Google BigQuery allows its users to process and analyze vast amounts of data in seconds. Even if you have terabytes of data, you can query it in just minutes.
The data warehouse provided by Google is equipped with a built-in query engine that allows anyone to quickly run queries without worrying about maintaining the warehouse infrastructure. The web service is accessible via Google Cloud Platform and offers many features that make working with data and analysing it much easier.
Unifying the query results into one dataset is a task that comes up pretty commonly while working with Google BigQuery and databases in general. When using BigQuery, you can do this by writing UNION queries. Today we’ll explore how you can create UNION queries using both the Standard SQL syntax and the Legacy SQL syntax. Let’s get started.
Google BigQuery UNION Overview
For those who have worked with SQL queries before, the BigQuery UNION type should be familiar. At its core, the fundamental functionality of this query function is to bring together the data from multiple result sets and unify them in a vertical manner.
However, BigQuery UNION offers two options for processing the duplicate data. By adding either ALL or DISTINCT keywords in the query, users can decide to either keep the duplicate values in the final results or have them removed.
For instance, you might have information for each client in your client database organised into tables by industry, and you’d prefer to have it all in one unified list. To save the time it takes to query each table, using the UNION operator, you can retrieve results using one query.
Let’s further explore how this query can save the day.
BigQuery UNION ALL and UNION DISTINCT
As discussed above, the BigQuery UNION can be utilised in two ways: as UNION DISTINCT and UNION ALL. Both of them produce specific results. Depending on your needs, you can select the query that will produce the best results.
Both queries can save you time and get you the data results you want without having to input multiple queries. Their names are self-explanatory: UNION ALL implies that there’s an inclusion of all values, and UNION DISTINCT implies that there’s an exclusion of non-unique values.
Differences Between BigQuery UNION ALL and UNION DISTINCT
Let’s see in what aspects the UNION ALL and UNION DISTINCT queries differ from each other in terms of speed and performance. Here’s an overview of the main differences to help you get a clearer picture regarding their use:
|UNION ALL||UNION DISTINCT|
|Displays duplicate values||Displays only unique values|
|Performs faster||Performs slower|
|Returns all queries and tables||Returns only the specified records from tables and queries|
|Results are output quickly||Results are output slowly|
What does their syntax look like when used practically? Let’s find out.
BigQuery UNION ALL Syntax
When your aim is to return all results when joining two tables into a vertical column, then UNION ALL is perfect for the job. Its syntax is clear and logical:
SELECT *column name* FROM `*respective table*` UNION ALL SELECT *column name* FROM `*respective table*`;
Let’s head over to Google Cloud Console (GCC) and try the Bigquery Sandbox to test the syntax.
|Pro Tip: Creating tables manually could result in an error. Try creating them in Google Sheets and then uploading them to GCC in a .csv format.|
We’re creating two example tables with data for our clients for two different months: August 2021 and August 2022. Here we’ll test the syntax and see what it returns.
Now, let’s use the UNION ALL query following the syntax to get the names of all clients from both months in one column:
SELECT Name FROM `acuto-bigquery.clients.clients_aug_2021 ` UNION ALL SELECT Name FROM `acuto-bigquery.clients.clients_aug_2022`
In just one run, we’ve brought all the client names into one column. And, as we can see, we do have duplicate client names. Let’s see if there’s something we can do about this.
BigQuery UNION DISTINCT Syntax
The syntax of UNION DISTINCT is straightforward too. The results we can expect are unique values and no duplicates. The syntax structure goes like this:
SELECT *column name* FROM `*respective table* ` UNION DISTINCT SELECT *column name* FROM `*respective table*`
Now, let’s run this query on the example tables that we created and see the result.
SELECT Name FROM `acuto-bigquery.clients.clients_aug_2021 ` UNION DISTINCT SELECT Name FROM `acuto-bigquery.clients.clients_aug_2022`
As we can see, we no longer have duplicate entries.
How to Write UNION Queries in Google BigQuery
How do you write UNION queries in Google BigQuery? There are two different ways of doing it, depending on the SQL syntax you prefer. Obviously, the results will be the same, but there are slight changes in syntax.
Comma-Delimited UNION using Legacy SQL
For those who prefer working with Legacy SQL, all you have to do is switch the settings and tweak the query a bit. To change the settings, head to the More menu above the console field, then > Query settings > Additional settings, and tick the Legacy box under SQL dialect.
As the name indicates, the Comma-Delimited Unions syntax uses commas to separate tables, and it’s shorter than the Standard SQL syntax. Once you have selected which tables you want to work with, you have to list them, separated by a comma, and then hit “Run”.
Legacy SQL BigQuery Union Example
Let’s re-run the query using the Legacy SQL syntax. According to the rules, its syntax is:
SELECT *column name* FROM [*respective table1* ], [*respective table2*] ORDER BY *column name*
SELECT Name FROM [acuto-bigquery:clients.clients_aug_2021 ], [acuto-bigquery:clients.clients_aug_2022] ORDER BY Name
Tip: Keep an eye on the formatting changes between Legacy SQL and Standard SQL. Also, note that adding an extra space at the end of the first table name (here, for example, after “…aug_2021” could prevent the error that is otherwise raised “Not found: Dataset my-project-name:domain_public was not found in location [country]”)
UNION using Standard SQL
When we talk about Standard SQL, we’re talking about the BigQuery Standard SQL we used at the beginning of the article as an example. What makes it special is the use of two additional keywords, DISTINCT and ALL.
Standard SQL BigQuery Union Example
To illustrate the use of Standard SQL, let’s demonstrate another example using the tables we’ve built. We’re using the UNION DISTINCT query for the “Lead_type” field values.
SELECT Lead_type FROM `acuto-bigquery.clients.clients_aug_2021 ` UNION DISTINCT SELECT Lead_type FROM `acuto-bigquery.clients.clients_aug_2022`
As expected, we do have only two distinct values in the column Lead_type, so that’s what we see in the results.
BigQuery UNION Rules to Keep in Mind
Since the launch of BigQuery Standard SQL in 2016, which is ANSI compliant, the dialect has turned into the preferred SQL dialect for BigQuery. The UNION operator is one of the most useful queries, but it can’t always be employed. There are a few rules that should be respected:
- The respective columns must contain the same data type.
- Statements in queries should have a uniform flow.
- Columns should have uniform layouts and lengths.
Even though both SQL dialects have a lot in common, migrating from Legacy SQL to Standard SQL is not difficult. Google even provides the option to translate queries from Legacy SQL, Snowflake, and MySQL, among others, into Standard SQL in the console.
BigQuery UNION Use Cases
The BigQuery UNION operator can be useful in many scenarios. We’ve picked three of the most common and useful cases. Here’s a short introduction to each of them:
1. BigQuery UNION for Multiple Tables
In real life case scenarios, practical and useful data is stored in numerous tables. So, you might need to bring together data from columns in different tables. And that’s something you can do using BigQuery UNION.
You can merge data from several tables by repeating the same line of code:
SELECT *column name* FROM `*respective table_1*` UNION ALL SELECT *column name* FROM `*respective table_2*` UNION ALL SELECT *column name* FROM `*respective table_n*`...
As an example, we’ll create a third table with clients in August 2020:
Now let’s get the Name of the clients from all 3 tables and the provided Solutions for each client, and merge them into one table:
SELECT Name FROM `acuto-bigquery.clients.clients_aug_2020` UNION ALL SELECT Name FROM `acuto-bigquery.clients.clients_aug_2021 ` UNION ALL SELECT Name FROM `acuto-bigquery.clients.clients_aug_2022`
2. BigQuery UNION for Tables with Multiple Columns
As you’ve seen from the previous example, BigQuery UNION queries can be used to unify data from not only one column, but multiple. This leads us to another common use of the query, which is to combine data from multiple columns. Let’s try to get all the columns from the three tables above:
SELECT Name, Solution, Lead_type FROM `acuto-bigquery.clients.clients_aug_2020` UNION ALL SELECT Name, Solution, Lead_type FROM `acuto-bigquery.clients.clients_aug_2021 ` UNION ALL SELECT Name, Solution, Lead_type FROM `acuto-bigquery.clients.clients_aug_2022`
Now we’ve gathered data from three months into one table.
3. BigQuery UNION Use with NULL Values
The UNION operator can include even NULL values if they’re part of the datasets. But they may lead to confusing outcomes. So, it’s important that our queries identify and display them instead of returning an error when these values are present. BigQuery UNION makes this feasible.
Let’s create another table with null values and see how they are displayed in the results when we merge the tables’ columns from one of our tables:
Then let’s run the query and see what happens:
SELECT Name, Solution, Lead_type FROM `acuto-bigquery.clients.clients_aug_2020` UNION ALL SELECT Name, Solution, Lead_type FROM `acuto-bigquery.clients.clients_null`
As we can see, the NULL values appear respectively in the final results.
With a simple syntax and functioning on a serverless database, BigQuery runs on powerful algorithms that allow you to analyse data and extract insights with ease.
BigQuery allows you to analyse large amounts of data without having to install anything on your own computer, saving you an enormous amount of time.
Tasks can easily become repetitive, but at Acuto, we can help you automate tedious tasks when working with Google Sheets, bidding on ads or other processes. We ship efficient scripts and automation solutions for businesses like yours. Get in touch with us to discover more.