BigQuery Window Functions

BigQuery Window Functions Guide: 3 Key Types + Full Examples

BigQuery is a great data warehouse that lets you quickly and easily search through billions of rows of data in just a few seconds. Both enterprises and startups find it practical and manageable because you can use standard SQL language to query data in BigQuery. 

The fully managed data warehouse even comes with a free version and a sandbox version where you can test queries for free. The sandbox version doesn’t even require a credit card to be used. There’s a learning curve that comes with its long list of functions, clauses, and other syntax components, but you don’t need to know everything before you start working with BigQuery.

As experts in BigQuery who build and run data warehouses to make business operations easier, we’ll explain in this comprehensive article how you can use some of the most important window functions in BigQuery. Let’s get started. 

What are Google BigQuery Window Functions?  

The BigQuery window functions are used to calculate results for a certain group of rows found in tables. The term “window” in the name of these functions might seem a bit odd, but what it actually stands for is a “window of data.” There’s another type of functions known as aggregate functions, which in a way resemble window functions because they’re both concerned with computing data from groups of rows.

The window functions return a certain value for each row, while the aggregate functions return a certain value for all rows (more on this below). Using the window functions, it’s easy to get rid of duplicates or keep track of the totals in a set of rows or a subset of rows. 

You’ll probably come across window functions as analytics functions as well. Both terms mean the same. Learning how to use them can help unlock the full power of your data. But first, let’s define these types of functions and what they actually do. 

Why Use BigQuery Window Functions? 

With BigQuery’s window functions, it’s very easy to find useful data in your tables by filtering them. As long as this data has a common denominator or is of a similar format, you can always make calculations on it and extract useful metrics from it. The window functions can be used in many different ways that depend on the user. But some of the main general use cases are: 

  • Calculating moving averages, 
  • Calculating changes between data in rows,
  • Calculating totals and deriving cumulative values,
  • Ranking data rows based on groups or by certain criteria, etc.

Being that BigQuery has a versatile syntax, there are other functions that can be used to perform similar tasks. For instance, aggregate functions are seen as closely similar to window functions, so it’s important to clarify the distinction between the two. 

BigQuery Window (Analytic) vs Aggregate Functions  

Aggregate functions are employed to calculate the sum of a group of rows and then return a single value at the end. On the other hand, window functions are used to calculate the value of one or multiple rows while displaying a new result for each row

Another major difference between the two types of functions lies in their syntax. Window functions are characterized by the clause OVER. All the other functions that seem similar to the window functions but don’t have the OVER clause are either aggregate or scalar functions

You can also preserve rows of data when using window functions, but you cannot achieve the same with aggregate functions. 

BigQuery Window Functions Syntax  

BigQuery offers a wide variety of functions that can be used to filter through datasets and extract valuable insights. The window functions are frequently used for this, too. 

The window functions syntax is as follows (sourced from Google’s official documentation): 

function_name ( [ argument_list ] ) OVER over_clause

over_clause:

  { named_window | ( [ window_specification ] ) }

window_specification:

  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  [ ORDER BY expression [ { ASC | DESC }  ] [, ...] ]
  [ window_frame_clause ]

window_frame_clause:

  { rows_range } { frame_start | frame_between }

rows_range:

  { ROWS | RANGE }

However, looking at the syntax alone is not enough to understand the functions thoroughly. That’s why we’ve provided examples below to further explain how the functions work.

BigQuery Window Function Parameters and Notation Standards 

The window functions seem hard to understand, so here’s a breakdown of all the clauses based on the brackets that surround them: 

  • [] – optional clauses. 
  • () – actual parentheses. 
  • | – presence of a logical OR. 
  • [, … ] – a list of comma-separated items that . 
  • {} – a group of options. 

The OVER clause makes the window functions notably different from other functions. That’s where you define the set of rows (a window frame) that you want to compute results for. The OVER  keyword is used when writing queries in Standard SQL as one of the many differences in syntax between Legacy SQL and Standard SQL

List of BigQuery Window Functions  

Whether you’re searching for a function to calculate the moving averages or percentiles or to find a particular value within a dataset, the long list of window functions offers many options. These functions are mainly separated into three categories: 

Aggregate BigQuery Functions  

Didn’t we just state that analytic (window) functions are different from aggregate functions? Yes, that’s true. But we can still build window functions if we use the OVER clause in our function and complement it with the aggregate functions. 

FunctionExplanation
ARRAY_AGGOutputs an array consisting of the values in the windowed rows. 
SUMOutputs the sum of non-NULL values for a selected range.  
ANY_VALUEQueries through the windowed rows and outputs a random row.  
CORRIt gives the Pearson coefficient for these two values by using the first number as a dependent variable and the second number as an independent variable.
AVGOutputs the average of the windowed rows (always for non-NULL values)
COUNTOutputs the number of rows for the inputted range. 
COUNTIFOutputs the number of rows for the group of rows that respect a certain expression.
COVAR_POPOutputs the population covariance for the inputted pairs.
COVAR_SAMPOutputs the sample covariance for the inputted pairs of numbers.   
MAXOutputs the maximum value for the given non-NULL expressions.
MINOutputs the minimum value for the given non-NULL expressions.
ST_CLUSTERDBSCANOutputs a DBSCAN clustering for a selected column of geographies.
STDDEV_POPOutputs the population standard deviation for the selected values.
STDDEV_SAMPOutputs the sample standard deviation for the selected values. 
STRING_AGGOutputs the concatenation of the selected non-NULL values as a STRING value. 
VAR_POPOutputs the population variance for the selected values.  
VAR_SAMPOutputs the sample variance for the selected values. 

As the main verb for these types of functions signifies, all of them involve performing some kind of addition with the selected values.

Now, let’s have a look at the second type of window functions, which are navigational functions. 

Navigational BigQuery Functions 

The navigational functions focus on computing a particular part of the data over a different row in the selected window frame from the present row. 

Function Explanation
FIRST_VALUEOutputs the value of the initial row found in the current window frame.
LAST_VALUEOutputs the value of the end row found in the current window frame.
NTH_VALUEOutputs the value of the nth row found in the current window frame. 
LEADOutputs the value of the subsequent row in the current window frame after having determined an offset for it. 
LAGOutputs the value of the preceding row in the current window frame after having determined an offset for it.  
PERCENTILE_CONTOutputs the percentile value with linear interpolation for a group of windowed values. 
PERCENTILE_DISCOutputs the percentile value for a group of discrete values.  
Note: Even though the syntax for the navigational functions (especially the syntax of the OVER clause) could vary, the data type of the result is always the same.

Numbering BigQuery Functions  

The numbering functions, which are the third type of window functions, assign integer values to the existing rows according to their position within the selected window. 

FUNCTIONExplanation
ROW_NUMBEROutputs the sequential row integer for each selected partition.
RANKOutputs the rank for each row in the ordered partition, starting from 1.  
DENSE_RANKOutputs the rank for each row within the window partition, starting from 1. 
PERCENT_RANKOutputs the percentile rank of a specific row in a particular partition. 
CUME_DISTOutputs the relative rank of a row in a selected partition.
NTILEOutputs the respective bucket number after grouping the rows into constant_integer_expression buckets according to the row ordering.

BigQuery Window Functions Examples  

So far, we’ve explained 30 window functions that you can start using immediately. But that might be hard without first seeing some real-life examples. That’s why we’ve built a table within our dataset in the BigQuery Sandbox and ran six essential window functions in BigQuery.

Example table with client data

Example table with client data

MAX and MIN BigQuery Examples  

Initially, we’ll start off with the MIN function. The function is expected to return the minimal values within the partitions. The syntax is quite simple: 

MIN BigQuery Syntax: 

MIN(expression) [OVER (...)]

The complete query for our table would be: 

SELECT
  Solution, 
  Client,
  Price_in_USD,
  MIN(Price_in_USD) OVER (PARTITION BY Client) AS Lowest_package_first
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s execute this function on the table we created. 

Running a MIN function in BigQuery 

Running a MIN function in BigQuery

As expected, the query partitioned the data by client and ranked the price rows from lowest to highest.

Now, let’s test out the MAX function, which is supposed to return the opposite (the maximum values). 

MAX BigQuery Syntax: 

MAX(expression)[OVER (...)]

The query for this case has only been slightly modified compared to the previous one. 

SELECT
  Solution, 
  Client,
  Price_in_USD,
  MAX(Price_in_USD) OVER (PARTITION BY Client) AS Highest_package_first
FROM
  `acuto-bigquery.clients.test_table_clients`;

Let’s see how it works in practice: 

Running a MAX function in BigQuery 

Running a MAX function in BigQuery 

As expected, the rows are ordered by price from most expensive to least expensive, with the most expensive solutions for each partition at the top. 

LAG and LEAD BigQuery Examples  

Operations done on datasets can be very different, and sometimes you may need to know what values came before and after each row. That’s what the LAG and LEAD functions in BigQuery do, as we also explained in the table about navigational functions above.

So, let’s start with the LAG function. We’ll use the same table as we did with the MIN and MAX examples. 

LAG BigQuery Function Syntax

The LAG function provides the values of the preceding row of the specified row. Its syntax looks like this: 

LAG (value_expression[, offset [, default_expression]])

Now let’s write a query to execute on our table. Using the rules, the query would look like this: 

SELECT 
  Solution,
  Client, 
  Price_in_USD,
  LAG(Price_in_USD,1) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Previous_solution
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s run the query and see what it outputs: 

Running a LAG function in BigQuery 

Running a LAG function in BigQuery 

The query returned a column pointing at the previous solution, as expected. 

LEAD BigQuery Function Syntax

The LEAD function does the opposite of the LAG function. It outputs the subsequent row coming after the specified row. The syntax is quite similar to the LAG function syntax: 

LEAD (value_expression[, offset [, default_expression]])

The changes we need to make in the syntax are quite minimal here, too. Our query for this example would be: 

SELECT 
  Solution,
  Client, 
  Price_in_USD,
  LEAD(Price_in_USD,1) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Previous_solution
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s run this query and see what it outputs: 

Running a LEAD function in BigQuery

Running a LEAD function in BigQuery 

RANK and NTILE BigQuery Examples  

These are two of the most commonly used numbering functions in BigQuery. They assign ranks to rows based on the value of rows within the partitions. 

RANK BigQuery Function Syntax

The rank function doesn’t require any specific conditions. Therefore, its syntax is simple: 

RANK() OVER clause

If we were to write a query for our table, it would appear like this: 

SELECT 

  Client,

  Solution,

  Price_in_USD,

  RANK() OVER (PARTITION BY Client ORDER BY Price_in_USD) AS Project_Nr

FROM

  `acuto-bigquery.clients.test_table_clients`;

Now, let’s see what results it produces: 

Running a RANK function in BigQuery 

Running a RANK function in BigQuery 

Each one of the rows is being assigned a number, cumulating the number of projects for each client.

What about the NTILE function? Let’s have a look at it. 

NTILE BigQuery Function Syntax

The NTILE function is similar to the RANK function since it assigns ranks to rows within partitions, but with a slight difference: it packs the rows into nearly similar-sized buckets. Its syntax looks like this: 

NTILE(constant_integer_expression) OVER over_clause

Our query for this example would be: 

SELECT 

  Solution,

  Price_in_USD,

  NTILE(5) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Buckets

FROM

  `acuto-bigquery.clients.test_table_clients`;

Now, let’s run it and see the results that it outputs: 

Running an NTILE function in BigQuery

Running an NTILE function in BigQuery 

Conclusion 

Writing queries in BigQuery can be daunting if you don’t have extensive SQL knowledge in order to produce the results that you want to extract from your data. In this article, we’ve explored some of the most commonly used window functions in BigQuery used for this purpose. 

These analytic (window) functions are divided into three main subsets: numbering, navigational and aggregate functions. Each of them yields particular information from your datasets.

However, their syntax could take hours to master. But you can avoid spending countless hours writing these queries to just generate a few reports.

We can build a data warehouse and manage it for you while you spend your time on tasks that are hard to automate. Furthermore, if you want to save more time by automating other time-consuming daily tasks, Acuto can build customized automation scripts for you. 

Schedule a call with our team in less than a minute to learn more about our services!

Recent Posts
Recent Posts