September 9, 2022

BigQuery Window Functions Guide: 3 Key Types + Full Examples

With <3 by Acuto

BigQuery is a great data warehouse that lets you quickly and easily search through billions of rows of data in just a few seconds. Both enterprises and startups find it practical and manageable because you can use standard SQL language to query data in BigQuery.

The fully managed data warehouse even comes with a free version and a sandbox version where you can test queries for free. The sandbox version doesn’t even require a credit card to be used. There’s a learning curve that comes with its long list of functions, clauses, and other syntax components, but you don’t need to know everything before you start working with BigQuery.

As experts in BigQuery who build and run data warehouses to make business operations easier, we’ll explain in this comprehensive article how you can use some of the most important window functions in BigQuery. Let’s get started.

What are Google BigQuery Window Functions?

The BigQuery window functions are used to calculate results for a certain group of rows found in tables. The term “window” in the name of these functions might seem a bit odd, but what it actually stands for is a “window of data.” There’s another type of functions known as aggregate functions, which in a way resemble window functions because they’re both concerned with computing data from groups of rows.

The window functions return a certain value for each row, while the aggregate functions return a certain value for all rows (more on this below). Using the window functions, it’s easy to get rid of duplicates or keep track of the totals in a set of rows or a subset of rows.

You’ll probably come across window functions as analytics functions as well. Both terms mean the same. Learning how to use them can help unlock the full power of your data. But first, let’s define these types of functions and what they actually do.

Why Use BigQuery Window Functions?

With BigQuery’s window functions, it’s very easy to find useful data in your tables by filtering them. As long as this data has a common denominator or is of a similar format, you can always make calculations on it and extract useful metrics from it. The window functions can be used in many different ways that depend on the user. But some of the main general use cases are:

Calculating moving averages,
Calculating changes between data in rows,
Calculating totals and deriving cumulative values,
Ranking data rows based on groups or by certain criteria, etc.

Being that BigQuery has a versatile syntax, there are other functions that can be used to perform similar tasks. For instance, aggregate functions are seen as closely similar to window functions, so it’s important to clarify the distinction between the two.

BigQuery Window (Analytic) vs Aggregate Functions

Aggregate functions are employed to calculate the sum of a group of rows and then return a single value at the end. On the other hand, window functions are used to calculate the value of one or multiple rows while displaying a new result for each row.

Another major difference between the two types of functions lies in their syntax. Window functions are characterized by the clause OVER. All the other functions that seem similar to the window functions but don’t have the OVER clause are either aggregate or scalar functions.

You can also preserve rows of data when using window functions, but you cannot achieve the same with aggregate functions.

BigQuery Window Functions Syntax

BigQuery offers a wide variety of functions that can be used to filter through datasets and extract valuable insights. The window functions are frequently used for this, too.

The window functions syntax is as follows (sourced from Google’s official documentation):

function_name ( [ argument_list ] ) OVER over_clause

over_clause:

  { named_window | ( [ window_specification ] ) }

window_specification:

  [ named_window ]
  [ PARTITION BY partition_expression [, ...] ]
  [ ORDER BY expression [ { ASC | DESC }  ] [, ...] ]
  [ window_frame_clause ]

window_frame_clause:

  { rows_range } { frame_start | frame_between }

rows_range:

  { ROWS | RANGE }

However, looking at the syntax alone is not enough to understand the functions thoroughly. That’s why we’ve provided examples below to further explain how the functions work.

BigQuery Window Function Parameters and Notation Standards

The window functions seem hard to understand, so here’s a breakdown of all the clauses based on the brackets that surround them:

[] – optional clauses.
() – actual parentheses.
| – presence of a logical OR.
[, … ] – a list of comma-separated items that .
{} – a group of options.

The OVER clause makes the window functions notably different from other functions. That’s where you define the set of rows (a window frame) that you want to compute results for. The OVER keyword is used when writing queries in Standard SQL as one of the many differences in syntax between Legacy SQL and Standard SQL.

List of BigQuery Window Functions

Whether you’re searching for a function to calculate the moving averages or percentiles or to find a particular value within a dataset, the long list of window functions offers many options. These functions are mainly separated into three categories:

Aggregate BigQuery Functions

Didn’t we just state that analytic (window) functions are different from aggregate functions? Yes, that’s true. But we can still build window functions if we use the OVER clause in our function and complement it with the aggregate functions.

Function	Explanation
ARRAY_AGG	Outputs an array consisting of the values in the windowed rows.
SUM	Outputs the sum of non-NULL values for a selected range.
ANY_VALUE	Queries through the windowed rows and outputs a random row.
CORR	It gives the Pearson coefficient for these two values by using the first number as a dependent variable and the second number as an independent variable.
AVG	Outputs the average of the windowed rows (always for non-NULL values)
COUNT	Outputs the number of rows for the inputted range.
COUNTIF	Outputs the number of rows for the group of rows that respect a certain expression.
COVAR_POP	Outputs the population covariance for the inputted pairs.
COVAR_SAMP	Outputs the sample covariance for the inputted pairs of numbers.
MAX	Outputs the maximum value for the given non-NULL expressions.
MIN	Outputs the minimum value for the given non-NULL expressions.
ST_CLUSTERDBSCAN	Outputs a DBSCAN clustering for a selected column of geographies.
STDDEV_POP	Outputs the population standard deviation for the selected values.
STDDEV_SAMP	Outputs the sample standard deviation for the selected values.
STRING_AGG	Outputs the concatenation of the selected non-NULL values as a STRING value.
VAR_POP	Outputs the population variance for the selected values.
VAR_SAMP	Outputs the sample variance for the selected values.

As the main verb for these types of functions signifies, all of them involve performing some kind of addition with the selected values.

Now, let’s have a look at the second type of window functions, which are navigational functions.

Navigational BigQuery Functions

The navigational functions focus on computing a particular part of the data over a different row in the selected window frame from the present row.

Function	Explanation
FIRST_VALUE	Outputs the value of the initial row found in the current window frame.
LAST_VALUE	Outputs the value of the end row found in the current window frame.
NTH_VALUE	Outputs the value of the nth row found in the current window frame.
LEAD	Outputs the value of the subsequent row in the current window frame after having determined an offset for it.
LAG	Outputs the value of the preceding row in the current window frame after having determined an offset for it.
PERCENTILE_CONT	Outputs the percentile value with linear interpolation for a group of windowed values.
PERCENTILE_DISC	Outputs the percentile value for a group of discrete values.

Note: Even though the syntax for the navigational functions (especially the syntax of the OVER clause) could vary, the data type of the result is always the same.

Numbering BigQuery Functions

The numbering functions, which are the third type of window functions, assign integer values to the existing rows according to their position within the selected window.

FUNCTION	Explanation
ROW_NUMBER	Outputs the sequential row integer for each selected partition.
RANK	Outputs the rank for each row in the ordered partition, starting from 1.
DENSE_RANK	Outputs the rank for each row within the window partition, starting from 1.
PERCENT_RANK	Outputs the percentile rank of a specific row in a particular partition.
CUME_DIST	Outputs the relative rank of a row in a selected partition.
NTILE	Outputs the respective bucket number after grouping the rows into constant_integer_expression buckets according to the row ordering.

BigQuery Window Functions Examples

So far, we’ve explained 30 window functions that you can start using immediately. But that might be hard without first seeing some real-life examples. That’s why we’ve built a table within our dataset in the BigQuery Sandbox and ran six essential window functions in BigQuery.

Example table with client data

MAX and MIN BigQuery Examples

Initially, we’ll start off with the MIN function. The function is expected to return the minimal values within the partitions. The syntax is quite simple:

MIN BigQuery Syntax:

MIN(expression) [OVER (...)]

The complete query for our table would be:

SELECT
  Solution, 
  Client,
  Price_in_USD,
  MIN(Price_in_USD) OVER (PARTITION BY Client) AS Lowest_package_first
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s execute this function on the table we created.

Running a MIN function in BigQuery

As expected, the query partitioned the data by client and ranked the price rows from lowest to highest.

Now, let’s test out the MAX function, which is supposed to return the opposite (the maximum values).

MAX BigQuery Syntax:

MAX(expression)[OVER (...)]

The query for this case has only been slightly modified compared to the previous one.

SELECT
  Solution, 
  Client,
  Price_in_USD,
  MAX(Price_in_USD) OVER (PARTITION BY Client) AS Highest_package_first
FROM
  `acuto-bigquery.clients.test_table_clients`;

Let’s see how it works in practice:

Running a MAX function in BigQuery

As expected, the rows are ordered by price from most expensive to least expensive, with the most expensive solutions for each partition at the top.

LAG and LEAD BigQuery Examples

Operations done on datasets can be very different, and sometimes you may need to know what values came before and after each row. That’s what the LAG and LEAD functions in BigQuery do, as we also explained in the table about navigational functions above.

So, let’s start with the LAG function. We’ll use the same table as we did with the MIN and MAX examples.

LAG BigQuery Function Syntax

The LAG function provides the values of the preceding row of the specified row. Its syntax looks like this:

LAG (value_expression[, offset [, default_expression]])

Now let’s write a query to execute on our table. Using the rules, the query would look like this:

SELECT 
  Solution,
  Client, 
  Price_in_USD,
  LAG(Price_in_USD,1) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Previous_solution
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s run the query and see what it outputs:

Running a LAG function in BigQuery

The query returned a column pointing at the previous solution, as expected.

LEAD BigQuery Function Syntax

The LEAD function does the opposite of the LAG function. It outputs the subsequent row coming after the specified row. The syntax is quite similar to the LAG function syntax:

LEAD (value_expression[, offset [, default_expression]])

The changes we need to make in the syntax are quite minimal here, too. Our query for this example would be:

SELECT 
  Solution,
  Client, 
  Price_in_USD,
  LEAD(Price_in_USD,1) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Previous_solution
FROM
  `acuto-bigquery.clients.test_table_clients`;

Now, let’s run this query and see what it outputs:

Running a LEAD function in BigQuery

RANK and NTILE BigQuery Examples

These are two of the most commonly used numbering functions in BigQuery. They assign ranks to rows based on the value of rows within the partitions.

RANK BigQuery Function Syntax

The rank function doesn’t require any specific conditions. Therefore, its syntax is simple:

RANK() OVER clause

If we were to write a query for our table, it would appear like this:

SELECT

Client,

Solution,

Price_in_USD,

RANK() OVER (PARTITION BY Client ORDER BY Price_in_USD) AS Project_Nr

FROM

`acuto-bigquery.clients.test_table_clients`;

Now, let’s see what results it produces:

Running a RANK function in BigQuery

Each one of the rows is being assigned a number, cumulating the number of projects for each client.

What about the NTILE function? Let’s have a look at it.

NTILE BigQuery Function Syntax

The NTILE function is similar to the RANK function since it assigns ranks to rows within partitions, but with a slight difference: it packs the rows into nearly similar-sized buckets. Its syntax looks like this:

NTILE(constant_integer_expression) OVER over_clause

Our query for this example would be:

SELECT

Solution,

Price_in_USD,

NTILE(5) OVER (PARTITION BY Solution ORDER BY Price_in_USD) AS Buckets

FROM

`acuto-bigquery.clients.test_table_clients`;

Now, let’s run it and see the results that it outputs:

Running an NTILE function in BigQuery

Conclusion

Writing queries in BigQuery can be daunting if you don’t have extensive SQL knowledge in order to produce the results that you want to extract from your data. In this article, we’ve explored some of the most commonly used window functions in BigQuery used for this purpose.

These analytic (window) functions are divided into three main subsets: numbering, navigational and aggregate functions. Each of them yields particular information from your datasets.

However, their syntax could take hours to master. But you can avoid spending countless hours writing these queries to just generate a few reports.

We can build a data warehouse and manage it for you while you spend your time on tasks that are hard to automate. Furthermore, if you want to save more time by automating other time-consuming daily tasks, Acuto can build customized automation scripts for you.

Schedule a call with our team in less than a minute to learn more about our services!