This tutorial shows you how to calculate percentages using Snowflake SQL.
Snowflake is, in my opinion, the best data warehousing solution on the market. It is easy to manage and the clear separation of storage and compute make for both performant and cost effective solution. Unless there is an extremely compelling reason to chose a competitor (Redshift, Bigquery) it’s what I recommend to most of my clients.
The flavor of SQL that Snowflake supports will be familiar to those who have used Redshift (or PostgreSQL) before. In most cases, you can write SQL as you ‘know it’ and it will be accepted fine. That said, there are many extra functions and ‘hidden gems’ in Snowflake. This post is part of a series of tutorials for lesser-known Snowflake functions.
This tutorial assumes you are already familiar with window functions.
Before we start, let’s look at the dataset we will be using for this tutorial:
select
player_id,
team,
points
from player_scores
PLAYER_ID | TEAM | POINTS |
---|---|---|
1 | red | 5 |
2 | red | 22 |
3 | red | 1 |
4 | blue | 17 |
5 | blue | 8 |
6 | blue | 6 |
Our data is about a fictional sporting game where each row represents a player, with columns for unique_id, their team color, and how many points they have scored.
If we wanted to work out what percentage of all points each player has scored, we can use sum()
and a window function:
select
player_id,
team,
points,
points / sum(points) over () as overall_points_pct
from player_scores
PLAYER_ID | TEAM | POINTS | OVERALL_POINTS_PCT |
---|---|---|---|
1 | red | 5 | 0.084746 |
2 | red | 22 | 0.372881 |
3 | red | 1 | 0.016949 |
4 | blue | 17 | 0.288136 |
5 | blue | 8 | 0.135593 |
6 | blue | 6 | 0.101695 |
The over()
in the query above tells the database to expand the ‘window’ in which the sum()
operates to the whole dataset.
In this tutorial, we’re going to use the ratio_to_report()
window function, a more sucinct way of expressing this logic.
Let’s look at how we would write our query above using ratio_to_report()
:
select
player_id,
team,
points,
ratio_to_report(points) over () as overall_points_pct
from player_scores
PLAYER_ID | TEAM | POINTS | OVERALL_POINTS_PCT |
---|---|---|---|
1 | red | 5 | 0.084746 |
2 | red | 22 | 0.372881 |
3 | red | 1 | 0.016949 |
4 | blue | 17 | 0.288136 |
5 | blue | 8 | 0.135593 |
6 | blue | 6 | 0.101695 |
The ratio_to_report()
window function takes an expression (usually a column) as input and calculates the ratio of that expression to the window that is defined (in this case, the whole dataset).
Let’s say we were not only interested in each players overall contribution to points, but also their contribution to points for their specific team. We can then add a partition by
to our window:
select
player_id,
team,
points,
ratio_to_report(points) over () as overall_points_pct,
ratio_to_report(points) over (partition by team) as team_points_pct
from player_scores
PLAYER_ID | TEAM | OVERALL_POINTS_PCT | TEAM_POINTS_PCT |
---|---|---|---|
1 | red | 0.084746 | 0.178571 |
2 | red | 0.372881 | 0.785714 |
3 | red | 0.016949 | 0.035714 |
4 | blue | 0.288136 | 0.548387 |
5 | blue | 0.135593 | 0.258065 |
6 | blue | 0.101695 | 0.193548 |
I hope that ratio_to_report()
is a useful addition to your Snowflake-SQL toolbox.
Although this tutorial focused on Snowflake, Redshift also supports the ratio_to_report()
window function. Bigquery only supports the function as part of its legacy syntax. Thanks to Erika Pullum who drew my attention the ratio_to_report()
function.