This tutorial shows you how to calculate percentages using Snowflake SQL.
Snowflake is, in my opinion, the best data warehousing solution on the market. It is easy to manage and the clear separation of storage and compute make for both performant and cost effective solution. Unless there is an extremely compelling reason to chose a competitor (Redshift, Bigquery) it’s what I recommend to most of my clients.
The flavor of SQL that Snowflake supports will be familiar to those who have used Redshift (or PostgreSQL) before. In most cases, you can write SQL as you ‘know it’ and it will be accepted fine. That said, there are many extra functions and ‘hidden gems’ in Snowflake. This post is part of a series of tutorials for lesser-known Snowflake functions.
This tutorial assumes you are already familiar with window functions.
Before we start, let’s look at the dataset we will be using for this tutorial:
select player_id, team, points from player_scores
Our data is about a fictional sporting game where each row represents a player, with columns for unique_id, their team color, and how many points they have scored.
If we wanted to work out what percentage of all points each player has scored, we can use
sum() and a window function:
select player_id, team, points, points / sum(points) over () as overall_points_pct from player_scores
over() in the query above tells the database to expand the ‘window’ in which the
sum() operates to the whole dataset.
In this tutorial, we’re going to use the
ratio_to_report() window function, a more sucinct way of expressing this logic.
Let’s look at how we would write our query above using
select player_id, team, points, ratio_to_report(points) over () as overall_points_pct from player_scores
ratio_to_report() window function takes an expression (usually a column) as input and calculates the ratio of that expression to the window that is defined (in this case, the whole dataset).
Let’s say we were not only interested in each players overall contribution to points, but also their contribution to points for their specific team. We can then add a
partition by to our window:
select player_id, team, points, ratio_to_report(points) over () as overall_points_pct, ratio_to_report(points) over (partition by team) as team_points_pct from player_scores
I hope that
ratio_to_report() is a useful addition to your Snowflake-SQL toolbox.
Although this tutorial focused on Snowflake, Redshift also supports the
ratio_to_report() window function. Bigquery only supports the function as part of its legacy syntax. Thanks to Erika Pullum who drew my attention the