Working with JSON in Redshift

2 min read

When working with data warehouses, it’s common to have structured data stored within a table as a JSON blob. Until recently, extracting data from JSON in Redshift was extremely cumbersome. This tutorial shows you a new, easier way of working with JSON in Redshift.

Our Dataset

For this tutorial, we’re going to assume we work for a company that teaches people how to code. When someone signs up for the company, we ask them a few questions and then recommend two languages for them. The data is stored in a table called recommendations shown below:

SELECT * FROM recommendations

user_id	recommendations
1	`[{"language_id": 1, "language_name": "Python"}, {"language_id": 2, "language_name": "SQL"}]`
2	`[{"language_id": 3, "language_name": "R"}, {"language_id": 2, "language_name": "SQL"}]`
3	`[{"language_id": 2, "language_name": "SQL"}, {"language_id": 1, "language_name": "Python"}]`
4	`[{"language_id": 2, "language_name": "SQL"}, {"language_id": 3, "language_name": "R"}]`
5	`[{"language_id": 3, "language_name": "R"}, {"language_id": 1, "language_name": "Python"}]`

JSON and Redshift: The Old (Hard) Way

Up until recently, working with JSON data in Redshift was very difficult. We’d have to use two functions to extract the data:

JSON_EXTRACT_PATH_TEXT, which extracts values from JSON key-value pairs.
JSON_EXTRACT_ARRAY_ELEMENT_TEXT(), which extracts values from JSON arrays.

Let’s look at how we’d extract the data into columns using these functions:

SELECT
    user_id,
    JSON_EXTRACT_PATH_TEXT(
        JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
            recommendations, 0), 'language_id') AS rec_1_id,
    JSON_EXTRACT_PATH_TEXT(
        JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
            recommendations, 0), 'language_name') AS rec_1_name,
    JSON_EXTRACT_PATH_TEXT(
        JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
            recommendations, 1), 'language_id') AS rec_2_id,
    JSON_EXTRACT_PATH_TEXT(
        JSON_EXTRACT_ARRAY_ELEMENT_TEXT(
            recommendations, 1), 'language_name') AS rec_2_name
FROM recommendations

user_id	rec_1_id	rec_1_name	rec_2_id	rec_2_name
1	1	Python	2	SQL
2	3	R	2	SQL
3	2	SQL	1	Python
4	2	SQL	3	R
5	3	R	1	Python

This code is hard to write, read, and maintain! Luckily, there is now a better way.

JSON and Redshift: The New (Better) Way

In April 2021, Redshift announced the SUPER type, and with it better JSON support. This brings it inline with Snowflake and Bigquery in terms of ease of use.

To work with the SUPER type, we first have to use the (at time of writing) undocumented JSON_PARSE() function to convert our JSON string into the SUPER type. Once we’ve done that, we can use brackets ([]) to retrieve and extract our JSON data. Let’s take a look:

WITH recs_super AS (
    SELECT
        user_id,
        JSON_PARSE(recommendations) AS recommendations
    FROM recommendations
)

SELECT
    user_id,
    recommendations[0]['language_id'] AS rec_1_id,
    recommendations[0]['language_name'] AS rec_1_name,
    recommendations[1]['language_id'] AS rec_2_id,
    recommendations[1]['language_name'] AS rec_2_name
FROM recs_super

user_id	rec_1_id	rec_1_name	rec_2_id	rec_2_name
1	1	Python	2	SQL
2	3	R	2	SQL
3	2	SQL	1	Python
4	2	SQL	3	R
5	3	R	1	Python

The code is clear and easy to understand!

As you can see, this new type makes working with JSON in Redshift dramatically easier. To find out more, you can consult the Redshift documentation on the SUPER type.