Repeating blocks in YAML

4 min read

YAML is a language commonly used to create configuration files. A common problem when creating YAML files is a need to repeat segments of your code. This can make your files harder to maintain.

In this post, we’ll teach you how to define a YAML block for reuse later in that same file. We’ll use an example from dbt, but the same principle applies whether you’re defining docker-compose files, CI pipelines, or any other place that YAML is used.

What is YAML?

Technically, YAML is a superset of JSON, which means that any JSON is valid YAML.
×
Technically, YAML is a superset of JSON, which means that any JSON is valid YAML.

YAML has six main constructs:

  • Lists
  • Dictionaries
  • Strings
  • Integers
  • Floats
  • Booleans

Indentation is used to define the structure of YAML files. An example showing examples of these constructs is below (comments start with the # character):

- string
- 3 # integer
- 2.5 # float
- dict_key_1: dict_value_1
  dict_key_2: true # boolean
  dict_key_3: dict_value_3
- an original list item

For more on the fundamentals of YAML, we recommend this tutorial.

Repeating YAML Sections

One area where YAML can be inefficient is when you have repeated sections. In dbt, YAML files are used to configure different resource types, most commonly database models. You can add tests to model columns, and you might see some repetition like this (we’ve added comments to make it clearer):

version: 2

models:
  - name: model_one
    columns:
      - name: id
        tests: # this block
          - unique
          - not_null
      - name: col_a
      - name: col_b
  - name: model_two
    columns:
      - name: id
        tests: # is repeated down here
          - unique
          - not_null
      - name: col_c
      - name: col_d

In this example we have two models, and each have an id column. The tests added to this column are identical for each — we’re repeating ourselves! If we have lots of this pattern, we’re potentially doing more typing than we need to, and making things harder to mantain. It would be much more efficient for us to be able to define this test block once, and then repeat it whereever we need to!

Solving Repeated YAML Sections with Anchors and Aliases

Luckily, YAML has a little known but very handy concept that does exactly this. You define a block using an anchor, and then refer to it using an alias.

Anchors are denoted using a & character followed by the anchor name. Let’s look at a simple version of defining an anchor:

dict_key_1: dict_value_1
key_with_anchor_value: &anchor_name hello

You can see that the value that the anchor represents goes straight after the anchor name, as if the anchor didn’t exist.

To use that anchor, we specify an alias using the * character followed by the anchor name:

dict_key_1: dict_value_1
key_with_anchor_value: &anchor_name hello
key_with_alias_value: *anchor_name

The YAML above is the equivalent of this:

dict_key_1: dict_value_1
key_with_anchor_value: hello
key_with_alias_value: hello

A More Complex YAML Anchor Example

YAML anchors and alias are relatively simple when you’re representing a simple value like a string or a float, but they can get a bit tricky when you’re representing a more complex construct like a dictionary or list.

The best way to remember is that the &anchor_name is followed by the block, just the same as if the anchor wasn’t there.

Let’s see this in action by creating a list as our anchor block.

dict_key_1: dict_value_1
key_with_anchor_value: &list_1
  - anchor_list_item_1
  - anchor_list_item_2
  - anchor_list_item_3
key_with_alias_value: *list_1

This above is the equivalent of this:

dict_key_1: dict_value_1
key_with_anchor_value:
  - anchor_list_item_1
  - anchor_list_item_2
  - anchor_list_item_3
key_with_alias_value:
  - anchor_list_item_1
  - anchor_list_item_2
  - anchor_list_item_3

Putting It All Together

Let’s return to the example we used at the start - the repeated tests in our dbt model columns. Let’s use anchors to define the test block once and an alias to use it again:

version: 2

models:
  - name: model_one
    columns:
      - name: id
        tests: &unique_not_null
          - unique
          - not_null
      - name: col_a
      - name: col_b
  - name: model_two
    columns:
      - name: id
        tests: *unique_not_null
      - name: col_c
      - name: col_d

If you want to see the equivalent of this, you can scroll up to the earlier code under the heading ‘Repeating YAML Sections’.

Conclusion

Don’t Repeat Yourself
×
Don’t Repeat Yourself

In this tutorial we saw examples where a YAML anchor was a string and a list, but anchors can represent any of the YAML constructs.

It’s worth keeping in mind that there’s a tradeoff to using anchors. While sometimes you want to keep your code ‘DRY’, other times it’s better to be explicit, so use them with care. If you do decide to use them, give your anchors really descriptive names — it doesn’t cost any more to use extra characters, and it will help to make your configuration files easier to understand.