YAML is a language commonly used to create configuration files. A common problem when creating YAML files is a need to repeat segments of your code. This can make your files harder to maintain.
In this post, we’ll teach you how to define a YAML block for reuse later in that same file. We’ll use an example from dbt, but the same principle applies whether you’re defining docker-compose files, CI pipelines, or any other place that YAML is used.
YAML has six main constructs:
Indentation is used to define the structure of YAML files. An example showing examples of these constructs is below (comments start with the #
character):
- string
- 3 # integer
- 2.5 # float
- dict_key_1: dict_value_1
dict_key_2: true # boolean
dict_key_3: dict_value_3
- an original list item
For more on the fundamentals of YAML, we recommend this tutorial.
One area where YAML can be inefficient is when you have repeated sections. In dbt, YAML files are used to configure different resource types, most commonly database models. You can add tests to model columns, and you might see some repetition like this (we’ve added comments to make it clearer):
version: 2
models:
- name: model_one
columns:
- name: id
tests: # this block
- unique
- not_null
- name: col_a
- name: col_b
- name: model_two
columns:
- name: id
tests: # is repeated down here
- unique
- not_null
- name: col_c
- name: col_d
In this example we have two models, and each have an id
column. The tests added to this column are identical for each — we’re repeating ourselves! If we have lots of this pattern, we’re potentially doing more typing than we need to, and making things harder to mantain. It would be much more efficient for us to be able to define this test block once, and then repeat it whereever we need to!
Luckily, YAML has a little known but very handy concept that does exactly this. You define a block using an anchor, and then refer to it using an alias.
Anchors are denoted using a &
character followed by the anchor name. Let’s look at a simple version of defining an anchor:
dict_key_1: dict_value_1
key_with_anchor_value: &anchor_name hello
You can see that the value that the anchor represents goes straight after the anchor name, as if the anchor didn’t exist.
To use that anchor, we specify an alias using the *
character followed by the anchor name:
dict_key_1: dict_value_1
key_with_anchor_value: &anchor_name hello
key_with_alias_value: *anchor_name
The YAML above is the equivalent of this:
dict_key_1: dict_value_1
key_with_anchor_value: hello
key_with_alias_value: hello
YAML anchors and alias are relatively simple when you’re representing a simple value like a string or a float, but they can get a bit tricky when you’re representing a more complex construct like a dictionary or list.
The best way to remember is that the &anchor_name
is followed by the block, just the same as if the anchor wasn’t there.
Let’s see this in action by creating a list as our anchor block.
dict_key_1: dict_value_1
key_with_anchor_value: &list_1
- anchor_list_item_1
- anchor_list_item_2
- anchor_list_item_3
key_with_alias_value: *list_1
This above is the equivalent of this:
dict_key_1: dict_value_1
key_with_anchor_value:
- anchor_list_item_1
- anchor_list_item_2
- anchor_list_item_3
key_with_alias_value:
- anchor_list_item_1
- anchor_list_item_2
- anchor_list_item_3
Let’s return to the example we used at the start - the repeated tests in our dbt model columns. Let’s use anchors to define the test block once and an alias to use it again:
version: 2
models:
- name: model_one
columns:
- name: id
tests: &unique_not_null
- unique
- not_null
- name: col_a
- name: col_b
- name: model_two
columns:
- name: id
tests: *unique_not_null
- name: col_c
- name: col_d
If you want to see the equivalent of this, you can scroll up to the earlier code under the heading ‘Repeating YAML Sections’.
In this tutorial we saw examples where a YAML anchor was a string and a list, but anchors can represent any of the YAML constructs.
It’s worth keeping in mind that there’s a tradeoff to using anchors. While sometimes you want to keep your code ‘DRY’, other times it’s better to be explicit, so use them with care. If you do decide to use them, give your anchors really descriptive names — it doesn’t cost any more to use extra characters, and it will help to make your configuration files easier to understand.