YAML Files (and other data files like JSON) are becoming more and more important in infrastructure deployments and projects. We often edit YAML files in a text editor and a mistake can have a big impact. Before something is deployed in production, it should definitely be validated, tested and verified but how can we check that a YAML file is not only syntactically correct but also that the data structure is correct?
JSON Schema is probably the defacto standard for validation of JSON data and can also be used for YAML files. A nice side effect is syntax highlighting in most text editors, which makes editing YAML files more pleasant and less error-prone.
This blog post only gives a general overview and some examples of JSON Schema. A good starting point for learning is JSON Schema - Understanding. There are also many good tools and libraries available helping generating schemas. A list of implementations can be found here.
cat > urs.yaml <<EOF
---
name: urs
ipv4: 127.0.0.1
...
EOF
The YAML data can be validated with a JSON Schema. Suppose we want to have YAML files with the name and IPv4 address. To validate the content we need to describe the schema. A mapping in YAML is an object in JSON. In this case the object has 2 properties named "name" and "ipv4". Both of type "string".
cat > name_schema.json <<EOF
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"type": "string"
},
"ipv4": {
"type": "string"
}
}
}
EOF
curl https://github.com/neilpa/yajsv/releases/download/v1.4.0/yajsv.linux.amd64 -o yajsv -L -s
chmod +x yajsv
./yajsv -s name_schema.json urs.yaml
cat > name_schema.json <<EOF
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"pattern": "^[A-Z].*$"
},
"ipv4": {
"type": "string",
"format": "ipv4"
}
}
}
EOF
./yajsv -s name_schema.json urs.yaml
The regex checks if the name content starts with a capital letter and now fails. The IP address is valid. After the name is corrected, the file passes the verification again.
cat > urs.yaml <<EOF
---
name: Urs
ipv4: 127.0.0.1
...
EOF
./yajsv -s name_schema.json urs.yaml
JSON Schema has generic annotations not used for validations, but to describe and self-document the schema. It is also used in tools like syntax highlighting in editors.
cat > name_schema.json <<EOF
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "schema/schemas/name.json",
"type": "object",
"$comment": "Only the defined properties are allowed",
"additionalProperties": false,
"properties": {
"name": {
"type": "string",
"pattern": "^[A-Z].*$",
"title": "Name",
"description": "Name beginning with a capital letter",
"examples": [
"Jane Doe",
"John Doe",
"Jane"
]
},
"ipv4": {
"type": "string",
"format": "ipv4",
"title": "IP Address",
"description": "IPv4 Address belonging to the name",
"examples": [
"127.0.0.1",
"10.11.12.13"
]
}
}
}
EOF
cat > service_schema.json <<EOF
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"additionalProperties": false,
"properties": {
"services": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"ports": {
"type": "array",
"items": {
"$ref": "#/$defs/Port"
}
}
},
"required": [
"name",
"ports"
]
}
}
},
"required": [
"services"
],
"$defs": {
"Port": {
"type": "object",
"properties": {
"port": {
"type": "integer"
},
"name": {
"type": "string"
},
"targetPort": {
"type": "integer"
}
},
"required": [
"port",
"name"
]
}
}
}
EOF
cat > myService.yaml <<EOF
---
services:
- name: app01
ports:
- name: http
port: &http 80
- port: 8080
name: http_alt
targetPort: *http
- name: https
port: 443
- name: db01
ports:
- port: 5432
name: sql
...
EOF
./yajsv -s service_schema.json myService.yaml
Generators provide a good starting point for creating a schema. Of the JSON Schema generators used, https://app.quicktype.io/ is one of the most popular. The generator only supports JSON so the data need to be converted first when creating the schema. Depending on the structure, a single JSON can be used or many JSON objects in the "Source type" Multiple JSON
. Most of the time the generated schema needs adjustment and adding semantic checks like pattern, format, enum or number restrictions but it shortens the time for creating a schema enormously.
Many editors support JSON Schema for YAML files and thus autocompletion and tooltips as well as validation. This makes editing YAML files easier and less error-prone, since you get feedback before you save the file. Many editors use the yaml-language-server implementation from Red Hat. The following examples are tested with VS Code with the YAML extension.
Like other editors the yaml-language-server
supports the JSON Schema Store. A list of schemas with associated fileMatch
patterns is retrieved from the API. If a file matches a pattern, the associated scheme is used. For example, all YAML files under the path .github/workflows/*.yaml
are automatically validated with the schema github-workflow.json.
To see the list of schemas with YAML files, jq
can be used. The following command is limited to the first 15 lines.
curl https://www.schemastore.org/api/json/catalog.json -s | jq '.schemas[] | select((.fileMatch != null) and ((.fileMatch[] | contains("yaml")) or (.fileMatch[] | contains("yml")))) | { name: .name, fileMatch: .fileMatch }' 2>&1 | head -15
Schemas can also be stored on any webserver, on the file system or in the project directory. In VS Code you can configure the schema assignment in the settings. Globally or for each project.
In a project that contains the schemas, the .vscode/settings.json
file might look like this using relative paths:
{
"yaml.schemas": {
"schema/schemas/hosts.json": [
"host*.yaml",
"host*.yml"
],
"schema/schemas/groups.json": [
"group*.yaml",
"group*.yml"
],
"schema/schemas/defaults.json": [
"default.yaml",
"default.yml"
],
}
}
Because the schemas are included in the project, it is easy to use them in the CI/CD pipeline.
The schema can be specified inline with a modeline comment at the beginning of the YAML file. The schema url can be a web url, a relative or an absolute path.
# yaml-language-server: $schema=https://server/schema.json
# yaml-language-server: $schema=../relative/path/hosts.json
# yaml-language-server: $schema=/opt/schemas/groups.json
cat > urs.yaml <<EOF
---
# yaml-language-server: $schema=schema/schemas/name.json
name: Urs
ipv4: 127.0.0.1
...
EOF
The yaml-language-server
includes Kubernetes, but does not know if a file is a Kubernetes file or not. Therefore, the pattern is needed in the settings to identify the YAML files. To recognize all YAML files starting with "k8s" as Kubernetes files, the following settings are required.
{
"yaml.schemas": {
"kubernetes": [
"k8s*.yaml",
"k8s*.yml"
]
}
}
Also inline specification works. Schemas generated from Swagger are available.
# yaml-language-server: $schema=https://raw.githubusercontent.com/yannh/kubernetes-json-schema/master/master-standalone-strict/all.json
Or a specific Kubernetes version
# yaml-language-server: $schema=https://raw.githubusercontent.com/yannh/kubernetes-json-schema/master/v1.23.1-standalone-strict/all.json