YAML

Language Overview¶

YAML (YAML Ain't Markup Language) is a human-readable data serialization language commonly used for configuration files, infrastructure as code, and data exchange. This guide covers YAML standards for consistent and maintainable configuration.

Key Characteristics¶

Paradigm: Data serialization, configuration
File Extension: .yaml, .yml (prefer .yaml)
Primary Use: Configuration files, Kubernetes manifests, CI/CD pipelines, Ansible playbooks
Indentation: 2 spaces (never tabs)

Quick Reference¶

Category	Convention	Example	Notes
Syntax
Indentation	2 spaces	`key: value`	Never tabs, always 2 spaces
Key-Value	`key: value`	`name: John`	Space after colon
Lists	`- item`	`- apple`	Dash followed by space
Multi-line	`\\|` or `>`	`description: \\| text`	`\\|` preserves newlines, `>` folds
Data Types
String	Unquoted or quoted	`name: John` or `name: "John"`	Quote when special chars
Number	Numeric	`count: 42`, `pi: 3.14`	Integer or float
Boolean	`true`/`false`	`enabled: true`	Lowercase
Null	`null` or `~`	`value: null`	Explicit null
Collections
Mapping	`key: value`	`person:\n name: John`	Nested objects
Sequence	`- item`	`fruits:\n - apple`	Arrays/lists
Inline Map	`{key: value}`	`{name: John, age: 30}`	Flow style
Inline List	`[item1, item2]`	`[1, 2, 3]`	Flow style
Files
Extension	`.yaml` preferred	`config.yaml`, `values.yaml`	Avoid `.yml`
Multiple Docs	`---` separator	`---\ndoc1\n---\ndoc2`	Multiple YAML docs in one file
Best Practices
Quotes	Quote when needed	`version: "1.20"`	Avoid type coercion
Comments	`# comment`	`# Configuration`	Hash for comments
Anchors	`&anchor`	`defaults: &defaults`	Reuse with `*anchor`
Merge Keys	`<<: *anchor`	`<<: *defaults`	Merge referenced keys

Basic Syntax¶

Indentation¶

Always use 2 spaces for indentation:

## Good - 2 spaces
services:
  web:
    image: nginx:latest
    ports:
      - "80:80"

## Bad - 4 spaces or tabs
services:
    web:
        image: nginx:latest

Key-Value Pairs¶

## Simple key-value pairs
name: my-application
version: 1.0.0
environment: production

## Nested structures
database:
  host: localhost
  port: 5432
  credentials:
    username: admin
    password: secret

Data Types¶

Strings¶

## Unquoted strings (preferred for simple strings)
name: my-application
description: A simple web application

## Quoted strings (use when needed)
message: "String with: special characters"
path: 'C:\Windows\System32'

## Multi-line strings - literal block (preserves newlines)
script: |
  #!/bin/bash
  echo "Hello World"
  exit 0

## Multi-line strings - folded block (single line)
description: >
  This is a long description
  that will be folded into
  a single line.

Numbers¶

## Integers
count: 42
port: 8080

## Floats
pi: 3.14159
percentage: 99.9

## Exponential notation
scientific: 1.23e-4

Booleans¶

## Preferred boolean values
enabled: true
disabled: false

## Avoid these (but they work)
## legacy_enabled: yes
## legacy_disabled: no

Null Values¶

## Explicit null
value: null

## Implicit null (empty value)
empty_value:

## Tilde also means null
another_null: ~

Collections¶

Lists¶

## Dash notation (preferred)
fruits:
  - apple
  - banana
  - orange

## Flow style (use sparingly)
colors: [red, green, blue]

## List of objects
users:
  - name: Alice
    role: admin
  - name: Bob
    role: user

## Empty list
empty_list: []

Dictionaries¶

## Nested dictionaries
application:
  name: my-app
  version: 1.0.0
  config:
    database:
      host: localhost
      port: 5432
    cache:
      type: redis
      ttl: 3600

## Empty dictionary
empty_dict: {}

Kubernetes YAML¶

Pod Definition¶

apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  namespace: default
  labels:
    app: nginx
    environment: production
spec:
  containers:
    - name: nginx
      image: nginx:1.21-alpine
      ports:
        - containerPort: 80
          protocol: TCP
      resources:
        requests:
          cpu: 100m
          memory: 128Mi
        limits:
          cpu: 500m
          memory: 512Mi
      env:
        - name: NGINX_HOST
          value: example.com
        - name: NGINX_PORT
          value: "80"

Deployment¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-deployment
  labels:
    app: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: nginx:1.21-alpine
          ports:
            - containerPort: 80

Docker Compose YAML¶

version: '3.8'

services:
  web:
    image: nginx:alpine
    container_name: web-server
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./html:/usr/share/nginx/html:ro
      - ./conf/nginx.conf:/etc/nginx/nginx.conf:ro
    environment:
      - NGINX_HOST=example.com
      - NGINX_PORT=80
    networks:
      - frontend
    depends_on:
      - api
    restart: unless-stopped

  api:
    build:
      context: ./api
      dockerfile: Dockerfile
    container_name: api-server
    ports:
      - "8080:8080"
    environment:
      DATABASE_URL: postgresql://user:pass@db:5432/mydb
    networks:
      - frontend
      - backend
    depends_on:
      - db

  db:
    image: postgres:15-alpine
    container_name: postgres-db
    environment:
      POSTGRES_USER: user
      POSTGRES_PASSWORD: pass
      POSTGRES_DB: mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - backend
    restart: unless-stopped

networks:
  frontend:
    driver: bridge
  backend:
    driver: bridge

volumes:
  postgres_data:
    driver: local

GitHub Actions YAML¶

name: CI Pipeline

on:
  push:
    branches:
      - main
      - develop
  pull_request:
    branches:
      - main

env:
  NODE_VERSION: '18'
  PYTHON_VERSION: '3.11'

jobs:
  test:
    name: Run Tests
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [20, 22]
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: 'npm'

      - name: Install dependencies
        run: npm ci

      - name: Run tests
        run: npm test

      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage/lcov.info

  build:
    name: Build Application
    runs-on: ubuntu-latest
    needs: test
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Push to registry
        if: github.ref == 'refs/heads/main'
        run: |
          echo "${{ secrets.DOCKER_PASSWORD }}" | docker login -u "${{ secrets.DOCKER_USERNAME }}" --password-stdin
          docker push myapp:${{ github.sha }}

Ansible YAML¶

---
- name: Configure web servers
  hosts: webservers
  become: true
  vars:
    nginx_version: "1.21"
    app_port: 8080

  tasks:
    - name: Update apt cache
      ansible.builtin.apt:
        update_cache: true
        cache_valid_time: 3600

    - name: Install nginx
      ansible.builtin.apt:
        name: nginx
        state: present

    - name: Copy nginx configuration
      ansible.builtin.template:
        src: nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        owner: root
        group: root
        mode: '0644'
      notify: Reload nginx

    - name: Ensure nginx is running
      ansible.builtin.service:
        name: nginx
        state: started
        enabled: true

  handlers:
    - name: Reload nginx
      ansible.builtin.service:
        name: nginx
        state: reloaded

Comments¶

## Single-line comment

## Multi-line comment block
## that spans multiple lines
## to explain complex configuration

services:
  web:
    image: nginx:latest  # Inline comment
    ports:
      - "80:80"  # HTTP port
      - "443:443"  # HTTPS port

Anchors and Aliases¶

Reusing Configuration¶

## Define anchor with &
default_settings: &defaults
  timeout: 30
  retries: 3
  log_level: info

## Reuse with *
production:
  <<: *defaults
  environment: production

staging:
  <<: *defaults
  environment: staging
  timeout: 60  # Override specific value

## List anchors
common_env: &common_env
  - name: APP_NAME
    value: my-app
  - name: LOG_LEVEL
    value: info

service_a:
  env: *common_env

service_b:
  env: *common_env

Testing¶

YAML Linting¶

Use yamllint to validate YAML files:

## Install yamllint
pip install yamllint

## Lint single file
yamllint config.yaml

## Lint all YAML files
yamllint .

## Lint with custom config
yamllint -c .yamllint.yaml config.yaml

yamllint Configuration¶

## .yamllint.yaml
extends: default

rules:
  line-length:
    max: 120
    level: warning
  indentation:
    spaces: 2
    indent-sequences: true
  comments:
    min-spaces-from-content: 2
  document-start:
    present: true
  truthy:
    allowed-values: ['true', 'false']

Schema Validation¶

Validate YAML against JSON Schema:

## Install check-jsonschema
pip install check-jsonschema

## Validate against schema
check-jsonschema --schemafile schema.json config.yaml

## Validate multiple files
check-jsonschema --schemafile schema.json configs/*.yaml

Example schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "required": ["version", "services"],
  "properties": {
    "version": {
      "type": "string",
      "pattern": "^[0-9]+\\.[0-9]+$"
    },
    "services": {
      "type": "object",
      "patternProperties": {
        "^[a-z][a-z0-9-]*$": {
          "type": "object",
          "required": ["image"],
          "properties": {
            "image": {
              "type": "string"
            },
            "ports": {
              "type": "array",
              "items": {
                "type": "string",
                "pattern": "^[0-9]+:[0-9]+$"
              }
            }
          }
        }
      }
    }
  }
}

Testing with yq¶

Validate and test YAML structure:

## Check if file is valid YAML
yq eval '.' config.yaml > /dev/null

## Test specific values
version=$(yq eval '.version' config.yaml)
if [ "$version" != "1.0" ]; then
  echo "Invalid version: $version"
  exit 1
fi

## Test array length
count=$(yq eval '.services | length' config.yaml)
if [ "$count" -lt 1 ]; then
  echo "Must have at least one service"
  exit 1
fi

## Test nested values
image=$(yq eval '.services.web.image' config.yaml)
if [ -z "$image" ]; then
  echo "Web service must have image"
  exit 1
fi

Unit Testing YAML¶

## tests/test_yaml_config.py
import yaml
import pytest

def load_yaml(filename):
    with open(filename, 'r') as f:
        return yaml.safe_load(f)

def test_config_structure():
    config = load_yaml('config.yaml')

    assert 'version' in config
    assert 'services' in config
    assert isinstance(config['services'], dict)

def test_service_configuration():
    config = load_yaml('config.yaml')

    for name, service in config['services'].items():
        assert 'image' in service, f"Service {name} missing image"
        assert isinstance(service.get('environment', {}), dict)

def test_environment_specific_config():
    prod_config = load_yaml('config.production.yaml')

    assert prod_config['environment'] == 'production'
    assert prod_config['debug'] is False
    assert 'ssl' in prod_config
    assert prod_config['ssl']['enabled'] is True

@pytest.mark.parametrize("env", ["development", "staging", "production"])
def test_all_environments(env):
    config = load_yaml(f'config.{env}.yaml')

    assert config['environment'] == env
    assert 'database' in config
    assert 'host' in config['database']

CI/CD Integration¶

## .github/workflows/yaml-test.yml
name: YAML Validation

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install yamllint
        run: pip install yamllint

      - name: Lint YAML files
        run: yamllint .

      - name: Install yq
        run: |
          wget https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
          chmod +x yq_linux_amd64
          sudo mv yq_linux_amd64 /usr/local/bin/yq

      - name: Validate structure
        run: |
          for file in config*.yaml; do
            echo "Validating $file"
            yq eval '.' "$file" > /dev/null
          done

  schema-validation:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install check-jsonschema
        run: pip install check-jsonschema

      - name: Validate against schema
        run: |
          check-jsonschema --schemafile schema.json config.yaml

Testing with Docker Compose¶

Test YAML in context:

## tests/test-compose.sh
#!/bin/bash
set -e

echo "Testing docker-compose.yaml..."

## Validate syntax
docker-compose -f docker-compose.yaml config > /dev/null

## Test in dry-run mode
docker-compose -f docker-compose.yaml up --dry-run

## Validate services defined
services=$(docker-compose -f docker-compose.yaml config --services)
expected_services="web db redis"

for service in $expected_services; do
  if ! echo "$services" | grep -q "^${service}$"; then
    echo "ERROR: Service $service not found"
    exit 1
  fi
done

echo "docker-compose.yaml is valid"

Pre-commit Hooks¶

## .pre-commit-config.yaml
repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: check-yaml
        args: ['--safe']

  - repo: https://github.com/adrienverge/yamllint
    rev: v1.33.0
    hooks:
      - id: yamllint
        args: ['-c', '.yamllint.yaml']

  - repo: https://github.com/python-jsonschema/check-jsonschema
    rev: 0.27.0
    hooks:
      - id: check-jsonschema
        name: Validate configs
        files: ^config.*\.yaml$
        args: ['--schemafile', 'schema.json']

Diff Testing¶

Compare YAML configurations:

## Install dyff
brew install homeport/tap/dyff

## Compare configurations
dyff between config.staging.yaml config.production.yaml

## Output in different formats
dyff between --output human config.staging.yaml config.production.yaml
dyff between --output yaml config.staging.yaml config.production.yaml

Security Scanning¶

Scan for secrets in YAML:

## Install detect-secrets
pip install detect-secrets

## Scan YAML files
detect-secrets scan config*.yaml

## Create baseline
detect-secrets scan --baseline .secrets.baseline config*.yaml

## Audit findings
detect-secrets audit .secrets.baseline

Performance Testing¶

Test YAML parsing performance:

## tests/test_yaml_performance.py
import yaml
import time

def test_large_yaml_performance():
    start = time.time()

    with open('large-config.yaml', 'r') as f:
        config = yaml.safe_load(f)

    duration = time.time() - start

    assert duration < 1.0, f"YAML parsing too slow: {duration}s"
    assert config is not None

Security Best Practices¶

Never Store Secrets in YAML¶

YAML files are often committed to version control:

## Bad - Secrets in YAML
database:
  host: db.example.com
  password: MySecretPassword123  # ❌ Exposed in version control!
  api_key: sk-1234567890abcdef   # ❌ Hardcoded secret!

## Good - Environment variable references
database:
  host: ${DB_HOST}
  password: ${DB_PASSWORD}  # ✅ From environment
  api_key: ${API_KEY}

## Good - External secret references
database:
  host: db.example.com
  password: !vault |
    $ANSIBLE_VAULT;1.1;AES256
    ...encrypted...
  api_key: ssm:///myapp/api-key  # AWS Systems Manager Parameter Store

Key Points:

Never commit secrets to YAML files in version control
Use environment variables for sensitive data
Use secret management (Ansible Vault, Sealed Secrets, SOPS)
Scan repositories for accidentally committed secrets
Encrypt sensitive YAML files at rest

Prevent YAML Injection¶

Untrusted YAML can execute arbitrary code in some parsers:

## Bad - Unsafe YAML loading
import yaml

user_input = """
!!python/object/apply:os.system
args: ['rm -rf /']
"""
data = yaml.load(user_input)  # ❌ Code execution vulnerability!

## Good - Safe YAML loading
import yaml

user_input = """
name: John
age: 30
"""
data = yaml.safe_load(user_input)  # ✅ Safe - no code execution

## Good - Validate with schema
from yamale import make_schema, make_data, validate

schema = make_schema('schema.yaml')
data = make_data('config.yaml')
validate(schema, data)  # ✅ Validated against schema

Key Points:

Always use safe_load() instead of load()
Never parse untrusted YAML with yaml.load()
Validate YAML against schemas
Sanitize user inputs before YAML encoding
Use YAML parsers with security in mind

Validate YAML Schema¶

Define and enforce schemas for all YAML configurations:

## schema.yaml (using JSON Schema)
type: object
properties:
  name:
    type: string
    pattern: '^[a-zA-Z0-9_-]+$'
  email:
    type: string
    format: email
  age:
    type: integer
    minimum: 0
    maximum: 150
required:
  - name
  - email
additionalProperties: false  # Prevent unexpected properties

## Good - Validate YAML
import yaml
import jsonschema

with open('schema.yaml') as f:
    schema = yaml.safe_load(f)

with open('config.yaml') as f:
    config = yaml.safe_load(f)

jsonschema.validate(config, schema)  # ✅ Validated

Key Points:

Define schemas for all YAML files
Validate on load
Use additionalProperties: false to prevent injection
Enforce type and format constraints
Fail fast on invalid YAML

File Permissions¶

Protect YAML configuration files:

## Good - Restrictive permissions
# Application configuration
chmod 640 config.yaml
chown app:app config.yaml

# Secrets (Kubernetes secrets, etc.)
chmod 600 secrets.yaml
chown app:app secrets.yaml

# Public configuration
chmod 644 public-config.yaml

Key Points:

Set restrictive file permissions (600-644)
Use appropriate ownership
Never make secrets world-readable
Audit file access regularly
Encrypt sensitive YAML at rest

Kubernetes Secrets¶

Properly handle secrets in Kubernetes YAML:

## Bad - Base64 is NOT encryption!
apiVersion: v1
kind: Secret
metadata:
  name: db-password
type: Opaque
data:
  password: TXlTZWNyZXRQYXNzd29yZDEyMw==  # ❌ Easily decoded!

## Good - Use Sealed Secrets or external secrets
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: db-password
spec:
  encryptedData:
    password: AgB...encrypted...  # ✅ Encrypted with public key

## Good - External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: db-password
spec:
  secretStoreRef:
    name: vault-backend
  target:
    name: db-password
  data:
    - secretKey: password
      remoteRef:
        key: secret/data/database
        property: password

Key Points:

Don't commit Kubernetes Secrets to Git
Use Sealed Secrets or External Secrets Operator
Reference external secret stores (Vault, AWS Secrets Manager)
Enable encryption at rest in etcd
Use RBAC to restrict secret access

YAML Bombs (Billion Laughs Attack)¶

Prevent denial of service from malicious YAML:

## Bad - YAML bomb (exponential expansion)
a: &a ["lol","lol","lol","lol","lol","lol","lol","lol","lol"]
b: &b [*a,*a,*a,*a,*a,*a,*a,*a,*a]
c: &c [*b,*b,*b,*b,*b,*b,*b,*b,*b]
# ... continues to expand exponentially (billions of elements)

## Good - Limit YAML complexity
import yaml

class SafeLoader(yaml.SafeLoader):
    def __init__(self, stream):
        self._depth = 0
        super().__init__(stream)

    def construct_object(self, node, deep=False):
        self._depth += 1
        if self._depth > 50:  # ✅ Limit recursion depth
            raise yaml.YAMLError('Maximum recursion depth exceeded')
        obj = super().construct_object(node, deep)
        self._depth -= 1
        return obj

data = yaml.load(yaml_content, Loader=SafeLoader)

Key Points:

Set maximum recursion/nesting depth
Limit file size for YAML parsing
Implement timeouts for parsing
Monitor memory usage during parsing
Reject malformed YAML early

Common Pitfalls¶

Boolean Value Confusion¶

Issue: Unquoted yes, no, on, off, true, false are interpreted as booleans, not strings.

Example:

## Bad - Unintended boolean conversion
country_codes:
  norway: no  # ❌ Parsed as boolean false, not string "no"
  yemen: yes  # ❌ Parsed as boolean true, not string "yes"
  india: off  # ❌ Parsed as boolean false

switches:
  power: on  # ❌ Parsed as boolean true

Solution: Quote string values that look like booleans.

## Good - Explicit strings
country_codes:
  norway: "no"  # ✅ String "no"
  yemen: "yes"  # ✅ String "yes"
  india: "off"  # ✅ String "off"

switches:
  power: "on"  # ✅ String "on"

## Good - Actual booleans
flags:
  enabled: true  # Boolean
  debug: false   # Boolean

Key Points:

YAML boolean values: true, false, yes, no, on, off
Always quote values if you want literal strings
Use explicit true/false for clarity
Check parser output to verify interpretation

Indentation Errors¶

Issue: Mixing spaces and tabs or incorrect indentation breaks YAML structure.

Example:

## Bad - Inconsistent indentation
server:
  host: localhost
   port: 8080  # ❌ 3 spaces instead of 2
  database:
 name: mydb  # ❌ Tab character!
 user: admin

Solution: Use consistent spaces (2 or 4) throughout.

## Good - Consistent 2-space indentation
server:
  host: localhost
  port: 8080
  database:
    name: mydb
    user: admin

Key Points:

YAML forbids tabs for indentation
Use 2 or 4 spaces consistently
Configure editor to convert tabs to spaces
Use YAML linter to catch indentation errors

Anchor and Alias Typos¶

Issue: Referencing non-existent anchors or typos in anchor names causes parsing errors.

Example:

## Bad - Anchor/alias mismatch
defaults: &defaults
  timeout: 30
  retries: 3

production:
  <<: *default  # ❌ Typo! Should be *defaults
  host: prod.example.com

Solution: Verify anchor names match alias references.

## Good - Matching anchor and alias
defaults: &defaults
  timeout: 30
  retries: 3

production:
  <<: *defaults  # ✅ Correct reference
  host: prod.example.com

development:
  <<: *defaults  # ✅ Reusing anchor
  host: dev.example.com

Key Points:

Anchors: &anchor_name
Aliases: *anchor_name
Merge: <<: *anchor_name
Anchor must be defined before use

Multiline String Confusion¶

Issue: Choosing wrong multiline string style (|, >, |-, >-) for the use case.

Example:

## Bad - Using | when > is better
description: |
  This is a long description that should be on one line
  but was split across multiple lines using the literal
  style which preserves newlines.

## Bad - Using > when | is needed
script: >
  #!/bin/bash
  set -e
  echo "Line 1"
  echo "Line 2"

Solution: Use | for literals (preserve newlines), > for folding (join lines).

## Good - Folded for paragraphs
description: >
  This is a long description that will be folded
  into a single line with spaces replacing the
  newlines. Perfect for prose.

## Good - Literal for scripts
script: |
  #!/bin/bash
  set -e
  echo "Line 1"
  echo "Line 2"

## Good - Strip trailing newlines with -
command: |-
  docker run \
    --name myapp \
    myimage:latest

Key Points:

| (literal): Preserves newlines and indentation
> (folded): Joins lines with spaces
|- and >-: Strip final newline
|+ and >+: Keep final newlines

Duplicate Keys Silently Overwriting¶

Issue: YAML allows duplicate keys; last value wins without warning.

Example:

## Bad - Duplicate keys
server:
  port: 8080  # First definition
  host: localhost
  port: 9000  # ❌ Silently overwrites first value!

## Result: port = 9000

Solution: Use unique keys or YAML linter to detect duplicates.

## Good - Unique keys
server:
  http_port: 8080
  grpc_port: 9000
  host: localhost

## Or use linter to catch duplicates

Key Points:

YAML allows duplicate keys (last wins)
Use YAML linter with key-duplicates: enable
Duplicate keys often indicate copy-paste errors
Some parsers can be configured to error on duplicates

Anti-Patterns¶

❌ Avoid: Tabs for Indentation¶

## Bad - Using tabs
services:
 web:
  image: nginx

## Good - Using 2 spaces
services:
  web:
    image: nginx

❌ Avoid: Inconsistent Indentation¶

## Bad - Inconsistent spacing
services:
  web:
      image: nginx
    ports:
     - "80:80"

## Good - Consistent 2-space indentation
services:
  web:
    image: nginx
    ports:
      - "80:80"

❌ Avoid: Mixing Styles¶

## Bad - Mixing block and flow styles
services:
  web: {image: nginx, ports: ["80:80"]}
  db:
    image: postgres
    ports:
      - "5432:5432"

## Good - Consistent block style
services:
  web:
    image: nginx
    ports:
      - "80:80"
  db:
    image: postgres
    ports:
      - "5432:5432"

❌ Avoid: Unquoted Special Values¶

## Bad - Unquoted values that could be misinterpreted
version: 3.8          # Becomes float 3.8
enabled: yes          # Becomes boolean true
country: NO           # Becomes boolean false (Norway code!)
version_string: 1.20  # Becomes float 1.2

## Good - Quote strings
version: "3.8"
enabled: "yes"
country: "NO"
version_string: "1.20"

❌ Avoid: Duplicate Keys¶

## Bad - Duplicate keys (last one wins)
database:
  host: localhost
  port: 5432
  host: prod-db.example.com  # ❌ Overwrites previous host

## Good - Unique keys
database:
  host: prod-db.example.com
  port: 5432

❌ Avoid: Not Using Anchors and Aliases¶

## Bad - Repeated configuration
services:
  web1:
    image: nginx:latest
    restart: always
    logging:
      driver: json-file
      options:
        max-size: "10m"
  web2:
    image: nginx:latest
    restart: always
    logging:
      driver: json-file
      options:
        max-size: "10m"

## Good - Use anchors and aliases
x-common-config: &common
  restart: always
  logging:
    driver: json-file
    options:
      max-size: "10m"

services:
  web1:
    <<: *common
    image: nginx:latest
  web2:
    <<: *common
    image: nginx:latest

❌ Avoid: Complex Multi-line Strings Without Proper Style¶

## Bad - Unclear multi-line handling
description: This is a very long description that
spans multiple lines but doesn't specify
how line breaks should be handled

## Good - Use | for literal style or > for folded
description_literal: |
  This preserves line breaks.
  Each line appears exactly as written.
  Great for scripts or formatted text.

description_folded: >
  This folds lines into a single line.
  Line breaks become spaces.
  Great for long paragraphs.

Advanced YAML Linting¶

Advanced yamllint Configuration¶

.yamllint:

---
extends: default

rules:
  line-length:
    max: 120
    level: warning
  indentation:
    spaces: 2
    indent-sequences: true
  comments:
    min-spaces-from-content: 2
  braces:
    min-spaces-inside: 0
    max-spaces-inside: 1
  brackets:
    min-spaces-inside: 0
    max-spaces-inside: 1
  trailing-spaces: enable
  truthy:
    allowed-values: ['true', 'false']

Running yamllint¶

## Lint all YAML files
yamllint .

## Lint specific file
yamllint config.yaml

## Lint with custom config
yamllint -c .yamllint .

## Format output
yamllint -f parsable .

Advanced Schema Validation¶

Using JSON Schema for Complex Validation¶

## config.yaml
database:
  host: localhost
  port: 5432
  username: admin
  max_connections: 100

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "database": {
      "type": "object",
      "properties": {
        "host": { "type": "string" },
        "port": { "type": "integer", "minimum": 1, "maximum": 65535 },
        "username": { "type": "string" },
        "max_connections": { "type": "integer", "minimum": 1 }
      },
      "required": ["host", "port"]
    }
  }
}

Tool Configurations¶

VSCode settings.json¶

{
  "yaml.schemas": {
    "https://json.schemastore.org/github-workflow.json": ".github/workflows/*.yaml",
    "https://json.schemastore.org/docker-compose.json": "docker-compose*.yaml",
    "kubernetes": "k8s/**/*.yaml"
  },
  "yaml.format.enable": true,
  "yaml.format.singleQuote": false,
  "yaml.validate": true,
  "yaml.completion": true,
  "[yaml]": {
    "editor.insertSpaces": true,
    "editor.tabSize": 2,
    "editor.autoIndent": "advanced"
  }
}

Best Practices¶

Use Consistent Indentation¶

Always use 2 spaces (never tabs):

# Good - Consistent 2-space indentation
services:
  web:
    image: nginx:latest
    ports:
      - "80:80"
    environment:
      - NODE_ENV=production

Quote Strings When Needed¶

Quote strings that could be misinterpreted:

# Good - Explicit quoting
version: "3.8"  # Quoted to preserve as string
port: 8080      # Number doesn't need quotes
enabled: true   # Boolean doesn't need quotes
name: "yes"     # Quote reserved words
config: "true"  # Quote boolean-like strings

# Strings with special characters
message: "Hello: World"
path: "C:\\Users\\Admin"

Use Anchors and Aliases for DRY¶

Reuse configuration with anchors (&) and aliases (*):

# Define anchor
defaults: &defaults
  cpu: "100m"
  memory: "128Mi"
  timeout: 30

# Reuse with alias
web:
  <<: *defaults
  replicas: 3

api:
  <<: *defaults
  replicas: 5
  memory: "256Mi"  # Override specific value

Validate YAML Before Deployment¶

Always validate YAML syntax:

# Lint YAML files
yamllint config.yaml

# Validate Kubernetes manifests
kubectl apply --dry-run=client -f deployment.yaml

# Validate Docker Compose
docker compose config

Use Multi-line Strings Appropriately¶

Choose the right multi-line syntax:

# Literal block (|) - preserves newlines
script: |
  #!/bin/bash
  echo "Line 1"
  echo "Line 2"

# Folded block (>) - folds newlines to spaces
description: >
  This is a long description
  that will be folded into
  a single line with spaces.

# Literal with strip (|-) - removes trailing newlines
config: |-
  key1=value1
  key2=value2

Organize Keys Logically¶

Group related keys together:

# Good - Logical organization
apiVersion: v1
kind: Service
metadata:
  name: my-service
  namespace: production
  labels:
    app: web
    tier: frontend
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080

Avoid Complex Nesting¶

Keep nesting levels reasonable (max 4 levels):

# Bad - Too deeply nested
app:
  services:
    backend:
      config:
        database:
          connection:
            pool:
              size: 10

# Good - Flattened structure or split into multiple files
database_pool_size: 10

Use Lists for Multiple Items¶

Always use lists for collections:

# Good - List syntax
ports:
  - 80
  - 443
  - 8080

environments:
  - name: NODE_ENV
    value: production
  - name: PORT
    value: "3000"

# Inline list (use sparingly)
tags: [web, frontend, production]

Comment Complex Configurations¶

Add comments to explain non-obvious configurations:

# Database connection pool settings
# Increased from 10 to 20 based on load testing results (PERF-123)
database:
  pool:
    min: 5
    max: 20
    acquire_timeout: 30000  # milliseconds

# Health check configuration
# More aggressive checks after incident INC-456
healthcheck:
  interval: 10s  # Check every 10 seconds
  timeout: 5s    # Timeout after 5 seconds
  retries: 3     # Retry 3 times before marking unhealthy

Separate Environment Configurations¶

Use separate YAML files for different environments:

# base-config.yaml (shared)
app:
  name: myapp
  version: "1.0.0"

# production-config.yaml
app:
  replicas: 3
  resources:
    limits:
      cpu: "1000m"
      memory: "1Gi"

# dev-config.yaml
app:
  replicas: 1
  resources:
    limits:
      cpu: "200m"
      memory: "256Mi"

Use Schema Validation¶

Validate against JSON Schema:

# With $schema reference
$schema: https://json.schemastore.org/github-workflow.json

name: CI Pipeline
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

Handle Null Values Explicitly¶

Be explicit about null values:

# Explicit null
user:
  name: John
  middle_name: null
  email: john@example.com

# Or omit null fields entirely
user:
  name: John
  email: john@example.com

Version Your Configuration¶

Include version information in YAML files:

# Kubernetes uses apiVersion
apiVersion: apps/v1
kind: Deployment

# Docker Compose uses version
version: "3.8"
services:
  web:
    image: nginx:latest

# Custom configs should include version
config_version: "2.0"
settings:
  timeout: 30

References¶

Official Documentation¶

Tools¶

yamllint - YAML linter
yq - YAML processor (like jq for YAML)
YAML Validator - Online YAML validator

Schema Repositories¶

JSON Schema Store - Common YAML/JSON schemas

Status: Active