Kubernetes & Helm

Language Overview¶

Kubernetes is a container orchestration platform for automating deployment, scaling, and management of containerized applications. Helm is the package manager for Kubernetes, using charts to define, install, and upgrade applications.

Key Characteristics¶

Paradigm: Declarative infrastructure as code
Language: YAML manifests
Version Support: Kubernetes 1.31.x through 1.33.x
Package Manager: Helm 3.x (chartless installation)
Modern Approach: Helm charts for reusable application definitions

Primary Use Cases¶

Container orchestration
Microservices deployment
Application scaling and rolling updates
Service discovery and load balancing
Configuration and secret management

Quick Reference¶

Category	Convention	Example	Notes
Naming
Resources	`kebab-case`	`my-app-deployment`, `web-service`	Lowercase with hyphens
Namespaces	`kebab-case`	`production`, `staging`	Environment or team based
Labels	`kebab-case` keys	`app: my-app`, `env: prod`	Consistent label keys
Helm Charts	`kebab-case`	`my-application`	Chart directory name
Resource Types
Deployment	Application workloads	`kind: Deployment`	Stateless apps
StatefulSet	Stateful workloads	`kind: StatefulSet`	Databases, persistent apps
Service	Network services	`kind: Service`	Load balancing, discovery
ConfigMap	Configuration	`kind: ConfigMap`	Non-sensitive config
Secret	Sensitive data	`kind: Secret`	Passwords, tokens
Ingress	HTTP routing	`kind: Ingress`	External access
File Naming
Manifests	`resource-type.yaml`	`deployment.yaml`, `service.yaml`	One resource per file
Combined	`app-name.yaml`	`my-app.yaml`	All resources together
Helm Values	`values.yaml`	`values.yaml`, `values-prod.yaml`	Chart values
Labels
app	Application name	`app: nginx`	Required label
version	App version	`version: "1.0.0"`	Deployment tracking
environment	Environment	`environment: production`	Env identification
Best Practices
Resource Limits	Always set	`limits:` and `requests:`	CPU and memory
Readiness Probes	Define probes	`readinessProbe:`	Health checking
Namespaces	Use namespaces	Isolate workloads	Multi-tenancy
Helm Charts	Package with Helm	Reusable templates	DRY principle

Naming Conventions¶

Resource Names¶

Use kebab-case for all Kubernetes resource names:

## Good
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
  namespace: production

## Bad
metadata:
  name: webApplication  # camelCase - avoid
  name: web_application  # snake_case - avoid

Namespace Conventions¶

## Environment-based namespaces
production
staging
development

## Team or project-based namespaces
team-platform
team-backend
project-analytics

## System namespaces (reserved)
kube-system
kube-public
kube-node-lease
default

Label Standards¶

Required Labels¶

Apply these labels to ALL resources:

metadata:
  labels:
    app.kubernetes.io/name: nginx
    app.kubernetes.io/instance: nginx-production
    app.kubernetes.io/version: "1.24.0"
    app.kubernetes.io/component: webserver
    app.kubernetes.io/part-of: ecommerce-platform
    app.kubernetes.io/managed-by: helm

Label Descriptions¶

app.kubernetes.io/name: "nginx"           # Application name
app.kubernetes.io/instance: "nginx-prod"  # Unique instance identifier
app.kubernetes.io/version: "1.24.0"       # Application version
app.kubernetes.io/component: "webserver"  # Component within architecture
app.kubernetes.io/part-of: "platform"     # Application group/system
app.kubernetes.io/managed-by: "helm"      # Tool managing the resource

Custom Labels¶

metadata:
  labels:
    # Standard labels
    app.kubernetes.io/name: api
    app.kubernetes.io/instance: api-production
    # Custom labels
    environment: production
    team: backend
    cost-center: engineering

Annotation Patterns¶

metadata:
  annotations:
    # Deployment metadata
    kubernetes.io/change-cause: "Update to v1.2.3"
    deployment.kubernetes.io/revision: "5"

    # Documentation
    description: "User authentication API"
    contact: "platform-team@example.com"
    documentation: "https://docs.example.com/api"

    # Monitoring and alerting
    prometheus.io/scrape: "true"
    prometheus.io/port: "9090"
    prometheus.io/path: "/metrics"

    # Service mesh (Istio/Linkerd)
    sidecar.istio.io/inject: "true"
    linkerd.io/inject: enabled

Deployment Manifests¶

---
## @module web-application-deployment
## @description Production deployment for web application
## @version 1.0.0
## @author Tyler Dukes
## @last_updated 2025-10-28

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-application
  namespace: production
  labels:
    app.kubernetes.io/name: web-application
    app.kubernetes.io/instance: web-production
    app.kubernetes.io/version: "1.2.3"
    app.kubernetes.io/component: frontend
    app.kubernetes.io/part-of: ecommerce
    app.kubernetes.io/managed-by: helm
spec:
  replicas: 3
  revisionHistoryLimit: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app.kubernetes.io/name: web-application
      app.kubernetes.io/instance: web-production
  template:
    metadata:
      labels:
        app.kubernetes.io/name: web-application
        app.kubernetes.io/instance: web-production
        app.kubernetes.io/version: "1.2.3"
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      serviceAccountName: web-application
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      containers:
        - name: web
          image: myregistry.com/web-application:1.2.3
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: APP_ENV
              value: "production"
            - name: DATABASE_HOST
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: database_host
            - name: DATABASE_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: app-secrets
                  key: database_password
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          startupProbe:
            httpGet:
              path: /startup
              port: http
            initialDelaySeconds: 0
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 30
          volumeMounts:
            - name: config
              mountPath: /etc/app/config
              readOnly: true
            - name: cache
              mountPath: /var/cache/app
      volumes:
        - name: config
          configMap:
            name: app-config
        - name: cache
          emptyDir: {}

Service Definitions¶

---
apiVersion: v1
kind: Service
metadata:
  name: web-application
  namespace: production
  labels:
    app.kubernetes.io/name: web-application
    app.kubernetes.io/instance: web-production
spec:
  type: ClusterIP
  ports:
    - name: http
      port: 80
      targetPort: http
      protocol: TCP
  selector:
    app.kubernetes.io/name: web-application
    app.kubernetes.io/instance: web-production

---
## LoadBalancer service
apiVersion: v1
kind: Service
metadata:
  name: web-application-public
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  ports:
    - name: https
      port: 443
      targetPort: http
      protocol: TCP
  selector:
    app.kubernetes.io/name: web-application

ConfigMap and Secret Patterns¶

ConfigMap¶

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
  labels:
    app.kubernetes.io/name: web-application
data:
  app.env: "production"
  database_host: "postgres.production.svc.cluster.local"
  database_port: "5432"
  redis_host: "redis.production.svc.cluster.local"
  log_level: "info"

  # Configuration file
  nginx.conf: |
    server {
        listen 8080;
        location / {
            proxy_pass http://backend:8080;
        }
    }

Secret¶

---
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
  namespace: production
  labels:
    app.kubernetes.io/name: web-application
type: Opaque
stringData:
  database_password: "super-secret-password"
  api_key: "secret-api-key-12345"
  jwt_secret: "jwt-signing-secret"

## Use external secret management
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
  namespace: production
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: app-secrets
  data:
    - secretKey: database_password
      remoteRef:
        key: production/database
        property: password

Resource Limits and Requests¶

Guidelines¶

## Development
resources:
  requests:
    cpu: 50m       # 0.05 CPU cores
    memory: 64Mi
  limits:
    cpu: 200m      # 0.2 CPU cores
    memory: 256Mi

## Staging
resources:
  requests:
    cpu: 100m      # 0.1 CPU cores
    memory: 128Mi
  limits:
    cpu: 500m      # 0.5 CPU cores
    memory: 512Mi

## Production
resources:
  requests:
    cpu: 250m      # 0.25 CPU cores
    memory: 512Mi
  limits:
    cpu: 1000m     # 1 CPU core
    memory: 2Gi

Quality of Service (QoS) Classes¶

## Guaranteed QoS - requests == limits
resources:
  requests:
    cpu: 500m
    memory: 1Gi
  limits:
    cpu: 500m
    memory: 1Gi

## Burstable QoS - requests < limits
resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 1Gi

## BestEffort QoS - no requests or limits (avoid in production)

Health Probes¶

Liveness Probe¶

Restarts container if probe fails:

livenessProbe:
  httpGet:
    path: /health
    port: 8080
    httpHeaders:
      - name: X-Health-Check
        value: liveness
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  successThreshold: 1
  failureThreshold: 3

Readiness Probe¶

Removes pod from service endpoints if probe fails:

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5
  timeoutSeconds: 3
  successThreshold: 1
  failureThreshold: 3

Startup Probe¶

Delays liveness/readiness probes during slow application startup:

startupProbe:
  httpGet:
    path: /startup
    port: 8080
  initialDelaySeconds: 0
  periodSeconds: 5
  timeoutSeconds: 3
  successThreshold: 1
  failureThreshold: 30  # 30 * 5s = 150s max startup time

Probe Types¶

## HTTP probe
httpGet:
  path: /health
  port: 8080
  scheme: HTTP

## TCP probe
tcpSocket:
  port: 5432

## Command probe
exec:
  command:
    - /bin/sh
    - -c
    - pg_isready -U postgres

Helm Chart Structure¶

my-application/
├── Chart.yaml              # Chart metadata
├── values.yaml             # Default configuration values
├── values-dev.yaml         # Development overrides
├── values-prod.yaml        # Production overrides
├── charts/                 # Dependency charts
├── templates/
│   ├── _helpers.tpl        # Template helpers
│   ├── deployment.yaml     # Deployment manifest
│   ├── service.yaml        # Service manifest
│   ├── ingress.yaml        # Ingress manifest
│   ├── configmap.yaml      # ConfigMap
│   ├── secret.yaml         # Secret
│   ├── serviceaccount.yaml # ServiceAccount
│   ├── hpa.yaml            # HorizontalPodAutoscaler
│   ├── pdb.yaml            # PodDisruptionBudget
│   └── NOTES.txt           # Post-install notes
├── .helmignore             # Files to exclude
└── README.md               # Chart documentation

Chart.yaml¶

apiVersion: v2
name: web-application
description: A Helm chart for web application deployment
type: application
version: 1.0.0
appVersion: "1.2.3"
keywords:
  - web
  - api
  - application
home: https://example.com
sources:
  - https://github.com/example/web-application
maintainers:
  - name: Tyler Dukes
    email: tyler@example.com
dependencies:
  - name: postgresql
    version: "12.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: postgresql.enabled
  - name: redis
    version: "17.x.x"
    repository: "https://charts.bitnami.com/bitnami"
    condition: redis.enabled

values.yaml Patterns¶

## values.yaml
---
## Application configuration
replicaCount: 3

image:
  repository: myregistry.com/web-application
  pullPolicy: IfNotPresent
  tag: ""  # Defaults to Chart.appVersion

imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "8080"

podSecurityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

securityContext:
  allowPrivilegeEscalation: false
  capabilities:
    drop:
      - ALL
  readOnlyRootFilesystem: true

service:
  type: ClusterIP
  port: 80
  targetPort: http

ingress:
  enabled: true
  className: nginx
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 80
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}
tolerations: []
affinity: {}

## Application-specific configuration
config:
  environment: production
  logLevel: info
  database:
    host: postgres.production.svc.cluster.local
    port: 5432

## Secret management
secrets:
  databasePassword: ""
  apiKey: ""

Helper Templates (_helpers.tpl)¶

{{/*
Expand the name of the chart.
*/}}
{{- define "web-application.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
*/}}
{{- define "web-application.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "web-application.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "web-application.labels" -}}
helm.sh/chart: {{ include "web-application.chart" . }}
{{ include "web-application.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}

{{/*
Selector labels
*/}}
{{- define "web-application.selectorLabels" -}}
app.kubernetes.io/name: {{ include "web-application.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}

{{/*
Create the name of the service account to use
*/}}
{{- define "web-application.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "web-application.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}

Helm Template Example¶

## templates/deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "web-application.fullname" . }}
  labels:
    {{- include "web-application.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "web-application.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "web-application.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "web-application.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      containers:
        - name: {{ .Chart.Name }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          env:
            - name: APP_ENV
              value: {{ .Values.config.environment }}
            - name: LOG_LEVEL
              value: {{ .Values.config.logLevel }}
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

Helm Commands¶

## Install chart
helm install my-app ./my-application -n production

## Install with custom values
helm install my-app ./my-application \
  -f values-prod.yaml \
  -n production \
  --create-namespace

## Upgrade release
helm upgrade my-app ./my-application \
  -f values-prod.yaml \
  -n production

## Upgrade with rollback on failure
helm upgrade my-app ./my-application \
  -f values-prod.yaml \
  --atomic \
  --timeout 5m

## Dry run / template rendering
helm install my-app ./my-application \
  --dry-run \
  --debug \
  -f values-prod.yaml

## Lint chart
helm lint ./my-application

## Package chart
helm package ./my-application

## List releases
helm list -n production

## Rollback
helm rollback my-app 5 -n production

## Uninstall
helm uninstall my-app -n production

Testing¶

Testing with kubeval¶

Validate Kubernetes YAML manifests:

## Install kubeval
brew install kubeval

## Validate manifest
kubeval deployment.yaml

## Validate multiple files
kubeval manifests/*.yaml

## Validate against specific Kubernetes version
kubeval --kubernetes-version 1.32.0 deployment.yaml

## Strict mode (fail on warnings)
kubeval --strict deployment.yaml

Testing with kubeconform¶

More comprehensive validation:

## Install kubeconform
brew install kubeconform

## Validate manifests
kubeconform manifests/

## Validate with CRDs
kubeconform -schema-location default \
  -schema-location 'crds/{{.ResourceKind}}.json' \
  manifests/

## Output in JSON
kubeconform -output json manifests/

Testing with kube-score¶

Analyze manifests for best practices:

## Install kube-score
brew install kube-score

## Analyze deployment
kube-score score deployment.yaml

## Check all manifests
kube-score score manifests/*.yaml

## Ignore specific checks
kube-score score --ignore-test pod-networkpolicy deployment.yaml

Unit Testing with conftest¶

Policy-based testing for Kubernetes:

## Install conftest
brew install conftest

## Test Kubernetes manifests
conftest test deployment.yaml

## Custom policy
conftest test -p policy/ deployment.yaml

Example policy:

## policy/kubernetes.rego
package main

deny[msg] {
  input.kind == "Deployment"
  not input.spec.template.spec.securityContext.runAsNonRoot
  msg := "Containers must not run as root"
}

deny[msg] {
  input.kind == "Deployment"
  container := input.spec.template.spec.containers[_]
  not container.resources.limits
  msg := sprintf("Container %s must have resource limits", [container.name])
}

warn[msg] {
  input.kind == "Service"
  input.spec.type == "LoadBalancer"
  msg := "Consider using Ingress instead of LoadBalancer"
}

Integration Testing with kind¶

Test on local Kubernetes cluster:

## Create kind cluster
kind create cluster --name test-cluster

## Apply manifests
kubectl apply -f manifests/

## Run tests
kubectl wait --for=condition=available --timeout=60s \
  deployment/myapp

## Test service endpoints
kubectl run test-pod --image=curlimages/curl --rm -it -- \
  curl http://myapp-service:80/health

## Cleanup
kind delete cluster --name test-cluster

E2E Testing Script¶

## tests/e2e-test.sh
#!/bin/bash
set -e

# Create kind cluster
echo "Creating test cluster..."
kind create cluster --name e2e-test --wait 60s

# Apply manifests
echo "Applying manifests..."
kubectl apply -f manifests/

# Wait for deployment
echo "Waiting for deployment..."
kubectl wait --for=condition=available --timeout=300s \
  deployment/myapp -n default

# Test application
echo "Testing application..."
kubectl port-forward svc/myapp-service 8080:80 &
PF_PID=$!
sleep 5

response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
if [ "$response" != "200" ]; then
  echo "Health check failed: $response"
  kill $PF_PID
  kind delete cluster --name e2e-test
  exit 1
fi

echo "Tests passed!"
kill $PF_PID
kind delete cluster --name e2e-test

Testing with Helm¶

Test Helm charts:

## Lint Helm chart
helm lint ./mychart

## Dry run install
helm install myapp ./mychart --dry-run --debug

## Template and validate
helm template myapp ./mychart | kubeval -

## Test with specific values
helm install myapp ./mychart --dry-run \
  --values test-values.yaml

Chart Testing¶

## ct.yaml (Chart Testing config)
chart-dirs:
  - charts
chart-repos:
  - bitnami=https://charts.bitnami.com/bitnami
helm-extra-args: --timeout 600s

## Install ct
brew install chart-testing

## Lint charts
ct lint --config ct.yaml

## Test charts in kind
ct install --config ct.yaml

CI/CD Integration¶

## .github/workflows/k8s-test.yml
name: Kubernetes Tests

on: [push, pull_request]

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install tools
        run: |
          curl -L https://github.com/kubeval/kubeval/releases/latest/download/kubeval-linux-amd64.tar.gz | tar xz
          sudo mv kubeval /usr/local/bin

      - name: Validate manifests
        run: kubeval manifests/*.yaml

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Create kind cluster
        uses: helm/kind-action@v1

      - name: Deploy and test
        run: |
          kubectl apply -f manifests/
          kubectl wait --for=condition=available --timeout=60s deployment/myapp
          kubectl get pods

Testing RBAC¶

Test Role-Based Access Control:

## Test if service account can perform action
kubectl auth can-i create pods \
  --as=system:serviceaccount:default:myapp

## Test with specific permissions
kubectl auth can-i delete deployments \
  --as=system:serviceaccount:default:myapp \
  -n production

Resource Quota Testing¶

## Apply resource quota
kubectl apply -f resourcequota.yaml

## Try to create pod that exceeds quota
kubectl apply -f test-pod.yaml

## Verify quota enforcement
kubectl describe resourcequota -n test-namespace

Network Policy Testing¶

Test network isolation:

## Apply network policy
kubectl apply -f networkpolicy.yaml

## Test connectivity (should fail)
kubectl run test-pod --image=curlimages/curl --rm -it -- \
  curl --max-time 5 http://restricted-service

## Test from allowed pod (should succeed)
kubectl run allowed-pod -l app=allowed --image=curlimages/curl --rm -it -- \
  curl http://restricted-service

Performance Testing¶

## Load test with k6
cat <<EOF | k6 run -
import http from 'k6/http';
import { check } from 'k6';

export let options = {
  vus: 10,
  duration: '30s',
};

export default function() {
  let res = http.get('http://myapp-service');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
}
EOF

Snapshot Testing¶

Test manifest rendering:

## Generate manifests
kustomize build overlays/production > snapshot.yaml

## Compare with previous snapshot
diff snapshot-previous.yaml snapshot.yaml

## Update snapshot if changes are expected
cp snapshot.yaml snapshot-previous.yaml

Common Pitfalls¶

Selector Label Mismatch¶

Issue: Pod template labels don't match deployment selector, causing deployment to never become ready.

Example:

## Bad - Mismatched labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: web-app  # Selector label
  template:
    metadata:
      labels:
        app: webapp  # ❌ Different label! Doesn't match selector
    spec:
      containers:
      - name: app
        image: myapp:1.0

Solution: Ensure selector labels exactly match template labels.

## Good - Matching labels
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: webapp  # ✅ Matches template
  template:
    metadata:
      labels:
        app: webapp  # ✅ Matches selector
        version: "1.0"  # Additional labels are OK
    spec:
      containers:
      - name: app
        image: myapp:1.0

Key Points:

Selector labels must be subset of template labels
Template can have additional labels beyond selector
Changing selector requires deleting and recreating deployment
Use consistent label keys across all resources

Resource Limits Without Requests¶

Issue: Setting limits without requests causes pods to get BestEffort QoS and be first to evict.

Example:

## Bad - Only limits, no requests
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    resources:
      limits:
        memory: "512Mi"
        cpu: "500m"
      ## ❌ No requests! Gets BestEffort QoS

Solution: Always set both requests and limits.

## Good - Both requests and limits
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    resources:
      requests:
        memory: "256Mi"  # ✅ Guaranteed allocation
        cpu: "250m"
      limits:
        memory: "512Mi"  # Maximum allowed
        cpu: "500m"

Key Points:

Always set requests to get Burstable or Guaranteed QoS
Requests determine pod scheduling and eviction priority
requests == limits gives Guaranteed QoS (highest priority)
Missing requests results in BestEffort QoS (first to evict)

Readiness Probe Pointing to Wrong Port¶

Issue: Readiness probe checks wrong port, causing traffic to be sent to pods that aren't actually ready.

Example:

## Bad - Wrong port in probe
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    ports:
    - containerPort: 8080
      name: http
    readinessProbe:
      httpGet:
        port: 80  # ❌ Wrong port! App runs on 8080
        path: /health

Solution: Use named ports or verify port numbers.

## Good - Correct port reference
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    ports:
    - containerPort: 8080
      name: http  # Named port
    readinessProbe:
      httpGet:
        port: http  # ✅ References named port
        path: /health
    livenessProbe:
      httpGet:
        port: 8080  # ✅ Or use exact port number
        path: /health

Key Points:

Use named ports for better readability and maintainability
Verify probe port matches container port
Test probes with kubectl exec before deployment
Check probe logs with kubectl describe pod

ConfigMap Volume Mount Overwrites Directory¶

Issue: Mounting ConfigMap to directory overwrites all existing files in that directory.

Example:

## Bad - Overwrites entire /etc/config directory
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: config
      mountPath: /etc/config  # ❌ Overwrites everything in /etc/config
  volumes:
  - name: config
    configMap:
      name: app-config

Solution: Use subPath to mount specific files or mount to dedicated directory.

## Good - Mount specific file with subPath
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: config
      mountPath: /etc/config/app.conf  # ✅ Specific file
      subPath: app.conf  # File from ConfigMap
  volumes:
  - name: config
    configMap:
      name: app-config

## Good - Mount to dedicated directory
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    volumeMounts:
    - name: config
      mountPath: /app/config  # ✅ Dedicated directory
  volumes:
  - name: config
    configMap:
      name: app-config

Key Points:

ConfigMap mount replaces all files in target directory
Use subPath to mount individual files
Mount to dedicated directories to avoid conflicts
Consider using environment variables for simple configs

Service Selector Doesn't Match Pods¶

Issue: Service selector doesn't match pod labels, causing no endpoints and connection failures.

Example:

## Bad - Service selector doesn't match pods
apiVersion: v1
kind: Service
metadata:
  name: webapp
spec:
  selector:
    app: web  # Selector
  ports:
  - port: 80
    targetPort: 8080

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: webapp  # ❌ Doesn't match service selector!
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: app
        image: myapp

Solution: Ensure service selector matches pod labels.

## Good - Service selector matches pods
apiVersion: v1
kind: Service
metadata:
  name: webapp
spec:
  selector:
    app: webapp  # ✅ Matches deployment labels
  ports:
  - port: 80
    targetPort: 8080

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp
spec:
  selector:
    matchLabels:
      app: webapp  # ✅ Matches service selector
  template:
    metadata:
      labels:
        app: webapp  # ✅ Matches service selector
    spec:
      containers:
      - name: app
        image: myapp
        ports:
        - containerPort: 8080

Key Points:

Service selector must match pod labels exactly
Check service endpoints: kubectl get endpoints webapp
Use consistent labeling across all resources
Service doesn't care about deployment selector, only pod labels

Anti-Patterns¶

❌ Avoid: latest Tag¶

## Bad - Unpredictable deployments
image: nginx:latest

## Good - Pin specific versions
image: nginx:1.24.0
image: nginx:1.24.0-alpine

❌ Avoid: No Resource Limits¶

## Bad - Can cause node resource exhaustion
containers:
  - name: app
    image: myapp:1.0.0

## Good - Define limits
containers:
  - name: app
    image: myapp:1.0.0
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 500m
        memory: 512Mi

❌ Avoid: Running as Root¶

## Bad - Security risk
securityContext:
  runAsUser: 0

## Good - Run as non-root
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  fsGroup: 1000

❌ Avoid: Missing Health Probes¶

## Bad - No health checks
containers:
  - name: app
    image: myapp:1.0.0

## Good - Include probes
containers:
  - name: app
    image: myapp:1.0.0
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080

❌ Avoid: Storing Secrets in ConfigMaps¶

## Bad - Secrets in ConfigMap (visible in plain text)
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  database_password: "MySecretPassword"  # ❌ Plain text!
  api_key: "sk-1234567890"              # ❌ Plain text!

## Good - Use Secrets with proper encryption
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
stringData:
  database_password: "MySecretPassword"  # ✅ Base64 encoded
  api_key: "sk-1234567890"              # ✅ Base64 encoded

## Better - Use external secret management
## Sealed Secrets, External Secrets Operator, or cloud provider KMS

❌ Avoid: No Pod Disruption Budgets¶

## Bad - No protection during cluster maintenance
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  # No PodDisruptionBudget - all pods could be terminated at once

## Good - Define disruption budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-pdb
spec:
  minAvailable: 2  # ✅ Always keep 2 pods running
  selector:
    matchLabels:
      app: web

❌ Avoid: Missing Network Policies¶

## Bad - No network restrictions (pods can talk to anything)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
spec:
  # No NetworkPolicy - unrestricted network access

## Good - Restrict network traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: backend-netpol
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Ingress
    - Egress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app: frontend
      ports:
        - protocol: TCP
          port: 8080
  egress:
    - to:
        - podSelector:
            matchLabels:
              app: database
      ports:
        - protocol: TCP
          port: 5432

Security Best Practices¶

Pod Security Standards¶

Use Pod Security Standards to enforce security policies.

## Bad - Running as root with privileges
apiVersion: v1
kind: Pod
metadata:
  name: insecure-pod
spec:
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      privileged: true  # NEVER in production!
      runAsUser: 0      # Running as root!

## Good - Non-root with security contexts
apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
  - name: app
    image: myapp:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop:
        - ALL
    volumeMounts:
    - name: tmp
      mountPath: /tmp
  volumes:
  - name: tmp
    emptyDir: {}

Secrets Management¶

Never hardcode sensitive data in manifests.

## Bad - Secrets in plain text
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    env:
    - name: DB_PASSWORD
      value: "SuperSecret123"  # EXPOSED!
    - name: API_KEY
      value: "sk_live_abc123"   # In version control!

## Good - Use Kubernetes Secrets
apiVersion: v1
kind: Secret
metadata:
  name: app-secrets
type: Opaque
data:
  db-password: U3VwZXJTZWNyZXQxMjM=  # base64 encoded
  api-key: c2tfbGl2ZV9hYmMxMjM=      # base64 encoded

---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    envFrom:
    - secretRef:
        name: app-secrets

## Better - Use external secrets management
## External Secrets Operator with AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: app-secrets
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: app-secrets
  data:
  - secretKey: db-password
    remoteRef:
      key: prod/db/password
  - secretKey: api-key
    remoteRef:
      key: prod/api/key

Network Policies¶

Restrict pod-to-pod communication.

## Bad - No network policies (pods can access anything)
## Default allow-all is insecure!

## Good - Deny all, then allow specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-to-db
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  policyTypes:
  - Egress
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: postgresql
    ports:
    - protocol: TCP
      port: 5432
  - to:  # Allow DNS
    - namespaceSelector:
        matchLabels:
          name: kube-system
    ports:
    - protocol: UDP
      port: 53

RBAC (Role-Based Access Control)¶

Follow principle of least privilege.

## Bad - Cluster-admin for all service accounts
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: all-cluster-admin
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin  # TOO PERMISSIVE!
subjects:
- kind: ServiceAccount
  name: default
  namespace: default

## Good - Scoped permissions
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sa
  namespace: production

---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: app-role
  namespace: production
rules:
- apiGroups: [""]
  resources: ["pods", "configmaps"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["app-secrets"]  # Specific secret only
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: app-role-binding
  namespace: production
subjects:
- kind: ServiceAccount
  name: app-sa
  namespace: production
roleRef:
  kind: Role
  name: app-role
  apiGroup: rbac.authorization.k8s.io

Resource Limits and Quotas¶

Prevent resource exhaustion attacks.

## Bad - No resource limits
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    ## No limits - can consume all node resources!

## Good - Set resource requests and limits
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: myapp
    resources:
      requests:
        memory: "128Mi"
        cpu: "100m"
      limits:
        memory: "256Mi"
        cpu: "200m"

## Good - Enforce with ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    persistentvolumeclaims: "10"

## Good - Set default limits
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - default:
      memory: 512Mi
      cpu: 500m
    defaultRequest:
      memory: 256Mi
      cpu: 250m
    type: Container

Image Security¶

Use trusted images and scan for vulnerabilities.

## Bad - Using latest tag from untrusted registry
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: randomuser/myapp:latest  # Untrusted! Unpredictable!

## Good - Pin specific versions from trusted registry
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
  - name: app
    image: gcr.io/mycompany/myapp:v1.2.3@sha256:abc123...  # SHA256 digest
    imagePullPolicy: Always

## Good - Use private registry with imagePullSecrets
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  imagePullSecrets:
  - name: regcred
  containers:
  - name: app
    image: myregistry.azurecr.io/myapp:v1.2.3

## Enforce with admission controller
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: allowed-repositories
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    repos:
    - "gcr.io/mycompany/"
    - "myregistry.azurecr.io/"

Admission Control¶

Use admission controllers to enforce policies.

## OPA Gatekeeper policy - Block privileged containers
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPPrivilegedContainer
metadata:
  name: deny-privileged-containers
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    excludedNamespaces:
    - kube-system

## Block images without digest
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sImageDigests
metadata:
  name: require-image-digest
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]

## Require labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: require-owner-label
spec:
  match:
    kinds:
    - apiGroups: [""]
      kinds: ["Pod"]
  parameters:
    labels:
    - key: "owner"
    - key: "environment"

Audit Logging¶

Enable comprehensive audit logging.

## kube-apiserver audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
## Log all requests to Secrets
- level: RequestResponse
  resources:
  - group: ""
    resources: ["secrets"]

## Log all authentication and authorization failures
- level: Metadata
  omitStages:
  - "RequestReceived"
  userGroups:
  - "system:unauthenticated"

## Log pod exec and port-forward
- level: Request
  verbs: ["create"]
  resources:
  - group: ""
    resources: ["pods/exec", "pods/portforward"]

Pod Disruption Budgets¶

Protect against accidental disruption.

## Good - Ensure minimum availability during maintenance
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb
  namespace: production
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: critical-app

## Or use percentage
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: app-pdb-percent
  namespace: production
spec:
  maxUnavailable: "25%"
  selector:
    matchLabels:
      app: web-app

References¶

Official Documentation¶

Best Practices¶

Tools¶

kubectl - Kubernetes CLI
helm - Kubernetes package manager
kubeval - Kubernetes manifest validation
kube-linter - Static analysis tool
kustomize - Template-free customization

Status: Active