Kubernetes & Helm
Language Overview¶
Kubernetes is a container orchestration platform for automating deployment, scaling, and management of containerized applications. Helm is the package manager for Kubernetes, using charts to define, install, and upgrade applications.
Key Characteristics¶
- Paradigm: Declarative infrastructure as code
- Language: YAML manifests
- Version Support: Kubernetes 1.31.x through 1.33.x
- Package Manager: Helm 3.x (chartless installation)
- Modern Approach: Helm charts for reusable application definitions
Primary Use Cases¶
- Container orchestration
- Microservices deployment
- Application scaling and rolling updates
- Service discovery and load balancing
- Configuration and secret management
Quick Reference¶
| Category | Convention | Example | Notes |
|---|---|---|---|
| Naming | |||
| Resources | kebab-case |
my-app-deployment, web-service |
Lowercase with hyphens |
| Namespaces | kebab-case |
production, staging |
Environment or team based |
| Labels | kebab-case keys |
app: my-app, env: prod |
Consistent label keys |
| Helm Charts | kebab-case |
my-application |
Chart directory name |
| Resource Types | |||
| Deployment | Application workloads | kind: Deployment |
Stateless apps |
| StatefulSet | Stateful workloads | kind: StatefulSet |
Databases, persistent apps |
| Service | Network services | kind: Service |
Load balancing, discovery |
| ConfigMap | Configuration | kind: ConfigMap |
Non-sensitive config |
| Secret | Sensitive data | kind: Secret |
Passwords, tokens |
| Ingress | HTTP routing | kind: Ingress |
External access |
| File Naming | |||
| Manifests | resource-type.yaml |
deployment.yaml, service.yaml |
One resource per file |
| Combined | app-name.yaml |
my-app.yaml |
All resources together |
| Helm Values | values.yaml |
values.yaml, values-prod.yaml |
Chart values |
| Labels | |||
| app | Application name | app: nginx |
Required label |
| version | App version | version: "1.0.0" |
Deployment tracking |
| environment | Environment | environment: production |
Env identification |
| Best Practices | |||
| Resource Limits | Always set | limits: and requests: |
CPU and memory |
| Readiness Probes | Define probes | readinessProbe: |
Health checking |
| Namespaces | Use namespaces | Isolate workloads | Multi-tenancy |
| Helm Charts | Package with Helm | Reusable templates | DRY principle |
Naming Conventions¶
Resource Names¶
Use kebab-case for all Kubernetes resource names:
## Good
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-application
namespace: production
## Bad
metadata:
name: webApplication # camelCase - avoid
name: web_application # snake_case - avoid
Namespace Conventions¶
## Environment-based namespaces
production
staging
development
## Team or project-based namespaces
team-platform
team-backend
project-analytics
## System namespaces (reserved)
kube-system
kube-public
kube-node-lease
default
Label Standards¶
Required Labels¶
Apply these labels to ALL resources:
metadata:
labels:
app.kubernetes.io/name: nginx
app.kubernetes.io/instance: nginx-production
app.kubernetes.io/version: "1.24.0"
app.kubernetes.io/component: webserver
app.kubernetes.io/part-of: ecommerce-platform
app.kubernetes.io/managed-by: helm
Label Descriptions¶
app.kubernetes.io/name: "nginx" # Application name
app.kubernetes.io/instance: "nginx-prod" # Unique instance identifier
app.kubernetes.io/version: "1.24.0" # Application version
app.kubernetes.io/component: "webserver" # Component within architecture
app.kubernetes.io/part-of: "platform" # Application group/system
app.kubernetes.io/managed-by: "helm" # Tool managing the resource
Custom Labels¶
metadata:
labels:
# Standard labels
app.kubernetes.io/name: api
app.kubernetes.io/instance: api-production
# Custom labels
environment: production
team: backend
cost-center: engineering
Annotation Patterns¶
metadata:
annotations:
# Deployment metadata
kubernetes.io/change-cause: "Update to v1.2.3"
deployment.kubernetes.io/revision: "5"
# Documentation
description: "User authentication API"
contact: "platform-team@example.com"
documentation: "https://docs.example.com/api"
# Monitoring and alerting
prometheus.io/scrape: "true"
prometheus.io/port: "9090"
prometheus.io/path: "/metrics"
# Service mesh (Istio/Linkerd)
sidecar.istio.io/inject: "true"
linkerd.io/inject: enabled
Deployment Manifests¶
---
## @module web-application-deployment
## @description Production deployment for web application
## @version 1.0.0
## @author Tyler Dukes
## @last_updated 2025-10-28
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-application
namespace: production
labels:
app.kubernetes.io/name: web-application
app.kubernetes.io/instance: web-production
app.kubernetes.io/version: "1.2.3"
app.kubernetes.io/component: frontend
app.kubernetes.io/part-of: ecommerce
app.kubernetes.io/managed-by: helm
spec:
replicas: 3
revisionHistoryLimit: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app.kubernetes.io/name: web-application
app.kubernetes.io/instance: web-production
template:
metadata:
labels:
app.kubernetes.io/name: web-application
app.kubernetes.io/instance: web-production
app.kubernetes.io/version: "1.2.3"
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
serviceAccountName: web-application
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: web
image: myregistry.com/web-application:1.2.3
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: APP_ENV
value: "production"
- name: DATABASE_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: database_host
- name: DATABASE_PASSWORD
valueFrom:
secretKeyRef:
name: app-secrets
key: database_password
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: http
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
startupProbe:
httpGet:
path: /startup
port: http
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 30
volumeMounts:
- name: config
mountPath: /etc/app/config
readOnly: true
- name: cache
mountPath: /var/cache/app
volumes:
- name: config
configMap:
name: app-config
- name: cache
emptyDir: {}
Service Definitions¶
---
apiVersion: v1
kind: Service
metadata:
name: web-application
namespace: production
labels:
app.kubernetes.io/name: web-application
app.kubernetes.io/instance: web-production
spec:
type: ClusterIP
ports:
- name: http
port: 80
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: web-application
app.kubernetes.io/instance: web-production
---
## LoadBalancer service
apiVersion: v1
kind: Service
metadata:
name: web-application-public
namespace: production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
type: LoadBalancer
ports:
- name: https
port: 443
targetPort: http
protocol: TCP
selector:
app.kubernetes.io/name: web-application
ConfigMap and Secret Patterns¶
ConfigMap¶
---
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
labels:
app.kubernetes.io/name: web-application
data:
app.env: "production"
database_host: "postgres.production.svc.cluster.local"
database_port: "5432"
redis_host: "redis.production.svc.cluster.local"
log_level: "info"
# Configuration file
nginx.conf: |
server {
listen 8080;
location / {
proxy_pass http://backend:8080;
}
}
Secret¶
---
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
namespace: production
labels:
app.kubernetes.io/name: web-application
type: Opaque
stringData:
database_password: "super-secret-password"
api_key: "secret-api-key-12345"
jwt_secret: "jwt-signing-secret"
## Use external secret management
---
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
namespace: production
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: app-secrets
data:
- secretKey: database_password
remoteRef:
key: production/database
property: password
Resource Limits and Requests¶
Guidelines¶
## Development
resources:
requests:
cpu: 50m # 0.05 CPU cores
memory: 64Mi
limits:
cpu: 200m # 0.2 CPU cores
memory: 256Mi
## Staging
resources:
requests:
cpu: 100m # 0.1 CPU cores
memory: 128Mi
limits:
cpu: 500m # 0.5 CPU cores
memory: 512Mi
## Production
resources:
requests:
cpu: 250m # 0.25 CPU cores
memory: 512Mi
limits:
cpu: 1000m # 1 CPU core
memory: 2Gi
Quality of Service (QoS) Classes¶
## Guaranteed QoS - requests == limits
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 500m
memory: 1Gi
## Burstable QoS - requests < limits
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 1Gi
## BestEffort QoS - no requests or limits (avoid in production)
Health Probes¶
Liveness Probe¶
Restarts container if probe fails:
livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: X-Health-Check
value: liveness
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
Readiness Probe¶
Removes pod from service endpoints if probe fails:
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
Startup Probe¶
Delays liveness/readiness probes during slow application startup:
startupProbe:
httpGet:
path: /startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 30 # 30 * 5s = 150s max startup time
Probe Types¶
## HTTP probe
httpGet:
path: /health
port: 8080
scheme: HTTP
## TCP probe
tcpSocket:
port: 5432
## Command probe
exec:
command:
- /bin/sh
- -c
- pg_isready -U postgres
Helm Chart Structure¶
my-application/
├── Chart.yaml # Chart metadata
├── values.yaml # Default configuration values
├── values-dev.yaml # Development overrides
├── values-prod.yaml # Production overrides
├── charts/ # Dependency charts
├── templates/
│ ├── _helpers.tpl # Template helpers
│ ├── deployment.yaml # Deployment manifest
│ ├── service.yaml # Service manifest
│ ├── ingress.yaml # Ingress manifest
│ ├── configmap.yaml # ConfigMap
│ ├── secret.yaml # Secret
│ ├── serviceaccount.yaml # ServiceAccount
│ ├── hpa.yaml # HorizontalPodAutoscaler
│ ├── pdb.yaml # PodDisruptionBudget
│ └── NOTES.txt # Post-install notes
├── .helmignore # Files to exclude
└── README.md # Chart documentation
Chart.yaml¶
apiVersion: v2
name: web-application
description: A Helm chart for web application deployment
type: application
version: 1.0.0
appVersion: "1.2.3"
keywords:
- web
- api
- application
home: https://example.com
sources:
- https://github.com/example/web-application
maintainers:
- name: Tyler Dukes
email: tyler@example.com
dependencies:
- name: postgresql
version: "12.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: postgresql.enabled
- name: redis
version: "17.x.x"
repository: "https://charts.bitnami.com/bitnami"
condition: redis.enabled
values.yaml Patterns¶
## values.yaml
---
## Application configuration
replicaCount: 3
image:
repository: myregistry.com/web-application
pullPolicy: IfNotPresent
tag: "" # Defaults to Chart.appVersion
imagePullSecrets: []
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
podSecurityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
service:
type: ClusterIP
port: 80
targetPort: http
ingress:
enabled: true
className: nginx
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
hosts:
- host: app.example.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: app-tls
hosts:
- app.example.com
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 80
targetMemoryUtilizationPercentage: 80
nodeSelector: {}
tolerations: []
affinity: {}
## Application-specific configuration
config:
environment: production
logLevel: info
database:
host: postgres.production.svc.cluster.local
port: 5432
## Secret management
secrets:
databasePassword: ""
apiKey: ""
Helper Templates (_helpers.tpl)¶
{{/*
Expand the name of the chart.
*/}}
{{- define "web-application.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Create a default fully qualified app name.
*/}}
{{- define "web-application.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}
{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "web-application.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}
{{/*
Common labels
*/}}
{{- define "web-application.labels" -}}
helm.sh/chart: {{ include "web-application.chart" . }}
{{ include "web-application.selectorLabels" . }}
{{- if .Chart.AppVersion }}
app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
{{- end }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
{{- end }}
{{/*
Selector labels
*/}}
{{- define "web-application.selectorLabels" -}}
app.kubernetes.io/name: {{ include "web-application.name" . }}
app.kubernetes.io/instance: {{ .Release.Name }}
{{- end }}
{{/*
Create the name of the service account to use
*/}}
{{- define "web-application.serviceAccountName" -}}
{{- if .Values.serviceAccount.create }}
{{- default (include "web-application.fullname" .) .Values.serviceAccount.name }}
{{- else }}
{{- default "default" .Values.serviceAccount.name }}
{{- end }}
{{- end }}
Helm Template Example¶
## templates/deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "web-application.fullname" . }}
labels:
{{- include "web-application.labels" . | nindent 4 }}
spec:
{{- if not .Values.autoscaling.enabled }}
replicas: {{ .Values.replicaCount }}
{{- end }}
selector:
matchLabels:
{{- include "web-application.selectorLabels" . | nindent 6 }}
template:
metadata:
annotations:
checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
{{- with .Values.podAnnotations }}
{{- toYaml . | nindent 8 }}
{{- end }}
labels:
{{- include "web-application.selectorLabels" . | nindent 8 }}
spec:
{{- with .Values.imagePullSecrets }}
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
serviceAccountName: {{ include "web-application.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
ports:
- name: http
containerPort: 8080
protocol: TCP
env:
- name: APP_ENV
value: {{ .Values.config.environment }}
- name: LOG_LEVEL
value: {{ .Values.config.logLevel }}
resources:
{{- toYaml .Values.resources | nindent 12 }}
Helm Commands¶
## Install chart
helm install my-app ./my-application -n production
## Install with custom values
helm install my-app ./my-application \
-f values-prod.yaml \
-n production \
--create-namespace
## Upgrade release
helm upgrade my-app ./my-application \
-f values-prod.yaml \
-n production
## Upgrade with rollback on failure
helm upgrade my-app ./my-application \
-f values-prod.yaml \
--atomic \
--timeout 5m
## Dry run / template rendering
helm install my-app ./my-application \
--dry-run \
--debug \
-f values-prod.yaml
## Lint chart
helm lint ./my-application
## Package chart
helm package ./my-application
## List releases
helm list -n production
## Rollback
helm rollback my-app 5 -n production
## Uninstall
helm uninstall my-app -n production
Testing¶
Testing with kubeval¶
Validate Kubernetes YAML manifests:
## Install kubeval
brew install kubeval
## Validate manifest
kubeval deployment.yaml
## Validate multiple files
kubeval manifests/*.yaml
## Validate against specific Kubernetes version
kubeval --kubernetes-version 1.32.0 deployment.yaml
## Strict mode (fail on warnings)
kubeval --strict deployment.yaml
Testing with kubeconform¶
More comprehensive validation:
## Install kubeconform
brew install kubeconform
## Validate manifests
kubeconform manifests/
## Validate with CRDs
kubeconform -schema-location default \
-schema-location 'crds/{{.ResourceKind}}.json' \
manifests/
## Output in JSON
kubeconform -output json manifests/
Testing with kube-score¶
Analyze manifests for best practices:
## Install kube-score
brew install kube-score
## Analyze deployment
kube-score score deployment.yaml
## Check all manifests
kube-score score manifests/*.yaml
## Ignore specific checks
kube-score score --ignore-test pod-networkpolicy deployment.yaml
Unit Testing with conftest¶
Policy-based testing for Kubernetes:
## Install conftest
brew install conftest
## Test Kubernetes manifests
conftest test deployment.yaml
## Custom policy
conftest test -p policy/ deployment.yaml
Example policy:
## policy/kubernetes.rego
package main
deny[msg] {
input.kind == "Deployment"
not input.spec.template.spec.securityContext.runAsNonRoot
msg := "Containers must not run as root"
}
deny[msg] {
input.kind == "Deployment"
container := input.spec.template.spec.containers[_]
not container.resources.limits
msg := sprintf("Container %s must have resource limits", [container.name])
}
warn[msg] {
input.kind == "Service"
input.spec.type == "LoadBalancer"
msg := "Consider using Ingress instead of LoadBalancer"
}
Integration Testing with kind¶
Test on local Kubernetes cluster:
## Create kind cluster
kind create cluster --name test-cluster
## Apply manifests
kubectl apply -f manifests/
## Run tests
kubectl wait --for=condition=available --timeout=60s \
deployment/myapp
## Test service endpoints
kubectl run test-pod --image=curlimages/curl --rm -it -- \
curl http://myapp-service:80/health
## Cleanup
kind delete cluster --name test-cluster
E2E Testing Script¶
## tests/e2e-test.sh
#!/bin/bash
set -e
# Create kind cluster
echo "Creating test cluster..."
kind create cluster --name e2e-test --wait 60s
# Apply manifests
echo "Applying manifests..."
kubectl apply -f manifests/
# Wait for deployment
echo "Waiting for deployment..."
kubectl wait --for=condition=available --timeout=300s \
deployment/myapp -n default
# Test application
echo "Testing application..."
kubectl port-forward svc/myapp-service 8080:80 &
PF_PID=$!
sleep 5
response=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/health)
if [ "$response" != "200" ]; then
echo "Health check failed: $response"
kill $PF_PID
kind delete cluster --name e2e-test
exit 1
fi
echo "Tests passed!"
kill $PF_PID
kind delete cluster --name e2e-test
Testing with Helm¶
Test Helm charts:
## Lint Helm chart
helm lint ./mychart
## Dry run install
helm install myapp ./mychart --dry-run --debug
## Template and validate
helm template myapp ./mychart | kubeval -
## Test with specific values
helm install myapp ./mychart --dry-run \
--values test-values.yaml
Chart Testing¶
## ct.yaml (Chart Testing config)
chart-dirs:
- charts
chart-repos:
- bitnami=https://charts.bitnami.com/bitnami
helm-extra-args: --timeout 600s
## Install ct
brew install chart-testing
## Lint charts
ct lint --config ct.yaml
## Test charts in kind
ct install --config ct.yaml
CI/CD Integration¶
## .github/workflows/k8s-test.yml
name: Kubernetes Tests
on: [push, pull_request]
jobs:
validate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install tools
run: |
curl -L https://github.com/kubeval/kubeval/releases/latest/download/kubeval-linux-amd64.tar.gz | tar xz
sudo mv kubeval /usr/local/bin
- name: Validate manifests
run: kubeval manifests/*.yaml
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Create kind cluster
uses: helm/kind-action@v1
- name: Deploy and test
run: |
kubectl apply -f manifests/
kubectl wait --for=condition=available --timeout=60s deployment/myapp
kubectl get pods
Testing RBAC¶
Test Role-Based Access Control:
## Test if service account can perform action
kubectl auth can-i create pods \
--as=system:serviceaccount:default:myapp
## Test with specific permissions
kubectl auth can-i delete deployments \
--as=system:serviceaccount:default:myapp \
-n production
Resource Quota Testing¶
## Apply resource quota
kubectl apply -f resourcequota.yaml
## Try to create pod that exceeds quota
kubectl apply -f test-pod.yaml
## Verify quota enforcement
kubectl describe resourcequota -n test-namespace
Network Policy Testing¶
Test network isolation:
## Apply network policy
kubectl apply -f networkpolicy.yaml
## Test connectivity (should fail)
kubectl run test-pod --image=curlimages/curl --rm -it -- \
curl --max-time 5 http://restricted-service
## Test from allowed pod (should succeed)
kubectl run allowed-pod -l app=allowed --image=curlimages/curl --rm -it -- \
curl http://restricted-service
Performance Testing¶
## Load test with k6
cat <<EOF | k6 run -
import http from 'k6/http';
import { check } from 'k6';
export let options = {
vus: 10,
duration: '30s',
};
export default function() {
let res = http.get('http://myapp-service');
check(res, {
'status is 200': (r) => r.status === 200,
});
}
EOF
Snapshot Testing¶
Test manifest rendering:
## Generate manifests
kustomize build overlays/production > snapshot.yaml
## Compare with previous snapshot
diff snapshot-previous.yaml snapshot.yaml
## Update snapshot if changes are expected
cp snapshot.yaml snapshot-previous.yaml
Common Pitfalls¶
Selector Label Mismatch¶
Issue: Pod template labels don't match deployment selector, causing deployment to never become ready.
Example:
## Bad - Mismatched labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
selector:
matchLabels:
app: web-app # Selector label
template:
metadata:
labels:
app: webapp # ❌ Different label! Doesn't match selector
spec:
containers:
- name: app
image: myapp:1.0
Solution: Ensure selector labels exactly match template labels.
## Good - Matching labels
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
selector:
matchLabels:
app: webapp # ✅ Matches template
template:
metadata:
labels:
app: webapp # ✅ Matches selector
version: "1.0" # Additional labels are OK
spec:
containers:
- name: app
image: myapp:1.0
Key Points:
- Selector labels must be subset of template labels
- Template can have additional labels beyond selector
- Changing selector requires deleting and recreating deployment
- Use consistent label keys across all resources
Resource Limits Without Requests¶
Issue: Setting limits without requests causes pods to get BestEffort QoS and be first to evict.
Example:
## Bad - Only limits, no requests
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
resources:
limits:
memory: "512Mi"
cpu: "500m"
## ❌ No requests! Gets BestEffort QoS
Solution: Always set both requests and limits.
## Good - Both requests and limits
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
resources:
requests:
memory: "256Mi" # ✅ Guaranteed allocation
cpu: "250m"
limits:
memory: "512Mi" # Maximum allowed
cpu: "500m"
Key Points:
- Always set requests to get Burstable or Guaranteed QoS
- Requests determine pod scheduling and eviction priority
requests == limitsgives Guaranteed QoS (highest priority)- Missing requests results in BestEffort QoS (first to evict)
Readiness Probe Pointing to Wrong Port¶
Issue: Readiness probe checks wrong port, causing traffic to be sent to pods that aren't actually ready.
Example:
## Bad - Wrong port in probe
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
ports:
- containerPort: 8080
name: http
readinessProbe:
httpGet:
port: 80 # ❌ Wrong port! App runs on 8080
path: /health
Solution: Use named ports or verify port numbers.
## Good - Correct port reference
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
ports:
- containerPort: 8080
name: http # Named port
readinessProbe:
httpGet:
port: http # ✅ References named port
path: /health
livenessProbe:
httpGet:
port: 8080 # ✅ Or use exact port number
path: /health
Key Points:
- Use named ports for better readability and maintainability
- Verify probe port matches container port
- Test probes with
kubectl execbefore deployment - Check probe logs with
kubectl describe pod
ConfigMap Volume Mount Overwrites Directory¶
Issue: Mounting ConfigMap to directory overwrites all existing files in that directory.
Example:
## Bad - Overwrites entire /etc/config directory
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
volumeMounts:
- name: config
mountPath: /etc/config # ❌ Overwrites everything in /etc/config
volumes:
- name: config
configMap:
name: app-config
Solution: Use subPath to mount specific files or mount to dedicated directory.
## Good - Mount specific file with subPath
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
volumeMounts:
- name: config
mountPath: /etc/config/app.conf # ✅ Specific file
subPath: app.conf # File from ConfigMap
volumes:
- name: config
configMap:
name: app-config
## Good - Mount to dedicated directory
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
volumeMounts:
- name: config
mountPath: /app/config # ✅ Dedicated directory
volumes:
- name: config
configMap:
name: app-config
Key Points:
- ConfigMap mount replaces all files in target directory
- Use
subPathto mount individual files - Mount to dedicated directories to avoid conflicts
- Consider using environment variables for simple configs
Service Selector Doesn't Match Pods¶
Issue: Service selector doesn't match pod labels, causing no endpoints and connection failures.
Example:
## Bad - Service selector doesn't match pods
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
selector:
app: web # Selector
ports:
- port: 80
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
selector:
matchLabels:
app: webapp # ❌ Doesn't match service selector!
template:
metadata:
labels:
app: webapp
spec:
containers:
- name: app
image: myapp
Solution: Ensure service selector matches pod labels.
## Good - Service selector matches pods
apiVersion: v1
kind: Service
metadata:
name: webapp
spec:
selector:
app: webapp # ✅ Matches deployment labels
ports:
- port: 80
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp
spec:
selector:
matchLabels:
app: webapp # ✅ Matches service selector
template:
metadata:
labels:
app: webapp # ✅ Matches service selector
spec:
containers:
- name: app
image: myapp
ports:
- containerPort: 8080
Key Points:
- Service selector must match pod labels exactly
- Check service endpoints:
kubectl get endpoints webapp - Use consistent labeling across all resources
- Service doesn't care about deployment selector, only pod labels
Anti-Patterns¶
❌ Avoid: latest Tag¶
## Bad - Unpredictable deployments
image: nginx:latest
## Good - Pin specific versions
image: nginx:1.24.0
image: nginx:1.24.0-alpine
❌ Avoid: No Resource Limits¶
## Bad - Can cause node resource exhaustion
containers:
- name: app
image: myapp:1.0.0
## Good - Define limits
containers:
- name: app
image: myapp:1.0.0
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
❌ Avoid: Running as Root¶
## Bad - Security risk
securityContext:
runAsUser: 0
## Good - Run as non-root
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
❌ Avoid: Missing Health Probes¶
## Bad - No health checks
containers:
- name: app
image: myapp:1.0.0
## Good - Include probes
containers:
- name: app
image: myapp:1.0.0
livenessProbe:
httpGet:
path: /health
port: 8080
readinessProbe:
httpGet:
path: /ready
port: 8080
❌ Avoid: Storing Secrets in ConfigMaps¶
## Bad - Secrets in ConfigMap (visible in plain text)
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
database_password: "MySecretPassword" # ❌ Plain text!
api_key: "sk-1234567890" # ❌ Plain text!
## Good - Use Secrets with proper encryption
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
stringData:
database_password: "MySecretPassword" # ✅ Base64 encoded
api_key: "sk-1234567890" # ✅ Base64 encoded
## Better - Use external secret management
## Sealed Secrets, External Secrets Operator, or cloud provider KMS
❌ Avoid: No Pod Disruption Budgets¶
## Bad - No protection during cluster maintenance
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
# No PodDisruptionBudget - all pods could be terminated at once
## Good - Define disruption budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: 2 # ✅ Always keep 2 pods running
selector:
matchLabels:
app: web
❌ Avoid: Missing Network Policies¶
## Bad - No network restrictions (pods can talk to anything)
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend
spec:
# No NetworkPolicy - unrestricted network access
## Good - Restrict network traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: backend-netpol
spec:
podSelector:
matchLabels:
app: backend
policyTypes:
- Ingress
- Egress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
ports:
- protocol: TCP
port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: database
ports:
- protocol: TCP
port: 5432
Security Best Practices¶
Pod Security Standards¶
Use Pod Security Standards to enforce security policies.
## Bad - Running as root with privileges
apiVersion: v1
kind: Pod
metadata:
name: insecure-pod
spec:
containers:
- name: app
image: myapp:latest
securityContext:
privileged: true # NEVER in production!
runAsUser: 0 # Running as root!
## Good - Non-root with security contexts
apiVersion: v1
kind: Pod
metadata:
name: secure-pod
spec:
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 2000
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: myapp:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
Secrets Management¶
Never hardcode sensitive data in manifests.
## Bad - Secrets in plain text
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
env:
- name: DB_PASSWORD
value: "SuperSecret123" # EXPOSED!
- name: API_KEY
value: "sk_live_abc123" # In version control!
## Good - Use Kubernetes Secrets
apiVersion: v1
kind: Secret
metadata:
name: app-secrets
type: Opaque
data:
db-password: U3VwZXJTZWNyZXQxMjM= # base64 encoded
api-key: c2tfbGl2ZV9hYmMxMjM= # base64 encoded
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
envFrom:
- secretRef:
name: app-secrets
## Better - Use external secrets management
## External Secrets Operator with AWS Secrets Manager
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-secrets
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: app-secrets
data:
- secretKey: db-password
remoteRef:
key: prod/db/password
- secretKey: api-key
remoteRef:
key: prod/api/key
Network Policies¶
Restrict pod-to-pod communication.
## Bad - No network policies (pods can access anything)
## Default allow-all is insecure!
## Good - Deny all, then allow specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-app-to-db
namespace: production
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Egress
egress:
- to:
- podSelector:
matchLabels:
app: postgresql
ports:
- protocol: TCP
port: 5432
- to: # Allow DNS
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
RBAC (Role-Based Access Control)¶
Follow principle of least privilege.
## Bad - Cluster-admin for all service accounts
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: all-cluster-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin # TOO PERMISSIVE!
subjects:
- kind: ServiceAccount
name: default
namespace: default
## Good - Scoped permissions
apiVersion: v1
kind: ServiceAccount
metadata:
name: app-sa
namespace: production
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: app-role
namespace: production
rules:
- apiGroups: [""]
resources: ["pods", "configmaps"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["app-secrets"] # Specific secret only
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: app-role-binding
namespace: production
subjects:
- kind: ServiceAccount
name: app-sa
namespace: production
roleRef:
kind: Role
name: app-role
apiGroup: rbac.authorization.k8s.io
Resource Limits and Quotas¶
Prevent resource exhaustion attacks.
## Bad - No resource limits
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
## No limits - can consume all node resources!
## Good - Set resource requests and limits
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
## Good - Enforce with ResourceQuota
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
persistentvolumeclaims: "10"
## Good - Set default limits
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- default:
memory: 512Mi
cpu: 500m
defaultRequest:
memory: 256Mi
cpu: 250m
type: Container
Image Security¶
Use trusted images and scan for vulnerabilities.
## Bad - Using latest tag from untrusted registry
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: randomuser/myapp:latest # Untrusted! Unpredictable!
## Good - Pin specific versions from trusted registry
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: gcr.io/mycompany/myapp:v1.2.3@sha256:abc123... # SHA256 digest
imagePullPolicy: Always
## Good - Use private registry with imagePullSecrets
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
imagePullSecrets:
- name: regcred
containers:
- name: app
image: myregistry.azurecr.io/myapp:v1.2.3
## Enforce with admission controller
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
name: allowed-repositories
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
repos:
- "gcr.io/mycompany/"
- "myregistry.azurecr.io/"
Admission Control¶
Use admission controllers to enforce policies.
## OPA Gatekeeper policy - Block privileged containers
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sPSPPrivilegedContainer
metadata:
name: deny-privileged-containers
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
excludedNamespaces:
- kube-system
## Block images without digest
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sImageDigests
metadata:
name: require-image-digest
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
## Require labels
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: require-owner-label
spec:
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
labels:
- key: "owner"
- key: "environment"
Audit Logging¶
Enable comprehensive audit logging.
## kube-apiserver audit policy
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
## Log all requests to Secrets
- level: RequestResponse
resources:
- group: ""
resources: ["secrets"]
## Log all authentication and authorization failures
- level: Metadata
omitStages:
- "RequestReceived"
userGroups:
- "system:unauthenticated"
## Log pod exec and port-forward
- level: Request
verbs: ["create"]
resources:
- group: ""
resources: ["pods/exec", "pods/portforward"]
Pod Disruption Budgets¶
Protect against accidental disruption.
## Good - Ensure minimum availability during maintenance
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: critical-app
## Or use percentage
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: app-pdb-percent
namespace: production
spec:
maxUnavailable: "25%"
selector:
matchLabels:
app: web-app
References¶
Official Documentation¶
Best Practices¶
Tools¶
- kubectl - Kubernetes CLI
- helm - Kubernetes package manager
- kubeval - Kubernetes manifest validation
- kube-linter - Static analysis tool
- kustomize - Template-free customization
Status: Active