SQL

Language Overview¶

SQL (Structured Query Language) is a declarative language for managing and querying relational databases. This guide provides database-agnostic standards that work across PostgreSQL, MySQL, SQL Server, and other SQL-compliant databases.

Key Characteristics¶

Paradigm: Declarative query language
Case Sensitivity: Varies by database (PostgreSQL case-sensitive, MySQL configurable)
Standards: SQL-92, SQL:1999, SQL:2003, SQL:2011
Primary Use: Data querying, manipulation, and schema definition

Naming Conventions¶

-- UPPERCASE keywords, lowercase identifiers
SELECT user_id, email, created_at
FROM users
WHERE status = 'active';

-- Avoid mixed case or all lowercase keywords
-- Bad
select user_id from users where status = 'active';

-- Bad
Select User_Id From Users Where Status = 'active';

Quick Reference¶

Category	Convention	Example	Notes
Naming
Keywords	`UPPERCASE`	`SELECT`, `FROM`, `WHERE`	All SQL keywords uppercase
Tables	`snake_case`	`users`, `order_items`	Plural nouns, lowercase
Columns	`snake_case`	`user_id`, `created_at`	Descriptive, lowercase
Indexes	`idx_table_columns`	`idx_users_email`	Prefix with `idx_`
Primary Keys	`id` or `table_id`	`id`, `user_id`	Singular, descriptive
Foreign Keys	`table_id`	`user_id`, `product_id`	Reference table name
Constraints	`pk_`, `fk_`, `uk_`, `ck_`	`pk_users`, `fk_orders_user_id`	Prefix by type
Views	`v_descriptive_name`	`v_active_users`	Prefix with `v_`
Formatting
Indentation	2 or 4 spaces	`WHERE status = 'active'`	Consistent indentation
Line Breaks	One clause per line	`SELECT\n column\nFROM`	Readable queries
Commas	Leading commas	`, column2\n, column3`	Or trailing (be consistent)
Query Structure
SELECT	Explicit columns	`SELECT id, name`	Avoid `SELECT *`
JOIN	Explicit JOIN type	`INNER JOIN`, `LEFT JOIN`	Not implicit joins
WHERE	Use bind parameters	`WHERE id = $1`	Prevent SQL injection
Best Practices
Comments	`--` for line	`-- Get active users`	Single-line comments
Transactions	Use when needed	`BEGIN; ... COMMIT;`	Atomic operations
NULL Handling	Explicit NULL checks	`WHERE col IS NULL`	Not `= NULL`

Keywords and Identifiers¶

SQL Keywords¶

Use UPPERCASE for all SQL keywords:

SELECT, FROM, WHERE, JOIN, LEFT JOIN, INNER JOIN, ON, AND, OR, NOT,
INSERT, UPDATE, DELETE, CREATE, DROP, ALTER, TABLE, INDEX, VIEW,
ORDER BY, GROUP BY, HAVING, DISTINCT, AS, UNION, INTERSECT, EXCEPT,
WITH, CASE, WHEN, THEN, ELSE, END, NULL, IS, LIKE, IN, BETWEEN

Identifiers¶

Use lowercase snake_case for all identifiers:

-- Tables
users, user_profiles, order_items, payment_transactions

-- Columns
user_id, first_name, email_address, created_at, is_active

-- Indexes
idx_users_email, idx_orders_user_id_created_at

-- Constraints
pk_users, fk_orders_user_id, uniq_users_email

Table Design¶

Table Naming¶

-- Good - plural nouns, lowercase snake_case
CREATE TABLE users (
    user_id BIGINT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE order_items (
    order_item_id BIGINT PRIMARY KEY,
    order_id BIGINT NOT NULL,
    product_id BIGINT NOT NULL,
    quantity INT NOT NULL
);

-- Avoid singular or mixed case
-- Bad
CREATE TABLE User (...);
CREATE TABLE OrderItem (...);

Primary Keys¶

-- Prefer surrogate keys with table_name + _id pattern
CREATE TABLE users (
    user_id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    username VARCHAR(100) NOT NULL UNIQUE
);

-- Composite primary keys for junction tables
CREATE TABLE user_roles (
    user_id BIGINT NOT NULL,
    role_id BIGINT NOT NULL,
    assigned_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (user_id, role_id)
);

Foreign Keys¶

CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    user_id BIGINT NOT NULL,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT fk_orders_user_id
        FOREIGN KEY (user_id)
        REFERENCES users(user_id)
        ON DELETE CASCADE
);

Indexes¶

-- Index naming: idx_table_column[_column...]
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status_created_at ON orders(status, created_at);

-- Unique indexes
CREATE UNIQUE INDEX uniq_users_email ON users(email);
CREATE UNIQUE INDEX uniq_users_username ON users(username);

-- Partial indexes (PostgreSQL)
CREATE INDEX idx_orders_active
    ON orders(user_id, created_at)
    WHERE status = 'active';

Query Formatting¶

SELECT Statements¶

-- One column per line for complex queries
SELECT
    u.user_id,
    u.email,
    u.first_name,
    u.last_name,
    u.created_at,
    COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
    AND u.created_at >= '2024-01-01'
GROUP BY u.user_id, u.email, u.first_name, u.last_name, u.created_at
HAVING COUNT(o.order_id) > 0
ORDER BY order_count DESC, u.created_at DESC
LIMIT 100;

-- Simple queries on one line
SELECT user_id, email FROM users WHERE status = 'active';

JOIN Conventions¶

-- Use explicit JOIN syntax (not implicit with WHERE)
-- Good
SELECT u.user_id, u.email, o.order_id
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

-- Bad - implicit join
SELECT u.user_id, u.email, o.order_id
FROM users u, orders o
WHERE u.user_id = o.user_id;

-- JOIN types
-- INNER JOIN - matching rows only
SELECT u.email, o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

-- LEFT JOIN - all left rows, matched right rows
SELECT u.email, o.total
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;

-- Multiple JOINs
SELECT
    u.email,
    o.order_id,
    oi.product_id,
    p.product_name
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
INNER JOIN order_items oi ON o.order_id = oi.order_id
INNER JOIN products p ON oi.product_id = p.product_id
WHERE o.status = 'completed';

Common Table Expressions (CTEs)¶

-- Use CTEs for complex queries
WITH active_users AS (
    SELECT user_id, email, created_at
    FROM users
    WHERE status = 'active'
),
user_orders AS (
    SELECT
        user_id,
        COUNT(*) AS order_count,
        SUM(total) AS total_spent
    FROM orders
    WHERE status = 'completed'
    GROUP BY user_id
)
SELECT
    au.user_id,
    au.email,
    COALESCE(uo.order_count, 0) AS order_count,
    COALESCE(uo.total_spent, 0) AS total_spent
FROM active_users au
LEFT JOIN user_orders uo ON au.user_id = uo.user_id
ORDER BY uo.total_spent DESC NULLS LAST;

-- Recursive CTE example
WITH RECURSIVE employee_hierarchy AS (
    -- Base case
    SELECT
        employee_id,
        manager_id,
        name,
        1 AS level
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case
    SELECT
        e.employee_id,
        e.manager_id,
        e.name,
        eh.level + 1
    FROM employees e
    INNER JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
)
SELECT * FROM employee_hierarchy ORDER BY level, name;

INSERT, UPDATE, DELETE¶

INSERT¶

-- Single row insert
INSERT INTO users (email, first_name, last_name)
VALUES ('user@example.com', 'John', 'Doe');

-- Multiple row insert
INSERT INTO users (email, first_name, last_name)
VALUES
    ('user1@example.com', 'Alice', 'Smith'),
    ('user2@example.com', 'Bob', 'Jones'),
    ('user3@example.com', 'Charlie', 'Brown');

-- INSERT with SELECT
INSERT INTO user_audit (user_id, action, created_at)
SELECT user_id, 'login', CURRENT_TIMESTAMP
FROM users
WHERE last_login < CURRENT_DATE - INTERVAL '30 days';

-- UPSERT (PostgreSQL)
INSERT INTO user_preferences (user_id, theme, language)
VALUES (1, 'dark', 'en')
ON CONFLICT (user_id)
DO UPDATE SET
    theme = EXCLUDED.theme,
    language = EXCLUDED.language,
    updated_at = CURRENT_TIMESTAMP;

UPDATE¶

-- Always use WHERE clause
UPDATE users
SET
    status = 'inactive',
    updated_at = CURRENT_TIMESTAMP
WHERE last_login < CURRENT_DATE - INTERVAL '90 days';

-- UPDATE with JOIN
UPDATE orders o
SET status = 'cancelled'
FROM users u
WHERE o.user_id = u.user_id
    AND u.status = 'deleted'
    AND o.status = 'pending';

DELETE¶

-- Always use WHERE clause (unless intentional truncation)
DELETE FROM sessions
WHERE expires_at < CURRENT_TIMESTAMP;

-- DELETE with JOIN
DELETE FROM orders
WHERE user_id IN (
    SELECT user_id
    FROM users
    WHERE status = 'deleted'
);

Transactions¶

-- Explicit transaction control
BEGIN;

UPDATE accounts
SET balance = balance - 100.00
WHERE account_id = 1;

UPDATE accounts
SET balance = balance + 100.00
WHERE account_id = 2;

-- Verify constraints
SELECT balance FROM accounts WHERE account_id IN (1, 2);

COMMIT;

-- Rollback on error
BEGIN;

UPDATE inventory SET quantity = quantity - 5 WHERE product_id = 100;

-- Check if enough inventory
SELECT quantity FROM inventory WHERE product_id = 100;

-- If quantity < 0, rollback
ROLLBACK;

Functions and Stored Procedures¶

Functions (PostgreSQL)¶

-- Function to calculate order total
CREATE OR REPLACE FUNCTION calculate_order_total(p_order_id BIGINT)
RETURNS NUMERIC AS $$
DECLARE
    v_total NUMERIC;
BEGIN
    SELECT SUM(quantity * unit_price)
    INTO v_total
    FROM order_items
    WHERE order_id = p_order_id;

    RETURN COALESCE(v_total, 0);
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT order_id, calculate_order_total(order_id) AS total
FROM orders;

Stored Procedures (PostgreSQL 11+)¶

CREATE OR REPLACE PROCEDURE close_expired_orders()
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE orders
    SET
        status = 'expired',
        updated_at = CURRENT_TIMESTAMP
    WHERE status = 'pending'
        AND created_at < CURRENT_TIMESTAMP - INTERVAL '7 days';

    -- Log the operation
    INSERT INTO audit_log (action, affected_rows, created_at)
    VALUES ('close_expired_orders', ROW_COUNT(), CURRENT_TIMESTAMP);

    COMMIT;
END;
$$;

-- Execute procedure
CALL close_expired_orders();

Views¶

-- Create view for commonly accessed data
CREATE VIEW active_user_orders AS
SELECT
    u.user_id,
    u.email,
    o.order_id,
    o.total,
    o.status,
    o.created_at
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id
WHERE u.status = 'active'
    AND o.status IN ('pending', 'processing', 'shipped');

-- Materialized view (PostgreSQL)
CREATE MATERIALIZED VIEW user_order_summary AS
SELECT
    u.user_id,
    u.email,
    COUNT(o.order_id) AS total_orders,
    SUM(o.total) AS total_spent,
    MAX(o.created_at) AS last_order_date
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.user_id, u.email;

-- Refresh materialized view
REFRESH MATERIALIZED VIEW user_order_summary;

Migration Scripts¶

Schema Migrations¶

-- migration_001_create_users_table.sql
-- @module users_table_migration
-- @description Create users table with indexes
-- @version 1.0.0
-- @author Tyler Dukes
-- @last_updated 2025-10-28

BEGIN;

CREATE TABLE users (
    user_id BIGSERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    status VARCHAR(50) NOT NULL DEFAULT 'active',
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

CREATE UNIQUE INDEX uniq_users_email ON users(email);
CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_created_at ON users(created_at);

COMMIT;

-- Rollback script: migration_001_rollback.sql
BEGIN;
DROP TABLE IF EXISTS users CASCADE;
COMMIT;

Data Migrations¶

-- migration_002_populate_default_roles.sql
BEGIN;

INSERT INTO roles (role_name, description)
VALUES
    ('admin', 'System administrator'),
    ('user', 'Regular user'),
    ('guest', 'Guest user')
ON CONFLICT (role_name) DO NOTHING;

COMMIT;

Query Optimization¶

Use Indexes Effectively¶

-- Bad - Full table scan
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- Good - Index-friendly query
SELECT * FROM users WHERE email = 'user@example.com';

-- Create functional index if needed
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

Avoid SELECT *¶

-- Bad - Retrieves unnecessary data
SELECT * FROM users WHERE user_id = 1;

-- Good - Specify only needed columns
SELECT user_id, email, first_name, last_name
FROM users
WHERE user_id = 1;

Use EXPLAIN¶

-- Analyze query performance
EXPLAIN ANALYZE
SELECT u.email, COUNT(o.order_id) AS order_count
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.email
HAVING COUNT(o.order_id) > 5;

Testing¶

SQL Linting¶

Use sqlfluff to lint SQL files:

## Install sqlfluff
pip install sqlfluff

## Lint SQL files
sqlfluff lint queries/*.sql

## Auto-fix issues
sqlfluff fix queries/*.sql

## Lint with specific dialect
sqlfluff lint --dialect postgres queries/*.sql

Unit Testing with pgTAP¶

Test PostgreSQL schemas and functions:

## tests/schema_test.sql
BEGIN;

SELECT plan(5);

-- Test table exists
SELECT has_table('users', 'users table should exist');

-- Test columns
SELECT has_column('users', 'id', 'users should have id column');
SELECT has_column('users', 'email', 'users should have email column');

-- Test constraints
SELECT has_pk('users', 'users should have primary key');

-- Test index
SELECT has_index('users', 'idx_users_email', 'email index should exist');

SELECT * FROM finish();
ROLLBACK;

Run with:

pg_prove -d testdb tests/*.sql

Testing with SQLite¶

Simple SQL tests:

## tests/test_queries.sh
#!/bin/bash

## Create test database
sqlite3 test.db < schema.sql

## Test query results
result=$(sqlite3 test.db "SELECT COUNT(*) FROM users;")
if [ "$result" != "0" ]; then
  echo "FAIL: Expected 0 users"
  exit 1
fi

## Insert test data
sqlite3 test.db "INSERT INTO users (name, email) VALUES ('Test', 'test@example.com');"

## Verify insertion
result=$(sqlite3 test.db "SELECT COUNT(*) FROM users WHERE email='test@example.com';")
if [ "$result" != "1" ]; then
  echo "FAIL: User not inserted correctly"
  exit 1
fi

echo "All SQL tests passed"
rm test.db

Integration Testing¶

Test SQL in application context:

## tests/test_database.py
import pytest
import psycopg2

@pytest.fixture
def db_connection():
    conn = psycopg2.connect(
        host='localhost',
        database='test_db',
        user='test_user',
        password='test_pass'
    )
    yield conn
    conn.close()

def test_user_creation(db_connection):
    cursor = db_connection.cursor()

    # Execute SQL
    cursor.execute("""
        INSERT INTO users (name, email)
        VALUES ('Test User', 'test@example.com')
        RETURNING id;
    """)

    user_id = cursor.fetchone()[0]
    assert user_id is not None

    # Verify
    cursor.execute("SELECT email FROM users WHERE id = %s", (user_id,))
    email = cursor.fetchone()[0]
    assert email == 'test@example.com'

    db_connection.rollback()

def test_query_performance(db_connection):
    import time

    cursor = db_connection.cursor()

    start = time.time()
    cursor.execute("SELECT * FROM large_table WHERE indexed_column = 'value'")
    duration = time.time() - start

    assert duration < 1.0, f"Query too slow: {duration}s"

Testing Migrations¶

Test database migrations:

## tests/test_migrations.sh
#!/bin/bash
set -e

## Apply migrations
psql -d test_db -f migrations/001_create_users.sql
psql -d test_db -f migrations/002_add_users_email_index.sql

## Verify schema
result=$(psql -d test_db -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_name='users';")
if [ "$result" != "1" ]; then
  echo "FAIL: users table not created"
  exit 1
fi

## Verify index
result=$(psql -d test_db -t -c "SELECT COUNT(*) FROM pg_indexes WHERE indexname='idx_users_email';")
if [ "$result" != "1" ]; then
  echo "FAIL: email index not created"
  exit 1
fi

echo "Migration tests passed"

Testing with Docker¶

Test SQL in isolated environment:

## docker-compose.test.yml
version: '3.8'

services:
  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_DB: test_db
      POSTGRES_USER: test_user
      POSTGRES_PASSWORD: test_pass
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U test_user"]
      interval: 5s
      timeout: 3s
      retries: 5

  test:
    image: postgres:15-alpine
    depends_on:
      postgres:
        condition: service_healthy
    volumes:
      - ./tests:/tests
      - ./sql:/sql
    environment:
      PGHOST: postgres
      PGDATABASE: test_db
      PGUSER: test_user
      PGPASSWORD: test_pass
    command: >
      sh -c "
        psql -f /sql/schema.sql &&
        psql -f /sql/seed.sql &&
        pg_prove /tests/*.sql
      "

Run tests:

docker-compose -f docker-compose.test.yml up --abort-on-container-exit

Query Plan Testing¶

Test query performance:

-- Explain query plan
EXPLAIN ANALYZE
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01';

-- Test index usage
EXPLAIN (FORMAT JSON)
SELECT * FROM users WHERE email = 'test@example.com';

CI/CD Integration¶

## .github/workflows/sql-test.yml
name: SQL Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest

    services:
      postgres:
        image: postgres:15
        env:
          POSTGRES_DB: test_db
          POSTGRES_USER: test_user
          POSTGRES_PASSWORD: test_pass
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5

    steps:
      - uses: actions/checkout@v4

      - name: Install sqlfluff
        run: pip install sqlfluff

      - name: Lint SQL
        run: sqlfluff lint --dialect postgres sql/*.sql

      - name: Run migrations
        env:
          PGHOST: localhost
          PGDATABASE: test_db
          PGUSER: test_user
          PGPASSWORD: test_pass
        run: |
          for file in migrations/*.sql; do
            psql -f "$file"
          done

      - name: Run tests
        env:
          PGHOST: localhost
          PGDATABASE: test_db
          PGUSER: test_user
          PGPASSWORD: test_pass
        run: |
          psql -c "SELECT version();"
          psql -f tests/test_schema.sql

Coverage Testing¶

Test query coverage:

-- Record queries executed
CREATE TABLE IF NOT EXISTS query_log (
    id SERIAL PRIMARY KEY,
    query_text TEXT,
    executed_at TIMESTAMP DEFAULT NOW()
);

-- Enable query logging (PostgreSQL)
ALTER SYSTEM SET log_statement = 'all';
SELECT pg_reload_conf();

Security Best Practices¶

SQL Injection Prevention¶

Always use parameterized queries; never concatenate user input into SQL.

-- NEVER DO THIS - Vulnerable to SQL injection
-- Python example showing the vulnerability
query = f"SELECT * FROM users WHERE email = '{user_email}'"  -- DANGEROUS!
-- Attacker input: "' OR '1'='1" exposes all data

-- ALWAYS USE - Parameterized queries (Python example)
query = "SELECT * FROM users WHERE email = %s"
cursor.execute(query, (user_email,))  -- Safe - parameters are escaped

-- ALWAYS USE - Prepared statements (Node.js example)
const query = 'SELECT * FROM users WHERE email = $1';
await client.query(query, [userEmail]);  -- Safe

Access Control and Least Privilege¶

Grant minimum necessary permissions to database users.

-- Bad - Granting excessive permissions
GRANT ALL PRIVILEGES ON DATABASE myapp TO app_user;  -- Too broad!
GRANT SUPER ON *.* TO app_user@'%';  -- NEVER grant SUPER!

-- Good - Minimal permissions for application user
CREATE USER 'app_user'@'localhost' IDENTIFIED BY 'SecurePassword123!';

GRANT SELECT, INSERT, UPDATE ON myapp.users TO 'app_user'@'localhost';
GRANT SELECT, INSERT, UPDATE ON myapp.orders TO 'app_user'@'localhost';
GRANT EXECUTE ON PROCEDURE myapp.process_order TO 'app_user'@'localhost';

-- Good - Read-only user for reporting
CREATE USER 'report_user'@'localhost' IDENTIFIED BY 'SecurePassword456!';
GRANT SELECT ON myapp.* TO 'report_user'@'localhost';

-- Good - Revoke dangerous permissions
REVOKE FILE, SUPER, PROCESS ON *.* FROM 'app_user'@'localhost';

Data Encryption¶

Encrypt sensitive data at rest and in transit.

-- Bad - Storing passwords in plain text
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email VARCHAR(255),
    password VARCHAR(255)  -- NEVER store passwords in plain text!
);

INSERT INTO users (user_id, email, password)
VALUES (1, 'user@example.com', 'Password123');  -- Exposed!

-- Good - Use application-level hashing (bcrypt, argon2)
-- Store only hashed passwords
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,  -- Hashed password
    password_salt VARCHAR(255) NOT NULL,  -- Unique salt
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Good - Encrypt sensitive columns (PostgreSQL example)
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE sensitive_data (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    ssn_encrypted BYTEA,  -- Encrypted column
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert encrypted data
INSERT INTO sensitive_data (user_id, ssn_encrypted)
VALUES (1, pgp_sym_encrypt('123-45-6789', 'encryption_key'));

-- Query encrypted data
SELECT user_id, pgp_sym_decrypt(ssn_encrypted, 'encryption_key') AS ssn
FROM sensitive_data
WHERE user_id = 1;

-- Good - Enable SSL/TLS for connections
-- In postgresql.conf:
-- ssl = on
-- ssl_cert_file = 'server.crt'
-- ssl_key_file = 'server.key'

Sensitive Data Handling¶

Protect PII and implement data masking.

-- Good - Data masking for non-production environments
CREATE VIEW users_masked AS
SELECT
    user_id,
    CONCAT(LEFT(email, 3), '***@***.com') AS email_masked,
    CONCAT(LEFT(phone, 3), '-***-****') AS phone_masked,
    first_name,
    'REDACTED' AS last_name_masked
FROM users;

-- Good - Row-level security (PostgreSQL)
ALTER TABLE sensitive_documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY user_documents_policy ON sensitive_documents
    FOR SELECT
    USING (owner_id = current_user_id());

-- Good - Column-level permissions
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    first_name VARCHAR(100),
    last_name VARCHAR(100),
    salary DECIMAL(10,2),  -- Sensitive
    ssn VARCHAR(11)        -- Highly sensitive
);

-- Grant access but hide sensitive columns
GRANT SELECT (employee_id, first_name, last_name) ON employees TO hr_viewer;

-- Only specific roles can see salary
GRANT SELECT (employee_id, first_name, last_name, salary) ON employees TO hr_manager;

Audit Logging¶

Enable comprehensive audit trails for security monitoring.

-- Good - Create audit log table
CREATE TABLE audit_log (
    audit_id BIGSERIAL PRIMARY KEY,
    table_name VARCHAR(100) NOT NULL,
    operation VARCHAR(10) NOT NULL,  -- INSERT, UPDATE, DELETE
    user_name VARCHAR(100) NOT NULL,
    timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    old_values JSONB,
    new_values JSONB,
    ip_address INET
);

-- Good - Audit trigger for sensitive tables
CREATE OR REPLACE FUNCTION audit_trigger_func()
RETURNS TRIGGER AS $$
BEGIN
    IF (TG_OP = 'DELETE') THEN
        INSERT INTO audit_log (table_name, operation, user_name, old_values, ip_address)
        VALUES (TG_TABLE_NAME, TG_OP, current_user, row_to_json(OLD), inet_client_addr());
        RETURN OLD;
    ELSIF (TG_OP = 'UPDATE') THEN
        INSERT INTO audit_log (table_name, operation, user_name, old_values, new_values, ip_address)
        VALUES (TG_TABLE_NAME, TG_OP, current_user, row_to_json(OLD), row_to_json(NEW), inet_client_addr());
        RETURN NEW;
    ELSIF (TG_OP = 'INSERT') THEN
        INSERT INTO audit_log (table_name, operation, user_name, new_values, ip_address)
        VALUES (TG_TABLE_NAME, TG_OP, current_user, row_to_json(NEW), inet_client_addr());
        RETURN NEW;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Apply audit trigger to sensitive tables
CREATE TRIGGER users_audit_trigger
    AFTER INSERT OR UPDATE OR DELETE ON users
    FOR EACH ROW EXECUTE FUNCTION audit_trigger_func();

CREATE TRIGGER financial_transactions_audit
    AFTER INSERT OR UPDATE OR DELETE ON financial_transactions
    FOR EACH ROW EXECUTE FUNCTION audit_trigger_func();

Backup Security¶

Protect database backups with encryption.

## Bad - Unencrypted backup
pg_dump myapp > backup.sql  # Plain text backup!
mysqldump -u root -p myapp > backup.sql  # No encryption!

## Good - Encrypted backup (PostgreSQL)
pg_dump myapp | gpg --encrypt --recipient admin@example.com > backup.sql.gpg

## Good - Encrypted backup with compression
pg_dump myapp | gzip | gpg --encrypt --recipient admin@example.com > backup.sql.gz.gpg

## Good - Secure backup permissions
chmod 600 backup.sql.gpg  # Only owner can read/write

## Good - Store backups securely
aws s3 cp backup.sql.gpg s3://secure-backups/ --sse aws:kms --sse-kms-key-id alias/backup-key

Connection Security¶

Enforce secure database connections.

-- Good - Require SSL for specific users
ALTER USER app_user REQUIRE SSL;

-- Good - Restrict connections by IP (MySQL)
CREATE USER 'app_user'@'10.0.1.%' IDENTIFIED BY 'SecurePassword123!';  -- Specific subnet only

-- Good - Disable remote root access
DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1');
FLUSH PRIVILEGES;

-- Good - Connection limits
ALTER USER app_user WITH CONNECTION LIMIT 50;  -- Prevent connection exhaustion

Secure Stored Procedures¶

Validate inputs and use security definer carefully.

-- Bad - Stored procedure vulnerable to injection
CREATE PROCEDURE get_user_by_email(IN email_input VARCHAR(255))
BEGIN
    SET @query = CONCAT('SELECT * FROM users WHERE email = "', email_input, '"');  -- VULNERABLE!
    PREPARE stmt FROM @query;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
END;

-- Good - Use parameterized queries in procedures
CREATE PROCEDURE get_user_by_email(IN email_input VARCHAR(255))
BEGIN
    SELECT user_id, email, first_name, last_name
    FROM users
    WHERE email = email_input;  -- Safe - parameterized
END;

-- Good - Input validation in stored procedures
CREATE PROCEDURE create_user(
    IN email_input VARCHAR(255),
    IN first_name_input VARCHAR(100)
)
BEGIN
    -- Validate email format
    IF email_input NOT REGEXP '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}$' THEN
        SIGNAL SQLSTATE '45000'
        SET MESSAGE_TEXT = 'Invalid email format';
    END IF;

    -- Validate name length
    IF LENGTH(first_name_input) < 2 OR LENGTH(first_name_input) > 100 THEN
        SIGNAL SQLSTATE '45000'
        SET MESSAGE_TEXT = 'Invalid name length';
    END IF;

    INSERT INTO users (email, first_name) VALUES (email_input, first_name_input);
END;

Prevent Information Disclosure¶

Avoid exposing sensitive information in error messages.

-- Bad - Exposing table structure in errors
SELECT * FROM users WHERE user_id = 'invalid';  -- Error reveals table schema!

-- Good - Handle errors gracefully (application level)
-- Python example
try:
    cursor.execute("SELECT * FROM users WHERE user_id = %s", (user_id,))
except DatabaseError as e:
    # Log detailed error server-side
    logger.error(f"Database error: {str(e)}")
    # Return generic error to client
    return {"error": "An error occurred processing your request"}

-- Good - Use views to hide sensitive columns
CREATE VIEW public_user_profile AS
SELECT user_id, username, avatar_url, created_at
FROM users;  -- Hides email, password_hash, etc.

GRANT SELECT ON public_user_profile TO app_user;
REVOKE SELECT ON users FROM app_user;  -- Deny access to full table

Common Pitfalls¶

NULL Comparison Confusion¶

Issue: Using = NULL or != NULL instead of IS NULL or IS NOT NULL returns unexpected results.

Example:

## Bad - NULL comparisons don't work with = or !=
SELECT * FROM users WHERE email = NULL;  -- ❌ Returns 0 rows (not NULL rows)
SELECT * FROM users WHERE email != NULL;  -- ❌ Also returns 0 rows!

UPDATE users SET status = 'inactive' WHERE last_login = NULL;  -- ❌ Updates 0 rows

Solution: Use IS NULL and IS NOT NULL operators.

## Good - Correct NULL handling
SELECT * FROM users WHERE email IS NULL;  -- ✅ Finds rows where email is NULL

SELECT * FROM users WHERE email IS NOT NULL;  -- ✅ Finds rows with non-NULL email

UPDATE users
SET status = 'inactive'
WHERE last_login IS NULL;  -- ✅ Updates rows with NULL last_login

Key Points:

NULL is not equal to anything, including NULL (NULL = NULL is false)
Always use IS NULL and IS NOT NULL for NULL checks
COALESCE(column, 'default') provides default values for NULLs
NULLIF(value1, value2) returns NULL if values are equal

Implicit Type Conversion Performance Issues¶

Issue: Comparing different data types forces type conversion, preventing index usage and slowing queries.

Example:

## Bad - String comparison on integer column
SELECT * FROM orders WHERE order_id = '12345';  -- ❌ Forces type conversion, no index

## Bad - Integer comparison on string column
SELECT * FROM users WHERE user_code = 123;  -- ❌ Table scan, not index seek

Solution: Match data types in comparisons.

## Good - Correct data types
SELECT * FROM orders WHERE order_id = 12345;  -- ✅ Integer comparison, uses index

SELECT * FROM users WHERE user_code = '123';  -- ✅ String comparison, uses index

## Good - Explicit casting when needed
SELECT *
FROM orders o
JOIN order_items oi ON o.order_id = CAST(oi.order_id_string AS INTEGER);

Key Points:

Match column data types in WHERE clauses and JOINs
Implicit conversion prevents index usage
Check execution plans for type conversion warnings
Use explicit CAST() or CONVERT() when conversion is necessary

NOT IN with NULL Values¶

Issue: NOT IN with a subquery containing NULL values returns no rows unexpectedly.

Example:

## Bad - NOT IN with possible NULLs
SELECT * FROM products
WHERE product_id NOT IN (
    SELECT product_id FROM discontinued_products  -- ❌ If any NULL, returns 0 rows!
);

## This happens because:
## product_id NOT IN (1, 2, NULL)
## is equivalent to:
## product_id != 1 AND product_id != 2 AND product_id != NULL
## The last comparison is always UNKNOWN, so entire condition fails

Solution: Use NOT EXISTS or filter out NULLs.

## Good - Use NOT EXISTS
SELECT * FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM discontinued_products dp
    WHERE dp.product_id = p.product_id  -- ✅ Handles NULLs correctly
);

## Good - Filter NULLs in subquery
SELECT * FROM products
WHERE product_id NOT IN (
    SELECT product_id
    FROM discontinued_products
    WHERE product_id IS NOT NULL  -- ✅ Exclude NULLs
);

Key Points:

NOT IN fails with NULL values in subquery
Prefer NOT EXISTS over NOT IN for subqueries
IN works fine with NULLs, NOT IN does not
Always check for NULL handling in subqueries

DISTINCT Hiding Performance Issues¶

Issue: Using DISTINCT to fix duplicate rows masks underlying join or query logic problems.

Example:

## Bad - DISTINCT hiding incorrect join
SELECT DISTINCT
    u.username,
    u.email,
    o.order_date  -- ❌ Why duplicates? Probably wrong join!
FROM users u
JOIN orders o ON u.user_id = o.user_id
JOIN order_items oi ON o.order_id = oi.order_id;  -- Cartesian product hidden by DISTINCT

Solution: Fix the join logic or use appropriate aggregation.

## Good - Correct join or aggregation
SELECT
    u.username,
    u.email,
    COUNT(DISTINCT o.order_id) AS order_count,
    MAX(o.order_date) AS latest_order
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.username, u.email;  -- ✅ Proper aggregation

## Or if you really need one row per user with latest order
SELECT
    u.username,
    u.email,
    o.order_date
FROM users u
JOIN LATERAL (
    SELECT order_date
    FROM orders
    WHERE user_id = u.user_id
    ORDER BY order_date DESC
    LIMIT 1
) o ON true;  -- ✅ Explicitly get one order per user

Key Points:

DISTINCT is expensive (sorting or hashing)
DISTINCT often indicates incorrect joins
Fix the root cause instead of masking with DISTINCT
Use GROUP BY with aggregation for proper deduplication

Transaction Isolation Level Misunderstanding¶

Issue: Wrong isolation level causes phantom reads, dirty reads, or unnecessary blocking.

Example:

## Bad - Default isolation may allow dirty reads
BEGIN TRANSACTION;  -- ❌ Default isolation (often READ COMMITTED)

SELECT SUM(balance) FROM accounts WHERE user_id = 123;
-- Another transaction updates balance here
UPDATE accounts SET balance = balance - 100 WHERE user_id = 123;

COMMIT;  -- ❌ Sum may be inconsistent due to concurrent updates

## Bad - SERIALIZABLE causing deadlocks
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;

SELECT * FROM inventory WHERE product_id = 1;
-- Locks entire result set, causes deadlocks with concurrent transactions
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1;

COMMIT;

Solution: Choose appropriate isolation level for use case.

## Good - REPEATABLE READ for consistent reads
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SELECT SUM(balance) FROM accounts WHERE user_id = 123;
-- Other transactions can't modify these rows until commit
UPDATE accounts SET balance = balance - 100 WHERE user_id = 123;

COMMIT;

## Good - READ COMMITTED with explicit locking when needed
BEGIN TRANSACTION;

SELECT * FROM inventory
WHERE product_id = 1
FOR UPDATE;  -- ✅ Explicit row lock

UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 1;

COMMIT;

## Good - READ UNCOMMITTED for reports (accept dirty reads)
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

SELECT COUNT(*) FROM large_table;  -- ✅ Fast, no locking, dirty reads OK for reports

Key Points:

READ UNCOMMITTED: Fastest, allows dirty reads (use for reports)
READ COMMITTED: Default, prevents dirty reads
REPEATABLE READ: Prevents non-repeatable reads, may have phantom reads
SERIALIZABLE: Strictest, prevents phantom reads, highest locking

Anti-Patterns¶

❌ Avoid: SELECT * in Production¶

-- Bad - Over-fetching data
SELECT * FROM users;

-- Good - Explicit columns
SELECT user_id, email, first_name, last_name FROM users;

❌ Avoid: N+1 Queries¶

-- Bad - N+1 query problem (fetching orders for each user in application loop)
-- Application code loop:
-- for each user:
--     SELECT * FROM orders WHERE user_id = ?

-- Good - Single query with JOIN
SELECT
    u.user_id,
    u.email,
    o.order_id,
    o.total
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;

❌ Avoid: Unparameterized Queries¶

-- Bad - SQL injection risk
-- query = "SELECT * FROM users WHERE email = '" + user_input + "'"

-- Good - Parameterized query
-- query = "SELECT * FROM users WHERE email = $1"
-- execute(query, [user_input])

❌ Avoid: Missing WHERE in UPDATE/DELETE¶

-- Bad - Updates all rows!
UPDATE users SET status = 'inactive';

-- Good - Specific WHERE clause
UPDATE users
SET status = 'inactive'
WHERE last_login < CURRENT_DATE - INTERVAL '90 days';

❌ Avoid: Using DISTINCT to Fix Duplicates¶

-- Bad - DISTINCT hides the real problem
SELECT DISTINCT
    u.user_id,
    u.email,
    o.order_id
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id;  -- ❌ Multiple orders create duplicates

-- Good - Fix the JOIN logic
SELECT
    u.user_id,
    u.email,
    ARRAY_AGG(o.order_id) AS order_ids
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.user_id, u.email;

❌ Avoid: Not Using Indexes¶

-- Bad - Querying without indexes
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email VARCHAR(255),
    status VARCHAR(50)
);
-- Queries on email and status will be slow!

-- Good - Add appropriate indexes
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    status VARCHAR(50) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_users_status ON users(status);
CREATE INDEX idx_users_created_at ON users(created_at);

❌ Avoid: Large IN Clauses¶

-- Bad - Large IN clause (thousands of IDs)
SELECT * FROM orders
WHERE user_id IN (1, 2, 3, ..., 10000);  -- ❌ Performance issues!

-- Good - Use temporary table or JOIN
CREATE TEMP TABLE temp_user_ids (user_id INT);
INSERT INTO temp_user_ids VALUES (1), (2), (3), ..., (10000);

SELECT o.*
FROM orders o
INNER JOIN temp_user_ids t ON o.user_id = t.user_id;

Best Practices¶

Index Strategically¶

Create indexes on frequently queried columns:

-- Index foreign keys
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Composite index for common query patterns
CREATE INDEX idx_orders_status_created ON orders(status, created_at);

-- Partial index for specific conditions
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';

Use Parameterized Queries¶

Prevent SQL injection with parameterized queries:

-- Good - Parameterized (Python example)
cursor.execute(
    "SELECT * FROM users WHERE email = %s",
    (user_email,)
)

-- Bad - String interpolation (SQL injection risk)
-- cursor.execute(f"SELECT * FROM users WHERE email = '{user_email}'")

Optimize JOIN Performance¶

Choose the right JOIN type and order:

-- Good - Filter before joining
SELECT u.name, o.total
FROM (
    SELECT user_id, name
    FROM users
    WHERE status = 'active'
) u
INNER JOIN orders o ON u.user_id = o.user_id;

-- Use appropriate JOIN hints when needed
SELECT /*+ ORDERED */ u.name, o.total
FROM users u
INNER JOIN orders o ON u.user_id = o.user_id;

Limit Result Sets¶

Always use LIMIT/TOP for potentially large result sets:

-- Pagination with LIMIT/OFFSET
SELECT user_id, email
FROM users
ORDER BY created_at DESC
LIMIT 100 OFFSET 0;

-- Modern pagination with keyset
SELECT user_id, email, created_at
FROM users
WHERE created_at < '2024-01-01'
ORDER BY created_at DESC
LIMIT 100;

Use Transactions Appropriately¶

Wrap related operations in transactions:

BEGIN TRANSACTION;

UPDATE accounts SET balance = balance - 100 WHERE account_id = 1;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 2;

INSERT INTO transaction_log (from_account, to_account, amount)
VALUES (1, 2, 100);

COMMIT;

Explicitly List Columns (Avoid SELECT *)¶

Explicitly list columns you need:

-- Good - Specific columns
SELECT user_id, email, created_at
FROM users
WHERE status = 'active';

-- Bad - SELECT * wastes bandwidth
-- SELECT * FROM users WHERE status = 'active';

Use CTEs for Readability¶

Common Table Expressions improve query readability:

WITH active_users AS (
    SELECT user_id, email
    FROM users
    WHERE status = 'active'
),
recent_orders AS (
    SELECT user_id, COUNT(*) AS order_count
    FROM orders
    WHERE created_at > CURRENT_DATE - INTERVAL '30 days'
    GROUP BY user_id
)
SELECT
    au.email,
    COALESCE(ro.order_count, 0) AS orders_last_30_days
FROM active_users au
LEFT JOIN recent_orders ro ON au.user_id = ro.user_id;

Analyze Query Performance¶

Use EXPLAIN to understand query execution:

-- PostgreSQL
EXPLAIN ANALYZE
SELECT u.email, COUNT(o.order_id)
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.email;

-- MySQL
EXPLAIN FORMAT=JSON
SELECT u.email, COUNT(o.order_id)
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
GROUP BY u.email;

Handle NULLs Explicitly¶

Be explicit about NULL handling:

-- Good - Explicit NULL handling
SELECT
    user_id,
    COALESCE(phone, 'Not provided') AS phone,
    NULLIF(email, '') AS email  -- Convert empty strings to NULL
FROM users;

-- Check for NULL explicitly
WHERE email IS NOT NULL
  AND status IS NOT NULL;

Use Database Constraints¶

Enforce data integrity at the database level:

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL UNIQUE,
    age INTEGER CHECK (age >= 18),
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status VARCHAR(20) DEFAULT 'active' CHECK (status IN ('active', 'inactive', 'suspended'))
);

-- Foreign key constraints
ALTER TABLE orders
    ADD CONSTRAINT fk_orders_users
    FOREIGN KEY (user_id)
    REFERENCES users(user_id)
    ON DELETE CASCADE;

Comments¶

-- Single-line comment for simple explanations
SELECT user_id, email FROM users; -- Active users only

/*
 * Multi-line comment for complex logic
 * This query calculates user lifetime value based on:
 * - Total completed orders
 * - Average order value
 * - Customer tenure
 */
SELECT
    u.user_id,
    u.email,
    COUNT(o.order_id) AS total_orders,
    AVG(o.total) AS avg_order_value,
    EXTRACT(YEAR FROM AGE(CURRENT_DATE, u.created_at)) AS years_active
FROM users u
LEFT JOIN orders o ON u.user_id = o.user_id
WHERE o.status = 'completed'
GROUP BY u.user_id, u.email, u.created_at;

References¶

SQL Standards¶

SQL-92 Standard
Modern SQL - SQL features across databases

Database-Specific Documentation¶

Tools¶

sqlfluff - SQL linter
pgFormatter - PostgreSQL formatter
DBeaver - Universal database tool

Status: Active