MEDIUMArchitecturelargeSQL & Databases

Normalize Data Appropriately

Normalize your database schema to at least 3NF to eliminate data duplication. Denormalized data means the same fact is stored in multiple places — when you update one copy, the others become stale, creating data inconsistencies that are nearly impossible to track down.

Why This Matters

Denormalized data stores the same fact in multiple rows or tables. When you update a customer's email in one place but not another, the database contains contradictory information. These inconsistencies are silent — no error is raised — and they propagate through reports, emails, and displays. Finding and fixing them after the fact requires manual data audits. Normalization to 3NF prevents this by storing each fact exactly once.

Related Rules

Add Indexes for Frequent Queries

Performance

HIGH

Avoid SELECT * in Production

Performance

MEDIUM

Use Parameterized Queries

Security

CRITICAL

Add Foreign Key Constraints

Quality

HIGH

Catch this automatically on every PR

BeforeMerge scans your pull requests against this rule and 4+ others. Get actionable feedback before code ships.

Join Waitlist Browse All Rules

Why this matters

Database normalization is the process of organizing data to eliminate redundancy. When the same fact is stored in multiple places, you create update anomalies:

Update anomaly: You change a customer's address in the orders table but not the customers table. Now the database contradicts itself.
Insert anomaly: You can't add a new category without also adding a product.
Delete anomaly: Deleting the last product in a category also deletes the category information.

These anomalies are silent. The database doesn't raise errors — it just stores contradictory data. You discover the problem weeks later when a report shows the wrong address or an email goes to the wrong inbox.

Normalization to Third Normal Form (3NF) eliminates these anomalies by ensuring each fact is stored exactly once.

The rule

Normalize your schema to at least Third Normal Form (3NF):

1NF: Every column contains atomic values (no arrays or nested objects in columns)
2NF: Every non-key column depends on the entire primary key (no partial dependencies)
3NF: Every non-key column depends only on the primary key (no transitive dependencies)

Denormalize intentionally and selectively for read performance, with clear documentation of which tables are denormalized and how consistency is maintained.

Bad example

-- BAD: customer name and email duplicated in every order
CREATE TABLE orders (
  id uuid PRIMARY KEY,
  customer_id uuid NOT NULL,
  customer_name text NOT NULL,   -- duplicated from customers table
  customer_email text NOT NULL,  -- duplicated from customers table
  product_name text NOT NULL,    -- duplicated from products table
  product_price numeric NOT NULL, -- duplicated from products table
  quantity integer NOT NULL,
  total numeric NOT NULL
);
-- When a customer changes their email, you must update every order row

Good example

-- GOOD: each fact stored exactly once
CREATE TABLE customers (
  id uuid PRIMARY KEY,
  name text NOT NULL,
  email text NOT NULL
);
 
CREATE TABLE products (
  id uuid PRIMARY KEY,
  name text NOT NULL,
  price numeric NOT NULL
);
 
CREATE TABLE orders (
  id uuid PRIMARY KEY,
  customer_id uuid NOT NULL REFERENCES customers(id),
  created_at timestamptz NOT NULL DEFAULT now()
);
 
CREATE TABLE order_items (
  id uuid PRIMARY KEY,
  order_id uuid NOT NULL REFERENCES orders(id),
  product_id uuid NOT NULL REFERENCES products(id),
  quantity integer NOT NULL,
  unit_price numeric NOT NULL  -- price at time of purchase (intentional snapshot)
);

How to detect

Look for columns that store data belonging to another entity:

Column names like customer_name in an orders table
The same column appearing in multiple tables
Columns that need to be updated in multiple places when the source value changes

Remediation

Identify duplicated data by listing every column and asking: "Is this fact owned by this table, or does it belong to a related table?"
Extract duplicated data into its own table with a foreign key reference
Use JOINs to reassemble the data at query time
For read-heavy paths where JOINs are too slow, use materialized views or intentional denormalization with triggers to maintain consistency
Document any intentional denormalization with a comment explaining why and how consistency is maintained