Normalize your database schema to at least 3NF to eliminate data duplication. Denormalized data means the same fact is stored in multiple places — when you update one copy, the others become stale, creating data inconsistencies that are nearly impossible to track down.
Why This Matters
Denormalized data stores the same fact in multiple rows or tables. When you update a customer's email in one place but not another, the database contains contradictory information. These inconsistencies are silent — no error is raised — and they propagate through reports, emails, and displays. Finding and fixing them after the fact requires manual data audits. Normalization to 3NF prevents this by storing each fact exactly once.
Database normalization is the process of organizing data to eliminate redundancy. When the same fact is stored in multiple places, you create update anomalies:
Update anomaly: You change a customer's address in the orders table but not the customers table. Now the database contradicts itself.
Insert anomaly: You can't add a new category without also adding a product.
Delete anomaly: Deleting the last product in a category also deletes the category information.
These anomalies are silent. The database doesn't raise errors — it just stores contradictory data. You discover the problem weeks later when a report shows the wrong address or an email goes to the wrong inbox.
Normalization to Third Normal Form (3NF) eliminates these anomalies by ensuring each fact is stored exactly once.
The rule
Normalize your schema to at least Third Normal Form (3NF):
1NF: Every column contains atomic values (no arrays or nested objects in columns)
2NF: Every non-key column depends on the entire primary key (no partial dependencies)
3NF: Every non-key column depends only on the primary key (no transitive dependencies)
Denormalize intentionally and selectively for read performance, with clear documentation of which tables are denormalized and how consistency is maintained.
Bad example
-- BAD: customer name and email duplicated in every orderCREATE TABLE orders ( id uuid PRIMARY KEY, customer_id uuid NOT NULL, customer_name text NOT NULL, -- duplicated from customers table customer_email text NOT NULL, -- duplicated from customers table product_name text NOT NULL, -- duplicated from products table product_price numeric NOT NULL, -- duplicated from products table quantity integer NOT NULL, total numeric NOT NULL);-- When a customer changes their email, you must update every order row
Good example
-- GOOD: each fact stored exactly onceCREATE TABLE customers ( id uuid PRIMARY KEY, name text NOT NULL, email text NOT NULL);CREATE TABLE products ( id uuid PRIMARY KEY, name text NOT NULL, price numeric NOT NULL);CREATE TABLE orders ( id uuid PRIMARY KEY, customer_id uuid NOT NULL REFERENCES customers(id), created_at timestamptz NOT NULL DEFAULT now());CREATE TABLE order_items ( id uuid PRIMARY KEY, order_id uuid NOT NULL REFERENCES orders(id), product_id uuid NOT NULL REFERENCES products(id), quantity integer NOT NULL, unit_price numeric NOT NULL -- price at time of purchase (intentional snapshot));
How to detect
Look for columns that store data belonging to another entity:
Column names like customer_name in an orders table
The same column appearing in multiple tables
Columns that need to be updated in multiple places when the source value changes
Remediation
Identify duplicated data by listing every column and asking: "Is this fact owned by this table, or does it belong to a related table?"
Extract duplicated data into its own table with a foreign key reference
Use JOINs to reassemble the data at query time
For read-heavy paths where JOINs are too slow, use materialized views or intentional denormalization with triggers to maintain consistency
Document any intentional denormalization with a comment explaining why and how consistency is maintained