Normalization vs. Denormalization

Overview

Database normalization is organizing data to reduce redundancy and improve integrity. Denormalization deliberately introduces redundancy to improve read performance.

It's a fundamental trade-off: normalized data is cleaner and more maintainable but may require complex joins. Denormalized data is faster to read but harder to maintain consistency.

Key Concepts

Normalization

Organizing data into separate tables to eliminate redundancy. Follows normal forms (1NF, 2NF, 3NF, BCNF).

Denormalization

Intentionally adding redundancy by storing data in multiple places to avoid expensive joins and improve query speed.

Normal Forms

1NF: Atomic values. 2NF: No partial dependencies. 3NF: No transitive dependencies. Higher forms exist but rarely used.

How It Works

Normalized (3NF): Users: id, name Orders: id, user_id, total OrderItems: id, order_id, product_id, quantity

To get order with user name: SELECT * FROM Orders JOIN Users ON Orders.user_id = Users.id (Requires join, slower but no duplication)

Denormalized: Orders: id, user_id, user_name, total, items_json

To get order with user name: SELECT * FROM Orders WHERE id = 123 (Fast, single table, but user_name duplicated, inconsistency risk)

Use Cases

Normalization: OLTP systems (banking, e-commerce) where data integrity is critical

Normalization: When write operations are frequent

Denormalization: Read-heavy applications (reporting, analytics)

Denormalization: When query performance is critical

Denormalization: Data warehouses and OLAP systems

Best Practices

Start with normalization for data integrity

Denormalize strategically for proven bottlenecks

Use materialized views for read optimization

Implement proper indexing before denormalizing

Monitor query performance to identify slow joins

Use caching as an alternative to denormalization

Document denormalization decisions

Ensure write logic maintains consistency across denormalized data

Interview Tips

What Interviewers Look For

•
Explain the read-write trade-off: normalization optimizes writes, denormalization optimizes reads
•
Discuss normal forms but mention 3NF is usually sufficient
•
Give examples of when to denormalize: reporting, analytics, read-heavy systems
•
Mention alternatives: caching, materialized views, read replicas
•
Talk about consistency challenges with denormalized data
•
Explain how to maintain consistency: triggers, application logic, eventual consistency

Module 1: The Foundations of the Web

Module 2: Networking & Content Delivery

Module 3: Database Fundamentals & Storage

Module 4: Scaling Strategies

Module 5: Scaling the Database

Module 6: Advanced Distributed Systems

Module 7: Real-Time Communication & Reliability