<aside> 🗃️

TL;DR: Normalization is the practice of organizing data so that each piece of information lives in exactly one place. When you update something, you update it once. When you look something up, you find one answer — not three conflicting ones.


This isn't database theory for its own sake. It's the difference between a system that scales gracefully and one that collapses into chaos.

</aside>


Flat vs. Normalized

flowchart TB
    classDef bad fill:#fee2e2,stroke:#dc2626,color:#7f1d1d,stroke-width:2px
    classDef good fill:#d1fae5,stroke:#059669,color:#064e3b,stroke-width:2px
    classDef neutral fill:#f3f4f6,stroke:#6b7280,stroke-width:1px

    subgraph FLAT["❌ FLAT / Un-normalized"]
        direction TB
        F1["Song A | Jordan | [email protected]"]
        F2["Song B | Jordan | [email protected]"]
        F3["Song C | Jordan | [email protected]"]
        F4["⚠️ Jordan changes email<br/>= update 3 rows"]
    end

    subgraph NORM["✅ NORMALIZED"]
        direction TB
        subgraph CONTACTS["Contacts"]
            CON["Jordan | [email protected]"]
        end
        subgraph SONGS["Songs"]
            S1["Song A → Jordan"]
            S2["Song B → Jordan"]
            S3["Song C → Jordan"]
        end
        S1 & S2 & S3 -.->|reference| CON
        OK["✓ Jordan changes email<br/>= update 1 row"]
    end

    class F1,F2,F3,F4 bad
    class CON,S1,S2,S3,OK good

<aside> 💡

The difference: Flat data copies information everywhere. Normalized data stores it once and references it. When data changes, normalized systems update once; flat systems break.

</aside>


What normalization actually means

Imagine you have a spreadsheet tracking your songs. For each song, you include the collaborator's name, email, and phone number.

Song Collaborator Email Phone
Track A Jordan Smith [email protected] 555-1234
Track B Jordan Smith [email protected] 555-1234
Track C Jordan Smith [email protected] 555-1234

Jordan changes their email. Now you have to update three rows. Miss one? Now you have conflicting data. Multiply this by 100 collaborators across 500 songs, and you have a nightmare.

Normalization solves this by storing Jordan's contact info in one place and referencing it from your songs:

Contacts:

Name Email Phone
Jordan Smith [email protected] 555-1234

Songs:

Song Collaborator
Track A → Jordan Smith
Track B → Jordan Smith
Track C → Jordan Smith

Now when Jordan's email changes, you update it once. Every song that references Jordan automatically has the correct info.


The core principle: one source of truth

<aside> 🔑

The golden rule of normalization:


Every piece of information should live in exactly one place. Everything else should point to that place, not copy from it.


This eliminates: