Database Normalization | DBMS
A database is an organized collection of data whose content must be quickly and easily accessed, managed and updated.
A relational database is one whose data are split up into tables, sometimes called relations.
What is normalization ?
Normalization is set of rules which :-
- Cures the ‘SpreadSheet Syndrome’.
- Store only the minimal amount of information.
- Remove redundancies.
- Remove anomalies.
- Restructure data.
What are normalization rules?
- Prevent update anomalies (mistakes) and data inconsistencies.
- Degrade performance, usually only slightly.
- Denormalization is common on read-only databases and in report generation or data warehouses.
- Normalizing the data model is a technical exercise, done in a back room; it does not change the business rules.
There are a few rules for database normalization. Each rule is called a “normal form.”
Types of normal form.
- First Normal Form (1NF): All occurrences of an entity must contain the same number of attributes.
- Second Normal Form (2NF): All non-key fields must be a function of the key.
- Third Normal Form (3NF): All non-key fields must not be a function of other non-key fields.
- Forth Normal Form (BCNF): A row must not contain two or more independent multi-valued facts about an entity.
- Fifth Normal Form (5NF): A record cannot be reconstructed from several smaller record types.
The Zero Form:
- No rules of normalization have been applied.
- Where most people start.
- No room for growth.
- Usually wastes space.
First Normal Form (1NF) :
All rows must be fixed length. It is a restrictive assumption not a design principle.
- Eliminate repeating columns in each table.
- Create a separate table for each set of related data.
- Identify each set of related data with a primary key.
- All attributes are single valued & non-repeating.
- Does not allow variable length lists.
Benefits: Now we can have infinite phone numbers or company addresses for each contact.
Drawback: Now we have to type in everything over and over again. This leads to inconsistency, redundancy and wasting space. Thus, the second normal form.
Second Normal Form (2NF):
All non-key fields must be a function of the full key.
- Create separate tables for sets of values that apply to multiple records.
- Relate these tables with a “foreign key”.
- Addresses the concept of removing duplicative data.
- Meet all the requirements of the first normal form.
- Create relationships between these new tables and their pre-decessors through the use of foreign keys.
- Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Third Normal Form (3NF):
Non-key fields cannot be a function of other non-key fields.
- Eliminate fields that do not depend on the primary key.
- Each non-primary key attribute must be dependent only on primary key.
Fourth Normal Form | Boyce Codd Normal Form (BCNF):
A row should not contain two or more independent multi-valued facts about an entity.
- In many-to-many relationships, independent entities cannot be stored in the same table.
- A relation is in 4NF if it has no multi-valued dependencies.
- Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it’s extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies.
Example:
An employee may have several skills and languages.
Blank skill could mean:
- Person has no skill.
- Attribute doesn’t apply to this employee.
- Data is unknown.
- Data may be found in another record (as in this case).
Solution:
Two entity types, employee-skill and employee-language.
Fifth Normal Form (5NF):
A record cannot be reconstructed from several smaller record types.
- The “very esoteric” one that is probably not required to get the most out of your database.
- “The original table must be reconstructed from the tables into which it has been broken down.”
- The rule ensures that you have not created any extraneous columns and all the tables are only as large as they need to be. The rule do exist, but are rarely considered in practical design.
Example.
If an agent sells a certain product and she represents the company, then she sells that product for that company.
We can reconstruct all true facts from 3 tables instead of the single table.