Normalization is a database design technique which organizes tables in a manner that reduces redundancy and dependency of data.
It divides larger tables to smaller tables and link them using relationships.
Purpose of Normalization,
- Eliminating redundant (useless) data.
- Ensuring data dependencies make sense i.e data is logically stored.
Problems without Normalization
Insertion, deletion, updation of database may face to data loss, without normalization.
To update student address who repeated more than once we need to update S_Address column in all the rows, else data will become inconsistent.
Suppose we are not having ay value to insert in S_Subject, than we need to insert null there, which lead to insertion anamoly.
If we need to drop a subject , than full row need to be dropped, entire student record will be deleted, which lead to deletion anamoly.
Normalization rule are divided into following normal form.
- First Normal Form
- Second Normal Form
- Third Normal Form
First Normal Form (1NF)
- All rows unique.
- All columns unique.
- Each row must have a primary key or group of keys act as primary key.
Table not in normal form-
In 1NF, no colums contain more than one value.
1NF of above table will be-
- Data redundancy increases.
- Each row will be unique.
Second Normal Form (2NF)
In above table two rows are identical in column Student and Age.Age depends on Student.Subject also depends on student.
2NF will be,
Now above 2NF will not suffer update anomaly now.
Third Normal Form (3NF)
- Every non-prime attribute of table must depend on primary key.
- Non-prime attribute must notbe determined by another non-prime attribute. (called transitive functional dependency)
- Table must be in 2NF.
In this table S_ID is a primary key.
Street, City and State depends upon Pincode.
The dependency between Pincode and other fields is called transitive dependency.
To apply 3NF, we need to move the Street, City and State to new table, with Pincode as primary key.
- Data duplication is reduced.
- Data integrity achieved.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form.
This form deals with certain type of anamoly that is not handled by 3NF.
A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
A 3NF table needed following conditions to be in BCNF
- It must be in 3NF.
- For each functional dependency (X->Y), X should be a super key.
Consider the following relationship: R(W,X,Y,Z)
And following dependencies:
Above table is already in 3NF, Keys are W and XY.
W-> XYZ, W is the super key.
XY-> WZ, XY is also a key.
Z->X, D is not a key.
So, we can break relationship in to (W,X,Y,Z)