Feature scaling is a method of preparing data that is used to make sure that the independent features (variables) in a dataset have a standardized or normalized range.
It is an important step in many machine learning algorithms, especially those that are affected by the size of the input features.
Feature scaling makes sure that all features are on a similar scale. This prevents certain features from overpowering others while the model is being trained.
The two common methods for feature scaling are:
1. Min-Max Scaling (Normalization):
- Min-Max scaling rescales the features to a fixed range, typically between 0 and 1. It shifts the values of the feature to lie within this range.
- The formula for Min-Max scaling is:
X_scaled = (X - X_min) / (X_max - X_min)
Where:
- X_scaled is the scaled value of the feature.
- X is the original value of the feature.
- X_min is the minimum value of the feature in the dataset.
- X_max is the maximum value of the feature in the dataset.
- Min-Max scaling works well when the data doesn’t follow a normal distribution and has a limited range.
2. Standardization (Z-Score Scaling):
- Standardization scales the features to have zero mean and unit variance. It transforms the features such that the mean of each feature is 0, and the standard deviation is 1.
- The formula for standardization is:
X_scaled = (X - mean(X)) / std(X)
Where:
- X_scaled is the scaled value of the feature.
- X is the original value of the feature.
- mean(X) is the mean of the feature values in the dataset.
- std(X) is the standard deviation of the feature values in the dataset.
- Standardization is suitable when the data follows a Gaussian distribution or when the features have different units of measurement.