当前位置：网站首页>[AI practice] data normalization and standardization of machine learning data processing

[AI practice] data normalization and standardization of machine learning data processing

2022-06-23 07:15:00 【szZack】

Data normalization for machine learning data processing 、 Standardization

This paper introduces three normalization methods 、 Standardized methods .

1.min-max Standardization (Min-max normalization)

Linear transformation of the original data , Make the result fall to [0,1] Section , The conversion function is as follows ：
$x^*=\frac{x-min}{max-min}$

matters needing attention
max,min Must be fixed

2.z-score( Standard deviation ) Standardization

The mean value of the original data （mean） And standard deviation （standard deviation） Standardize data .

The processed data conform to the standard normal distribution , That is, the mean value is 0, The standard deviation is 1, The transformation function is ：
$x^* = \frac{x - μ }{σ}$
among μ Is the mean of all sample data ,σ Is the standard deviation of all sample data .

3.nonlinearity( nonlinear ) normalization

The nonlinear normalization method is often used in Scenarios with large data differentiation , Some values are very large , Some are very small . Through some mathematical functions , Mapping the original values .

The method includes log, tangent etc. , According to the distribution of data , The curve that determines the nonlinear function ：

Logarithmic function transformation method
such as $y = l n (x)$ , The corresponding normalization method is ：
$x^*= \frac{ln(x)}{ln(max)}$
among $m a x$ Represents the maximum value of the sample data , $x^*$ Is the normalized value , $x$ For input value , And all sample data must be greater than or equal to 1.
Arctangent function transformation method
The data can be normalized by using the arctangent function , namely
$x^*= arctan(x)*(2/pi)$
When using this method, it should be noted that if the interval to be mapped is [0,1], Then the data should be greater than or equal to 0, Less than 0 The data to be mapped to [－1,0] On interval .
L2 Norm normalization method
L2 Norm normalization is that each element of the eigenvector is divided by the vector L2 norm ：
$x_i^*= \frac{x_i}{norm(x)}$
among , vector $x(x_1,x_2,...,x_n)$ Of L2 The norm is defined as ：
$norm(x)=\sqrt{x_1^2+x_2^2+...+x_1^n}$
characteristic ： Converted data $x^*$ The sum of squares is 1