当前位置:网站首页>Self taught machine learning series - 1 basic framework of machine learning

Self taught machine learning series - 1 basic framework of machine learning

2022-06-26 09:09:00 ML_ python_ get√

1 Basic ideas of machine learning

  • Introduce the basic framework of machine learning : Data acquisition 、 feature extraction 、 Data conversion 、 model training 、 Model selection 、 Model evaluation
  • Supervised learning : Give original features and problem labels , Mining rules , Learn a pattern , Answer new questions
  • Unsupervised learning : Just look for patterns based on original features
  • Reinforcement learning : Maximize returns , There is no absolutely correct label , Nor is it looking for data structure features

1.1 Model selection

  • How to select model parameters ? Cross validation
  • For the general regression problem : Minimize the mean square error
  • Over fitting : The variance increases as the model complexity increases , The bias decreases as the complexity of the model increases , Mean square error U type -
  • Cross validation :
    • The easiest way : Select a certain proportion of the training set as the verification set , But will not participate in model training , Reduce accuracy
    • Usually used K The method of cross validation , All samples are then divided into K part (3-20). Use one part of it as a validation set each time , repeat K Time , Until all parts have been verified .

1.2 Model evaluation

  • The return question : Mean square error
  • Classification problem : Accuracy rate - hit (1,1) A false report (1,0) Omission of (0,1) Right to refuse (0,0) 1 I'm sick ,0 No disease
    • Accuracy rate :( hit + Right to refuse )/ total When the incidence rate is particularly low
    • Accuracy : hit /( hit + A false report )
    • shooting : hit /( hit + Omission of )
    • False report rate : A false report /( A false report + Right to refuse )
    • ROC
    • AUC

2 Common machine learning methods

  • Supervised learning : Generalized linear model 、 Linear discriminant analysis 、 Support vector machine 、 Decision tree 、 Random forests 、 neural network 、K a near neighbor
  • Unsupervised learning : clustering 、 Dimension reduction PCA

2.1 Generalized linear model

  • Simple regression : Single factor
  • Multiple regression : Multi factor
  • Ridge return :L2 Regularization
  • Lasso : L1 Regularization
  • Logical regression : Dichotomous problem , Improvement of linear probability model
  • Ordered multi classification : Multiple classification problem , The order in which dependent variables exist , fitting N-1 A logistic regression
  • OvR:one vs rest Divide the samples into two categories , Conduct N Secondary logistic regression , The probability of each single class is obtained

2.2 Linear discriminant analysis and quadratic discriminant analysis

  • Logistic regression is not suitable for cases where two categories are far apart
  • LDA: Linear discriminant analysis , Expansion of logistic regression , It is considered that the sample satisfies the normal distribution , Use sample moments to estimate coefficients
  • QDA: Second discriminant analysis , The discriminant equation is a quadratic function , The boundary is a curve

2.3 Support vector machine

  • Dividing sample space by a hyperplane : Use a super large piece of paper to divide the space into two parts
  • And this hyperplane is only determined by a finite number of points , It's called support vector
    • XOR gate problem : Only the input (1,0)(0,1) Just output 1, Lines cannot be classified
    • L d , Such as regression introduction x1*x2, Taylor expansion, etc
    • SVM Introducing kernel function to compute hyperplane , Linear kernel 、 Polynomial kernel 、 Gaussian kernel

2.4 Decision trees and random forests

  • Decision tree : Each layer node is divided into multiple nodes by some rule , The leaf node of the terminal is the classification result
    • Selection of classification features : It maximizes the information gain after splitting sum(-plogp)
    • Avoid overfitting : prune 、 Branch stop method
    • C4.5 Algorithm : It can only be used for classification , Feature combinations are not possible
    • CART Algorithm : Each node can only be classified into two child nodes , Support feature combination , It can be used for classification and regression
    • advantage : Fast training , Solve non numerical characteristics , Nonlinear classification
    • shortcoming : unstable , Sensitive to training samples , Easy to overfit
  • Integration method
    • Bootstrap : Some samples with the same length can be obtained by putting them back for sampling Bootstrap Data sets , Conduct N Time , Train weak classifiers for each data set .
    • Bagging : be based on Bootstrap Method , Vote on multiple weak classifiers 、 The mean gets the final classification
    • Parallel methods :Bagging—— Random forests , Row sampling results in Bootstrap Data sets , Column sampling random selection m Features , Final N Four decision trees vote to get the classification
    • Serial method :AdaBoost—— Gradient lift decision tree :GBDT, The original data is trained to get a weak classifier , The samples with wrong classification increase the weight , Keep training

2.5 Neural networks and deep learning

  • The basic idea : Neurons have two states of excitation and inhibition , The dendrites on it will receive the stimulation from the last neuron , Only the potential reaches a certain threshold , Neurons will be activated to the excited state , The electrical signal then travels along the axon and synapse to the dendrite of the next neuron , This forms a huge network
  • Input layer : Linear weighting
  • Hidden layer : Activation function ,ReLu,sigmoid,tanh
  • Output layer : classification softmax、sigmoid, Regression equality
  • Too many layers : The parameters are difficult to estimate , The gradient disappears , Convolutional neural networks CNN( Local connection )
  • Image recognition :CNN
  • Time series problem : Recursive neural network RNN And the long and short memory network LSTM
  • Unsupervised learning : Generative antagonistic network GAN

2.6 KNN

  • Supervised learning
  • The above categories are based on assumptions : If two samples have similar characteristics , Belong to the same category
  • Based on this idea , Make a new classification rule : The category corresponding to each point shall be determined by the nearest K Neighbor categories determine
  • K Determination of value : Too small to fit , Too big to fit , Cross validation

2.7 clustering

  • Unsupervised learning : Divide the sample into K A cluster of , Group similar objects into a cluster .
  • K-means
    • Randomly determine K A point as the center of mass , Find the nearest centroid for each sample point , Assigned to the corresponding cluster
    • Select the average value of each cluster as the new particle , Update the sample points in the cluster , Get the first iteration result
    • Keep repeating the process , Until the cluster no longer changes
    • shortcoming : suffer K Value has a great influence , Affected by outliers , Slow convergence
  • Hierarchical clustering : Decompose the sample hierarchy , Then decompose from top to bottom or merge from bottom to top
  • Spectral clustering : Each object looks at the vertex of the graph V, The similarity between vertices is equal to the connecting edge E A weight , An undirected weighted graph based on similarity is obtained G(V,E)

2.8 Dimension reduction

  • PAC:
    • It is assumed that the larger part of the independent variable can obtain a larger response at the dependent variable
    • Look for the most changeable direction in space , As a new feature , Data needs to be standardized
    • How to determine the characteristic number , Cross validation
  • Partial least squares :
    • When PAC Suppose not , Look for the close relationship between features and dependent variables to determine new features
    • X Yes Y Do the regression coefficient of linear regression , Use significant coefficients as much as possible to get Z1
    • X Yes Z1 Do linear regression , The residuals , Not by Z1 Explain the part as a new feature
    • New features on Y Do linear regression to get significant regression coefficient , Linear combination gives Z2
    • Repeat the process M Time
  • Fisher Linear discriminant analysis :
    • The core idea : The spacing within the sample class is the smallest , The space between classes is the largest
    • Calculate the center of two types of data
    • Calculate the dispersion matrix of two types of data ( Within class ) Add to get the total dispersion matrix S
    • Calculate the dispersion matrix between classes Sb
    • Want to project to wx Distance between classes on w’Sbw As big as possible , In class distance w’Sw As small as possible , So optimize w’Sbw/w’Sw
    • Get the projected features
  • Nonlinear dimensionality reduction
    • Local linear embedding LLE
    • Geodesic distance isomap
    • Laplacian characteristic mapping
原网站

版权声明
本文为[ML_ python_ get√]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202170553131507.html