当前位置:网站首页>What is machine learning? (Fundamentals)
What is machine learning? (Fundamentals)
2022-06-25 20:44:00 【Chengshaoting】
Fundamentals of machine learning
The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples
[0,1,2,3] vector
Feature Engineering : Determine the effect of model prediction , The process of processing data
- feature extraction
- Feature conversion
- Dimension reduction
The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)
- Training set ( Training to get the model )
- Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above
Machine learning classification
Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
- Forecast house prices
- Forecast stock trends
- Forecast the company's sales
- Classification problem ( The target y Is the category value )
- Dichotomy and multiclassification
- Whether it's spam
- Whether the users are lost
- Dichotomy and multiclassification
- The return question ( The target y It's a continuous value )
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
- User segmentation ( User portrait )
- Dimension reduction
- clustering ( Birds of a feather flock together )
Machine learning workflow 【 a key 】
- get data
- Basic data processing ( Time consuming )
- Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
- Using machine learning algorithms ( Training models )
- Model to evaluate
Over fitting and under fitting
- Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
- Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters
KNN Algorithm
knn The algorithm is suitable for classification problems , Two classification
It can also be used to do regression problems
thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7
- KNN api usage
from sklearn.neighbors import KNeighborsClassifier
# Create classifier
knn_clf = KNeighborsClassifier(n_neighbors=6)
# model training fit()
knn_clf.fit(x,y)
# Model to predict predict()
knn_clf.predict(x1)
- Divide the data set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)
- Evaluation of classification models
# Computational accuracy
from sklearn.metrics import accuracy_score
# The way 1:
accuracy_score(y_test,y_predict)
- Normalization and standardization
effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .
It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】
normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]
Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1
from sklearn.preprocessing import StandardScaler,MinMaxScaler
# Create examples
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)
(8-1)/(11-1)=0.7
(61-1)/(101-1) = 0.6
- Grid search and cross validation
Purpose : Make the model more accurate and reliable ( Model tuning )
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- explain : Detailed search for the specified parameter value of the estimator
- Parameters :
- estimator: Estimator objects
- param_grid: Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv: Specify a few fold cross validation
- Method :
- fit: Input training data
- score: Accuracy rate
- Result analysis :
- best_score_: The best results in cross validation
- best_estimator_: The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation
from sklearn.model_selection import GridSearchCV
# Instantiate the predictor class
estimator = KNeighborsClassifier()
# Model selection and tuning —— Grid search and cross validation
# Prepare the hyper parameters to be adjusted
param_dict = {
"n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training
estimator.fit(x_train, y_train)
边栏推荐
- The last core step of configuring theano GPU
- ZK implementation of distributed global counter for cursor application scenario analysis
- Intra domain information collection for intranet penetration
- Introduction to event flow, event capture, and event bubbling
- A simple file searcher
- Those high-frequency and real software test interview questions sorted out by the test director in 7 days, come to get
- Nine built-in objects of JSP and four scopes of Servlet
- Yolov4 improved version comes out! Yolov4 extended edition! Yolov4 launched scaled-yolov4: scaling cross stage partial network
- Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)
- How to close gracefully after using jedis
猜你喜欢

Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
Tencent music knowledge map search practice

Sonar series: continuous scanning through Jenkins integrated sonarqube (IV)
MySQL installation tutorial

TypeError: __ init__ () takes 1 positional argument but 5 were given

Cvpr2020 | the latest cvpr2020 papers are the first to see, with all download links attached!
Literals and type conversions of basic data types
MySQL lock
Live broadcast preview | front line experts invite you to talk: the value of data science enabling multiple application scenarios

Must see the summary! In depth learning era, you should read 10 articles to understand image classification!
随机推荐
8. iterators and generators
Illustrated with pictures and texts, 700 pages of machine learning notes are popular! Worth learning
Web components series (11) -- realizing the reusability of mycard
K-fold cross validation
"Space guard soldier" based on propeller -- geosynchronous geostationary orbit space target detection system
Solution to big noise of OBS screen recording software
Dice、Sensitivity、ppv、miou
The secret of metaktv technology of sound network: 3D space sound effect + air attenuation + vocal blur
MySQL lock
Install and initialize MySQL (under Windows)
Talking about the foundation of function test today
Share several Threat Intelligence platforms
Node connection MySQL
How to buy the millions of medical insurance for children? How much is it a year? Which product is the best?
Interface automation -md5 password encryption
CiteSpace download installation tutorial
Literals and type conversions of basic data types
Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
Bank digital transformation layout in the beginning of the year, 6 challenges faced by financial level structure and Countermeasures
Leetcode daily question - 27 Remove element (simple)