当前位置:网站首页>What is machine learning? (Fundamentals)
What is machine learning? (Fundamentals)
2022-06-25 20:44:00 【Chengshaoting】
Fundamentals of machine learning
The eigenvalue : A column in a dataset (x)
The target : The column to be predicted (y)( Continuous value (0,1,2,3,4,5…) And discrete values ( Category type ))
sample : A line of data , The number of rows of data in the data set is the number of samples
[0,1,2,3] vector
Feature Engineering : Determine the effect of model prediction , The process of processing data
- feature extraction
- Feature conversion
- Dimension reduction
The partition of data sets ( The historical data =>y)(7:3,8:2,9:1)
- Training set ( Training to get the model )
- Test set ( Test the model effect of training )
- Actual y value y_true
- The model can get a predicted value y_pred
- y_true and y_pred By comparison, you can check the effect of the face model
- Accuracy rate :70% above
Machine learning classification
Supervised learning ( There is a target value y)
- The return question ( The target y It's a continuous value )
- Forecast house prices
- Forecast stock trends
- Forecast the company's sales
- Classification problem ( The target y Is the category value )
- Dichotomy and multiclassification
- Whether it's spam
- Whether the users are lost
- Dichotomy and multiclassification
- The return question ( The target y It's a continuous value )
Unsupervised learning ( There is no target value y)
- clustering ( Birds of a feather flock together )
- User segmentation ( User portrait )
- Dimension reduction
- clustering ( Birds of a feather flock together )
Machine learning workflow 【 a key 】
- get data
- Basic data processing ( Time consuming )
- Feature Engineering ( Time consuming )
- Normalization and standardization
- Dimension reduction
- Data set partitioning
- Characteristic derivation
- Feature crossover
- Using machine learning algorithms ( Training models )
- Model to evaluate
Over fitting and under fitting
- Over fitting : The effect of the model in the training set is good , In the test set or unknown data, the effect is not good
- Re cleaning the data
- Increase the amount of training data
- The regularization method is used to impose penalties on the parameters
- Under fitting : The model does not work well in training set, test set or unknown data
- Add additional feature items
- Add polynomial feature
- Reduce regularization parameters
KNN Algorithm
knn The algorithm is suitable for classification problems , Two classification
It can also be used to do regression problems
thought : take k The nearest point , These points are used to predict unknown data ( classification ) k Value default =5, commonly 1,3,5,7
- KNN api usage
from sklearn.neighbors import KNeighborsClassifier
# Create classifier
knn_clf = KNeighborsClassifier(n_neighbors=6)
# model training fit()
knn_clf.fit(x,y)
# Model to predict predict()
knn_clf.predict(x1)
- Divide the data set
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=,random_state)
- Evaluation of classification models
# Computational accuracy
from sklearn.metrics import accuracy_score
# The way 1:
accuracy_score(y_test,y_predict)
- Normalization and standardization
effect : Map all data to the same scale , Enable features of different units or orders of magnitude to be compared and weighted .
It involves the algorithm of calculating distance , Be sure to normalize or standardize 【 a key 】
normalization :(X-Xmin)/(Xmax-Xmin) Map data to [0,1]
Standardization :(X-Xmean)/Xstd The mean value of the data is 0, The standard deviation is 1
from sklearn.preprocessing import StandardScaler,MinMaxScaler
# Create examples
minmax = MinMaxScaler()
minmax.fit_transform(X_train)
minmax.transform(X_test)
(8-1)/(11-1)=0.7
(61-1)/(101-1) = 0.6
- Grid search and cross validation
Purpose : Make the model more accurate and reliable ( Model tuning )
sklearn.model_selection.GridSearchCV(estimator, param_grid=None,cv=None)
- explain : Detailed search for the specified parameter value of the estimator
- Parameters :
- estimator: Estimator objects
- param_grid: Estimator parameters (dict){“n_neighbors”:[1,3,5]}
- cv: Specify a few fold cross validation
- Method :
- fit: Input training data
- score: Accuracy rate
- Result analysis :
- best_score_: The best results in cross validation
- best_estimator_: The best parametric model
- cv_results_: The accuracy results of verification set and training set after each cross validation
from sklearn.model_selection import GridSearchCV
# Instantiate the predictor class
estimator = KNeighborsClassifier()
# Model selection and tuning —— Grid search and cross validation
# Prepare the hyper parameters to be adjusted
param_dict = {
"n_neighbors": [1, 3, 5],'weights':["uniform","distance"]}
estimator = GridSearchCV(estimator, param_grid=param_dict, cv=3)
# fit Data for training
estimator.fit(x_train, y_train)
边栏推荐
- SaaS privatization deployment scheme
- JS canvas drawing an arrow with two hearts
- 6. exception handling
- Share a billing system (website) I have developed
- Bank digital transformation layout in the beginning of the year, 6 challenges faced by financial level structure and Countermeasures
- TypeError: __ init__ () takes 1 positional argument but 5 were given
- Baidu AI Financing Innovation workshop enrollment!
- The last core step of configuring theano GPU
- Leetcode daily question - 27 Remove element (simple)
- Day 28/100 CI CD basic introductory concepts
猜你喜欢

Besides using hackbar, how can I make post requests

E-commerce project environment construction
Uncover n core 'black magic' of Presto + alluxio

App battery historian master

Heavy update! Yolov4 latest paper! Interpreting yolov4 framework
[data recovery in North Asia] a data recovery case in which the upper virtual machine data is lost due to the hard disk failure and disconnection of raid6 disk array
The secret of metaktv technology of sound network: 3D space sound effect + air attenuation + vocal blur

Remember to deploy selenium crawler on the server

After 20 days' interview, I finally joined Ali (share the interview process)
Why doesn't anyone read your hard-working blog? Do you really understand the skills of framing, typesetting and drawing?
随机推荐
CSDN sign in cash reward
4.ypthon function foundation
Introduction to the basics of kotlin language: lambda expression
Dice、Sensitivity、ppv、miou
PIP command -fatal error in launcher: unable to create process using How to resolve the error after migrating the virtual environment?
Jingxi Pinpin wechat applet -signstr parameter encryption
Flexible scale out: from file system to distributed file system
MySQL lock
How to view and explain robots protocol
JS canvas drawing an arrow with two hearts
Boomfilter learning
5 minutes to learn how to install MySQL
An unusual interview question: why doesn't the database connection pool adopt IO multiplexing?
Corporate finance formula_ P1_ Accounting statement and cash flow
[golang] leetcode intermediate - the kth largest element in the array &
[deep learning series] - visual interpretation of neural network
Intra domain information collection for intranet penetration
Yunzhisheng atlas supercomputing platform: computing acceleration practice based on fluid + alluxio (Part I)
手机开户股票安全吗,买股票在哪开户?
Expand and check the specified node when loading ztree