当前位置:网站首页>100+ data science interview questions and answers Summary - machine learning and deep learning
100+ data science interview questions and answers Summary - machine learning and deep learning
2022-06-28 04:37:00 【deephub】
come from Amazon, Google ,Meta, Microsoft And so on , Following yesterday's article, this paper sorted out the problems of machine learning and deep learning
machine learning
54、 What is machine learning ?
Machine learning is an interdisciplinary specialty , Covering probability theory , Knowledge of Statistics , Approximate theory knowledge and complex algorithm knowledge , Using computers as tools and working to simulate human learning in real time , And divide the existing content into knowledge structure to effectively improve the learning efficiency .
Machine learning has the following definitions :
(1) Machine learning is an artificial intelligence science , The main research object in this field is artificial intelligence , In particular, how to improve the performance of specific algorithms in experiential learning .
(2) Machine learning is a study of computer algorithms that can be improved automatically by experience .
(3) Machine learning is using data or past experience , To optimize the performance of computer programs .
55、 What is unsupervised learning ?
Unsupervised learning is a machine learning algorithm , Used to infer from a data set consisting of input data , And there is no need to mark the data when learning .
It mainly includes : clustering , Dimension reduction , Anomaly detection, etc
56、 What are the different classification algorithms ?
The following figure lists the most important classification algorithms .

57、 In naive Bayes “ simple ” What is it? ?
Naive Bayesian algorithm is based on Bayesian theorem . Bayes theorem describes the probability of an event , A priori knowledge based on conditions that may be relevant to the event .
This algorithm is very “ naive ”, Because the assumptions it makes may be correct , Maybe not .
58、 How to build a random forest model ?
The stochastic forest model combines many decision tree models . The selected decision tree has high deviation and low variance . Each decision tree takes a subset of the sample , And make predictions . The results of each decision tree are recorded , And take the majority as the answer , In the classification problem, it is the mode , In the regression problem are the mean and median .

59、 Explain in detail SVM Algorithm
SVM Is the abbreviation of support vector machine , It is a supervised machine learning algorithm , Can be used for regression and classification . If your training data set has n Features ,SVM Try to n Draw it in dimensional space , The value of each feature is the value of a specific coordinate .SVM Based on the provided kernel function , Use hyperplane to separate different classes .

60、 What is the support vector in support vector machine ?

In the picture , We see a thin line marking the distance from the classifier to the nearest data point ( It's called support vector ) Distance of ( Black data points ). The distance between two thin lines is called margin .
61、 What are the kernel functions of support vector machines ?
Four kernel functions are generally used in support vector machines .
Linear kernel 、 The kernel of a polynomial 、 Radial basis core 、Sigmoid nucleus
62、 Explain the decision tree algorithm in detail
Decision tree is a supervised machine learning algorithm , It is mainly used for regression and classification . It decomposes data sets into smaller and smaller subsets , At the same time, relevant decision trees are gradually developed . The final result is a tree with decision nodes and leaf nodes . Decision tree can process both classified data and numerical data .

63、 What is the entropy and information gain of decision tree algorithm ?
The core algorithms for building decision trees are ·ID3、C45 etc. .ID3 Using entropy and information gain to construct decision tree .
entropy : The decision tree is built from the root node from top to bottom , It involves dividing data into homogeneous subsets .ID3 Use entropy to test the homogeneity of samples . If the sample is completely uniform , So entropy is 0 If the sample is equally divided , So entropy is 1.

The information gain is based on the reduction of entropy after the data set is divided into attributes . Building a decision tree is about finding attributes that return the highest return on information .


64、 What is pruning in a decision tree ?
Pruning is a technique in machine learning and search algorithms , It reduces the size of the decision tree by removing the part of the decision tree that has little effect on instance classification . When we delete a child node of a decision node , This process is called pruning or reverse splitting .
65、 What is logical regression ? Take an example of your recent use of logistic regression .
Logistic regression is often called logit Model , It is a technique to predict binary classification by linear combination of predictive variables .
Spam detection 、 Medical illness judgment 、 Financial loan evaluation is a binary classification .
66、 What is linear regression ?
Linear regression is to use regression analysis in mathematical statistics , A statistical analysis method to determine the quantitative relationship between two or more variables , It's widely used . It is expressed in the form of y = w’x+e,e The mean value of error is 0 Is a normal distribution .
x It is called an argument 、y It is called the dependent variable

67、 What are the disadvantages of linear models ?
- The assumption that the error is linear
- It cannot be used for counting results or binary results
- It cannot solve the problem of over fitting
68、 Regression and classification ML What's the difference between technology ?
Both regression and classification machine learning techniques belong to supervised machine learning algorithms . In supervised machine learning algorithms , We must use labeled datasets to train the model , When training, we must clearly provide the correct label , The algorithm tries to learn the pattern from input to output . If our label is a discrete value , Then it will be a classification problem , Such as a,B etc. , But if our tag is a continuous value , Then it will be a regression problem , Such as 1.23,1.333 etc. .
69、 What is a recommendation system ?
Recommendation system is a subclass of information filtering system , It aims to predict users' preferences or ratings for products . Recommendation systems are widely used in movies 、 Journalism 、 Research article 、 product 、 Social Tags 、 Music and other fields .
for example IMDB、Netflix and BookMyShow Movie recommendations , Amazon 、eBay and Flipkart And other e-commerce websites ,YouTube Video recommendation and Xbox Game recommendation .
70、 What is collaborative filtering ?
Most recommendation systems use collaborative filtering algorithms to make recommendations

An example of collaborative filtering is , The rating of a particular user can be predicted based on the rating of other movies by that user and the rating of all movies by others . This concept is widely used in IMDB、Netflix & BookMyShow Movie recommendations , Amazon 、eBay & Flipkart And other e-commerce websites ,Xbox Of YouTube Video recommendation and game recommendation .
71、 How to handle outliers ?
Outliers can be identified by using univariate or any other graphical analysis method . If the number of outliers is small , Then they can be evaluated separately , But if the number of outliers is large , You can use the 99 The first percentile or 1 Replace these values with percentiles .
But it should be noted that , Not all extreme values are outliers .
72、 What are the general steps of a machine learning project ?
- Understanding business issues
- Explore the data and become familiar with it .
- By detecting outliers 、 Handling missing values 、 Transform variables, etc. to prepare data for modeling .
- When the data is ready , Start running the model , Analyze the results and adjust the method . This is an iterative step , Until you get the best possible result .
- Validate the model with the new dataset .
- Start implementing the model and track the results , To analyze the performance of the model over a period of time .
73、 How to handle missing values ?
After identifying variables with missing values , Need to identify the range of missing values . If there is any pattern , This can lead to benefits and meaningful business insights .
If there is no definite pattern , The missing value can be the mean or median ( interpolation ) Instead of , Otherwise you can simply ignore them . If it is a categorical variable , Default values can be assigned . If there is a distribution of data , Then the normal distribution can be filled with the average value . If you lose a lot of missing values , For example, more than 80%, You can delete variables directly instead of dealing with missing values .
74、 How will you define the number of clusters in the clustering algorithm ?
Although not all clustering algorithms need to determine the number of clusters , But this question mainly refers to k Mean clustering . The purpose of clustering is to group by a similar attribute , That is, the members in the group are similar to each other , But the groups are different from each other .
for example , The following figure shows three different clusters .

If you plot for a series of clusters WSS, You will get the plot shown below .

This figure is often referred to as the elbow curve . Red circles a point in the figure above , namely Number of clusters = 6. This point is called the bending point , stay k- In its meaning, it is called k.
75、 What is integrated learning ?
Integrated learning is basically a group of different learners ( Individual model ) The combination can ensure the stability and prediction ability of the model .
76、 Briefly describe common integrated learning ?
There are many types of integrated learning , Here are two more popular integrated learning technologies .
Bagging Try to achieve similar learners in a small sample population , Then take the average of all the predictions . Different learning methods can be used in different subsets , This helps us to reduce the variance error .

Boosting It's an iterative technique , It adjusts the weight of an observation according to the final classification . If an observation is incorrectly classified , It will try to increase the weight of this observation , vice versa .Boosting The deviation is reduced , And a powerful prediction model is established . But they may over fit the training data .

77、 What is random forest ? How it works ?
Random forest is a kind of Bagging Integrated learning method , Able to perform regression and classification tasks . It is also used for dimensionality reduction , Handling missing values , Outliers, etc . It combines a group of weak models to form a powerful model .

In random forests , We will generate multiple trees , Not a tree . To classify new data according to attributes , Each tree gives a classification . The forest chooses the category with the most votes ( Trees in the overall forest ), In the case of return , It takes the average of different tree outputs .
78、 How to create random forests ?
A few weak learners can be combined to become a strong learner . The steps involved are as follows
- The bootstrap method is used to construct several decision trees on the training sample data
- On every tree , Every time you think about splitting , A random sample of one predictor will be selected from all predictors as a split candidate
- forecast : On the principle of majority decision
79、 What cross validation techniques do you use for time series data sets ?
Time series are not randomly distributed data — It is essentially chronological .
For time series data , Modeling should be based on past data , Then look at the forward data .
Fold 1: Training [1], test [2]
Fold 2 Training [1 2], test [3]
Fold 3: Training [1 2 3], test [4]
Fold 4: Training [1 2 3 4], test [5]
80、 What is? Box-Cox Transformation ?
Box-Cox The transformation is Box and Cox stay 1964 A generalized power transformation method proposed in , It is a common data transformation in statistical modeling , It is used when the continuous response variable does not satisfy the normal distribution .Box-Cox After the change , It can reduce the correlation between unobservable error and prediction variables to a certain extent .Box-Cox The main feature of the transformation is the introduction of a parameter , The parameter is estimated by the data itself, and then the data transformation form that should be adopted is determined ,Box-Cox Transformation can obviously improve the normality of data 、 Symmetry and equality of variance , It is effective for many actual data .

81、 If your machine has 4GB Memory , You want to 10GB Train your model on the dataset . How will you go about solving this problem ? up to now , In your machine learning / Data science experience , Have you ever had such a problem ?
First , You have to ask which one you want to train ML Model .
For neural networks : Batch size is adjustable , So adjust to fit 4GB The batch size of is OK .
about SVM You can use partial fitting , Divide a large data set into small data sets , Use SVM Partial fitting method of .
Deep learning
82、 What do you mean by deep learning ?
Deep learning is just an example of machine learning , It has shown incredible prospects in recent years . This is because deep learning is very similar to the function of the human brain .
83、 What is the difference between machine learning and deep learning ?
Machine learning is a field of computer science , It gives computers the ability to learn , Without explicit programming . Machine learning can be divided into three categories .
- Supervised machine learning ,
- Unsupervised machine learning ,
- Reinforcement learning

Deep learning is a sub field of machine learning , Its algorithm is inspired by the brain structure and function called artificial neural network .
84、 What are the reasons for the popularity of deep learning recently ?
Although deep learning has existed for many years , But major breakthroughs in these technologies have only emerged in recent years . There are two main reasons for this :
- An increase in the amount of data generated from various sources
- The growth of hardware resources required to run these models
gpu Is several times faster than before , Build larger in a relatively short time 、 More in-depth learning model .
85、 Explain the basics of neural networks
Neural networks in data science are designed to mimic human brain neurons , Different neurons combine to perform tasks . It learns generalizations or patterns from data , And use this knowledge to predict the output of new data , Without any human intervention .
The simplest neural network is a perceptron . It contains a neuron , Perform two operations , Linear calculation of all inputs and an activation function .

A more complex neural network consists of the following 3 layers -
Input layer —— It receives input
Hidden layer —— This is the layer between the input layer and the output layer . The initial hidden layer usually helps detect low-level patterns , The further layer combines the output of the previous layer to find more patterns .
Output layer —— The output layer is the last layer of output prediction .
The following figure shows a neural network -

86、 What is reinforcement learning ?
Reinforcement learning is a model for learning what to do and how to map situations to actions . The learner is not told what action to take , It's about finding out what kind of action will produce the greatest return . Reinforcement learning is inspired by human learning , Based on the reward and punishment mechanism .

87、 What is an artificial neural network ?
Artificial neural network is a set of specific algorithms , It revolutionized machine learning . They are inspired by biological neural networks . Neural networks can adapt to changes in input , So as to produce the best possible results without redesigning the output standard .
88、 Describe the structure of artificial neural network ?
The working principle of artificial neural network is the same as that of biological neural network . It consists of inputs , With the help of the activation function , Process with weights and deviations .

89、 How to initialize weights in a network ?
There are two ways : We can initialize the weight to 0, Or random assignment .
Initialize the ownership value to 0: This will make your model similar to a linear model . All neurons and each layer perform the same operation , Produce the same output , Make the deep net useless . except rnn/lstm The hidden state of should not be initialized to 0, however rnn/lstm In special cases, it may not be initialized to 0
Random initialization of ownership values : The weights are passed very close 0 Initialization of random assignment . Because each neuron has a different amount of computation , Therefore, the accuracy of the model is higher . This is the most common method .
90、 What is the cost function ?
Also known as “ Loss ” or “ error ”, The cost function is a measure of how well your model performs . Used to calculate the error of the output layer in the back propagation process . We propagate this error backward through the neural network and use it in different training functions .
91、 What is a super parameter ?
In the context of machine learning , A hyperparameter is a parameter that sets a value before the learning process begins . The values of other parameters are obtained through training . in other words , Hyperparameters will affect the training of our parameters , So it is called super parameter .
Hyperparameters :
- Define higher-level concepts about models , Such as complexity or learning ability .
- You can't learn directly from the data in the standard model training process , It needs to be defined up front .
- You can set different values , Train different models and choose better test values to decide
Some examples of hyperparameters :
- The number of trees or the depth of trees
- The number of potential factors in matrix factorization
- Learning rate ( Multiple modes )
- The number of hidden layers of deep neural network
- k The number of clusters in mean clustering
92、 The learning rate setting is not accurate ( Too high or too low ) What's going to happen ?
When the learning rate is too low , The training of the model will progress very slowly , Because we only update the weight to the minimum . Multiple updates are required before reaching the minimum point .
If the learning rate is set too high , Due to the drastic update of weights , Will cause the loss function to produce undesirable divergence behavior . It may not converge ( The model can give a good output ), Even divergence ( The data is too confusing , Network cannot train ).
92、 Deep learning Epoch、Batch and Iteration What's the difference ?
Epoch—— Represents an iteration of the entire dataset ( All the contents put into the training model ).
Batch - It means that we cannot transfer the whole data set to the neural network at one time , So we divide the data set into several Batch.
Iteration—— If we had 10,000 Images as data , The batch size is 200. Then one Epoch Should be running 50 Time Iteration(10,000 Divide 50).
93、CNN What are the common layers ?
- Convolution layer —— Layers performing convolution operations , Create several smaller picture windows to view the data .
- Activation layer - It brings nonlinearity to the network , In general use relu
- Pooling layer — Pooling is a downsampling operation , It can reduce the dimension of feature mapping .
- Fully connected layer - This layer identifies and classifies objects in the image .

94、 Where is the pooling layer CNN How Chinese works ?
Use pooling to reduce CNN The spatial dimension of . It performs down sampling to reduce the dimension , And by sliding a filter matrix on the input matrix to create a collection of feature maps .
95、 What is a cyclic neural network (RNNs)?
rnn It's an artificial neural network , Aimed at time series 、 Identify patterns in data series such as stock market and government agencies . Understand the loop , First of all, you must understand the basic knowledge of feedforward neural network .
Both networks are named after the way they transmit information through a series of mathematical operations performed on network nodes . A method of providing information directly ( Never touch the same node twice ), The other provides information through a loop , So it's called a loop .

The input of the cyclic network is not just the current input example they see , It also includes what they perceived before .
Cyclic neural network in t-1 The decision made at any moment will affect its decision in t Decisions made after the moment . So the network has two input sources , The present and the recent past , Models combine them to react to new data , Just like what we do in life .
96、LSTM How the Internet works ?
Long - Short term memory (long-term Memory, LSTM) It's a special kind of cyclic neural network , Have the ability to learn for a long time , The default behavior is to remember information for a long time . stay LSTM There are three steps in the network :
- The Internet decides what to forget , What to remember .
- It selectively updates the cell status values .
- The network determines which part of the current state is output .
97、 What is a multilayer perceptron (MLP)?
mlp There is an input layer 、 A hidden layer and an output layer . It has the same structure as a single-layer perceptron with one or more hidden layers . A single-layer perceptron can only have binary output (0,1) Linear separable classes of , Multilayer perceptron can classify nonlinear classes .

In addition to the input layer , Each node in the other layer uses a nonlinear activation function . This means that the input layer 、 The incoming data and activation function are added based on all nodes and weights , So it produces output .MLP Used a type called “ Back propagation ” Supervised learning methods . In back propagation , The neural network uses the cost function to calculate the error . It propagates the error backwards from its source ( Adjust the weights to train the model more accurately ).
98、 Simply explain the gradient descent
To understand gradient descent , Let's first understand what a gradient is .
The gradient measures the change in the output of a function if the input changes a little . It only measures the changes in all weights and errors . You can also think of the gradient as the slope of a function .
It can be said that the gradient descent is to climb to the bottom of the valley , Instead of climbing up the hill . This is because it is a function that minimizes a given function ( Activation function ) The minimization algorithm .

99、 What is gradient explosion ?
During the training , If you see exponential growth ( A very large ) Error gradient , It accumulates and leads to a very large update of the neural network model weight in the training process , They are called gradient explosions . In extreme cases , Weights can become very large , So as to overflow and cause NaN value .
This leads to model instability , Unable to learn from training data .
100、 What is gradient disappearance ?
When training , Your gradient can become too small ; This makes training difficult . When the gradient is too small , This problem is called vanishing gradient . This can lead to long training , Poor performance and low accuracy .
101、 What is back propagation and explain how it works .
Back propagation algorithm is a training algorithm for multilayer neural networks . In this way , Move the error from one end of the network to the ownership value within the network , Thus, the gradient can be calculated efficiently .
It has the following steps :
- Forward propagation of training data
- Calculate the derivative using the output and the target
- Back propagation is used to calculate wrt Output the active error derivative
- Use the previously calculated derivative to calculate the output
- Update weights
102、 What are the variants of back propagation ?
Stochastic gradient descent : We only use a single training example to calculate the gradient and update the parameters .
Batch gradient descent : We calculate the gradient of the entire data set , And perform updates at each iteration .
Small batch gradient descent : This is one of the most popular optimization algorithms . It is a variant of the random gradient descent method , Not a single training example is used here , It's a small batch of samples .
103、 What are the different in-depth learning frameworks ?
- Pytorch
- TensorFlow
- Keras
- Caffe
104、 What is the function of the activation function ?
Using the activation function, the nonlinearity is introduced into the neural network , Help them learn more complex functions . Without it, neural networks will only be able to learn linear relations , That is, the linear combination of input data .
105、 What is an automatic encoder ?
The automatic coder is a simple learning network , It aims to convert input into output with minimum error . This means that we want the output to be as close to the input as possible . We have added several layers between input and output , These layers are smaller than the input layer . Receive unmarked input from encoder , It is then encoded to reconstruct the input .

106、 What is a Boltzmann machine ?
Boltzmann machine is a simple learning algorithm , They can find interesting features that represent complex laws in the training data . Boltzmann machine is mainly used to optimize the weight and quantity of a given problem . In a network with multilayer feature detectors , Learning algorithms is slow .“ The restricted Boltzmann machine ” The algorithm has only one feature detector , This makes it faster than other algorithms .

107、 What is? Dropout and BN?
Dropout It is a technology to randomly delete hidden and visible units in the network , To prevent data from over fitting ( Usually delete 20% The node of ). It doubles the number of iterations required for network convergence .

BN By normalizing the input of each layer , The average output is 0, The standard deviation is 1, So as to improve the performance and stability of neural networks .
108、 What is a calculation chart ?
Everything in tensor flow is based on creating a computational graph . It has a network of nodes , Each node works in it , Nodes represent mathematical operations , Edges represent tensors . In the calculation chart , A node is an input value or a function used to combine values . As data flows through the graph , Edges will receive their weights . The outbound side of the input node is weighted with the input value ; The outbound node from the function node is weighted by combining the weights of the inbound edges with the specified function .

All deep learning frameworks rely on creating a computational graph to calculate the gradient values required for gradient descent optimization . Usually , You have to build a forward propagation graph , And the framework will handle reverse differentiation for you .
One of the advantages of static graphs is that it allows powerful offline optimization of graphs / Dispatch . This means that these are usually faster than dynamic graphs ( The difference may not be significant in each use case , It depends on our diagram ). The disadvantage is that it is complex to deal with structured or variable size data .
Dynamic graphs are debug friendly . It's much easier to find problems in your code , Because it allows you to execute code line by line , And you can access all variables . If you want to apply deep learning to any practical purpose in the industry , This is definitely a very important feature .
https://avoid.overfit.cn/post/8a0516dcc791436f893239402bc935ab
边栏推荐
- [small program practice series] e-commerce platform source code and function implementation
- How to clean the nozzle of Epson l3153 printer
- 在线直播源码,JS动态效果之,侧边栏滚动固定效果
- GO语言-select语句
- Une seule pile dans l'ordre inverse avec des fonctions récursives et des opérations de pile
- 僅用遞歸函數和棧操作逆序一個棧
- 云厂商为什么都在冲这个KPI?
- How can we speed up the chunksplitter when CDC extracts MySQL data in full?
- Reverse a stack with recursive functions and stack operations only
- OracleData安装问题
猜你喜欢

From zero to one, I will teach you to build a "search by text and map" search service (I)

Excel knowledge and skills summary

How to traverse collections Ordereddict, taking it and forgetting items
![[Matlab bp regression prediction] GA Optimized BP regression prediction (including comparison before optimization) [including source code 1901]](/img/73/1e4c605991189acc674d85618cf0ef.png)
[Matlab bp regression prediction] GA Optimized BP regression prediction (including comparison before optimization) [including source code 1901]

With the transformation of automatic empowerment, Feihe dairy accelerates its move towards digitalization!

【Matlab红绿灯识别】红绿灯识别【含GUI源码 1908期】

测试开发必备技能:安全测试漏洞靶场实战

Games104 operation 2-colorgrading

成长一夏 挑战赛来袭 | 学习、创作两大赛道,开启导师报名啦!

Matlab exercises -- exercises related to symbolic operation
随机推荐
2022-06-27:给出一个长度为n的01串,现在请你找到两个区间, 使得这两个区间中,1的个数相等,0的个数也相等, 这两个区间可以相交,但是不可以完全重叠,即两个区间的左右端点不可以完全一样。
From meeting a big guy to becoming a big guy, shengteng AI developer creation day brings infinite possibilities to developers
单一职责原则
flinkcdc采集oracle,oracle数据库是CDB的
云厂商为什么都在冲这个KPI?
Reading notes of top performance version 2 (II) -- CPU monitoring
Annual comprehensive analysis of China's audio market in 2022
[proteus simulation] timer 1 external counting interrupt
Visualization of loss using tensorboard
How do I get the STW (pause) time of a GC (garbage collector)?
10:00面试,10:02就出来了 ,问的实在是太...
Secouer le son et se battre ~ prêter attention au blogueur
如何遍历collections.OrderedDict,服了又忘记items
If mysqlcdc sets multiple parallelism, will the incremental data repeat?
公司领导说,个人代码超10个Bug就开除,是什么体验?
僅用遞歸函數和棧操作逆序一個棧
Little knowledge about function templates --
Tiktok actual battle ~ take off the blogger
27 years, Microsoft IE is over!
Reverse a stack with recursive functions and stack operations only