当前位置：网站首页>100+ data science interview questions and answers Summary - machine learning and deep learning

100+ data science interview questions and answers Summary - machine learning and deep learning

2022-06-28 04:37:00 【deephub】

come from Amazon, Google ,Meta, Microsoft And so on , Following yesterday's article, this paper sorted out the problems of machine learning and deep learning

machine learning

54、 What is machine learning ?

Machine learning is an interdisciplinary specialty , Covering probability theory , Knowledge of Statistics , Approximate theory knowledge and complex algorithm knowledge , Using computers as tools and working to simulate human learning in real time , And divide the existing content into knowledge structure to effectively improve the learning efficiency .

Machine learning has the following definitions ：

（1） Machine learning is an artificial intelligence science , The main research object in this field is artificial intelligence , In particular, how to improve the performance of specific algorithms in experiential learning .

（2） Machine learning is a study of computer algorithms that can be improved automatically by experience .

（3） Machine learning is using data or past experience , To optimize the performance of computer programs .

55、 What is unsupervised learning ?

Unsupervised learning is a machine learning algorithm , Used to infer from a data set consisting of input data , And there is no need to mark the data when learning .

It mainly includes ： clustering , Dimension reduction , Anomaly detection, etc

56、 What are the different classification algorithms ?

The following figure lists the most important classification algorithms .

57、 In naive Bayes “ simple ” What is it? ?

Naive Bayesian algorithm is based on Bayesian theorem . Bayes theorem describes the probability of an event , A priori knowledge based on conditions that may be relevant to the event .

This algorithm is very “ naive ”, Because the assumptions it makes may be correct , Maybe not .

58、 How to build a random forest model ?

The stochastic forest model combines many decision tree models . The selected decision tree has high deviation and low variance . Each decision tree takes a subset of the sample , And make predictions . The results of each decision tree are recorded , And take the majority as the answer , In the classification problem, it is the mode , In the regression problem are the mean and median .

59、 Explain in detail SVM Algorithm

SVM Is the abbreviation of support vector machine , It is a supervised machine learning algorithm , Can be used for regression and classification . If your training data set has n Features ,SVM Try to n Draw it in dimensional space , The value of each feature is the value of a specific coordinate .SVM Based on the provided kernel function , Use hyperplane to separate different classes .

60、 What is the support vector in support vector machine ?

In the picture , We see a thin line marking the distance from the classifier to the nearest data point ( It's called support vector ) Distance of ( Black data points ). The distance between two thin lines is called margin .

61、 What are the kernel functions of support vector machines ?

Four kernel functions are generally used in support vector machines .

Linear kernel 、 The kernel of a polynomial 、 Radial basis core 、Sigmoid nucleus

62、 Explain the decision tree algorithm in detail

Decision tree is a supervised machine learning algorithm , It is mainly used for regression and classification . It decomposes data sets into smaller and smaller subsets , At the same time, relevant decision trees are gradually developed . The final result is a tree with decision nodes and leaf nodes . Decision tree can process both classified data and numerical data .

63、 What is the entropy and information gain of decision tree algorithm ?

The core algorithms for building decision trees are ·ID3、C45 etc. .ID3 Using entropy and information gain to construct decision tree .

entropy ： The decision tree is built from the root node from top to bottom , It involves dividing data into homogeneous subsets .ID3 Use entropy to test the homogeneity of samples . If the sample is completely uniform , So entropy is 0 If the sample is equally divided , So entropy is 1.

The information gain is based on the reduction of entropy after the data set is divided into attributes . Building a decision tree is about finding attributes that return the highest return on information .

Insert picture description here

64、 What is pruning in a decision tree ?

Pruning is a technique in machine learning and search algorithms , It reduces the size of the decision tree by removing the part of the decision tree that has little effect on instance classification . When we delete a child node of a decision node , This process is called pruning or reverse splitting .

65、 What is logical regression ? Take an example of your recent use of logistic regression .

Logistic regression is often called logit Model , It is a technique to predict binary classification by linear combination of predictive variables .

Spam detection 、 Medical illness judgment 、 Financial loan evaluation is a binary classification .

66、 What is linear regression ?

Linear regression is to use regression analysis in mathematical statistics , A statistical analysis method to determine the quantitative relationship between two or more variables , It's widely used . It is expressed in the form of y = w’x+e,e The mean value of error is 0 Is a normal distribution .

x It is called an argument 、y It is called the dependent variable

Insert picture description here

67、 What are the disadvantages of linear models ?

The assumption that the error is linear
It cannot be used for counting results or binary results
It cannot solve the problem of over fitting

68、 Regression and classification ML What's the difference between technology ?

Both regression and classification machine learning techniques belong to supervised machine learning algorithms . In supervised machine learning algorithms , We must use labeled datasets to train the model , When training, we must clearly provide the correct label , The algorithm tries to learn the pattern from input to output . If our label is a discrete value , Then it will be a classification problem , Such as a,B etc. , But if our tag is a continuous value , Then it will be a regression problem , Such as 1.23,1.333 etc. .

69、 What is a recommendation system ?

Recommendation system is a subclass of information filtering system , It aims to predict users' preferences or ratings for products . Recommendation systems are widely used in movies 、 Journalism 、 Research article 、 product 、 Social Tags 、 Music and other fields .

for example IMDB、Netflix and BookMyShow Movie recommendations , Amazon 、eBay and Flipkart And other e-commerce websites ,YouTube Video recommendation and Xbox Game recommendation .

70、 What is collaborative filtering ?

Most recommendation systems use collaborative filtering algorithms to make recommendations

An example of collaborative filtering is , The rating of a particular user can be predicted based on the rating of other movies by that user and the rating of all movies by others . This concept is widely used in IMDB、Netflix & BookMyShow Movie recommendations , Amazon 、eBay & Flipkart And other e-commerce websites ,Xbox Of YouTube Video recommendation and game recommendation .

71、 How to handle outliers ?

Outliers can be identified by using univariate or any other graphical analysis method . If the number of outliers is small , Then they can be evaluated separately , But if the number of outliers is large , You can use the 99 The first percentile or 1 Replace these values with percentiles .

But it should be noted that , Not all extreme values are outliers .

72、 What are the general steps of a machine learning project ?

Understanding business issues
Explore the data and become familiar with it .
By detecting outliers 、 Handling missing values 、 Transform variables, etc. to prepare data for modeling .
When the data is ready , Start running the model , Analyze the results and adjust the method . This is an iterative step , Until you get the best possible result .
Validate the model with the new dataset .
Start implementing the model and track the results , To analyze the performance of the model over a period of time .

73、 How to handle missing values ?

After identifying variables with missing values , Need to identify the range of missing values . If there is any pattern , This can lead to benefits and meaningful business insights .

If there is no definite pattern , The missing value can be the mean or median （ interpolation ） Instead of , Otherwise you can simply ignore them . If it is a categorical variable , Default values can be assigned . If there is a distribution of data , Then the normal distribution can be filled with the average value . If you lose a lot of missing values , For example, more than 80％, You can delete variables directly instead of dealing with missing values .

74、 How will you define the number of clusters in the clustering algorithm ？

Although not all clustering algorithms need to determine the number of clusters , But this question mainly refers to k Mean clustering . The purpose of clustering is to group by a similar attribute , That is, the members in the group are similar to each other , But the groups are different from each other .

for example , The following figure shows three different clusters .

If you plot for a series of clusters WSS, You will get the plot shown below .

Insert picture description here

This figure is often referred to as the elbow curve . Red circles a point in the figure above , namely Number of clusters = 6. This point is called the bending point , stay k- In its meaning, it is called k.

75、 What is integrated learning ?

Integrated learning is basically a group of different learners ( Individual model ) The combination can ensure the stability and prediction ability of the model .

76、 Briefly describe common integrated learning ?

There are many types of integrated learning , Here are two more popular integrated learning technologies .

Bagging Try to achieve similar learners in a small sample population , Then take the average of all the predictions . Different learning methods can be used in different subsets , This helps us to reduce the variance error .

Insert picture description here

Boosting It's an iterative technique , It adjusts the weight of an observation according to the final classification . If an observation is incorrectly classified , It will try to increase the weight of this observation , vice versa .Boosting The deviation is reduced , And a powerful prediction model is established . But they may over fit the training data .

77、 What is random forest ? How it works ?

Random forest is a kind of Bagging Integrated learning method , Able to perform regression and classification tasks . It is also used for dimensionality reduction , Handling missing values , Outliers, etc . It combines a group of weak models to form a powerful model .

Insert picture description here

In random forests , We will generate multiple trees , Not a tree . To classify new data according to attributes , Each tree gives a classification . The forest chooses the category with the most votes ( Trees in the overall forest ), In the case of return , It takes the average of different tree outputs .

78、 How to create random forests ?

A few weak learners can be combined to become a strong learner . The steps involved are as follows

The bootstrap method is used to construct several decision trees on the training sample data
On every tree , Every time you think about splitting , A random sample of one predictor will be selected from all predictors as a split candidate
forecast : On the principle of majority decision

79、 What cross validation techniques do you use for time series data sets ?

Time series are not randomly distributed data — It is essentially chronological .

For time series data , Modeling should be based on past data , Then look at the forward data .

Fold 1: Training [1], test [2]

Fold 2 Training [1 2], test [3]

Fold 3: Training [1 2 3], test [4]

Fold 4: Training [1 2 3 4], test [5]

80、 What is? Box-Cox Transformation ?

Box-Cox The transformation is Box and Cox stay 1964 A generalized power transformation method proposed in , It is a common data transformation in statistical modeling , It is used when the continuous response variable does not satisfy the normal distribution .Box-Cox After the change , It can reduce the correlation between unobservable error and prediction variables to a certain extent .Box-Cox The main feature of the transformation is the introduction of a parameter , The parameter is estimated by the data itself, and then the data transformation form that should be adopted is determined ,Box-Cox Transformation can obviously improve the normality of data 、 Symmetry and equality of variance , It is effective for many actual data .

81、 If your machine has 4GB Memory , You want to 10GB Train your model on the dataset . How will you go about solving this problem ? up to now , In your machine learning / Data science experience , Have you ever had such a problem ?

First , You have to ask which one you want to train ML Model .

For neural networks : Batch size is adjustable , So adjust to fit 4GB The batch size of is OK .

about SVM You can use partial fitting , Divide a large data set into small data sets , Use SVM Partial fitting method of .

Deep learning

82、 What do you mean by deep learning ?

Deep learning is just an example of machine learning , It has shown incredible prospects in recent years . This is because deep learning is very similar to the function of the human brain .

83、 What is the difference between machine learning and deep learning ?

Machine learning is a field of computer science , It gives computers the ability to learn , Without explicit programming . Machine learning can be divided into three categories .

Supervised machine learning ,
Unsupervised machine learning ,
Reinforcement learning

Insert picture description here

Deep learning is a sub field of machine learning , Its algorithm is inspired by the brain structure and function called artificial neural network .

84、 What are the reasons for the popularity of deep learning recently ?

Although deep learning has existed for many years , But major breakthroughs in these technologies have only emerged in recent years . There are two main reasons for this :

An increase in the amount of data generated from various sources
The growth of hardware resources required to run these models

gpu Is several times faster than before , Build larger in a relatively short time 、 More in-depth learning model .

85、 Explain the basics of neural networks

Neural networks in data science are designed to mimic human brain neurons , Different neurons combine to perform tasks . It learns generalizations or patterns from data , And use this knowledge to predict the output of new data , Without any human intervention .

The simplest neural network is a perceptron . It contains a neuron , Perform two operations , Linear calculation of all inputs and an activation function .

Insert picture description here

A more complex neural network consists of the following 3 layers -

Input layer —— It receives input

Hidden layer —— This is the layer between the input layer and the output layer . The initial hidden layer usually helps detect low-level patterns , The further layer combines the output of the previous layer to find more patterns .

Output layer —— The output layer is the last layer of output prediction .

The following figure shows a neural network -

Insert picture description here

86、 What is reinforcement learning ?

Reinforcement learning is a model for learning what to do and how to map situations to actions . The learner is not told what action to take , It's about finding out what kind of action will produce the greatest return . Reinforcement learning is inspired by human learning , Based on the reward and punishment mechanism .

Insert picture description here

87、 What is an artificial neural network ?

Artificial neural network is a set of specific algorithms , It revolutionized machine learning . They are inspired by biological neural networks . Neural networks can adapt to changes in input , So as to produce the best possible results without redesigning the output standard .

88、 Describe the structure of artificial neural network ?

The working principle of artificial neural network is the same as that of biological neural network . It consists of inputs , With the help of the activation function , Process with weights and deviations .

Insert picture description here

89、 How to initialize weights in a network ?

There are two ways : We can initialize the weight to 0, Or random assignment .

Initialize the ownership value to 0: This will make your model similar to a linear model . All neurons and each layer perform the same operation , Produce the same output , Make the deep net useless . except rnn/lstm The hidden state of should not be initialized to 0, however rnn/lstm In special cases, it may not be initialized to 0

Random initialization of ownership values : The weights are passed very close 0 Initialization of random assignment . Because each neuron has a different amount of computation , Therefore, the accuracy of the model is higher . This is the most common method .

90、 What is the cost function ?

Also known as “ Loss ” or “ error ”, The cost function is a measure of how well your model performs . Used to calculate the error of the output layer in the back propagation process . We propagate this error backward through the neural network and use it in different training functions .

91、 What is a super parameter ?

In the context of machine learning , A hyperparameter is a parameter that sets a value before the learning process begins . The values of other parameters are obtained through training . in other words , Hyperparameters will affect the training of our parameters , So it is called super parameter .

Hyperparameters ：

Define higher-level concepts about models , Such as complexity or learning ability .
You can't learn directly from the data in the standard model training process , It needs to be defined up front .
You can set different values , Train different models and choose better test values to decide

Some examples of hyperparameters ：

The number of trees or the depth of trees
The number of potential factors in matrix factorization
Learning rate （ Multiple modes ）
The number of hidden layers of deep neural network
k The number of clusters in mean clustering

92、 The learning rate setting is not accurate ( Too high or too low ) What's going to happen ?

When the learning rate is too low , The training of the model will progress very slowly , Because we only update the weight to the minimum . Multiple updates are required before reaching the minimum point .

If the learning rate is set too high , Due to the drastic update of weights , Will cause the loss function to produce undesirable divergence behavior . It may not converge ( The model can give a good output ), Even divergence ( The data is too confusing , Network cannot train ).

92、 Deep learning Epoch、Batch and Iteration What's the difference ?

Epoch—— Represents an iteration of the entire dataset ( All the contents put into the training model ).

Batch - It means that we cannot transfer the whole data set to the neural network at one time , So we divide the data set into several Batch.

Iteration—— If we had 10,000 Images as data , The batch size is 200. Then one Epoch Should be running 50 Time Iteration(10,000 Divide 50).

93、CNN What are the common layers ?

Convolution layer —— Layers performing convolution operations , Create several smaller picture windows to view the data .
Activation layer - It brings nonlinearity to the network , In general use relu
Pooling layer — Pooling is a downsampling operation , It can reduce the dimension of feature mapping .
Fully connected layer - This layer identifies and classifies objects in the image .

Insert picture description here

94、 Where is the pooling layer CNN How Chinese works ?

Use pooling to reduce CNN The spatial dimension of . It performs down sampling to reduce the dimension , And by sliding a filter matrix on the input matrix to create a collection of feature maps .

95、 What is a cyclic neural network (RNNs)?

rnn It's an artificial neural network , Aimed at time series 、 Identify patterns in data series such as stock market and government agencies . Understand the loop , First of all, you must understand the basic knowledge of feedforward neural network .

Both networks are named after the way they transmit information through a series of mathematical operations performed on network nodes . A method of providing information directly ( Never touch the same node twice ), The other provides information through a loop , So it's called a loop .

Insert picture description here

The input of the cyclic network is not just the current input example they see , It also includes what they perceived before .

Cyclic neural network in t-1 The decision made at any moment will affect its decision in t Decisions made after the moment . So the network has two input sources , The present and the recent past , Models combine them to react to new data , Just like what we do in life .

96、LSTM How the Internet works ?

Long - Short term memory (long-term Memory, LSTM) It's a special kind of cyclic neural network , Have the ability to learn for a long time , The default behavior is to remember information for a long time . stay LSTM There are three steps in the network :

The Internet decides what to forget , What to remember .
It selectively updates the cell status values .
The network determines which part of the current state is output .

97、 What is a multilayer perceptron (MLP)?

mlp There is an input layer 、 A hidden layer and an output layer . It has the same structure as a single-layer perceptron with one or more hidden layers . A single-layer perceptron can only have binary output (0,1) Linear separable classes of , Multilayer perceptron can classify nonlinear classes .

In addition to the input layer , Each node in the other layer uses a nonlinear activation function . This means that the input layer 、 The incoming data and activation function are added based on all nodes and weights , So it produces output .MLP Used a type called “ Back propagation ” Supervised learning methods . In back propagation , The neural network uses the cost function to calculate the error . It propagates the error backwards from its source ( Adjust the weights to train the model more accurately ).

98、 Simply explain the gradient descent

To understand gradient descent , Let's first understand what a gradient is .

The gradient measures the change in the output of a function if the input changes a little . It only measures the changes in all weights and errors . You can also think of the gradient as the slope of a function .

It can be said that the gradient descent is to climb to the bottom of the valley , Instead of climbing up the hill . This is because it is a function that minimizes a given function （ Activation function ） The minimization algorithm .

Insert picture description here

99、 What is gradient explosion ?

During the training , If you see exponential growth ( A very large ) Error gradient , It accumulates and leads to a very large update of the neural network model weight in the training process , They are called gradient explosions . In extreme cases , Weights can become very large , So as to overflow and cause NaN value .

This leads to model instability , Unable to learn from training data .

100、 What is gradient disappearance ?

When training , Your gradient can become too small ; This makes training difficult . When the gradient is too small , This problem is called vanishing gradient . This can lead to long training , Poor performance and low accuracy .

101、 What is back propagation and explain how it works .

Back propagation algorithm is a training algorithm for multilayer neural networks . In this way , Move the error from one end of the network to the ownership value within the network , Thus, the gradient can be calculated efficiently .

It has the following steps :

Forward propagation of training data
Calculate the derivative using the output and the target
Back propagation is used to calculate wrt Output the active error derivative
Use the previously calculated derivative to calculate the output
Update weights

102、 What are the variants of back propagation ?

Stochastic gradient descent : We only use a single training example to calculate the gradient and update the parameters .

Batch gradient descent : We calculate the gradient of the entire data set , And perform updates at each iteration .

Small batch gradient descent : This is one of the most popular optimization algorithms . It is a variant of the random gradient descent method , Not a single training example is used here , It's a small batch of samples .

103、 What are the different in-depth learning frameworks ?

Pytorch
TensorFlow
Keras
Caffe

104、 What is the function of the activation function ?

Using the activation function, the nonlinearity is introduced into the neural network , Help them learn more complex functions . Without it, neural networks will only be able to learn linear relations , That is, the linear combination of input data .

105、 What is an automatic encoder ?

The automatic coder is a simple learning network , It aims to convert input into output with minimum error . This means that we want the output to be as close to the input as possible . We have added several layers between input and output , These layers are smaller than the input layer . Receive unmarked input from encoder , It is then encoded to reconstruct the input .

106、 What is a Boltzmann machine ?

Boltzmann machine is a simple learning algorithm , They can find interesting features that represent complex laws in the training data . Boltzmann machine is mainly used to optimize the weight and quantity of a given problem . In a network with multilayer feature detectors , Learning algorithms is slow .“ The restricted Boltzmann machine ” The algorithm has only one feature detector , This makes it faster than other algorithms .

Insert picture description here

107、 What is? Dropout and BN?

Dropout It is a technology to randomly delete hidden and visible units in the network , To prevent data from over fitting ( Usually delete 20% The node of ). It doubles the number of iterations required for network convergence .

BN By normalizing the input of each layer , The average output is 0, The standard deviation is 1, So as to improve the performance and stability of neural networks .

108、 What is a calculation chart ?

Everything in tensor flow is based on creating a computational graph . It has a network of nodes , Each node works in it , Nodes represent mathematical operations , Edges represent tensors . In the calculation chart , A node is an input value or a function used to combine values . As data flows through the graph , Edges will receive their weights . The outbound side of the input node is weighted with the input value ; The outbound node from the function node is weighted by combining the weights of the inbound edges with the specified function .

All deep learning frameworks rely on creating a computational graph to calculate the gradient values required for gradient descent optimization . Usually , You have to build a forward propagation graph , And the framework will handle reverse differentiation for you .

One of the advantages of static graphs is that it allows powerful offline optimization of graphs / Dispatch . This means that these are usually faster than dynamic graphs ( The difference may not be significant in each use case , It depends on our diagram ). The disadvantage is that it is complex to deal with structured or variable size data .

Dynamic graphs are debug friendly . It's much easier to find problems in your code , Because it allows you to execute code line by line , And you can access all variables . If you want to apply deep learning to any practical purpose in the industry , This is definitely a very important feature .

https://avoid.overfit.cn/post/8a0516dcc791436f893239402bc935ab

原网站

版权声明
本文为[deephub]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/179/202206280355483382.html