当前位置:网站首页>INT 104_ LEC 06

INT 104_ LEC 06

2022-06-23 08:00:00 NONE_ WHY

1. Support Vector Machine【 Support vector machine 】


1.1. Hyperplane【 hyperplane 】

1.1.1. describe

  • Hyperplanes are n Linear subspace with codimension equal to one in dimensional Euclidean space , Its dimension must be (n-1)
  • Codimension is a measure of subspace ( Sub cluster, etc ) A numerical quantity of size . hypothesis X Is an algebraic family ,Y yes X A sub cluster in .X The dimension of is n,Y The dimension of is m, Then we call Y stay X The codimension in is n-m, Specially , If X and Y It's all linear spaces , that Y stay X The codimension in is Y The dimension of the complement space of

1.1.2. Definition

  • mathematics
    • ​​​​​ set up F Is the domain ( It can be considered as F=\mathbb{R}),n Dimensional space F^n The hyperplane of is determined by the equation :a_1x_1+...+a_nx_n =b Defined subset , among a_1,...,a_n\in F Is a constant that is not all zero
  • linear algebra
    • F- Vector space V A hyperplane in is a plane shaped like :\{v\in V: f(v)=0\} The subspace of , among f:V\rightarrow F Is any nonzero linear mapping
  • Projective geometry
    • In homogeneous coordinates (x_0:...:x_n) Next , Projective space \mathbb{P}^n The hyperplane of is determined by the equation :a_1x_1+...+a_nx_n =0 Definition , among a_1,...,a_n Is a constant that is not all zero

1.1.3. Some special types

  • Affine hyperplane

    • Affine hyperplane is an algebra in affine space 1 In Cartesian coordinates , The equation can be used :a_1x_1+...+a_nx_n =b describe , among a_1,...,a_n Is a constant that is not all zero

    • In the case of real affine space , let me put it another way , When the coordinates are real numbers , The affine space divides the space into two half spaces , They are the connecting components of the complement of the hyperplane , By inequality a_1x_1+...+a_nx_n <b and a_1x_1+...+a_nx_n>b give

    • Affine hyperplane is used to define the decision boundary in many machine learning algorithms , For example, linear combination ( tilt ) Decision tree and perceptron

  • Vector hyperplane

    • In vector space , Vector hyperplane is algebra 1 The subspace of , You can only shift from the origin by a vector , under these circumstances , It is called a plane . Such a hyperplane is the solution of a single linear equation

  • Projection hyperplane

    • The shadow casting space is a set of points with attributes , For any two points in the set , All points on the line determined by two points are included in the set . Projective geometry can be seen as adding vanishing points ( Infinity ) Affine geometry of . Affine hyperplane and infinitely related points form a projective hyperplane . A special case of a projected hyperplane is an infinite or ideal hyperplane , It is defined as the set of all points at infinity

    • In projection space , A hyperplane does not divide a space into two parts : contrary , It requires two hyperplanes to separate points and divide the space . The reason is that space is basically “ Surround ", So that the two sides of a single hyperplane are connected to each other

1.1.4. PPT

  • A hyper plane could be used to split samples belonging to different classes
  • The hyperplane can be written as WX + b = 0 hence
    • Positive class will be taken as WX + b > +1
    • Negative class will be taken as WX + b < -1

1.2. Support Vector【 Support vector 】

1.2.1. INFO

  • Support vector machine (SVM) It's a kind of press Supervised learning (Supervised Learning) The method of binary classification of data Generalized linear classifier (Generalized Linear Classifier), The decision boundary is to solve the learning sample Maximum margin hyperplane (Maxmum Margin Hyperplane)
  • SVM Use the hinge loss function (Hinge Loss) Calculate empirical risk (Empirical Risk) The regularization term is added to the solution system to optimize the structural risk (Structural Risk), It is a classifier with sparsity and robustness
  • SVM You can use and methods (Kernel Method) Non linear classification , Is a common nuclear learning (Kernel Learning) One method

1.2.2. What do we want?

  • The maximum distance between the hyperplane and the support vector
  •  
  • Optimal hyperplane                               【 Optimal hyperplane 】
  • Optimal Margin                                      【 Optimal interval 】
  • Dashed Line                                          【 Support vector 】

1.3. Kernel Function【 Kernel function

1.3.1. Definition

  • Support vector machine through some nonlinear transformation \phi(x), Map the input space to the high-dimensional feature space . If only the inner product calculation is used to solve the support vector , And there is a function in the lower input space K(x,x'), It happens to be equal to the inner product in the higher dimensional space , namely K(x,x')=<\phi(x),\phi(x')>, Then support vector machines do not need to compute complex nonlinear transformations , And by this function K(x,x') The inner product of the nonlinear transformation is obtained directly , It greatly simplifies the calculation . A function like this K(x,x') It's called kernel function

1.3.2.  classification

  • The choice of kernel function should satisfy Mercer's Theorem, That is, any of the kernel functions in the sample space Gram Matrix【 Gram matrix 】 Is a positive semidefinite matrix
  • frequently-used :
    • Linear kernel function
    • Polynomial kernel function
    • Radial basis function
    • Sigmoid Kernel function
    • Composite kernel function
    • Fourier series kernel function
    • B Spline kernel function
    • Tensor product kernel function
    • ......

1.3.3. theory

  • According to pattern recognition theory , Low dimensional space is linearly nonseparable The pattern of is non linearly mapped to High dimensional feature space may be linearly separable
  • The kernel function technique can effectively avoid the problems in the operation of high-dimensional feature space “ Dimension disaster ”
    • set up x,z\in X,X \ Belong to R(n) Space , Nonlinear functions \phi Implement input space X To feature space F Mapping , among F Belong to R(m),n<<m, According to kernel technique, there are
      • K(x,z)=<\phi(x),\phi(z)>, among <\phi(x),\phi(z)> Is inner product ,K(x,z) It's a kernel function
    • Kernel function will m The inner product operation of high dimensional space is transformed into n Calculation of kernel function in low dimensional input space , Thus, the problem of calculating in high-dimensional feature space is ingeniously solved “ Dimension disaster ” Other questions , Thus, it lays a theoretical foundation for solving complex classification or regression problems in high-dimensional feature space

1.3.4. nature

  • Avoided “ Dimension disaster ”, It greatly reduces the amount of calculation , Effectively handle high-dimensional input
  • There is no need to know the nonlinear transformation function \phi Form and parameters of
  • The change of the form and parameters of the kernel function will implicitly change the mapping from the input space to the feature space , Then it has an impact on the properties of feature space , Finally, the performance of various kernel methods is changed
  •   Kernel function method can be combined with different algorithms , Form a variety of different methods based on kernel function technology , And the design of these two parts can be carried out separately , Different kernel functions and algorithms can be selected for different applications

1.3.5. PPT

1.4. Multiple Classes【 Multiple classification problem 】

1.4.1. OvO---One Vernus One

  • Figure
  • thought
    • take N Pairing two categories , Each use 2 Data training classifiers of categories
    • Provide samples to all classifiers , The final result is produced by voting
  • Number of classifiers
    • C_n^2=\frac{N(N-1)}{2}
  • characteristic
    • There are many classifiers , And each classifier only uses 2 Sample data of categories

1.4.2. OvM---One Vernus Many

  • thought
    • Use one class at a time as a positive example of the sample , All others are counterexamples
    • Each classifier can recognize a fixed category
    • When using , If a classifier is a positive class , Then it is the category ; If more than one classifier is a positive class , Then select the category recognized by the classifier with the highest confidence
  • Number of classifiers
    • C_n^1=N
  • characteristic
    • comparison OvO Fewer classifiers , And each classifier uses all the sample data in training

1.4.3. MvM---Many Vernus Many

  • thought
    • Take several classes as positive examples at a time 、 Several classes as counterexamples
    • The common technique is ECOC【 Error correcting output code 】
    • Encoding phase
      • Yes N A class to do M Sub division , Each partition treats a part as a positive class , Part of the division is anti class
      • There are two forms of coding matrix : Binary code and ternary code
        • Binary code : It is divided into positive and negative classes
        • Ternary code : It is divided into positive class, negative class and inactive class
      • A total of M Training set , Training out M A classifier
    • Decoding phase
      • M Two classifiers predict the test samples respectively , These predictive markers make up a code , Compare this prediction code with the code of each class , Return the category with the smallest distance as the final result
  • Number of classifiers
    • M individual
  • characteristic
    • ECOC Coding length is positively correlated with error correction capability
    • The longer the code, the more classifiers need to be trained , Computing storage overhead will increase ; On the other hand, it is meaningless for the finite class code length to exceed a certain range . For codes of the same length , In theory , The farther the coding distance between the two categories of the task is , The stronger the error correction ability
  • example
    • ECOC In the coding diagram ,“+1” and “-1” They are positive 、 Counter example , In ternary code “0” Indicates that this type of sample is not used
    • Black and white are positive respectively 、 Counter example , Grey in ternary code means that such samples are not used
    • binary ECOC code
      f_1f_2f_3f_4f_5 Hamming distance Euclidean distance
      C_1{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}32\sqrt{3}
      C_2{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkGreen}-1 }44
      C_3{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}12
      C_4{\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }22\sqrt{2}
      The test sample {\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}
    • Three yuan ECOC code
      f_1f_2f_3f_4f_5f_6f_7 Hamming distance Euclidean distance
      C_1{\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}44
      C_2{\color{DarkGreen}-1 }{\color{Purple} 0}{\color{Purple} 0}{\color{Purple} 0}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{Purple} 0}22
      C_3{\color{DarkOrange} +1}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkGreen}-1 }52\sqrt{5}
      C_4{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{Purple} 0}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{Purple} 0}{\color{DarkOrange} +1}3\sqrt{10}
      The test sample {\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}{\color{DarkGreen}-1 }{\color{DarkOrange} +1}

2. Naive Bayes【 Naive Bayes 】


2.1. Baye's Rule

2.1.1. Definition

  • States the relationship between prior probability distribution【 A priori probability distribution 】 and posterior probability distribution【 A posteriori probability distribution 】

2.1.2. Formula

  • P(C|X)=\frac{P(C)P(X|C)}{P(X)} \Rightarrow C:Class\quad X:A\ set\ of\ samples
  • P(C)\Rightarrow Prior\ Probability
  • P(X|C)\Rightarrow Class\ Conditional\ Probability(CCP,aka\ Likelihood)
  • P(C|X)\Rightarrow Posterior\ Probability
  • P(X)\Rightarrow Evidence Factor(Observation)

2.1.3. Application

  • Parameter estimation
  • Classification
  • Model selection

2.2. Baye's Rule for Classification

2.2.1. How can we make use of Bayes’ Rule for Classification?

  • We want to maximise the posterior probability of observations
  • This method is named MAP estimation (Maximum A Posteriori)

2.2.2. How to use?

  • Presume that x\in D_c  means that a sample x belongs to class c where all samples belong to class c from dataset D_c
  • Recall Bayes' Rule
    • PosteriorProbability=\frac{CCP\times PriorProbability}{Observation}
  • As the observation is same(the same training dataset), we have
    • PosteriorProbability\propto\ CCP\times PriorProbability
  • So we need to find the CCP and the prior probability
  • According to Law of Large Numbers, the prior probability can be taken as the probability resulted from the frequency of observations
  • So we only care about the term
    • p(D|\Theta)=p(c)\prod _c\prod _{x_c\in D_c}p(x_c|\Theta _c)

2.3. Naïve Bayes Classifier【 Naive Bayes 】

2.3.1. Calculate

  • Calculate the term p(x_c|\Theta _c ) is never an easy task
  • A way to simplify the process is to assume the conditions / features of \Theta _c are independent to each other
  • Assume that \Theta _c=(\theta_{c1},\theta_{c2},...,\theta_{cl}), we have
    • p(x_c|\Theta _c )=\prod _{i=1}^lp(c|\theta _i)

2.3.2. Example 01

  • Will you play on the day of Mild?
    • Original Dataset
      OutlookTempratureHumidityWindyPlay
      OvercastHotHighFalseYes
      OvercastCoolNormalTrueYes
      OvercastMildHighTrueYes
      OvercastHotNormalFalseYes
      RainyMildHighFalseYes
      RainyCoolNormalFalseYes
      RainyCoolNormalTrueNo
      RainyMildNormalFalseYes
      RainyMildHighTrueNo
      SunnyHotHighTrueNo
      SunnyHotHighFalseNo
      SunnyMildHighFalseNo
      SunnyCoolNormalFalseYes
      SunnyMildNormalTrueYes
  • Solution
    • TempratureYesNop
      Hot220.28
      Mild420.43
      Cool310.28
      p0.640.36
    • p(Mild|Yes)=\frac{4}{9}=0.44
    • p(Mild|No)=\frac{2}{5}=0.40
    • p(Yes|Mild)=\frac{p(Mild|Yes)p(Yes)}{p(Mild)}=\frac{0.44\times 0.64}{0.43}=0.65
    • p(No|Mild)=\frac{p(Mild|No)p(No)}{p(Mild)}=\frac{0.4\times 0.36}{0.43}=0.33
    • p(Yes|Mild)>p(No|Mild),it's\ likely\ to\ play

2.3.3. Example 02

  • Will you play on the day of Rainy, Cool , Normal and Windy?
    • Original Dataset
      OutlookTempratureHumidityWindyPlay
      OvercastHotHighFalseYes
      OvercastCoolNormalTrueYes
      OvercastMildHighTrueYes
      OvercastHotNormalFalseYes
      RainyMildHighFalseYes
      RainyCoolNormalFalseYes
      RainyCoolNormalTrueNo
      RainyMildNormalFalseYes
      RainyMildHighTrueNo
      SunnyHotHighTrueNo
      SunnyHotHighFalseNo
      SunnyMildHighFalseNo
      SunnyCoolNormalFalseYes
      SunnyMildNormalTrueYes
  • Solution​​​​​​
    • OutlookYesNop
      Overcast400.28
      Rainy320.36
      Sunny230.36
      p0.640.36
    • TempratureYesNop
      Hot220.28
      Mild420.43
      Cool310.28
      p0.640.36
    • HumidityYesNop
      High340.50
      Normal610.50
      p0.640.36
    • WindyYesNop
      T330.43
      F620.57
      p0.640.36
    • P(Rainy, Cool , Normal,Windy) 
      • ​​​​​​​\propto
      • ​​​​​​​P(Yes)P(Rain|Yes)P(Cool|Yes)P(Normal|Yes)P(Windy|Yes)​​​​​​​
      • ​​​​​​​=\frac{9}{14}\times\frac{3}{9}\times\frac{3}{9}\times\frac{6}{9}\times\frac{3}{9}​​​​​​​
      • =\frac{1}{21}​​​​​​​
    • P(NO(Rainy, Cool , Normal,Windy))
      • \propto
      •  P(No)P(Rain|No)P(Cool|No)P(Normal|No)P(Windy|No)
      • =\frac{2}{5}\times\frac{2}{5}\times\frac{2}{5}\times\frac{1}{5}\times\frac{2}{5}
      • =\frac{16}{3125}​​​​​​​
    • Yes

3. Methods


3.1. Parametric Methods【 Parametric methods 】

  • We presume that there exists a model, either a Bayesian model (e.g. Naïve Bayes) or a mathematical model (e.g. Linear Regression)
    • We seldom obtain a “true” model due to the lack of prior knowledge
    • For the same reason, we never know which model should be used

3.2. Non-parametric Methods【 Nonparametric methods 】

  • Non-parametric models can be used with arbitrary distributions and without the assumption that the forms of the underlying densities are known.
  • Moreover, they can be used with multimodal distributions which are much more common in practice than unimodal distributions.
  • ​​​​​​​With enough samples, convergence to an arbitrarily complicated target density can be obtained.
  • The number of samples needed may be very large (number grows exponentially with the dimensionality of the feature space).
  • These methods are very sensitive to the choice of window size (if too small, most of the volume will be empty, if too large, important variations may be lost).
  • There may be severe requirements for computation time and storage.
  • Two major methods
    • Decision tree
    • k-nearest neighbour

4. Decision Tree【 Decision tree 】


4.1. Steps

4.1.1. Provide

  • Suppose we have a training dataset D=\{ \vec x_1, \vec x_2,...,\vec x_n \} whose labels are Y=\{ y_1, y_2,...,y_n \} respectively, where \vec x_i=(x_{ i1},x_{ i2},....x_{ im})

4.1.2. Step

  • Given a node as root node
  • Determine whether the node is a leaf note by
    • Whether the node contains no samples
    • Whether the samples belong to the node is from a universal class
    • Whether a common attribute is shared
    • Whether attributes are yet to be further analysed
  • The decision of leaf is determined by voting
  • Select attribute j in x_{*j}​​​​​​​ to build a new branch for each value of the selected branch then determine the mode of nodes by the previous procedure
    • Entropy gain【Do not consider the size of dataset】
      • Gain(D,x_{*j})=entropy(D)-\sum_{V=1}^V\frac{|D^V|}{D}entropy(D^V)
      • entropy(D)=-\sum_{k=1^{|Y|}}p_k\log_2p_k
    • Ratio gain (C4.5)【Prefer small size】
      • GainRatio(D,x_{*j})=\frac{Gain(D,x_{*j})}{IV(1)}
      • IV(a)=-\sum_{V=1}^V\frac{|D^V|}{|D|}\log_2\frac{|D^V|}{|D|}
    • Gini index (CART)【R: Regression】
      • ​​​​​​​Gini(D)=\sum_{k=1}^{|Y|}\sum_{k'\neq k}p_kp_{k'}=1-\sum_{k=1}^{|Y|}p_k^2
      • GiniIndex(D,x_{*j})=\sum_{V=1}^V\frac{|D^V|}{|D|}Gini(D^V)

4.2. Example

4.1.1. Dataset

  • Original Dataset
    OutlookTempratureHumidityWindyPlay
    OvercastHotHighFalseYes
    OvercastCoolNormalTrueYes
    OvercastMildHighTrueYes
    OvercastHotNormalFalseYes
    RainyMildHighFalseYes
    RainyCoolNormalFalseYes
    RainyCoolNormalTrueNo
    RainyMildNormalFalseYes
    RainyMildHighTrueNo
    SunnyHotHighTrueNo
    SunnyHotHighFalseNo
    SunnyMildHighFalseNo
    SunnyCoolNormalFalseYes
    SunnyMildNormalTrueYes

4.1.2. Analysis

  • Outlook
    • OutlookYesNo
      Overcast404
      Rainy325
      Sunny235
    • Entropy of Outllook
      • E(D^{Outlook})​​​​​​​
      • =-(P_{Overcast}\log_2 P_{Overcast}+P_{Rainy}\log_2 P_{Rainy}+P_{Sunny}\log_2 P_{Sunny})
      • =-(\frac{4}{14}\log_2 \frac{4}{14}+\frac{5}{14}\log_2 \frac{5}{14}+\frac{5}{14}\log_2 \frac{5}{14})
      • =1.5774
    • Entropy of Overcast
      • E(D^{Overcast})​​​​​​​
      • =-(P_{Yes}\log_2 P_{Yes}+P_{No}\log_2 P_{No})
      • =-(1\log_2 1+0\log_2 0)
      • =0
    • Entropy of Rainy
      • E(D^{Rainy})​​​​​​​
      • =-(P_{Yes}\log_2 P_{Yes}+P_{No}\log_2 P_{No})
      • =-(\frac{3}{5}\log_2 \frac{3}{5}+\frac{2}{5}\log_2 \frac{2}{5})
      • =0.971
    • Entropy of Sunny
      • E(D^{Sunny})​​​​​​​
      • =-(P_{Yes}\log_2 P_{Yes}+P_{No}\log_2 P_{No})
      • =-(\frac{2}{5}\log_2 \frac{2}{5}+\frac{3}{5}\log_2 \frac{3}{5})
      • =0.971
    • {\color{DarkRed} \sum_{V=1}^V\frac{|D^V|}{D}entropy(D^V)}
      • \small =P_{Overcast}\times entropy(D^{Overcast}) +P_{Rainy}\times entropy(D^{Rainy})+P_{Sunny}\times entropy(D^{Sunny})
      • =1.5774-(\frac{4}{15}\times 0+\frac{5}{15}\times 0.971+\frac{5}{15}\times 0.971)
      • =0.6936
    • Entropy Gain
      • =entropy(Outlook)-\sum_{V=1}^V\frac{|D^V|}{D}entropy(D^V)
      • =1.5774-0.6936
      • =0.8838
  • Build the decision tree
    • ​​​​​​​Select the attribute with highest entropy gain to build up
    • e.g.
      • ​​​​​​​Suppose Entropy Gain of Outlook is the highest
      • The 1st level of decision tree will look like
      • ​​​​​​​​​​ 

4.3. Overfitting【 Over fitting 】

4.3.1. Too good to be true

  • Sometimes, the decision tree has fit the sample distribution too well such that unobserved samples cannot be predicted in a sensible way
  • We could prune(remove) the branches that cannot introduce system performance

4.4. Random Forest → Typical  Example of Boosting【 Random forests 】

  • Another way to improve the system is to have multiple decision tree and vote for the final results
  • Attributes for each decision tree is random selected
  • ​​​​​​​Each decision tree is only trained by a part set of attributes of samples
    • ​​​​​​​e.g.
      • ​​​​​​​For attribute {A,B,C,D}
        • Tree 1 ={A,B,C} 
        • Tree 2 ={B,C,D} 
        • Tree 3 = {C,D}
      • Trained
        • ​​​​​​​Tree 1 ={A,B,C} → Yes
        • Tree 2 ={B,C,D} → Yes
        • Tree 3 = {C,D}    → No

5. KNN


5.1. PPT

  • As in the general problem of classification, we have a set of data points for which we know the correct class labels
  • When we get a new data point, we compare it to each of our existing data points and find similarity
  • Take the most similar k data points (k nearest neighbours)
  • From these k data points, take the majority vote of their labels. The winning label is the label / class of the new data point

5.2. Extra

5.2.1. The core idea

  • If a sample is in the feature space K Most of the closest samples belong to a certain category , Then the sample also belongs to this category , And have the characteristics of samples in this category

5.2.2. Algorithm flow

  • Preprocess the data

  • Calculate test sample points ( That is, the points to be classified ) Distance to each other sample point 【 Usually use European distance 】

  • Sort each distance , Then choose the one with the smallest distance K A little bit

  • Yes K Compare the categories of points , According to the principle that the minority is subordinate to the majority , The test sample points are classified in K The one with the highest percentage of points

5.2.3. Advantages and disadvantages

  • advantage
    • ​​​​​​​ Simple thinking , Easy to understand , Easy to implement , There is no need to estimate parameters
  • ​​​​​​​ shortcoming
    • ​​​​​​​ When the sample is unbalanced , For example, a class has a large sample size , And the sample size of other classes is very small , It is possible to cause when entering a new sample , The sample K Most of the samples of large capacity classes in neighbors
    • The amount of calculation is large , Because for each text to be classified, the distance from it to all known samples should be calculated , To get it K Nearest neighbors

5.2.4. Improvement strategy

  • Seek a distance function that is closer to the actual distance to replace the standard Euclidean distance               【WAKNN、VDM】
  • Search for more reasonable K Value in place of the specified size K value                                【SNNB、DKNAW】
  • Using a more accurate probability estimation method to replace the simple voting mechanism               【KNNDW、LWNB、ICLNB】
  • Build efficient indexes , In order to improve the KNN The efficiency of the algorithm                         【KDTree、NBTree】
原网站

版权声明
本文为[NONE_ WHY]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/174/202206230727561001.html