当前位置:网站首页>[Bayesian classification 3] semi naive Bayesian classifier
[Bayesian classification 3] semi naive Bayesian classifier
2022-06-26 20:37:00 【NoBug ㅤ】
List of articles
1. Review of naive Bayesian classifier knowledge
1.1 Category , features
We according to the Bayesian decision theory , Or rather, Bayesian classification principle , The first thing you get is a Expected loss 【 R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x) R(ci∣x)=∑j=1NλijP(cj∣x)】. Bayesian criteria Is to minimize the overall risk , Thus, it can be pushed to the requirement that some risks be minimized 【 h ∗ ( x ) = a r g m i n c ∈ Y R ( c ∣ x ) h^*(x)=arg\ min_{c\in Y}R(c|x) h∗(x)=arg minc∈YR(c∣x)】.
You can put c c c regard as “ Category ”, hold x x x regard as “ features ”. R ( c ∣ x ) R(c|x) R(c∣x) Is a category , It has its own characteristics , Such as 【 Good melon : Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】, c = { good melon , bad melon } c=\{ Good melon , Bad melon \} c={ good melon , bad melon }, x = { green green , clear Clear , . . . , The secret degree } x=\{ dark green , Clear ,..., density \} x={ green green , clear Clear ,..., The secret degree }, We calculate categories according to characteristics ( Right or not ) The risk of , The less risk , The more correct the representative category is .
1.2 risk , probability
according to Miscalculation of loss , We further derive R ( c ∣ x ) = 1 − P ( c ∣ x ) R(c_|x)=1-P(c|x) R(c∣x)=1−P(c∣x), P ( c ∣ x ) = P ( class other ∣ , sign ) P(c|x)=P( Category | features ) P(c∣x)=P( class other ∣ , sign ) It can be understood as : This category , Through these characteristics , The emergence of ( Or the probability of an event ) probability . for example :【 Good melon : Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】, We said :【 Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】 Is a good melon ( The probability is very high ). So this is according to the characteristics , Constantly adjust the probability of categories , yes Posterior probability .
We naturally hope that , According to these characteristics, the greater the probability of being a good melon, the better , So there is :【 h ∗ ( x ) = a r g m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h∗(x)=arg maxc∈YP(c∣x), There are several categories of datasets , One sample has several P ( c ∣ x ) P(c|x) P(c∣x), According to these P ( c ∣ x ) P(c|x) P(c∣x) Size , Select the one with the largest value , So this sample is Be divided For this category .】. We demand P ( c ∣ x ) P(c|x) P(c∣x) This probability , There's a formula :【 P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(c∣x)=P(x)P(c)P(x∣c)】, It's for all attributes joint probability ( In the actual , Will meet : Combination explosion , Missing value, etc ), difficulty In seeking Class conditional probability P ( x ∣ c ) P(x|c) P(x∣c). We use naive Bayes to solve Posterior probability ( P ( c ∣ x ) P(c|x) P(c∣x)) A difficult problem to calculate , Put forward 【 Assumption of independence of attribute condition 】( Attributes can also be called features ).
1.3 Class conditional probability
Naive Bayes classifier The expression of :【 h n b ( x ) = a r g m a x c ∈ Y P ( c ) ∏ i = 1 d P ( x i ∣ c ) h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c) hnb(x)=arg maxc∈Y P(c)∏i=1dP(xi∣c)】, The focus is on Class conditional probability On , Basically, we use... In the calculation of data sets “ Reduced sample space method ”, Instead of using some formulas of conditional probability in probability theory . Simply speaking , Statistical samples Number , Do it again The proportion operation .
2. Semi naive Bayesian classifier learning notes
2.1 introduction
Naive Bayesian classifier uses “ Attribute independence assumption ”, It is difficult to establish in real tasks . therefore , People try to relax the assumption of attribute conditional independence to a certain extent , This leads to a class called " Semi naive Bayesian classifier " (semi-naïve Bayes classifiers) Learning methods of . The basic idea :【 Due consideration should be given to the interdependent information among some attributes 】.
2.2 Knowledge cards
1. Semi naive Bayesian classifier :semi-naive Bayes classifiers
2. Also called :SNB Algorithm
2.3 Semi naive Bayesian classifier
- Expression of semi naive Bayesian classifier
p a i pa_i pai Attribute x j x_j xj Dependent properties , be called x j x_j xj Parent attribute
P ( c ∣ x ) ∝ P ( c ) ∏ j = 1 d P ( x j ∣ c , p a i ) P(c|x)\propto P(c)\prod_{j=1}^dP(x_j|c,pa_{i}) P(c∣x)∝P(c)j=1∏dP(xj∣c,pai)
- Rely solely on classifiers
The key to the problem is how to determine the value of each attribute Parent attribute , Different practices produce different Rely solely on classifiers .
2.4 Independent estimation
2.4.1 brief introduction
- " english :"
One-Dependent Estimator, abbreviation :ODE
- " purpose :"
It is a semi naive Bayesian classifier , Due consideration should be given to the interdependent information among some attributes , One of the most common strategies .
- "ODE principle :"
Suppose that each attribute depends on at most one other attribute outside the category
- " Due consideration should be given to the interdependent information among some attributes :"
Consider different strategies , The independent dependent classifiers formed are also different . representative :
1)SPODE(Super-Parent ODE)
2)TAN(Tree Augmented naive Bayes)
3)AODE(Average ODE)
2.4.2 SPODE( Superparent dependent estimation )
- principle
All features used depend on a single feature , This dependent feature is called Super parent (super-parent). And then through Cross validation Identify superparent ( Assume that each attribute is a superparent , Choose the one that works best ).
- chart 1

- Example
【 Data sets 】
| x 1 x_1 x1 | x 2 x_2 x2 | x 3 x_3 x3 | y y y |
|---|---|---|---|
| 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 |
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 0 | 0 | 0 | 0 |
【 forecast 】
| x 1 x_1 x1 | x 2 x_2 x2 | x 3 x_3 x3 | y y y |
|---|---|---|---|
| 1 | 1 | 0 | ? |
【SPODE The formula 】
P ( c ∣ x ) ∝ a r g m a x c ∈ Y P ( c ) ∏ j = 1 d P ( x j ∣ c , p a i ) P(c|x)\propto arg\ max_{c\in Y}\ P(c)\prod_{j=1}^dP(x_j|c,pa_{i}) P(c∣x)∝arg maxc∈Y P(c)j=1∏dP(xj∣c,pai)
【 Prior probability P ( c ) P(c) P(c)】
P ( y = 1 ) = 4 10 = 0.4 P(y=1)=\frac{4}{10}=0.4 P(y=1)=104=0.4
P ( y = 0 ) = 6 10 = 0.6 P(y=0)=\frac{6}{10}=0.6 P(y=0)=106=0.6
【 Class conditional probability P ( x i ∣ c , p a i ) P(x_i|c,pa_{i}) P(xi∣c,pai)[ Suppose first x 1 x_1 x1 For superparent p a i = x 1 pa_i=x_1 pai=x1]】
P ( x 1 = 1 ∣ y = 1 ) = 3 4 P(x_1=1|y=1)=\frac{3}{4} P(x1=1∣y=1)=43
P ( x 2 = 1 ∣ y = 1 , x 1 = 1 ) = 2 3 P(x_2=1|y=1,x_1=1)=\frac{2}{3} P(x2=1∣y=1,x1=1)=32
P ( x 3 = 0 ∣ y = 1 , x 1 = 1 ) = 1 3 P(x_3=0|y=1,x_1=1)=\frac{1}{3} P(x3=0∣y=1,x1=1)=31
P ( x 1 = 1 ∣ y = 0 ) = 2 6 = 1 3 P(x_1=1|y=0)=\frac{2}{6}=\frac{1}{3} P(x1=1∣y=0)=62=31
P ( x 2 = 1 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_2=1|y=0,x_1=1)=\frac{1}{2} P(x2=1∣y=0,x1=1)=21
P ( x 3 = 0 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_3=0|y=0,x_1=1)=\frac{1}{2} P(x3=0∣y=0,x1=1)=21
【 Semi naive Bayesian classifier 】
P ( y = 1 ) = 0.4 ∗ 0.75 ∗ 2 3 ∗ 1 3 = 0.067 P(y=1)=0.4*0.75*\frac{2}{3}*\frac{1}{3}=0.067 P(y=1)=0.4∗0.75∗32∗31=0.067
P ( y = 1 ) = 0.6 ∗ 1 3 ∗ 1 2 ∗ 1 2 = 0.050 P(y=1)=0.6*\frac{1}{3}*\frac{1}{2}*\frac{1}{2}=0.050 P(y=1)=0.6∗31∗21∗21=0.050
the With , pre measuring sample Ben class other , y = 1 therefore , Forecast sample category ,y=1 the With , pre measuring sample Ben class other ,y=1
2.4.3 AODE( Average independent dependence estimation )
- principle
SPODE It's choice A model To make predictions ,AODE It's based on Integrated learning The mechanism Rely solely on classifier . stay AODE in , Each model makes a prediction , And then The results averaged The final prediction result is obtained .(AODE Try to Each attribute Build as a superparent SPODE, And then I'm going to talk about those who have Enough training data Supported by SPODE Integrated as the end result .)
- Understand the formula
P ( c ∣ x ) ∝ ∑ i = 1 d P ( c , x i ) ∏ j = 1 d P ( x j ∣ c , x i ) d P(c|x)\propto \frac{\sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_{i})}{d} P(c∣x)∝d∑i=1dP(c,xi)∏j=1dP(xj∣c,xi)
- threshold m ′ m' m′
For example, in the example above , P ( x 2 = 1 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_2=1|y=0,x_1=1)=\frac{1}{2} P(x2=1∣y=0,x1=1)=21, It is possible that the denominator is 1, Resulting in too few samples available . therefore ,AODE Add a threshold , The number of samples whose super parent feature is a specific value is required to be greater than or equal to m’ when , To use this SPODE Model .
- AODE The formula
P ( c ∣ x ) ∝ ∑ i = 1 d P ( c , x i ) ∏ j = 1 d P ( x j ∣ c , x i ) , ∣ D x i ∣ ≥ m ′ P(c|x)\propto \sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_i),|D_{x_i}|\geq m' P(c∣x)∝i=1∑dP(c,xi)j=1∏dP(xj∣c,xi),∣Dxi∣≥m′
- Parameter Solving
【 P ( c , x i ) P(c,x_i) P(c,xi)】
P ( c , x i ) = ∣ D c , x i ∣ + 1 ∣ D ∣ + N i P(c,x_i)=\frac{|D_{c,x_i}|+1}{|D|+N_i} P(c,xi)=∣D∣+Ni∣Dc,xi∣+1
【 P ( x j ∣ c , x i ) P(x_j|c,x_i) P(xj∣c,xi)】
P ( x j ∣ c , x i ) = ∣ D c , x i , x j ∣ + 1 ∣ D c , x i ∣ + N j P(x_j|c,x_i)=\frac{|D_{c,x_i,x_j}|+1}{|D_{c,x_i}|+N_j} P(xj∣c,xi)=∣Dc,xi∣+Nj∣Dc,xi,xj∣+1
2.4.4 TAN( Tree augmentation naive Bayes )
- introduction
Whether it's SPODE, still AOED, Although the superparent is transforming , But in every model Each feature depends on the superparent feature . In reality , The dependence of features is also unlikely to depend on one of them , It is Maybe each feature has different dependencies .TAN Can solve this problem , find Each feature is best suited to the other feature on which it depends .
- principle
stay Maximum weighted spanning tree Build dependencies on the basis of the algorithm .TAN Calculate the... Between two features Conditional mutual information , When selecting dependent features for each feature , Choose mutual information Maximum The corresponding characteristics of ( The value of conditional mutual information represents the degree of interdependence between the two features ).
- Algorithm

3. The extension of semi naive Bayesian classifier
3.1 kDE(k Dependency estimation )
Since it will Assumption of independence of attribute condition Relax as Dependent assumption It is possible to obtain the improvement of generalization performance , that , Can we consider the relationship between attributes Higher order dependencies ( Dependency on multiple attributes ) To further improve generalization performance ? amount to , take ODE Expand into kDE. if The training data is very sufficient , You can consider .
边栏推荐
- Some cold knowledge about QT database development
- C language simple login
- mysql的充值问题
- 案例描述:比赛分数管理系统,需要统计历届冠军所得比赛得分,并记录到文件中,其中系统有如下需求:- 打开系统有欢迎界面,并显示可选择的选项- 选项1:记录比赛得分- 选项2:查看往届
- 好物推薦:移動端開發安全工具
- Garbage collection mechanism of browser
- Button how to dynamically set drawablebottom (setcomposunddrawables is invalid)
- 浏览器事件循环
- 动态规划111
- 分布式ID生成系统
猜你喜欢

Basic and necessary common plug-ins of vscade

飞天+CIPU体为元宇宙带来更大想象空间

超分之VRT

Three basic backup methods of mongodb

Tiktok practice ~ search page ~ scan QR code

Muke 8. Service fault tolerance Sentinel

Unity——Mathf. Similarities and differences between atan and atan2

Development of NFT for digital collection platform
Detailed explanation of stored procedures in MySQL

郭明錤:苹果 AR / MR 头显是其有史以来设计最复杂的产品,将于 2023 年 1 月发布
随机推荐
Can I open an account online? Is it safe?
Development of NFT for digital collection platform
Selection of database paradigm and main code
0 basic C language (3)
Boot指标监测
Serial port application program based on gd32
Mongodb implements creating and deleting databases, creating and deleting tables (sets), and adding, deleting, modifying, and querying data
威胁猎人必备的六个威胁追踪工具
Gd32 USB composite device file descriptor
Daily basic use of alicloud personal image warehouse
Is it safe to open a securities account? Is there any danger
Some cold knowledge about QT database development
MySQL - database creation and management
Tiktok practice ~ sharing module ~ short video download (save to photo album)
MySQL中存储过程的详细详解
GEE:计算image区域内像素最大最小值
Fixed length memory pool
Feitian +cipu body brings more imagination to the metauniverse
动态规划111
30. concatenate substrings of all words