当前位置:网站首页>[reading notes > statistics] 12-01 construction of confidence interval - Introduction to the concept of confidence interval
[reading notes > statistics] 12-01 construction of confidence interval - Introduction to the concept of confidence interval
2022-07-23 08:06:00 【Xiaoming 2766】
Confidence interval construction
introduction
The previous chapter helps us use samples to estimate the population mean 、 The exact value of variance or a certain proportion . But the sample you think must be accurate ( Or unbiased ) Do you ? This chapter , Another way to estimate population statistics —— confidence interval , It has its function .
Mandy candy company uses a containing 100 The point estimator of the mean taste duration obtained from the sample of sugar balls is 62.7 minute , At the same time, the point estimator of the total variance is 25 minute . This is the most reliable estimate of taste duration possible based on the evidence at hand , But if there is a slight difference , So what do we do? ?
We did use the most representative data sample , In order to estimate the main statistics of the population , Mean value 、 variance 、 The proportion , This means that the point estimator of the mean taste duration of the super long effect gum ball is the best estimate we can give .
But there is such 2 A question :
- We rely on results from a single sample to arrive at very accurate estimates . We try our best to make it unbiased , Make it representative . But can it 100% Represents the overall , We are not absolutely sure , The reason is simple —— We use samples .
- If the sample we use is unbiased , Then this estimator is likely to be close to the truth of the whole . The problem is , How close is it “ Close enough ”?
therefore , Instead of giving an exact value as an estimate of the overall mean , It's better to adopt another method . We can specify a certain interval —— Instead of using a very precise length of time , As an estimate of the duration of sugar ball taste . for example , We can say : We expect the taste duration of sugar balls to be 55 to 65 minute , This will still make listeners feel that the duration of sugar ball taste is close 1 Hours , But there is a bigger one Error space . Determining the width of the space depends on how confident you are about the result .
confidence interval
Know the confidence interval
before , We are based on sample data , Using point estimator, the mean duration of sugar ball taste was estimated , Through point estimator , We can give a very accurate estimate of the average duration of sugar ball taste . The following figure shows the distribution of taste duration of sugar ball samples .

that , If we specify an interval for the overall mean , What will happen ? We don't specify an exact value , And specify two values — We expect the duration of sugar ball taste to be between these two values . Let's put the point estimator of the mean at the center of this interval , And set the upper and lower limits of this interval as the point estimator plus or minus some error .

The upper and lower limits of the interval are selected to make “ The overall average is between a and b Between ” This result has a specific probability . for example , You may want to choose a and b, The probability of including the overall mean in this interval is 95%. in other words , The chosen a and b bring :
P ( a < μ < b ) = 0.95 P(a < \mu < b) = 0.95 P(a<μ<b)=0.95
We use it (a,b) I mean this interval , because a and b The exact value of depends on what you want to do with “ This interval contains the overall mean ” The credibility of this result , therefore ,(a,b) go by the name of confidence interval .
that , How do we find the confidence interval of the overall mean ?
There are four steps to solve the confidence interval
- choice Overall statistics ( It refers to the total statistics that you want to use to build confidence intervals )
- Find its sampling distribution ( The previous chapter talked about sampling distribution )
- Determine the confidence level ( The interval you choose contains the probability of this statistic )
- Find the upper and lower confidence limits ( In order to find the upper and lower confidence limits , We need to know the confidence level and sampling distribution )
The first 1 Step : Select the population statistics
The first 1 The first step is to select the statistics for which the confidence interval is to be constructed , It depends on the actual problem to be solved .
In our case , It is necessary to build a confidence interval for the mean duration of gum ball taste , So it needs to be the overall average μ \mu μ Build a confidence interval .
The first 2 Step : Find the sampling distribution of the selected statistic
In order to find the sampling distribution of the population mean , We need to know the sampling distribution of the mean , I need to know X ‾ \overline{X} X And its distribution .( amount to “ The probability of sample mean in the previous chapter ” In turn, , What we know this time is the probability of the sample mean , What is required is the overall mean and variance )
Let's first find the expectation and variance . Review the previous chapter , We know the sampling distribution of the mean ( Concept : Use the mean of all samples from all possible samples to form a distribution ) The expectation and variance of is :
E ( X ‾ ) = μ V a r ( X ‾ ) = σ 2 n E(\overline{X}) = \mu \\ Var(\overline{X}) = \frac{\sigma^2}{n} E(X)=μVar(X)=nσ2
In order to use the above results to find μ \mu μ The confidence interval of , We substitute the value of the population variance σ 2 \sigma^2 σ2 And sample size n. But we don't substitute μ \mu μ The numerical , Because this is because we are calculating the confidence interval for this value .( μ \mu μ Is the overall mean , We are finding the confidence interval for it )
reason ( Maybe you can understand it later ): We are using sampling distribution to find μ \mu μ The confidence interval of , therefore , except μ \mu μ outside , We substitute all the values . Plug in σ 2 \sigma^2 σ2 and n after , Can use X ‾ \overline{X} X Find the confidence interval , We will explain it soon .
But there's a problem —— We don't know σ 2 \sigma^2 σ2 Of Truth value , Estimates must be made based on samples . What do I do ?
-> utilize Point estimator
Although we don't know the total variance σ 2 \sigma^2 σ2 True value of , But it can be estimated with its point estimator . So we substitute σ ^ 2 \hat{\sigma}^2 σ^2( Point estimator of population variance , See the previous chapter for the concept ), Or called s 2 s^2 s2, instead of σ 2 \sigma^2 σ2.( It means to use σ ^ 2 \hat{\sigma}^2 σ^2 Roughly make do with it when σ 2 \sigma^2 σ2)
So the mean and variance of the sampling distribution of the mean are equal to :
E ( X ‾ ) = μ V a r ( X ‾ ) = s 2 n E(\overline{X}) = \mu \\ Var(\overline{X}) = \frac{s^2}{n} E(X)=μVar(X)=ns2
( Again : s 2 s^2 s2 Is the point estimator of variance . We don't know the true value of the total variance , So we use the sample variance to estimate .)
Mandy confectionery company uses contains 100 Calculate the estimated value of a sample of sugar balls , And calculate s 2 = 25 s^2=25 s2=25, therefore :
V a r ( X ‾ ) = s 2 n = 25 / 100 = 0.25 Var(\overline{X})=\frac{s^2}{n} = 25/100 = 0.25 Var(X)=ns2=25/100=0.25
besides , We also need to know clearly X ‾ \overline{X} X The distribution of .

The first 3 Step : Determine the confidence level
The confidence level indicates You want to be interested in “ Confidence intervals contain population statistics ” How sure is this statement . for example , Suppose we want the confidence level of the overall mean to be 95%, This means that the probability that the overall mean is in the confidence interval is 0.95.

Be careful : The higher the confidence level , The wider the range , The greater the probability that the confidence interval contains population statistics .
Choose a reasonable confidence level , It can guarantee a high probability , And make the interval narrow enough . Otherwise, for example : We can say that the average duration of sugar ball taste is 0 to 3 Between days , But you can't know how long the taste of sugar balls actually lasts .
The first 4 Step : Find the upper and lower confidence limits
The last step is to ask a and b— Confidence interval Upper and lower limit , The upper and lower bounds indicate the left and right boundaries of a range — The average is 95% The probability of falling into this range .a and b The exact value of depends on the sampling distribution to be used and the confidence level to be possessed .
For our example , We need to make the mean duration of sugar ball taste have 95% The degree of confidence , namely , μ \mu μ Located in the a and b The probability between must be 0.95. We also know that , X ‾ \overline{X} X In line with the normal distribution , among X ‾ ~ N ( μ , 0.25 ) \overline{X}~N(\mu,0.25) X~N(μ,0.25).

utilize X ‾ \overline{X} X We can find the distribution of a and b Value . namely , We can use X ‾ ∼ N ( μ , 0.25 ) \overline{X} \sim N(\mu, 0.25) X∼N(μ,0.25) Find out a and b, for example P ( X ‾ < a ) = 0.025 P(\overline{X}<a) = 0.025 P(X<a)=0.025 and P ( X ‾ > b ) = 0.025 P(\overline{X} > b) = 0.025 P(X>b)=0.025.
because X ‾ \overline{X} X In line with the normal distribution , So we can Find the confidence interval with normal distribution . The method is similar to the algorithm mentioned above : Calculate the standard score , Query the standard normal distribution probability table , Get the desired results .
1 seek Z
Yes X ‾ \overline{X} X Standardize .
Z = X ‾ − μ 0.25 , among Z ∼ N ( 0 , 1 ) Z = \frac{\overline{X}-\mu}{\sqrt{0.25}}, among Z\sim N(0,1) Z=0.25X−μ, among Z∼N(0,1)
The following is the normalized confidence interval graph :

utilize P ( Z < z a ) = 0.025 P(Z < z_a) = 0.025 P(Z<za)=0.025 and P ( Z > z b ) = 0.0255 P(Z > z_b) = 0.0255 P(Z>zb)=0.0255 We can work out z a , z b z_a, z_b za,zb, They are the upper and lower limits of the standard confidence interval .

2 use μ \mu μ Rewrite inequality
Only this and nothing more , We get P ( − 1.96 < Z < 1.96 ) = 0.95 P(-1.96<Z<1.96) = 0.95 P(−1.96<Z<1.96)=0.95, namely :
P ( − 1.96 < X ‾ − μ 0.5 < 1.96 ) = 0.95 P(-1.96 < \frac{\overline{X}-\mu}{0.5} < 1.96) = 0.95 P(−1.96<0.5X−μ<1.96)=0.95
use μ \mu μ Rewrite inequality , You can get μ \mu μ The confidence interval of .
− 1.96 < X ‾ − μ 0.5 < 1.96 − 0.98 < X ‾ − μ < 0.98 X ‾ − 0.98 < μ < X ‾ + 0.98 -1.96 < \frac{\overline{X}-\mu}{0.5} < 1.96 \\ -0.98 < \overline{X}-\mu < 0.98 \\ \overline{X} - 0.98 < \mu < \overline{X} + 0.98 −1.96<0.5X−μ<1.96−0.98<X−μ<0.98X−0.98<μ<X+0.98
3 Finally, ask for X ‾ \overline{X} X The numerical
Write the inequality , We are very close to the value describing the typical taste duration of sugar balls —— μ \mu μ The confidence interval of . namely , We use :
P ( X ‾ − 0.98 < μ < X ‾ + 0.98 ) = 0.95 P(\overline{X}-0.98 < \mu < \overline{X}+0.98) = 0.95 P(X−0.98<μ<X+0.98)=0.95
Here is the sketch :

Then just ask for the money X ‾ \overline{X} X, We can get the upper and lower confidence limits .
X ‾ \overline{X} X refer to The distribution of the sample mean , So we can use samples from Mandy candy company x ‾ \overline{x} x value ( The term is : Sample mean ).
Tips : There is no substitute , See the following for specific reasons “ ask 2”. V a r ( X ‾ ) = σ 2 n Var(\overline{X}) = \frac{\sigma^2}{n} Var(X)=nσ2 Medium σ 2 \sigma^2 σ2 Because it is the total variance , Using the total variance point estimator s 2 s^2 s2 replace .

In this way, the confidence interval is obtained . In the interval (61.72,63.68) The probability of including the overall mean duration of sugar ball taste in 95%.
Use confidence intervals instead of point estimators , An accurate and accurate estimation of the taste duration of sugar balls is given , There is no need to mention precise figures —— Even if the sample has errors, there is still room for maneuver .
Step summary
Let's review the previous steps of confidence interval construction .
First Select the population statistics used to construct the confidence interval . We need to find the confidence interval of the mean duration of sugar ball taste , So we need to build the confidence interval of the mountain .
After determining the total statistics used to construct the confidence interval , next Find its sampling distribution . We obtain the expectation and variance of the sampling distribution of the mean , Substitute in Division M The value of each statistic other than , So we found that we can use the normal distribution of the text .
And then , We determined the confidence level used to construct the confidence interval ——95%.
In the end, it must Find the upper and lower confidence limits of the confidence interval . We use the confidence level and sampling distribution to get the appropriate confidence interval .
The same steps are used repeatedly to build confidence intervals , Therefore, some simplification can be made , It depends on the required confidence level and the distribution of test statistics . As follows , Just look at the overall estimate required 、 Overall distribution and various conditions , Then substitute the population statistics or its estimators , That's it . The number c Depends on the confidence level .

The example above should be No 3 In this case .
Example ( Just substitute )

ask : Ask before X ‾ \overline{X} X Of expectations and variances , Why substitute σ 2 \sigma^2 σ2 Point estimator of , But do not substitute μ \mu μ Point estimator of ?
answer : Because what we need is μ \mu μ The confidence interval of , So there's no need to x ‾ \overline{x} x Instead of μ \mu μ. We need to find the content μ \mu μ The expression of , In order to find the confidence interval .
ask : Why x ‾ \overline{x} x As X ‾ \overline{X} X Value ?
answer : X ‾ \overline{X} X The distribution of is the sampling distribution of the mean . This is how it came : Take each size from the total as n Possible samples of , Then use the mean value of all samples to form a sampling distribution .
x ‾ \overline{x} x Is the specific mean value from the sample , So we use it to find the confidence interval .
ask : What is the difference between confidence interval and confidence level ?
answer : The confidence level is “ The statistics are in the confidence interval ” Probability , Usually a percentage , for example 95%. The confidence interval gives the interval itself —— The upper and lower limits of the actual range of numbers .
ask : We have obtained μ \mu μ Of 95% The confidence interval is (61.72, 63.68), What exactly does this mean ?
answer : It means : If you plan to take multiple samples of the same size , Then build confidence intervals for all these samples , Then there are 95% Will contain the true value of the overall mean . From this you know , The confidence interval constructed by this method is 95% In all cases, it will include the overall mean .
ask : Are all confidence intervals based on normal distribution ?
answer : Is not the case, . We will then talk about intervals based on other distributions .
ask : Since it's just a matter of substituting numerical values into a simple algorithm , Why are there so many steps ?
answer : These steps are to let you see the essence of the problem , Understand the construction process of confidence interval . Most of the time , You just need to insert the value .
ask : Is it necessary to make continuity correction when using confidence intervals ?
answer : Theoretically, yes , However, it is often ignored in practice , That is to say, it is only necessary to calculate the confidence interval by substituting the numerical value in the simple algorithm .
边栏推荐
- VMware virtual machine changes static IP and reports an error unit network Service entered failed state solution
- Mysql A left(right) join B on A.id=B.id and A.age=1与A left(right) join B on A.id=B.id where A.age=1
- 多商户系统的直播功能用过吗?用过的朋友扣个 666!
- pip更新一个package
- Leetcode day 26
- C语音实现tcp客户端和tcp服务端,Qt调用测试
- scala 主构造器_Scala主构造器深度
- HCIP --- BGP综合实验
- yolov5 test. Py broken pipe error: [errno 32] broken pipe problem solving
- Flick batch reading es
猜你喜欢

实验三 LZW

Experiment 2 YUV

Qt+vtk+pcl pictures are converted to grayscale images and displayed with grayscale as the Y axis

算法面试高频题解指南【一】

初出茅庐的小李第111篇博客之中英文点阵字符显示原理

Qt+VTK+PCL图片转灰度图且以灰度为Y轴显示

Scala generic generic class details - t

Experiment 7 H.264 file analysis

【读书笔记->统计学】12-01 置信区间的构建-置信区间概念简介

1.10 API 和字符串
随机推荐
主控芯片CSU18M92开发智能体重秤方案
Three effective strategies for driving page performance optimization
VMware虚拟机更改静态IP报错Unit network.service entered failed state解决方案
【读书笔记->统计学】12-01 置信区间的构建-置信区间概念简介
MPLS VPN 跨域-optionB
机器学习笔记 - 基于深度学习(HomographyNet)的图像单应性估计
初出茅庐的小李第111篇博客之中英文点阵字符显示原理
数据脱敏的场景与价值【总结】
Solution to the second game of 2022 Hangzhou Electric Multi school league
多商户系统的直播功能用过吗?用过的朋友扣个 666!
VMware虚拟机更改静态IP和主机名,使用Xshell进行连接
Yolov5 post-processing code of cpu/gpu (CUDA) version
Three things programmers want to do most | comics
Qt文档阅读笔记-QAudioInput&QAudioFormat解析与实例
H7-TOOL串口脱机烧录操作说明,支持TTL串口,RS232和RS485(2022-06-30)
Leetcode day 26
matlab simulink 水能和同步电机发电
读书笔记->统计学】12-02 置信区间的构建-t分布概念简介
二叉树(学习日常)
实验三 LZW