Difference between revisions of "SMHS CIs"
(→Method of Moments (MOM) Estimation) |
m (→Maximum likelihood estimation (MLE)) |
||
(8 intermediate revisions by the same user not shown) | |||
Line 18: | Line 18: | ||
# Set the sample moments equal to the population moments and solve for a (linear or non-linear) system of $k$ equations with $k$ unknowns. | # Set the sample moments equal to the population moments and solve for a (linear or non-linear) system of $k$ equations with $k$ unknowns. | ||
− | * MOM proportion example: Consider the example of observing the clinical outcomes (survival or death) of 10 patients undergoing an experimental treatment for terminally ill cancer patients. We can record the outcomes as $S$ (1 month survival) and $D$ (death within 1 month) and we need to compute (infer) the true probability of $S$ | + | * MOM proportion example: Consider the example of observing the clinical outcomes (survival or death) of 10 patients undergoing an experimental treatment for terminally ill cancer patients. We can record the outcomes as $S$ (1 month survival) and $D$ (death within 1 month) and we need to compute (infer) the true probability of $S$, i.e., $p=P(S)$. Suppose we observe the outcomes $\{S,D,D,D,D,S,S,D,S,D\}$. In the MOM framework, this is a [[SMHS_ProbabilityDistributions#Binomial_distribution |Binomial experiment]] and the expected number of patients surviving 1 month would be $E[X]=np$, where $X=\sum_{i=1}^{10}{\{X_1+X_2+...+X_{10}\}}$, where |
$$X_i=\begin{cases} | $$X_i=\begin{cases} | ||
1, & \text{if } i^{th} \text{ subject is alive after 1 month of treatment} \\ | 1, & \text{if } i^{th} \text{ subject is alive after 1 month of treatment} \\ | ||
Line 35: | Line 35: | ||
The beta distribution mean and variance are defined explicitly in terms of two parameters. | The beta distribution mean and variance are defined explicitly in terms of two parameters. | ||
* Mean: $\mu=\frac{\alpha}{\alpha+\beta}$, | * Mean: $\mu=\frac{\alpha}{\alpha+\beta}$, | ||
− | * Variance: $ | + | * Variance: $\sigma^2=\frac{\alpha \beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}$. |
The sample mean and sample variance are $\bar{x} = 0.251$, and $s^2=0.6187$. Solve for $\alpha$ and $\beta$. | The sample mean and sample variance are $\bar{x} = 0.251$, and $s^2=0.6187$. Solve for $\alpha$ and $\beta$. | ||
Line 47: | Line 47: | ||
The MLE of $\theta$ is the value of $\theta$ that maximizes $L(\theta)$: $\arg\max_\theta{L(\theta)}.$ | The MLE of $\theta$ is the value of $\theta$ that maximizes $L(\theta)$: $\arg\max_\theta{L(\theta)}.$ | ||
− | It is typically assumed that the observed data are independent and identically distributed (iid) with unknown parameter $\theta$. The likelihood can be written as a product of $n$ univariate probability densities: $L(\theta)=\prod_{i=1}^n {f_\theta (x_i | | + | It is typically assumed that the observed data are independent and identically distributed (iid) with unknown parameter $\theta$. The likelihood can be written as a product of $n$ univariate probability densities: $L(\theta)=\prod_{i=1}^n {f_\theta (x_i |\theta)}$. Because maxima are unaffected by monotone transformations, one can take the logarithm of this expression and turn it into a sum: $L^* (θ)=\sum_{i=1}^n {\ln{f_θ (x_i |θ)}}$. The maximum of this expression can then be found numerically using various optimization algorithms. |
* Note: The MLE may not be unique and is not guaranteed to exist. | * Note: The MLE may not be unique and is not guaranteed to exist. | ||
− | * Example: Consider the coin flipping example above in which we observed the number of | + | * Example: Consider the coin flipping example above in which we observed the number of heads, and use this to infer the true probability of p(Head). |
: Likelihood function: $L(\theta)=f(x│\theta=p)={10 \choose 4} p^4 (1-p)^6$ | : Likelihood function: $L(\theta)=f(x│\theta=p)={10 \choose 4} p^4 (1-p)^6$ | ||
: Log-likelihood function: $L^* (\theta)=\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p)}$. | : Log-likelihood function: $L^* (\theta)=\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p)}$. | ||
: Maximize the log-likelihood function by setting its first derivative to zero: | : Maximize the log-likelihood function by setting its first derivative to zero: | ||
$$ 0=\frac{d(\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p))}}{dp} =4/p-6/(1-p), p=2/5.$$ | $$ 0=\frac{d(\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p))}}{dp} =4/p-6/(1-p), p=2/5.$$ | ||
+ | |||
+ | ====Additional MoM and MLE Examples==== | ||
+ | See [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE#Method_of_Moments_.28MOM.29_Estimation|this section for additional MoM and MLE examples]]. | ||
====MOM vs. MLE==== | ====MOM vs. MLE==== | ||
+ | * [[AP_Statistics_Curriculum_2007_Estim_MOM_MLE| See some additional MoM and MLE Examples here]]. | ||
* The MOM is inferior to Fisher’s MLE method because MLE has a higher probability of being close to the quantities to be estimated. | * The MOM is inferior to Fisher’s MLE method because MLE has a higher probability of being close to the quantities to be estimated. | ||
* MLE may be intractable in some situations, whereas the MOM estimates can be quickly and easily calculated by hand or using a computer. | * MLE may be intractable in some situations, whereas the MOM estimates can be quickly and easily calculated by hand or using a computer. | ||
Line 64: | Line 68: | ||
* MOM estimates are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample. | * MOM estimates are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample. | ||
* MOM may be preferred to MLE for estimating some structural parameters when appropriate probability distributions are unknown. | * MOM may be preferred to MLE for estimating some structural parameters when appropriate probability distributions are unknown. | ||
+ | |||
+ | ===Expectation Maximization and Mixture Modeling=== | ||
+ | [[SOCR_EduMaterials_Activities_2D_PointSegmentation_EM_Mixture| See the SOCR 1D, 2D, 3D EM Activity]] and this [http://repositories.cdlib.org/socr/EM_MM EM and Mixture Modeling Guide]. | ||
===Student’s T Distribution=== | ===Student’s T Distribution=== | ||
Line 174: | Line 181: | ||
* Example: Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (<math>\alpha=0.05</math>) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2} = z_{0.025}=1.96</math>]]: | * Example: Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (<math>\alpha=0.05</math>) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that [[AP_Statistics_Curriculum_2007_Normal_Critical | <math>z_{\alpha \over 2} = z_{0.025}=1.96</math>]]: | ||
− | :: <math>\hat{p} = {17\over 500}=0.034</math> | + | :: <math>\hat{p} = {17\over 500}=0.034;</math> <math>\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038</math> |
− | :: <math>SE_{\hat{p}}= \sqrt{0.034(1-0.034)\over 500}=0.0036</math> | + | :: <math>SE_{\hat{p}}= \sqrt{0.034(1-0.034)\over 500}=0.0036;</math> <math>SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085</math> |
::And the corresponding confidence intervals are given by | ::And the corresponding confidence intervals are given by | ||
:: <math>\hat{p}\pm 1.96 SE_{\hat{p}}=[0.026944, 0.041056]</math> | :: <math>\hat{p}\pm 1.96 SE_{\hat{p}}=[0.026944, 0.041056]</math> | ||
Line 232: | Line 239: | ||
: <math>0.0564=\sqrt{15\times 0.0053 \over 24.9958} \leq \sigma \leq \sqrt{15\times 0.0053 \over 7.261}=0.10464</math> | : <math>0.0564=\sqrt{15\times 0.0053 \over 24.9958} \leq \sigma \leq \sqrt{15\times 0.0053 \over 7.261}=0.10464</math> | ||
− | + | * [[AP_Statistics_Curriculum_2007_Estim_Var#More_Examples|See more examples here]]. | |
===Applications=== | ===Applications=== |
Latest revision as of 08:01, 10 November 2014
Contents
Scientific Methods for Health Sciences - Point and Interval Estimation: MoM and MLE
Overview
The estimation of population parameters is critical in many applications. In statistics, estimation uses a combination of effect sizes, confidence intervals, and meta-analysis to plan experiments, analyze data and interpret results. It is most frequently carried out in terms of point-estimates or interval estimates for population parameters of interest. This lesson aims to study the various methodologies commonly used in point and interval estimates like Method of Moments (MOM) and Maximum Likelihood Estimation (MLE). We are interested in methods of estimating population parameters based on a sample distribution. We illustrate point and interval estimations of population means, proportions, and variances using methods introduced in this class.
Motivation
Suppose we want to estimate the probability of a head occurring when we flip a specific coin by repeating the experiment several times. How much confidence do we have in our estimation? There are a number of other similar situations where we need to evaluate, predict or estimate a population parameter of interest using an observed data sample. The method of moments (MOM) and maximum likelihood estimation (MLE) are among the most commonly used methods to estimate various population parameters.
In point and interval estimation, not only do we need to consider the distribution and model on which the estimates are based, we also need to make assumptions in terms of the population distribution. Additionally, the estimates of parameters are influenced by other population parameters. For example, the estimate of the mean of the population is influenced by parameters like variance and sample size.
A confidence interval is an interval that contains the true value of a parameter of interest for $(1-α)100\%$ of samples taken. It is called a $(1-\alpha)100\%$ confidence interval for that parameter, and the ends of the CI are called confidence limits.
Theory
Method of Moments (MOM) Estimation
This method uses the sample data to calculate some sample moments and then sets these equal to their corresponding population counterparts. Steps:
- Determine the k parameters of interest and the specific distribution for this process;
- Compute the first $k$ (or more) sample moments;
- Set the sample moments equal to the population moments and solve for a (linear or non-linear) system of $k$ equations with $k$ unknowns.
- MOM proportion example: Consider the example of observing the clinical outcomes (survival or death) of 10 patients undergoing an experimental treatment for terminally ill cancer patients. We can record the outcomes as $S$ (1 month survival) and $D$ (death within 1 month) and we need to compute (infer) the true probability of $S$, i.e., $p=P(S)$. Suppose we observe the outcomes $\{S,D,D,D,D,S,S,D,S,D\}$. In the MOM framework, this is a Binomial experiment and the expected number of patients surviving 1 month would be $E[X]=np$, where $X=\sum_{i=1}^{10}{\{X_1+X_2+...+X_{10}\}}$, where
$$X_i=\begin{cases} 1, & \text{if } i^{th} \text{ subject is alive after 1 month of treatment} \\ 0, & \text{if } i^{th} \text{ subject is dead (did not survive) after 1 month of treatment} \end{cases}.$$ Hence, $np=4$ (4 patients did survive for a month), and $MOM(p)=p = \frac{4}{10}$.
- MOM Beta distribution example: Suppose we have 10 observations we suspect came from a Beta distribution.
Data | 0.055 | 1.005 | 0.075 | 0.005 | 0.075 | 1.005 | 0.005 | 0.035 | 0.225 |
---|
The beta distribution mean and variance are defined explicitly in terms of two parameters.
- Mean: $\mu=\frac{\alpha}{\alpha+\beta}$,
- Variance: $\sigma^2=\frac{\alpha \beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}$.
The sample mean and sample variance are $\bar{x} = 0.251$, and $s^2=0.6187$. Solve for $\alpha$ and $\beta$.
Maximum likelihood estimation (MLE)
Modeling the parameters of a distribution using MLE based on observed real world data offers a way to tune the free parameters of a model to provide an optimum fit.
Suppose we observe a sample, $x_1,x_2,…,x_n$, of $n$ values from one distribution with probability density or mass function $f_\theta$, and we are trying to estimate the parameter $\theta$. We can compute the (multivariate) probability density associated with our observed data $f_\theta (x_1,x_2,…,x_n│\theta)$ as a function of $\theta$ with $x_1,x_2,…,x_n$ fixed. The likelihood function is $$L(\theta)=f_\theta (x_1,x_2,…,x_n│\theta).$$
The MLE of $\theta$ is the value of $\theta$ that maximizes $L(\theta)$: $\arg\max_\theta{L(\theta)}.$
It is typically assumed that the observed data are independent and identically distributed (iid) with unknown parameter $\theta$. The likelihood can be written as a product of $n$ univariate probability densities: $L(\theta)=\prod_{i=1}^n {f_\theta (x_i |\theta)}$. Because maxima are unaffected by monotone transformations, one can take the logarithm of this expression and turn it into a sum: $L^* (θ)=\sum_{i=1}^n {\ln{f_θ (x_i |θ)}}$. The maximum of this expression can then be found numerically using various optimization algorithms.
- Note: The MLE may not be unique and is not guaranteed to exist.
- Example: Consider the coin flipping example above in which we observed the number of heads, and use this to infer the true probability of p(Head).
- Likelihood function: $L(\theta)=f(x│\theta=p)={10 \choose 4} p^4 (1-p)^6$
- Log-likelihood function: $L^* (\theta)=\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p)}$.
- Maximize the log-likelihood function by setting its first derivative to zero:
$$ 0=\frac{d(\ln{10 \choose 4} + 4\ln{p} + 6\ln{(1-p))}}{dp} =4/p-6/(1-p), p=2/5.$$
Additional MoM and MLE Examples
See this section for additional MoM and MLE examples.
MOM vs. MLE
- See some additional MoM and MLE Examples here.
- The MOM is inferior to Fisher’s MLE method because MLE has a higher probability of being close to the quantities to be estimated.
- MLE may be intractable in some situations, whereas the MOM estimates can be quickly and easily calculated by hand or using a computer.
- MOM estimates may be used as first approximations to the solutions of the MLE method, and successive improved approximations may then be found by the Newton-Raphson method. In this respect, the MOM and MLE are symbiotic.
- Sometimes, MOM estimates may be outside of the parameter space, i.e., they are unreliable, which is never a problem with ML methods.
- MOM estimates are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample.
- MOM may be preferred to MLE for estimating some structural parameters when appropriate probability distributions are unknown.
Expectation Maximization and Mixture Modeling
See the SOCR 1D, 2D, 3D EM Activity and this EM and Mixture Modeling Guide.
Student’s T Distribution
The distribution needed to estimate the mean of a normally distributed population when the sample size is small and the population variance is unknown. It is the basis of the popular Student’s t-tests for the statistical significance of the difference between two sample means, and for confidence intervals for the difference between two population means.
Suppose $X_1,X_2,…,X_n$ are independent random variables that are normally distributed with expected value $μ$ and variance $σ^2$. Sample mean: $\bar{x}_n = \frac{1}{n} \sum_{i=1}^n{x_i}$. Sample variance: $S_n^2=\frac{1}{n} \sum_{i=1}^n{(x_i-\bar{x})^2}$, $Z=\frac{\bar{x}_n-\mu}{\frac{\sigma}{\sqrt{n}}}$ is normally distributed with mean 0 and variance 1, since the sample mean ($\bar{x}_n$) is normally distributed with mean μ and standard deviation $\frac{\sigma}{\sqrt{n}}$. $$Z=\frac{\bar{x}_n-\mu}{\frac{\sigma}{\sqrt{n}}}$$ $$T=\frac{\bar{x}_n-\mu}{\frac{S_n}{\sqrt{n}}}$$
T replaces $\sigma$ with with sample standard deviation. Also, $(n-1) \frac{S_n^2}{\sigma^2}$ has a Chi-square distribution $\chi_{n-1}^2$ with degree of freedom equal to $n-1$.
- Example: suppose a research involves 25 patients and relative measurements are recorded:
Variable | N | N* | Mean | SE of Mean | StDev | Minimum | Q1 | Median | Q3 | Maximum |
CD4 | 25 | 0 | 321.4 | 14.8 | 73.8 | 208.0 | 261.5 | 325.0 | 394.0 | 449.0 |
What do we know from the background information?
- $\bar{y}= 321.4$
- $s = 73.8$
- $SE = 14.8$
- $n = 25$
- $CI(\alpha)=CI(0.05)$: $\bar{y} \pm t_{\alpha\over 2} {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{\frac{(x_i-\bar{x})^2}{n-1}}}.$
- $321.4 \pm t_{(24, 0.025)}{73.8\over \sqrt{25}}$
- $321.4 \pm 2.064\times 14.8$
- $[290.85, 351.95]$
Estimating a population mean with large samples
We use the following protocol to find point and interval estimates when the sample sizes are large, say exceeding 100.
- Assumptions: The Central Limit Theorem guarantees that for large samples, the method above provides a valid recipe for constructing a confidence interval for the population mean, no matter what the distribution for the observed data may be. Of course, for significantly non-Normal distributions, we may need to increase the sample size to guarantee that the sampling distribution of the mean is approximately Normal.
- Point estimation of population mean: $\bar{X_n}={1\over n}\sum_{i=1}^n{X_i}$, constructed from a random sample of the process {$X_1, X_2, X_3, \cdots , X_n$}, which is an unbiased estimate of the population mean $\mu$, if it exists! Note that the sample average may be susceptible to outliers.
- Interval estimation of a population mean: Choose a confidence level \((1-\alpha)100%\), where \(\alpha\) is small (e.g., 0.1, 0.05, 0.025, 0.01, 0.001, etc.). Then a \((1-\alpha)100%\) confidence interval for \(\mu\) will be
\[CI(\alpha): \overline{x} \pm z_{\alpha\over 2} E,\]
- The Error term, E, is defined as
- \[E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\
{SE},& \texttt{for-unknown}-\sigma.\end{cases}\]
- The Standard Error of the estimated \(\overline {x}\) is obtained by replacing the unknown population standard deviation by the sample standard deviation\[SE(\overline {x}) = {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}\]
- \(z_{\alpha\over 2}\) is the Critical Value for a Standard Normal distribution at \({\alpha\over 2}\).
- Example: a random sample of the number of sentences found in 30 magazine advertisements is listed below. Use this sample to find point estimate for the population mean μ. Samples: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, 12. Suppose the point estimate is 12.25.
A confidence interval estimate of μ is a range of values used to estimate a population parameter.
- Known variance: Suppose that we know the variance for the number of sentences per advertisement example above is known to be 256 (so the population standard deviation is \(\sigma=16\)).
- For \({\alpha \over 2}=0.1\), the \(80% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.05\), the \(90% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.005\), the \(99% CI(\mu)\) is constructed by:
- Notice the increase of the CI's (directly related to the decrease of \(\alpha\)) reflecting our choice for higher confidence.
- Unknown variance: use the sample variance 273 as an estimate (so the sample standard deviation is \(s=\hat{\sigma}=16.54\)).
- For \({\alpha \over 2}\), the \(80% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.05\), the \(90% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.005\), the \(99% CI(\mu)\) is constructed by:
- Notice the increase of the CI's (directly related to the decrease of \(\alpha\)) reflecting our choice for higher confidence.
- You can use the SOCR CI Analysis Applet to compute these interval estimates.
Estimating a population mean with small samples (say <30 observations)
For small samples, the point estimates are less precise and the interval estimates produce wider intervals, compared to the case of large samples.
- Assumptions: need evidence that the data we observed and used for point and interval estimates come from a distribution, which is (approximately) normal. If this assumption is violated than the interval estimate we are going to introduce may be significantly misrepresenting the real confidence interval.
- Point estimation of population mean: Choose a confidence level \((1-\alpha)100%\), where \(\alpha\) is small (e.g., 0.1, 0.05, 0.025, 0.01, 0.001, etc.). Then a \((1-\alpha)100%\) confidence interval for \(\mu\) is defined in terms of the T-distribution:
- \[CI(\alpha): \overline{x} \pm t_{\alpha\over 2} E.\]
- The Error term, E, is defined as \(E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\ {SE},& \texttt{for-unknown}-\sigma.\end{cases}\)
- The Standard Error of the estimate \(\overline {x}\) is obtained by replacing the unknown population standard deviation by the sample standard deviation\[SE(\overline {x}) = {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}\]
- $t_{\alpha\over 2}$ is the Critical Value for the T(df=sample-size -1) distribution at \({\alpha\over 2}\).
- Example: a random sample of the number of sentences found in 10 magazine advertisements is listed below. Use this sample to find point estimate for the population mean μ. Samples: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12. Suppose the point estimate is 22.1.
- Known variance: Suppose that we know the variance for the number of sentences per advertisement example above is known to be 256 (so the population standard deviation is \(\sigma=16\)).
- For \({\alpha \over 2}=0.1\), the \(80% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.05\), the \(90% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.005\), the \(99% CI(\mu)\) is constructed by:
- Notice the increase of the CI's (directly related to the decrease of \(\alpha\)) reflecting our choice for higher confidence.
- Unknown variance: Suppose that we do not know the variance for the number of sentences per advertisement but use the sample variance 737.88 as an estimate (so the sample standard deviation is \(s=\hat{\sigma}=27.16390579\)).
- For \({\alpha \over 2}=0.1\), the \(80% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.05\), the \(90% CI(\mu)\) is constructed by:
- For \({\alpha \over 2}=0.005\), the \(99% CI(\mu)\) is constructed by:
- Notice the increase of the CI's (directly related to the decrease of \(\alpha\)) reflecting our choice for higher confidence.
Estimating a population proportion
When the sample size is large, the sampling distribution of the sample proportion \(\hat{p}\) is approximately Normal, by CLT, as the sample proportion may be presented as a sample average or Bernoulli random variables. When the sample size is small, the normal approximation may be inadequate. To accommodate this, we will modify the sample-proportion \(\hat{p}\) slightly and obtain the corrected-sample-proportion \(\tilde{p}\): \[\hat{p}={y\over n} \longrightarrow \tilde{p}={y+0.5z_{\alpha \over 2}^2 \over n+z_{\alpha \over 2}^2},\] where \(z_{\alpha \over 2}\) is the normal critical value we saw earlier.
The standard error of \(\hat{p}\) also needs a slight modification \[SE_{\hat{p}} = \sqrt{\hat{p}(1-\hat{p})\over n} \longrightarrow SE_{\tilde{p}} = \sqrt{\tilde{p}(1-\tilde{p})\over n+z_{\alpha \over 2}^2}.\]
- Example: Suppose a researcher is interested in studying the effect of aspirin in reducing heart attacks. He randomly recruits 500 subjects with evidence of early heart disease and has them take one aspirin daily for two years. At the end of the two years, he finds that during the study only 17 subjects had a heart attack. Calculate a 95% (\(\alpha=0.05\)) confidence interval for the true (unknown) proportion of subjects with early heart disease that have a heart attack while taking aspirin daily. Note that \(z_{\alpha \over 2} = z_{0.025}=1.96\):
- \[\hat{p} = {17\over 500}=0.034;\] \(\tilde{p} = {17+0.5z_{0.025}^2\over 500+z_{0.025}^2}== {17+1.92\over 500+3.84}=0.038\)
- \[SE_{\hat{p}}= \sqrt{0.034(1-0.034)\over 500}=0.0036;\] \(SE_{\tilde{p}}= \sqrt{0.038(1-0.038)\over 500+3.84}=0.0085\)
- And the corresponding confidence intervals are given by
- \[\hat{p}\pm 1.96 SE_{\hat{p}}=[0.026944, 0.041056]\]
- \[\tilde{p}\pm 1.96 SE_{\tilde{p}}=[0.0213, 0.0547]\]
Estimating population variance
The most unbiased point estimate for the population variance \(\sigma^2\) is the Sample-Variance (s2) and the point estimate for the population standard deviation \(\sigma\) is the Sample Standard Deviation (s).
We use a Chi-Square Distribution to construct confidence intervals for the variance and standard distribution. If the process or phenomenon we study generates a Normal random variable, then computing the following random variable (for a sample of size \(n>1\)) has a Chi-Square Distribution \[\chi_o^2 = {(n-1)s^2 \over \sigma^2}\]
- Chi-Square Distribution Properties
- All chi-squares values \(\chi_o^2 \geq 0\).
- The chi-square distribution is a family of curves, each is determined by the degrees of freedom (n-1). See the interactive Chi-Square distribution.
- To form a confidence interval for the variance (\(\sigma^2\)), use the \(\chi^2(df=n-1)\) distribution with degrees of freedom equal to one less than the sample size.
- The area under each curve of the Chi-Square Distribution equals one.
- All Chi-Square Distributions are positively skewed.
- Interval Estimates of Population Variance and Standard Deviation:
- Notice that the Chi-Square Distribution is not symmetric (positively skewed) and therefore, there are two critical values for each level of confidence. The value \(\chi_L^2\) represents the left-tail critical value and \(\chi_R^2\) represents the right-tail critical value. For various degrees of freedom and areas, you can compute all critical values either using the SOCR Distributions or using the SOCR Chi-square Distribution Calculator.
- Example: Find the critical values, \(\chi_L^2\) and \(\chi_R^2\), for a 90% confidence interval when the sample size is 25. Use the following Protocol:
- Identify the degrees of freedom (\(df=n-1=24\)) and the level of confidence (\({\alpha\over 2}=0.05\)).
- Find the left and right critical values, \(\chi_L^2=13.848\) and \(\chi_R^2=36.415\), as in the image below.
- Confidence Interval for \(\sigma^2\)
- \[{(n-1)s^2 \over \chi_R^2} \leq \sigma^2 \leq {(n-1)s^2 \over \chi_L^2}\]
- Confidence Interval for \(\sigma\)
- \[\sqrt{(n-1)s^2 \over \chi_R^2} \leq \sigma \leq \sqrt{(n-1)s^2 \over \chi_L^2}\]
Hands-on Activity
Construct the confidence intervals for \(\sigma^2\) and \(\sigma\) assuming the observations below represent a random sample from the liquid content (in fluid ounces) of 16 beverage cans and can be considered as Normally distributed. Use a 90% level of confidence.
14.816 | 14.863 | 14.814 | 14.998 | 14.965 | 14.824 | 14.884 | 14.838 | 14.916 | 15.021 | 14.874 | 14.856 | 14.860 | 14.772 | 14.980 | 14.919 |
- Get the sample statistics from SOCR Charts (e.g., Index Plot); Sample-Mean=14.8875; Sample-SD=0.072700298, Sample-Var=0.005285333.
- Identify the degrees of freedom (\(df=n-1=15\)) and the level of confidence (\({\alpha/2}=0.05\)), as we are looking for a \((1-\alpha)100% CI(\sigma^2)\).
- Find the left and right critical values, \(\chi_L^2=7.261\) and \(\chi_R^2=24.9958\) using SOCR Chi-Square Distribution, as in the image below.
- CI(\(\sigma^2\))
\[0.00318={15\times 0.0053 \over 24.9958} \leq \sigma^2 \leq {15\times 0.0053 \over 7.261}=0.01095\]
- CI(\(\sigma\))
\[0.0564=\sqrt{15\times 0.0053 \over 24.9958} \leq \sigma \leq \sqrt{15\times 0.0053 \over 7.261}=0.10464\]
Applications
- This article titled Reliability of Scales with General Structure: Point and Interval Estimation Using a Structural Equation discussed a method of obtaining point and interval estimates of reliability for composites of measures with a general structure. The approach is based on fitting a correspondingly constrained structural equation model and generalizes earlier covariance structure analysis methods for scale reliability estimation with congeneric tests. The procedure can be sued with weighted or unweighted composites, in which the weights need not be known in advance but may be estimated simultaneously. The method presented in this paper allows one to obtain an approximate standard error and confidence interval for scale reliability using bootstrap.
- This activity shows normal and beta distribution model fit. It describes the process of SOCR model fitting in the case of using Normal or Beta distribution models. The article aims to motivate the need for analytical modeling of natural processes and illustrated how to use SOCR modeler to fit models to real data ad presented applications of model fitting. It provides specific examples illustrating model fitting and two exercises to practice and learn.
- This experiment shows SOCR activity on general confidence interval and demonstrates the usage and functionality of SOCR general confidence interval applet. It demonstrates the theory behind the use of interval-based estimates of the parameters, illustrates various confidence intervals construction recipes, draws parallels between the construction algorithms and intuitive meaning of confidence intervals and presents a new technology enhanced approach for understanding and utilizing confidence intervals for various applications. The article presents specific example and exercises in this topic and works as a good supplement to point and interval estimates.
Software
Problems
- Tom is in charge of sampling sugar measurements from a very large population of sugar. Lately her standard errors have been alarmingly high for her sample means. If she wants to decrease her sampling error (standard deviation of her sample means) by 1/2 what should she do?
- (a) Quadruple the variation inherent in the population.
- (b) Triple her sample size.
- (c) Quadruple her sample size.
- (d) Halve her sample size.
- The average standardized math score for eighth graders in the state of Michigan is 70 and the standard deviation is 10. We want to find out if the average standardized math score in district A is higher than the average score for the state of Michigan. The mean for a random sample of 36 students from this district is 72. What is the best response?
- (a) The p-value is around 0.76 and it is concluded that the average standardized math score in this district is not different from the overall population mean.
- (b) The p-value is around 0.12 and it is concluded that the average standardized math score in this district is not higher than the overall population mean.
- (c) The p-value is around 0.24 and it is concluded that the average standardized math score in this district is not higher than the overall population mean.
- (d) The p-value is around 0.88 and it is concluded that the average standardized math score in this district is not higher than the overall population mean.
- A random sample of 121 students from the UMich was selected to estimate the average ACT score of all UMich students. The average for the sample was 23.4 and the sample standard deviation was 3.65. If you wanted to calculate a more precise and accurate prediction of the average ACT score of UMich students, which one of the following would be the best thing to do?
- (a) Decrease the sample size to 91.
- (b) Increase the sample size to 151.
- (c) Increase the confidence level to 99%.
- (d) Decrease the confidence level to 90%.
- How does the shape, center, and spread of t-models change as its degrees of freedom increases?
- (a) The shape and center stays the same, but the spread becomes narrower.
- (b) The shape and center stays the same, but the spread becomes wider.
- (c) The shape and spread stays the same, but the center will increase.
- (d) The shape and spread stays the same, but the center will decrease.
- Estimate the critical value of t for a 95% confidence interval with df = 15
- (a) 1.71
- (b) 2.131
- (c) 1.17
- (d) 3.45
- True or False: In a well-designed sample survey like the Current Population Survey, the observed sample percentage (e.g., percentage unemployed) is equal to the population percentage. Thus, it is appropriate to just report the sample percentage, without any measure of accuracy (i.e. without the margin of error).
- (a) True
- (b) False
- Suppose an NPR news story reports that: "A polling agency reports that the percentage of the American public who agree we should spend more money on the mental health of the war veterans is 42% +/- 3%."
- (a) The probability that the American public agree that we should spend more money on the mental health of the war veterans is between 39% to 42%.
- (b) The percentage of the American public who agree that we should spend more money on the mental health of the war veterans is between 39% to 45%.
- (c) We are 95% confident that the percentage of the American public who agree that we should spend more money on the mental health of the war veterans is between 39% to 45%.
- (d) The percentage of the American public who agree that we should spend more money on the mental health of the war veterans is 42%.
- A major newspaper wants to hire a polling agency to predict who will be the next governor. Agency A proposes to do the job with a random sample of 5000 voters at a cost of $\$ 50K$ (K = one thousand). Agency B proposes to do the job with a random sample of 7500 voters at a cost of $\$ $75K. Assume both agencies find the percentage of voters to be 40% and both use the normal model to calculate the 95% interval. Which agency will you hire? Hint: Compare the margin of error for the two agencies and the relative costs before making your decision.
- (a) I will hire B.
- (b) I have no preference.
- (c) I need more information to decide who to hire.
- (d) I will hire A.
- Suppose that the proportion of the adult population who jog is 0.15. What is the probability that the proportion of joggers in a random sample of size n =200 lies between 0.13 and 0.17?
- (a) 0.5762 approximately
- (b) 0.8125 approximately
- (c) 0.2345 approximately
- (d) 0.1234 approximately
- Records at a large university indicate that 20% of all freshmen are placed on academic probation at the end of the first semester. A random sample of 100 freshmen found that 25% of them were placed on probation. The results of the sample:
- (a) are surprising since it indicates that 5% more of these freshmen were placed on probation than expected
- (b) are surprising since the standard deviation of the sampling distribution is 0.4%.
- (c) are biased since an increase of 5% could not happen without injecting bias into the sample.
- (d) are not surprising since the standard deviation of the sampling distribution is 4%.
- (e) are surprising since SAT scores have increased over the past years
- We have discussed that the standard deviation of the distribution of sample percentages, $SE(\hat{p})$ is calculated by taking the square root of $\frac{\hat{p}(1-\hat{p})}{N}$, where $\hat{p}$ is the proportion in the sample and N is sample size. What does $SE(\hat{p})$ show?
- (a) It shows the standard error of the man across repeated samples from the population.
- (b) It shows the distribution of $\hat{p}$ for the single sample that the researcher draws from the population.
- (c) It shows the standard deviation of $\hat{p}$ for repeated samples from the population.
- (d) It shows the variation for $\hat{p}$ values for repeated samples from the population.
References
- SOCR Home page: http://www.socr.umich.edu
Translate this page: