SMHS CIs
Scientific Methods for Health Sciences - Point and Interval Estimation: MoM and MLE
Overview
Estimation of population parameters is critical in many applications. In statistics, estimation uses a combination of effect sizes, confidence intervals, and meta-analysis to plan experiments, analyze data and interpret results. It is most frequently carried in terms of point-estimates or interval estimates for population parameters that are of interest. This lesson aims to study the various methodologies commonly used in point and interval estimates like Method of Moments (MOM), Maximum Likelihood Estimation (MLE). We are interested in looking at ways in estimation of a population based on the sample distribution and illustrated on point and interval estimation of population mean, proportion, and variance using methods introduced in this class.
Motivation
Suppose we wanted to estimate the probability of a head of flipping a specific coin by repeating the experiment several times. How much confidence are we in our estimation? There are a number of other similar situations where we need to evaluate, predict or estimate a population parameter of interest using an observed data sample. The method of moments (MOM) and maximum likelihood estimation (MLE) are among the most commonly used methods to estimate various population parameters.
In point and interval estimation, not only do we need to consider the distribution and model the estimates are based on, we also need to make assumptions in terms of the population distribution. Also, the estimates of parameters are influenced by other parameters of the population. For example, the estimate of the mean of the population is influenced by parameters like variance and sample size.
Confidence interval is a type of interval that contains the true value of a parameter of interest for $(1-α)100%$ of sample taken is called a %(1-α)100%% confidence interval for that parameter, and the ends of the CI are called confidence limits.
Theory
Method of Moments (MOM) Estimation
This method uses the sample data to calculate some sample moments and then sets these equal to their corresponding population counterparts. Steps:
- Determine the k parameters of interest and the specific distribution for this process;
- Compute the first $k$ (or more) sample moments;
- Set the sample moments equal to the population moments and solve for a (linear or non-linear) system of $k$ equations with $k$ unknowns.
- MOM proportion example: consider the example of flipping a coin 10 times and recording the outcomes of heads and tails. We use the outcomes to infer the true probability of a head ($p=P(Head)$). Suppose we observe the outcome of $\{H,T,T,T,T,H,H,T,H,T\}$. With MOM we have: this is a Binomial experiment and $E[X]=np$. $X$ is the number of heads in the experiment. Hence, $np=4$, $MOM(p)=p = \frac{4}{10}$.
- MOM Beta distribution example: Suppose we have 10 observations we suspect came from a Beta distribution.
Data | 0.055 | 1.005 | 0.075 | 0.005 | 0.075 | 1.005 | 0.005 | 0.035 | 0.225 |
---|
The beta distribution mean and variance are defined explicitly in terms of two parameters.
- Mean: $μ=\frac{α}{α+β}$,
- Variance: $σ^2=\frac{αβ}{(α+β)^2 (α+β+1)}$.
The sample mean and sample variance are $\bar{x} = 0.251$, and $s^2=0.6187$. Solve for α and β.
Maximum likelihood estimation (MLE)
Modeling distribution parameters using MLE estimation based on observed real world data offers a way of tuning the free parameters of the model to provide an optimum fit.
Suppose we observe a sample $x_1,x_2,…,x_n$ of $n$ values from one distribution with probability density/mass function $f_θ$, and we are trying to estimate the parameter $θ$. We can compute the (multivariate) probability density associated with our observed data, $f_θ (x_1,x_2,…,x_n│θ)$. As a function of $θ$ with $x_1,x_2,…,x_n$ fixed, the likelihood function is $$L(θ)=f_θ (x_1,x_2,…,x_n│θ).$$
The MLE of $θ$ is the value of $θ$ that maximizes $L(θ)$: $\arg\max_θ{L(θ)}.$
It is typically assumed that the observed data are independent and identically distributed (iid) with unknown parameter $θ$. The likelihood can be written as a product of n univariate probability densities: $L(θ)=\prod_{i=1}^n {f_θ (x_i |θ)}$ and since maxima are unaffected by monotone transformations and one can take the logarithm of this expression to turn it into a sum: $L^* (θ)=\sum_{i=1}^n {\ln{f_θ (x_i |θ)}}$. The maximum of this expression can then be found numerically using various optimization algorithms.
- Note: The MLE may not be unique, or guaranteed to exist.
- Example: consider the coin flipping example above, observing the number of heads in the outcomes and using this to infer the true probability of p(Head).
- Likelihood function: $L(θ)=f(x│θ=p)=\choose{10}{4} p^4 (1-p)^6$
- Log-likelihood function: $L^* (θ)=\ln{\choose{10}{4}} + 4\ln{p} + 6\ln{(1-p)}$.
- Maximize the log-likelihood function by setting its first derivative to zero:
$$ 0=\frac{d(\ln{\choose{10}{4}} + 4\ln{p} + 6\ln{(1-p)}}{dp} =4/p-6/(1-p), p=2/5.$$
MOM vs. MLE
- The MOM is inferior to Fisher’s MLE method, because MLE have higher probability of being close to the quantities to be estimated.
- MLE may be intractable in some situations, whereas the MOM estimates can be quickly and easily calculated by hand or using a computer.
- MOM estimates may be used as the first approximations to the solutions of the MLE method, and successive improved approximations may then be found by the Newton-Raphson method. In this respect, the MOM and MLE are symbiotic.
- Sometimes, MOM estimates may be outside of the parameter space, i.e., they are unreliable, which is never a problem with ML method.
- MOM estimates are not necessarily sufficient statistics, i.e., they sometimes fail to take into account all relevant information in the sample.
- MOM may be preferred to MLE for estimating some structural parameters, when appropriate probability distributions are unknown.
Student’s T Distribution
The distribution needed to estimate the mean of a normally distributed population when the sample size is small and the population variance is unknown. It is the basis of the popular Student’s t-tests for the statistical significance of the difference between two sample means, and for confidence intervals for the difference between two population means.
Suppose $X_1,X_2,…,X_n$ are independent random variables that are normally distributed with expected value $μ$ and variance $σ^2$. Sample mean: \bar{x}_n = \frac{1}{n} \sum_{i=1}^n{x_i}$. Sample variance: $S_n^2=\frac{1}{n} \sum_{i=1}^n{(x_i-\bar{x})^2}$, $Z=\frac{\bar{x}_n-\mu}{\frac{\sigma}{\sqrt{n}}}$ is normally distributed with mean 0 and variance 1, since the sample mean ($\bar{x}_n$) is normally distributed with mean μ and standard deviation $\frac{\sigma}{\sqrt{n}}$. $$Z=\frac{\bar{x}_n-\mu}{\frac{\sigma}{\sqrt{n}}}$$ $$T=\frac{\bar{x}_n-\mu}{\frac{S_n}{\sqrt{n}}}$$
T replaces $\sigma$ with with sample standard deviation. Also, $(n-1) \frac{S_n^2}{\sigma^2}$ has a Chi-square distribution $\chi_(n-1)^2$ with degree of freedom equal to $n-1$.
- Example: suppose a research involves 25 patients and relative measurements are recorded:
Variable | N | N* | Mean | SE of Mean | StDev | Minimum | Q1 | Median | Q3 | Maximum |
CD4 | 25 | 0 | 321.4 | 14.8 | 73.8 | 208.0 | 261.5 | 325.0 | 394.0 | 449.0 |
What do we know from the background information?
- $\bar{y}= 321.4$
- $s = 73.8$
- $SE = 14.8$
- $n = 25$
- $CI(\alpha)=CI(0.05)$: $\bar{y} \pm t_{\alpha\over 2} {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{\frac{(x_i-\bar{x})^2}{n-1}}}.$
- $321.4 \pm t_{(24, 0.025)}{73.8\over \sqrt{25}}$
- $321.4 \pm 2.064\times 14.8$
- $[290.85, 351.95]$
Estimating a population mean with large samples
We use the following protocol to find point and interval estimates when the sample sizes are large, say exceeding 100.
- Assumptions: The Central Limit Theorem guarantees that for large samples, the method above provides a valid recipe for constructing a confidence interval for the population mean, no matter what the distribution for the observed data may be. Of course, for significantly non-Normal distributions, we may need to increase the sample size to guarantee that the sampling distribution of the mean is approximately Normal.
- Point estimation of population mean: $\bar{X_n}={1\over n}\sum_{i=1}^n{X_i}$, constructed from a random sample of the process {$X_1, X_2, X_3, \cdots , X_n$}, which is an unbiased estimate of the population mean $\mu$, if it exists! Note that the sample average may be susceptible to outliers.
- Interval estimation of a population mean: Choose a confidence level \((1-\alpha)100%\), where \(\alpha\) is small (e.g., 0.1, 0.05, 0.025, 0.01, 0.001, etc.). Then a \((1-\alpha)100%\) confidence interval for \(\mu\) will be
\[CI(\alpha): \overline{x} \pm z_{\alpha\over 2} E,\]
- The Error term, E, is defined as
\(E = \begin{cases}{\sigma\over\sqrt{n}},& \texttt{for-known}-\sigma,\\ {SE},& \texttt{for-unknown}-\sigma.\end{cases}\)
- The Standard Error of the estimated \(\overline {x}\) is obtained by replacing the unknown population standard deviation by the sample standard deviation\[SE(\overline {x}) = {1\over \sqrt{n}} \sqrt{\sum_{i=1}^n{(x_i-\overline{x})^2\over n-1}}\]
- \(z_{\alpha\over 2}\) is the Critical Value for a Standard Normal distribution at \({\alpha\over 2}\).
choose a confidence level (1-α)100%, where α is small (e.g., 0.1, 0.025). Then a (1-α)100% confidence interval for μ will be: CI(α):x ̅±Z_(α/2) E, where E is the error term defined as:
E={█(σ/√n,for known σ@SE,for unknown σ.)┤, the standard error of the estimated x ̅ is obtained by replacing the unknown population standard deviation by the sample standard deviation: SE(x ̅ )=1/√n √(∑_(i=1)^n▒(x_i-x ̅ )^2/(n-1)), z_(α/2) is the critical value for a standard normal distribution at α/2.
Example: a random sample of the number of sentences found in 30 magazine advertisements is listed below. Use this sample to find point estimate for the population mean μ. Samples: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12, 5, 9, 17, 6, 11, 17, 18, 20, 6, 14, 7, 11, 12, 5, 18, 6, 4, 13, 11, 12. Suppose the point estimate is 12.25. A confidence interval estimate of μ is a range of values used to estimate a population parameter. Known variance: suppose that we know the population variance of the above example is known to be 256. σ=16. For α/2=0.05, the 90% CI of μ is constructed by x ̅±1.645SE(x ̅ )=14.77±1.645*16/√30=[9.96,19.57]. Unknown variance: suppose don’t know the variance for the population but the sample variance is known to be 273. s=σ ̂=16.54. For α/2=0.05, the 90% CI of μ is constructed by x ̅±1.645SE(x ̅ )=14.77±1.645*16.54/√30=[9.80,19.73].
3.6) Estimating a population mean with small samples (say <30 observations): naturally, the point estimates are less precise and the interval estimates produce wider intervals, compared to the case of large samples. Assumptions: need evidence that the data we observed and used for point and interval estimates come from a distribution, which is (approximately) normal. If this assumption is violated than the interval estimate we are going to introduce may be significantly misrepresenting the real confidence interval. Point estimation of population mean: (X_n ) ̅=1/n ∑_(i=1)^n▒X_i , constructed from a random sample of the process {X_1,X_2,…,X_n}, which is an unbiased estimate of the population mean μ if it exists. Interval estimation of a population mean: choose a confidence level (1-α)100%, where α is small (e.g., 0.1, 0.025). Then a (1-α)100% confidence interval for μ will be: CI(α):x ̅±t_(α/2) E, where E is the error term defined as: E={█(σ/√n,for known σ@SE,for unknown σ.)┤, the standard error of the estimated x ̅ is obtained by replacing the unknown population standard deviation by the sample standard deviation: SE(x ̅ )=1/√n √(∑_(i=1)^n▒(x_i-x ̅ )^2/(n-1)), t_(α/2) is the critical value for the T distribution at α/2.
Example: a random sample of the number of sentences found in 10 magazine advertisements is listed below. Use this sample to find point estimate for the population mean μ. Samples: 16, 9, 14, 11, 17, 12, 99, 18, 13, 12. Suppose the point estimate is 22.1. Known variance: suppose that we know the population variance of the above example is known to be 256. σ=16. For α/2=0.05, the 90% CI of μ is constructed by x ̅±1.833SE(x ̅ )=14.77±1.833*16/√10=[12.83,31.37]. Unknown variance: suppose don’t know the variance for the population but the sample variance is known to be 737.88. s=σ ̂=27.164. For α/2=0.05, the 90% CI of μ is constructed by x ̅±1.833SE(x ̅ )=14.77±1.833*27.164/√10=[6.35,37.85].
3.7) Estimating a population proportion: when sample size is large, the sampling distribution of sample proportion p ̂ is approximately Normal by CLT, as the sample proportion may be presented as a sample average or a Bernoulli random variable. When sample size is small, the normal distribution may be in adequate. To accommodate this, we will modify the sample proportion p ̂ slightly and obtain the corrected-sample-proportion p ̃=y/n→p ̃=(y+0.5z_(α/2)^2)/(n+z_(α/2)^2 ). The confidence intervals for the sample proportion p ̂ and the corrected-sample-proportion p ̃ are: p ̂±z_(α/2) 〖SE〗_p ̂ and p ̃±z_(α/2) 〖SE〗_p ̃ respectively.
3.8) Estimating population variance: Point estimates of population variance and standard deviation: the most unbiased point estimate for the population σ^2 is the sample variance s^2 and the point estimate for the population standard deviation σ is the sample standard deviation s. A chi-square distribution is used to construct confidence intervals for the variance and standard deviation. If the process or phenomenon we study generates a normal random variable, then computing the following random variable (for sample size > 1) has a chi-square distribution χ_o^2=((n-1) s^2)/σ^2 . Chi-square distribution: (1) χ_o^2≥0 for all chi-square values; (2) the chi-square distribution is a family of curves, each is determined by the degree of freedom n-1; (3) to form a confidence interval for the variance, use χ^2 (df=n-1) distribution with degrees of freedom equal to one less than the sample size; (4) the area under each curve of the chi-square distribution equals one; (5) a chi-square distribution are positively skewed. Interval estimates of population variance and standard deviation: chi-square distribution is not symmetric and there are two critical values for the distribution, say χ_L^2 (left-tail critical value) and χ_R^2 (right-tail critical value). Confidence interval for σ^2:(n-1) s^2/χ_R^2≤σ^2≤(n-1) s^2/χ_L^2.
4) Applications
4.1) This article (http://www.tandfonline.com/doi/abs/10.1207/.U5ys8BZRXKw) titled Reliability of Scales with General Structure: Point and Interval Estimation Using a Structural Equation discussed a method of obtaining point and interval estimates of reliability for composites of measures with a general structure. The approach is based on fitting a correspondingly constrained structural equation model and generalizes earlier covariance structure analysis methods for scale reliability estimation with congeneric tests. The procedure can be sued with weighted or unweighted composites, in which the weights need not be known in advance but may be estimated simultaneously. The method presented in this paper allows one to obtain an approximate standard error and confidence interval for scale reliability using bootstrap.
4.2) This article (http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_ModelerActivities_NormalBetaModelFit) works on normal and beta distribution model fit activity. It describes the process of SOCR model fitting in the case of using Normal or Beta distribution models. The article aims to motivate the need for analytical modeling of natural processes and illustrated how to use SOCR modeler to fit models to real data ad presented applications of model fitting. It provides specific examples illustrating model fitting and two exercises to practice and learn.
4.3) This article (http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_General_CI_Experiment) works on experiment activities in SOCR on general confidence interval activity and demonstrates the usage and functionality of SOCR general confidence interval applet. It demonstrates the theory behind the use of interval-based estimates of the parameters, illustrates various confidence intervals construction recipes, draws parallels between the construction algorithms and intuitive meaning of confidence intervals and presents a new technology enhanced approach for understanding and utilizing confidence intervals for various applications. The article presents specific example and exercises in this topic and works as a good supplement to point and interval estimates.
5) Software
http://socr.ucla.edu/Applets.dir/Normal_T_Chi2_F_Tables.htm
http://socr.ucla.edu/htmls/exp/Confidence_Interval_Experiment_General.html
http://socr.ucla.edu/htmls/SOCR_Modeler.html
http://socr.ucla.edu/htmls/SOCR_Experiments.html http://socr.ucla.edu/htmls/SOCR_Charts.html
6) Problems
6.1) Tom is in charge of sampling sugar measurements from a very large population of sugar. Lately her standard errors have been alarmingly high for her sample means. If she wants to decrease her sampling error (standard deviation of her sample means) by 1/2 what should she do?
(a) Quadruple the variation inherent in the population.
(b) Triple her sample size. (c) Quadruple her sample size. (d) Halve her sample size.
6.2) The average standardized math score for eighth graders in the state of Michigan is 70 and the standard deviation is 10. We want to find out if the average standardized math score in district A is higher than the average score for the state of Michigan. The mean for a random sample of 36 students from this district is 72. What is the best response?
(a) The p-value is around 0.76 and it is concluded that the average standardized math score in this district is not different from the overall population mean.
(b) The p-value is around 0.12 and it is concluded that the average standardized math score in this district is not higher than the overall population mean. (c) The p-value is around 0.24 and it is concluded that the average standardized math score in this district is not higher than the overall population mean. (d) The p-value is around 0.88 and it is concluded that the average standardized math score in this district is not higher than the overall population mean.
6.3) A random sample of 121 students from the UMich was selected to estimate the average ACT score of all UMich students. The average for the sample was 23.4 and the sample standard deviation was 3.65. If you wanted to calculate a more precise and accurate prediction of the average ACT score of UMich students, which one of the following would be the best thing to do?
(a) Decrease the sample size to 91.
(b) Increase the sample size to 151. (c) Increase the confidence level to 99%. (d) Decrease the confidence level to 90%.
6.4) How does the shape, center, and spread of t-models change as its degrees of freedom increases?
(a) The shape and center stays the same, but the spread becomes narrower.
(b) The shape and center stays the same, but the spread becomes wider. (c) The shape and spread stays the same, but the center will increase. (d) The shape and spread stays the same, but the center will decrease.
6.5) Estimate the critical value of t for a 95% confidence interval with df = 15 (a) 1.71 (b) 2.131 (c) 1.17 (d) 3.45
6.6) True or False: In a well-designed sample survey like the Current Population Survey, the observed sample percentage (e.g., percentage unemployed) is equal to the population percentage. Thus, it is appropriate to just report the sample percentage, without any measure of accuracy (i.e. without the margin of error). (a) True (b) False
6.7) Suppose an NPR news story reports that: "A polling agency reports that the percentage of the American public who agree we should spend more money on the mental health of the war veterans is 42% +/- 3%." (a) The probability that the American public agree that we should spend more money on the mental health of the war veterans is between 39% to 42%. (b) The percentage of the American public who agree that we should spend more money on the mental health of the war veterans is between 39% to 45%. (c) We are 95% confident that the percentage of the American public who agree that we should spend more money on the mental health of the war veterans is between 39% to 45%. (d) The percentage of the American public who agree that we should spend more money on the mental health of the war veterans is 42%.
6.8) A major newspaper wants to hire a polling agency to predict who will be the next governor. Agency A proposes to do the job with a random sample of 5000 voters at a cost of $50K (K = one thousand). Agency B proposes to do the job with a random sample of 7500 voters at a cost of $75K. Assume both agencies find the percentage of voters to be 40% and both use the normal model to calculate the 95% interval. Which agency will you hire? Hint: Compare the margin of error for the two agencies and the relative costs before making your decision. (a) I will hire B. (b) I have no preference. (c) I need more information to decide who to hire. (d) I will hire A.
6.9) Suppose that the proportion of the adult population who jog is 0.15. What is the probability that the proportion of joggers in a random sample of size n =200 lies between 0.13 and 0.17? (a) 0.5762 approximately (b) 0.8125 approximately (c) 0.2345 approximately (d) 0.1234 approximately
6.10) Records at a large university indicate that 20% of all freshmen are placed on academic probation at the end of the first semester. A random sample of 100 freshmen found that 25% of them were placed on probation. The results of the sample: (a) are surprising since it indicates that 5% more of these freshmen were placed on probation than expected (b) are surprising since the standard deviation of the sampling distribution is 0.4%. (c) are biased since an increase of 5% could not happen without injecting bias into the sample. (d) are not surprising since the standard deviation of the sampling distribution is 4%. (e) are surprising since SAT scores have increased over the past years
6.11) We have discussed that the standard deviation of the distribution of sample percentages, SE(p ̂) is calculated by taking the square root of p ̂(1-p ̂)/N, where p ̂ is the proportion in the sample and N is sample size. What does SE(p ̂) show?
(a) It shows the standard error of the man across repeated samples from the population.
(b) It shows the distribution of p ̂ for the single sample that the researcher draws from the population. (c) It shows the standard deviation of p ̂ for repeated samples from the population. (d) It shows the variation for p ̂ values for repeated samples from the population.
References
- SOCR Home page: http://www.socr.umich.edu
Translate this page: