Difference between revisions of "SMHS MultimodelInference"
(Created page with '== Scientific Methods for Health Sciences - Multi-Model Inference == <hr> * SOCR Home page: http://www.socr.umich.edu {{translate|pageName=http://wiki.socr.umich.e…') |
|||
Line 1: | Line 1: | ||
==[[SMHS| Scientific Methods for Health Sciences]] - Multi-Model Inference == | ==[[SMHS| Scientific Methods for Health Sciences]] - Multi-Model Inference == | ||
+ | ===Motivation=== | ||
+ | |||
+ | ===Theory=== | ||
+ | ====Akaike Information Criterion==== | ||
+ | For a given dataset, the Akaike Information Criterion (AIC) measures the relative quality of a statistical model. AIC is rooted in information entropy and quantifies relatively the quality of a model. It would not facilitate a hypothesis testing in an absolute sense. For instance, AIC will not give any warning if all the candidate models provide marginal fit to the data. | ||
+ | |||
+ | $$AIC= 2k -2\ln(L),$$ | ||
+ | where $k$ is the number of parameters in the statistical model, and $L$ is the maximal value of the likelihood function for the estimated model. In R, AIC may be computed using [http://stat.ethz.ch/R-manual/R-devel/library/stats/html/extractAIC.html extractAIC]. | ||
+ | |||
+ | If we have a collection of candidate models for the dataset, the optimal model is the one that minimizes the AIC value, i.e., has maximal [[SMHS_CIs#Maximum_likelihood_estimation_.28MLE.29|log-likelihood]] relative to the number of parameters estimated by the model! AIC includes a fidelity term (rewarding goodness of fit) and a regularization term (penalizing models including large number of parameters that need to be estimated, which discourages [http://en.wikipedia.org/wiki/Overfitting model overfitting]). | ||
+ | |||
+ | The relation between sample size ($n$) and number of estimated parameters ($k$) is important. For instance, smaller sample-sizes relative to number of parameters , $\frac{n}{k} < 40$, need to use the corrected AIC (AICc): | ||
+ | $$ AICc = AIC + 2k\frac{k+1}{n-(k+1)},$$ | ||
+ | |||
+ | Note that AICc converges to AIC as $n$ increases. So, as this information criterion is relative, the use of AICc generally may be justified. | ||
+ | |||
+ | ====Calculating model weights==== | ||
+ | We can rank-order the AIC values for each model, and compute $∆AIC$, the differences between the AIC of each model and the smallest one. That is, compute the relative AIC for each model with respect to the to best model with the smallest AIC value). The Akaike coefficient ($w_i$) represent the model weight: | ||
+ | $$ w_i = \frac{e^{-\frac{1}{2}∆AIC_i}}{\sum_{j=1}^K{e^{-\frac{1}{2}∆AIC_j}}},$$ | ||
+ | Note that the weight of the $i^{th}$ model is $0 \leq w_i \leq 1$ and the sum of AIC weights equals 1. The larger the model weight, the better the model. For example, a model with the model weight $w=0.52$ indicates the probability of this model to be the best possible model is 52%. | ||
+ | |||
+ | ====Model selection and multimodel inference==== | ||
+ | |||
+ | In practice, the model selection protocol may never be perfect especially when the $∆AIC$ is small. When $∆AIC_{(2)} > 2$, the first (rank-ordered!) model is likely to be the best model. | ||
+ | |||
+ | When $∆AIC_{(2)} < 2$ As the model with very small AIC values, which also have small model weights, so the model-averaged result will not be influenced a lot by such models. Suppose $\hat{W}$ is the observed dependent value, we can use model averaging to obtain better estimates: | ||
+ | $$\hat{\bar{W}}=\sum(i=1}^K{w_i $\hat{W_i}}.$$ | ||
+ | |||
+ | The model-averaged prediction is calculated using theh prediction from all models weight-averaged by the model AIC weight. Similarly, we can obtain the model-averaged estimates of parameters. If model $i$-driven estimate of the parameter θ is $\hat{\theta}_i$, the averaged estimate of parameter is computed by: | ||
+ | $$\hat{\bar{\theta}}=\sum(i=1}^K{w_i $\hat{\theta}_i}.$$ | ||
+ | |||
+ | Then, the unconditional variance estimate of the parameter $\theta$ is: | ||
+ | $$var\big (\hat{\bar{\theta}}\big )=\sum(i=1}^K{ \bigg[ w_i var(\hat{\theta}_i | g_i) + \big ( \hat{\theta}_i =\hat{\bar{\theta}}_i \big )^2 \big ]],$$ | ||
+ | where $\hat{\bar{θ}}$ is the model-averaged estimate, $w_i$ is model weight, and $g_i$ denotes the $i^{th}$ model. This estimator of the variance of parameter estimator incorporates sampling variance and a variance component for model selection uncertainty. Finally, confidence intervals for the parameter of hte averaged estimate can be constructed as: | ||
+ | $$\hat{\bar{\theta}} \pm z_{\alpha/2} \times \sqrt{var(\hat{\bar{\theta}})}.$$ | ||
+ | |||
+ | |||
+ | ===Applications=== | ||
Revision as of 16:46, 30 September 2014
Contents
Scientific Methods for Health Sciences - Multi-Model Inference
Motivation
Theory
Akaike Information Criterion
For a given dataset, the Akaike Information Criterion (AIC) measures the relative quality of a statistical model. AIC is rooted in information entropy and quantifies relatively the quality of a model. It would not facilitate a hypothesis testing in an absolute sense. For instance, AIC will not give any warning if all the candidate models provide marginal fit to the data.
$$AIC= 2k -2\ln(L),$$ where $k$ is the number of parameters in the statistical model, and $L$ is the maximal value of the likelihood function for the estimated model. In R, AIC may be computed using extractAIC.
If we have a collection of candidate models for the dataset, the optimal model is the one that minimizes the AIC value, i.e., has maximal log-likelihood relative to the number of parameters estimated by the model! AIC includes a fidelity term (rewarding goodness of fit) and a regularization term (penalizing models including large number of parameters that need to be estimated, which discourages model overfitting).
The relation between sample size ($n$) and number of estimated parameters ($k$) is important. For instance, smaller sample-sizes relative to number of parameters , $\frac{n}{k} < 40$, need to use the corrected AIC (AICc): $$ AICc = AIC + 2k\frac{k+1}{n-(k+1)},$$
Note that AICc converges to AIC as $n$ increases. So, as this information criterion is relative, the use of AICc generally may be justified.
Calculating model weights
We can rank-order the AIC values for each model, and compute $∆AIC$, the differences between the AIC of each model and the smallest one. That is, compute the relative AIC for each model with respect to the to best model with the smallest AIC value). The Akaike coefficient ($w_i$) represent the model weight: $$ w_i = \frac{e^{-\frac{1}{2}∆AIC_i}}{\sum_{j=1}^K{e^{-\frac{1}{2}∆AIC_j}}},$$ Note that the weight of the $i^{th}$ model is $0 \leq w_i \leq 1$ and the sum of AIC weights equals 1. The larger the model weight, the better the model. For example, a model with the model weight $w=0.52$ indicates the probability of this model to be the best possible model is 52%.
Model selection and multimodel inference
In practice, the model selection protocol may never be perfect especially when the $∆AIC$ is small. When $∆AIC_{(2)} > 2$, the first (rank-ordered!) model is likely to be the best model.
When $∆AIC_{(2)} < 2$ As the model with very small AIC values, which also have small model weights, so the model-averaged result will not be influenced a lot by such models. Suppose $\hat{W}$ is the observed dependent value, we can use model averaging to obtain better estimates: $$\hat{\bar{W}}=\sum(i=1}^K{w_i $\hat{W_i}}.$$
The model-averaged prediction is calculated using theh prediction from all models weight-averaged by the model AIC weight. Similarly, we can obtain the model-averaged estimates of parameters. If model $i$-driven estimate of the parameter θ is $\hat{\theta}_i$, the averaged estimate of parameter is computed by: $$\hat{\bar{\theta}}=\sum(i=1}^K{w_i $\hat{\theta}_i}.$$
Then, the unconditional variance estimate of the parameter $\theta$ is: $$var\big (\hat{\bar{\theta}}\big )=\sum(i=1}^K{ \bigg[ w_i var(\hat{\theta}_i | g_i) + \big ( \hat{\theta}_i =\hat{\bar{\theta}}_i \big )^2 \big ]],$$ where $\hat{\bar{θ}}$ is the model-averaged estimate, $w_i$ is model weight, and $g_i$ denotes the $i^{th}$ model. This estimator of the variance of parameter estimator incorporates sampling variance and a variance component for model selection uncertainty. Finally, confidence intervals for the parameter of hte averaged estimate can be constructed as: $$\hat{\bar{\theta}} \pm z_{\alpha/2} \times \sqrt{var(\hat{\bar{\theta}})}.$$
Applications
- SOCR Home page: http://www.socr.umich.edu
Translate this page: