Difference between revisions of "SMHS GLM"

From SOCR
Jump to: navigation, search
(Advanced Topics=)
(Gamma GLM vs. Beta Regression)
 
(14 intermediate revisions by the same user not shown)
Line 18: Line 18:
  
 
1. Random Component: The response variable \(Y\) follows a distribution from the exponential family:
 
1. Random Component: The response variable \(Y\) follows a distribution from the exponential family:
   <math>
+
   <math>f_Y(y|\theta,\phi) = \exp\left\{\frac{y\theta - b(\theta)}{a(\phi)} + c(y,\phi)\right\}</math>
  f_Y(y|\theta,\phi) = \exp\left\{\frac{y\theta - b(\theta)}{a(\phi)} + c(y,\phi)\right\}
+
where <math>\theta</math> is the natural parameter and \(\phi\) is the dispersion parameter.
  </math>
 
  where \(\theta\) is the natural parameter and \(\phi\) is the dispersion parameter.
 
  
 
2. Systematic Component: The linear predictor \(\eta\):
 
2. Systematic Component: The linear predictor \(\eta\):
   <math>
+
   <math>\eta = \mathbf{X}\beta = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p</math>
  \eta = \mathbf{X}\beta = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p
 
  </math>
 
  
 
3. Link Function: \(g(\cdot)\) that relates the mean \(\mu = E[Y]\) to the linear predictor:
 
3. Link Function: \(g(\cdot)\) that relates the mean \(\mu = E[Y]\) to the linear predictor:
   <math>
+
   <math>g(\mu) = \eta \quad \text{or equivalently} \quad \mu = g^{-1}(\eta)</math>
  g(\mu) = \eta \quad \text{or equivalently} \quad \mu = g^{-1}(\eta)
 
  </math>
 
  
The variance function relates the variance to the mean:
+
The variance function relates the variance to the mean: <math>\text{Var}(Y) = a(\phi)V(\mu)</math>
<math>
+
where <math>V(\mu)</math> is the variance function specific to the distribution.
\text{Var}(Y) = a(\phi)V(\mu)
 
</math>
 
where \(V(\mu)\) is the variance function specific to the distribution.
 
  
 
====2) Exponential Family Distributions====
 
====2) Exponential Family Distributions====
Line 141: Line 132:
 
plot(model1, which = 1:4)
 
plot(model1, which = 1:4)
 
</pre>
 
</pre>
 +
 +
'''Result Interpretation''':
 +
 +
Let's explicate ''how to interpret the parameter estimates in this GLM model''.
 +
Specifically, as this is a bivariate outcome, the estimates are not correlations
 +
 +
<pre>
 +
glm(formula = cbind(using, notUsing) ~ age + education + wantsMore,
 +
    family = binomial, data = cuse)
 +
 +
Coefficients:
 +
            Estimate Std. Error z value Pr(>|z|)   
 +
(Intercept)  -0.8082    0.1590  -5.083 3.71e-07 ***
 +
age25-29      0.3894    0.1759  2.214  0.02681 * 
 +
age30-39      0.9086    0.1646  5.519 3.40e-08 ***
 +
age40-49      1.1892    0.2144  5.546 2.92e-08 ***
 +
educationlow  -0.3250    0.1240  -2.620  0.00879 **
 +
wantsMoreyes  -0.8330    0.1175  -7.091 1.33e-12 ***
 +
</pre>
 +
 +
In a ''Generalized Linear Model (GLM)'' with a ''binomial family'' and a ''logit link'' (the default for ''R''), the coefficients represent '''log-odds ratios'''.
 +
Since our outcome is structured as
 +
<pre>
 +
  cbind(using, notUsing),
 +
</pre>
 +
we are modeling the probability of "using" (success) versus "not using" (failure).
 +
 +
That is, the model interpretation is in terms of '''Log-Odds'''.
 +
In this model, the relationship between the ''covariate predictors'' and the ''outcome'' is defined by the ''logit'' function
 +
 +
<math>\text{logit}(p) = \ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1X_1 + \dots + \beta_kX_k.
 +
</math>
 +
 +
The estimate ''Sign'' matters:
 +
 +
: ''Positive Estimate'': As the predictor increases (or if the category is present), the probability of "using" increases.
 +
: ''Negative Estimate'': As the predictor increases, the probability of "using" decreases.
 +
 +
The estimate ''Magnitude'' is interpreted via the "Exponentiation Trick".
 +
Because humans don't think naturally in "log-odds" terms, we usually ''exponentiate'' the coefficients (<math>e^\beta</math>) to get the ''Odds Ratios (OR)''.
 +
 +
{| class="wikitable"
 +
! Variable
 +
! Estimate (β)
 +
! Odds Ratio (e<sup>β</sup>)
 +
! Interpretation
 +
|-
 +
| '''age40-49'''
 +
| 1.1892
 +
| ≈ 3.28
 +
| Women aged 40-49 have '''3.28 times the odds''' of using contraception compared to the reference age group (likely <25).
 +
|-
 +
| '''wantsMoreyes'''
 +
| -0.8330
 +
| ≈ 0.43
 +
| Women who want more children have '''57% lower odds''' (1 - 0.43) of using contraception than those who don't.
 +
|}
 +
 +
 +
Specific Breakdown of the Results:
 +
 +
* ''Age'' (Categorical): R has treated age as a ''factor''. The baseline (reference) group is the youngest group (''under 25'').
 +
:: As age increases (<math>25-29 \ \to 30-39\  \to 40-49</math>), the coefficients become increasingly positive (<math>0.38 \to 0.90 \to 1.18</math>).
 +
:: Meaning: Older women in this dataset are significantly more likely to use contraception than the youngest group.
 +
 +
* Education:
 +
:: Estimate: -0.3250
 +
:: Meaning: Being in the ''low education'' group is associated with a decrease in the log-odds of using contraception compared to the ''high'' group. Specifically, their odds are about 28\% lower (<math>e^{-0.325} \approx 0.72</math>).
 +
 +
* Wants More Children:
 +
:: Estimate: -0.8330
 +
:: Meaning: This is a strong negative predictor. If a woman wants more children, the probability of her using contraception drops significantly.
 +
 +
We can have a quick "reality check" on the measuring units. Unlike a standard correlation (which is bounded between -1 and 1), these estimates can be any real number.
 +
 +
:: An estimate of 0 means the variable has no effect on the odds (Odds Ratio = 1).
 +
:: The z-value tells us how many standard errors the estimate is from zero. Since all p-values are small (<math>< 0.05</math>), all these predictors are statistically significant.
  
 
====Example 2: Poisson Regression for Count Data====
 
====Example 2: Poisson Regression for Count Data====
Line 319: Line 387:
 
</center>
 
</center>
  
===Practical Considerations====
+
===Practical Considerations===
  
 
* Model Selection:
 
* Model Selection:
Line 339: Line 407:
 
===Advanced Topics===
 
===Advanced Topics===
  
* Mixed Effects GLM (GLMM): Extension incorporating random effects:
+
==== Mixed Effects GLM (GLMM)====
 +
 
 +
Extension incorporating random effects:
  
 
<math>
 
<math>
Line 389: Line 459:
 
</pre>
 
</pre>
  
####2) Bayesian GLM
+
====Bayesian GLM====
Using Markov Chain Monte Carlo (MCMC) methods:
+
 
 +
Using Markov Chain Monte Carlo (MCMC) methods.
 +
 
 
<pre>
 
<pre>
 
library(rstanarm)
 
library(rstanarm)
Line 399: Line 471:
 
                         prior = normal(0, 2.5),
 
                         prior = normal(0, 2.5),
 
                         prior_intercept = normal(0, 5))
 
                         prior_intercept = normal(0, 5))
 +
 +
print(bayes_model, digits = 2)
 +
 +
# Coefficient Plot (Forest Plot)
 +
 +
plot(bayes_model, "areas")  # density + intervals
 +
# OR more customizable:
 +
library(bayesplot)
 +
mcmc_intervals(bayes_model, prob = 0.95) +
 +
  ggplot2::labs(title = "Posterior Distributions of Coefficients")
 +
 +
# Posterior Predictive Checks (PPC)
 +
# Check if the model can reproduce data like the observed
 +
pp_check(bayes_model, plotfun = "hist", nreps = 100) +
 +
  ggplot2::labs(title = "Posterior Predictive Check (Histogram of y_rep vs y)")
 
</pre>
 
</pre>
  
===Problems====
+
Another Bayesian Experiment.
 +
 
 +
<pre>
 +
library(rstanarm)
 +
set.seed(123)
 +
 
 +
# Simulate data — explicitly make trt and sex into factors
 +
n <- 500
 +
mydata <- data.frame(
 +
  day = sample(1:30, n, replace = TRUE),
 +
  trt = factor(sample(c("A", "B"), n, replace = TRUE), levels = c("A", "B")),
 +
  sex = factor(sample(c("F", "M"), n, replace = TRUE), levels = c("F", "M")),
 +
  p  = rnorm(n, mean = 50, sd = 10)
 +
)
 +
 
 +
# True log-odds
 +
logit_p <- -1 - 0.02 * mydata$day +
 +
          0.6 * (as.numeric(mydata$trt) - 1) +  # A=0, B=1
 +
          0.3 * (as.numeric(mydata$sex) - 1) +    # F=0, M=1
 +
          0.04 * mydata$p
 +
 
 +
mydata$y <- rbinom(n, size = 1, prob = plogis(logit_p))
 +
 
 +
# Fit model
 +
bayes_model <- stan_glm(
 +
  y ~ day + trt + sex + p,
 +
  family = binomial,
 +
  data = mydata,
 +
  prior = normal(0, 2.5),
 +
  prior_intercept = normal(0, 5),
 +
  seed = 42,
 +
  refresh = 0
 +
)
 +
 
 +
# Create New Data
 +
# Confirm variables are factors
 +
str(mydata[c("trt", "sex")])
 +
 
 +
# Create prediction grid
 +
newdata <- expand.grid(
 +
  day = seq(min(mydata$day), max(mydata$day), length.out = 30),
 +
  trt = levels(mydata$trt)[1],      # e.g., "A"
 +
  sex = levels(mydata$sex)[1],      # e.g., "F"
 +
  p  = median(mydata$p)
 +
)
 +
 
 +
# Verify it worked
 +
stopifnot(nrow(newdata) > 0)
 +
str(newdata)
 +
 
 +
# Generate Predictions from the Bayesian Posterior Probability (after fitting the Bayesian Model):
 +
# Posterior predicted probabilities
 +
post_pred <- posterior_linpred(bayes_model, newdata = newdata, transform = TRUE)
 +
 
 +
# Summarize
 +
pred_summary <- data.frame(
 +
  day = newdata$day,
 +
  prob = apply(post_pred, 2, median),
 +
  lower = apply(post_pred, 2, quantile, 0.025),
 +
  upper = apply(post_pred, 2, quantile, 0.975)
 +
)
 +
 
 +
# Plot
 +
library(ggplot2)
 +
ggplot(pred_summary, aes(x = day, y = prob)) +
 +
  geom_line(color = "blue") +
 +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, fill = "blue") +
 +
  labs(
 +
    title = "Predicted Probability of Outcome by Day",
 +
    subtitle = "Holding treatment = A, sex = F, and p = median",
 +
    y = "P(y = 1)",
 +
    x = "Day"
 +
  ) +
 +
  theme_minimal()
 +
</pre>
 +
 
 +
==== ''Gamma GLM'' vs. ''Beta Regression''====
 +
 
 +
The ''Gamma GLM'' and ''Beta Regression'' models are designed for very different types of
 +
"continuous" data.
 +
The fundamental difference lies in the '''domain''' (the range of possible values) and the
 +
'''variance structure''' of the data they expect.
 +
 
 +
===== The Gamma Model (''glm'')=====
 +
 
 +
The Gamma distribution is used for ''strictly positive, continuous data'' <math>(0, \infty)</math>. It is appropriate for data that is "skewed" to the right, where the variance increases as the mean increases.
 +
If a variable, like ''qol_score'' is a raw score (e.g., 0 to 100), the link, ''link = "log"'',
 +
ensures the predicted values remain positive.
 +
 
 +
In a Gamma GLM with a log link, we model the mean <math>\mu</math> as
 +
 
 +
<math>\ln(\mu_i) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k.</math>
 +
 
 +
The probability density function (PDF) for the Gamma distribution is
 +
 
 +
<math>f(y; \alpha, \beta) = \frac{\beta^\alpha y^{\alpha-1} e^{-\beta y}}{\Gamma(\alpha)} \quad \text{for } y > 0.</math>
 +
 
 +
===== The Beta Model (''betareg'')=====
 +
 
 +
The Beta distribution, is used specifically for '''rates, proportions, or scores constrained between 0 and 1''', <math>(0, 1)</math>. Unlike the Gamma model, which can go off to infinity, the Beta distribution is "boxed in" this interval.
 +
For instance, if ''qol_scaled'' is normalized, e.g., by dividing QoL score by its maximum value,
 +
it's values are forced into <math>(0, 1)</math> interval.
 +
Beta distribution shape is veryflexible. It can be U-shaped, J-shaped, or symmetric, making it perfect for "ceiling" or "floor" effects in Quality of Life scores.
 +
 
 +
Beta regression typically uses a '''logit link''' by default to keep predictions between 0 and 1:
 +
 
 +
<math>\ln\left(\frac{\mu_i}{1-\mu_i}\right) = \beta_0 + \beta_1 X_1 + \dots + \beta_k X_k.</math>
 +
 
 +
The PDF for the Beta distribution (parameterized by mean <math>\mu</math> and precision <math>\phi</math>) is:
 +
 
 +
<math>f(y; \mu, \phi) = \frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)} y^{\mu\phi-1} (1-y)^{(1-\mu)\phi-1} \quad \text{for } y \in (0, 1).</math>
 +
 
 +
Key Differences between ''Gamma GLM'' and ''Beta Regression''
 +
 
 +
{| class="wikitable"
 +
! Feature
 +
! Gamma Model (model4)
 +
! Beta Model (model_beta)
 +
|-
 +
| '''Outcome Range'''
 +
| <math>(0, \infty)</math> (Positive numbers)
 +
| <math>(0, 1)</math> (Proportions/Scaled)
 +
|-
 +
| '''Typical Use'''
 +
| Costs, rainfall, raw test scores
 +
| Percentages, scaled QoL indices
 +
|-
 +
| '''Interpretation'''
 +
| Log-link: Coefficients are multiplicative
 +
| Logit-link: Coefficients are odds ratios
 +
|-
 +
| '''Boundaries'''
 +
| Can handle very large values
 +
| Naturally handles "ceiling effects"
 +
|}
 +
 
 +
If the outcome dependent variable (DV) qol_score is a raw sum that could theoretically be much higher, stick with Gamma. But when ''qol_score'' is a bounded instrument (like a visual analog scale from 0–1) with many observations scoring near the very top or bottom, ''Beta'' may be superior because it understands the "walls" of the scale.
 +
 
 +
<pre>
 +
library(readr)
 +
 +
# REMOTE URL
 +
url_wide <- "https://socr.umich.edu/docs/uploads/2026/SOCR_CRDS_CompAnalysis_CaseStudy_wide.csv"
 +
url_long <- "https://socr.umich.edu/docs/uploads/2026/SOCR_CRDS_CompAnalysis_CaseStudy_long.csv"
 +
 +
# Import: Reading the Wide data
 +
df_wide <- read_csv(url_wide, col_types = cols(
 +
    patient_id      = col_factor(),    # Prevents math on IDs
 +
    age            = col_double(),
 +
    gender          = col_character(),
 +
    treatment      = col_character(),
 +
    smoking_history = col_character(),
 +
    baseline_fvc    = col_double(),
 +
    fvc_0          = col_double(),
 +
    fvc_6          = col_double(),
 +
    fvc_12          = col_double(),
 +
    qol_score      = col_double(),
 +
    walk_dist      = col_double(),
 +
    success        = col_integer(),  # 0 or 1 is best as integer
 +
    hospital_visits = col_integer(),
 +
    physician_notes = col_character()
 +
))
 +
                                                                                 
 +
# Import: Reading the Long data
 +
df_long <- read_csv(url_long, col_types = cols(
 +
    patient_id = col_factor(), 
 +
    age = col_double(),
 +
    gender = col_character(),
 +
    timepoint = col_factor(levels = c("Month 0", "Month 6", "Month 12"))
 +
))
 +
 
 +
df_wide$qol_scaled <- df_wide$qol_score / (max(df_wide$qol_score) + 0.0001)
 +
 +
# Use the betareg package
 +
library(betareg)
 +
 
 +
model4 <- glm(qol_score ~ treatment + gender + age + baseline_fvc + success, family = Gamma(link = "log"), data = df_wide)
 +
summary(model4)
 +
model_beta <- betareg(qol_scaled ~ treatment + gender + baseline_fvc + success + age, data = df_wide)
 +
summary(model_beta )
 +
</pre>
 +
 
 +
===Problems===
  
 
1. Conceptual Exercises:
 
1. Conceptual Exercises:
 +
 
   a) Derive the score equations for a Poisson GLM with log link
 
   a) Derive the score equations for a Poisson GLM with log link
 
   b) Show that the binomial distribution with logit link is the canonical link
 
   b) Show that the binomial distribution with logit link is the canonical link
Line 409: Line 679:
  
 
2. Applied Problems:
 
2. Applied Problems:
 +
 
   a) Analyze the [http://wiki.socr.umich.edu/index.php/SOCR_Data_Dinov_021808_ConsumerPriceIndex3Way Consumer Price Index] data using appropriate GLM
 
   a) Analyze the [http://wiki.socr.umich.edu/index.php/SOCR_Data_Dinov_021808_ConsumerPriceIndex3Way Consumer Price Index] data using appropriate GLM
 
   b) Model the [http://wiki.socr.umich.edu/index.php/SOCR_Data_MonetaryBase1959_2009 Monetary Base] data considering temporal autocorrelation
 
   b) Model the [http://wiki.socr.umich.edu/index.php/SOCR_Data_MonetaryBase1959_2009 Monetary Base] data considering temporal autocorrelation
 
   c) Using the <code>iris</code> dataset, build a multinomial logistic regression to classify species
 
   c) Using the <code>iris</code> dataset, build a multinomial logistic regression to classify species
  
3. Simulation Study:
+
3. Practice Simulation Study:
 +
 
 
   <pre>
 
   <pre>
 
   # Simulate data from a logistic regression model
 
   # Simulate data from a logistic regression model
Line 440: Line 712:
 
   </pre>
 
   </pre>
  
===References====
+
===References===
  
 
1. McCullagh, P., & Nelder, J. A. (1989). *Generalized Linear Models* (2nd ed.). Chapman and Hall.
 
1. McCullagh, P., & Nelder, J. A. (1989). *Generalized Linear Models* (2nd ed.). Chapman and Hall.
 +
 
2. Dobson, A. J., & Barnett, A. G. (2018). *An Introduction to Generalized Linear Models* (4th ed.). CRC Press.
 
2. Dobson, A. J., & Barnett, A. G. (2018). *An Introduction to Generalized Linear Models* (4th ed.). CRC Press.
 +
 
3. Agresti, A. (2015). *Foundations of Linear and Generalized Linear Models*. Wiley.
 
3. Agresti, A. (2015). *Foundations of Linear and Generalized Linear Models*. Wiley.
 +
 
4. Fahrmeir, L., Kneib, T., Lang, S., & Marx, B. (2013). *Regression: Models, Methods and Applications*. Springer.
 
4. Fahrmeir, L., Kneib, T., Lang, S., & Marx, B. (2013). *Regression: Models, Methods and Applications*. Springer.
 +
 
5. Wood, S. N. (2017). *Generalized Additive Models: An Introduction with R* (2nd ed.). Chapman and Hall/CRC.
 
5. Wood, S. N. (2017). *Generalized Additive Models: An Introduction with R* (2nd ed.). Chapman and Hall/CRC.
  

Latest revision as of 17:09, 13 March 2026

Scientific Methods for Health Sciences - Generalized Linear Modeling (GLM)

Overview

Generalized Linear Modeling (GLM) is a flexible generalization of ordinary linear regression that allows response variables to have error distribution models other than a normal distribution. GLM extends linear regression by allowing the linear model to be related to the response variable via a link function and enabling the variance of each measurement to be a function of its predicted value. This framework unifies statistical models including linear regression, logistic regression, and Poisson regression. Estimation methods include iteratively reweighted least squares for maximum likelihood estimation, Bayesian approaches, and least squares fitted to variance-stabilized responses.

Motivation

While linear regression models linear relationships between response and predictors, many real-world scenarios involve response variables that don't follow normal distributions. For example:

  • Binary outcomes (yes/no decisions) with probabilities bounded between 0 and 1
  • Count data (number of events) that follow Poisson distributions
  • Survival times that follow exponential or Weibull distributions

GLM provides a unified framework for these situations by allowing response variables from the exponential family of distributions.

Theory

1) GLM Components

A GLM consists of three components:

1. Random Component: The response variable \(Y\) follows a distribution from the exponential family\[f_Y(y|\theta,\phi) = \exp\left\{\frac{y\theta - b(\theta)}{a(\phi)} + c(y,\phi)\right\}\] where \(\theta\) is the natural parameter and \(\phi\) is the dispersion parameter.

2. Systematic Component: The linear predictor \(\eta\)\[\eta = \mathbf{X}\beta = \beta_0 + \beta_1X_1 + \beta_2X_2 + \cdots + \beta_pX_p\]

3. Link Function: \(g(\cdot)\) that relates the mean \(\mu = E[Y]\) to the linear predictor\[g(\mu) = \eta \quad \text{or equivalently} \quad \mu = g^{-1}(\eta)\]

The variance function relates the variance to the mean\[\text{Var}(Y) = a(\phi)V(\mu)\] where \(V(\mu)\) is the variance function specific to the distribution.

2) Exponential Family Distributions

The exponential family includes many common distributions:

Distribution Support Natural Parameter \(\theta\) \(b(\theta)\) Canonical Link \(g(\mu)\) Variance Function \(V(\mu)\)
Normal \(\mathbb{R}\) \(\mu\) \(\frac{\theta^2}{2}\) Identity: \(\mu\) 1
Binomial \(\{0,1,\ldots,n\}\) \(\log\left(\frac{\mu}{n-\mu}\right)\) \(n\log(1+e^\theta)\) Logit: \(\log\left(\frac{\mu}{n-\mu}\right)\) \(\mu\left(1-\frac{\mu}{n}\right)\)
Poisson \(\mathbb{N}_0\) \(\log(\mu)\) \(e^\theta\) Log: \(\log(\mu)\) \(\mu\)
Gamma \(\mathbb{R}^+\) \(-\frac{1}{\mu}\) \(-\log(-\theta)\) Inverse: \(\frac{1}{\mu}\) \(\mu^2\)
Inverse Gaussian \(\mathbb{R}^+\) \(-\frac{1}{2\mu^2}\) \(-\sqrt{-2\theta}\) Inverse squared: \(\frac{1}{\mu^2}\) \(\mu^3\)

3) Maximum Likelihood Estimation

For a GLM with \(n\) independent observations, the log-likelihood is\[ \ell(\beta) = \sum_{i=1}^n \frac{y_i\theta_i - b(\theta_i)}{a(\phi)} + c(y_i,\phi) \] where \(\theta_i = \theta(\mu_i)\) and \(\mu_i = g^{-1}(\mathbf{x}_i^\top\beta)\).

The score equations are\[ \mathbf{U}(\beta) = \frac{\partial\ell}{\partial\beta} = \mathbf{X}^\top\mathbf{W}(\mathbf{y} - \boldsymbol{\mu}) = 0 \] where \(\mathbf{W} = \text{diag}\left\{\frac{1}{a(\phi)V(\mu_i)[g'(\mu_i)]^2}\right\}\).

The Fisher information matrix is\[ \mathcal{I}(\beta) = E\left[-\frac{\partial^2\ell}{\partial\beta\partial\beta^\top}\right] = \mathbf{X}^\top\mathbf{W}\mathbf{X} \]

Parameters are estimated via Iteratively Reweighted Least Squares (IRLS)\[ \beta^{(t+1)} = \beta^{(t)} + (\mathbf{X}^\top\mathbf{W}^{(t)}\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{W}^{(t)}\mathbf{z}^{(t)} \] where \(z_i^{(t)} = \eta_i^{(t)} + (y_i - \mu_i^{(t)})g'(\mu_i^{(t)})\).

4) Deviance and Goodness-of-Fit

The deviance measures goodness-of-fit\[ D = 2[\ell(\text{saturated model}) - \ell(\text{fitted model})] \] For nested models \(M_0 \subset M_1\)\[ D_{M_0} - D_{M_1} \sim \chi^2_{df_{M_0} - df_{M_1}} \quad \text{under } H_0 \]

The scaled deviance is \(D^* = D/\phi\), and Pearson's chi-square statistic is\[ \chi^2_P = \sum_{i=1}^n \frac{(y_i - \hat{\mu}_i)^2}{V(\hat{\mu}_i)} \]

5) Model Diagnostics

Key diagnostic tools:

  • Pearson residuals: \(r_i^P = \frac{y_i - \hat{\mu}_i}{\sqrt{V(\hat{\mu}_i)}}\)
  • Deviance residuals: \(r_i^D = \text{sign}(y_i - \hat{\mu}_i)\sqrt{d_i}\)
  • Leverage: \(h_{ii}\) from the hat matrix \(\mathbf{H} = \mathbf{W}^{1/2}\mathbf{X}(\mathbf{X}^\top\mathbf{W}\mathbf{X})^{-1}\mathbf{X}^\top\mathbf{W}^{1/2}\)
  • Cook's distance: \(D_i = \frac{r_i^2 h_{ii}}{p(1-h_{ii})}\)

Applications

Example 1: Logistic Regression for Contraceptive Use

# Load and prepare data
# https://grodri.github.io/datasets/cuse.dat
cuse <- read.table("https://grodri.github.io/datasets/cuse.dat", header=TRUE)
cat("First few rows of dataset:\n")
print(head(cuse))

# Fit binomial GLM with logit link
model1 <- glm(cbind(using, notUsing) ~ age + education + wantsMore,
              family = binomial, data = cuse)

cat("\n=== Model Summary ===\n")
summary(model1)

cat("\n=== Confidence Intervals ===\n")
confint(model1)

cat("\n=== Odds Ratios with 95% CI ===\n")
exp_coef <- exp(coef(model1))
exp_ci <- exp(confint(model1))
cbind(OR = exp_coef, exp_ci)

cat("\n=== Model Comparison (Likelihood Ratio Test) ===\n")
# Reduced model without education
model_reduced <- glm(cbind(using, notUsing) ~ age + wantsMore,
                     family = binomial, data = cuse)
anova(model_reduced, model1, test = "Chisq")

# Diagnostic plots
par(mfrow = c(2, 2))
plot(model1, which = 1:4)

Result Interpretation:

Let's explicate how to interpret the parameter estimates in this GLM model. Specifically, as this is a bivariate outcome, the estimates are not correlations

glm(formula = cbind(using, notUsing) ~ age + education + wantsMore, 
    family = binomial, data = cuse)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -0.8082     0.1590  -5.083 3.71e-07 ***
age25-29       0.3894     0.1759   2.214  0.02681 *  
age30-39       0.9086     0.1646   5.519 3.40e-08 ***
age40-49       1.1892     0.2144   5.546 2.92e-08 ***
educationlow  -0.3250     0.1240  -2.620  0.00879 ** 
wantsMoreyes  -0.8330     0.1175  -7.091 1.33e-12 ***

In a Generalized Linear Model (GLM) with a binomial family and a logit link (the default for R), the coefficients represent log-odds ratios. Since our outcome is structured as

   cbind(using, notUsing),

we are modeling the probability of "using" (success) versus "not using" (failure).

That is, the model interpretation is in terms of Log-Odds. In this model, the relationship between the covariate predictors and the outcome is defined by the logit function

\(\text{logit}(p) = \ln\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1X_1 + \dots + \beta_kX_k. \)

The estimate Sign matters:

Positive Estimate: As the predictor increases (or if the category is present), the probability of "using" increases.
Negative Estimate: As the predictor increases, the probability of "using" decreases.

The estimate Magnitude is interpreted via the "Exponentiation Trick". Because humans don't think naturally in "log-odds" terms, we usually exponentiate the coefficients (\(e^\beta\)) to get the Odds Ratios (OR).

Variable Estimate (β) Odds Ratio (eβ) Interpretation
age40-49 1.1892 ≈ 3.28 Women aged 40-49 have 3.28 times the odds of using contraception compared to the reference age group (likely <25).
wantsMoreyes -0.8330 ≈ 0.43 Women who want more children have 57% lower odds (1 - 0.43) of using contraception than those who don't.


Specific Breakdown of the Results:

  • Age (Categorical): R has treated age as a factor. The baseline (reference) group is the youngest group (under 25).
As age increases (\(25-29 \ \to 30-39\ \to 40-49\)), the coefficients become increasingly positive (\(0.38 \to 0.90 \to 1.18\)).
Meaning: Older women in this dataset are significantly more likely to use contraception than the youngest group.
  • Education:
Estimate: -0.3250
Meaning: Being in the low education group is associated with a decrease in the log-odds of using contraception compared to the high group. Specifically, their odds are about 28\% lower (\(e^{-0.325} \approx 0.72\)).
  • Wants More Children:
Estimate: -0.8330
Meaning: This is a strong negative predictor. If a woman wants more children, the probability of her using contraception drops significantly.

We can have a quick "reality check" on the measuring units. Unlike a standard correlation (which is bounded between -1 and 1), these estimates can be any real number.

An estimate of 0 means the variable has no effect on the odds (Odds Ratio = 1).
The z-value tells us how many standard errors the estimate is from zero. Since all p-values are small (\(< 0.05\)), all these predictors are statistically significant.

Example 2: Poisson Regression for Count Data

# Example with built-in R dataset: AIDS cases in Belgium
# Load necessary libraries
if (!require("MASS")) install.packages("MASS")
library(MASS)

# Load AIDS dataset
data(Aids2)
cat("AIDS dataset structure:\n")
str(Aids2)

# Prepare data: count of AIDS cases by year and state
aids_counts <- aggregate(cbind(count = 1:nrow(Aids2)) ~ age + state,
                         data = Aids2, FUN = length)

# Fit Poisson regression
model2 <- glm(count ~ age + state, family = poisson, data = aids_counts)

cat("\n=== Poisson Model Summary ===\n")
summary(model2)

cat("\n=== Check for Overdispersion ===\n")
# Pearson chi-square statistic
pearson_chi2 <- sum(residuals(model2, type = "pearson")^2)
df_resid <- df.residual(model2)
dispersion <- pearson_chi2 / df_resid
cat("Pearson χ²:", pearson_chi2, "\n")
cat("Dispersion parameter:", dispersion, "\n")
cat("p-value for H0: φ=1:", pchisq(pearson_chi2, df_resid, lower.tail = FALSE), "\n")

# If overdispersed, fit quasipoisson model
if (dispersion > 1.5) {
  cat("\n=== Fitting Quasi-Poisson Model (accounting for overdispersion) ===\n")
  model2_qp <- glm(count ~ age + state, family = quasipoisson, data = aids_counts)
  summary(model2_qp)
}

# Predict expected counts
cat("\n=== Predictions for First 10 Observations ===\n")
predictions <- predict(model2, type = "response", se.fit = TRUE)
pred_df <- data.frame(
  Observed = aids_counts$count[1:10],
  Predicted = predictions$fit[1:10],
  SE = predictions$se.fit[1:10],
  Lower = predictions$fit[1:10] - 1.96 * predictions$se.fit[1:10],
  Upper = predictions$fit[1:10] + 1.96 * predictions$se.fit[1:10]
)
print(pred_df)

Example 3: Gamma Regression for Positive Continuous Data

# Example with insurance claims data
if (!require("insuranceData")) install.packages("insuranceData")
library(insuranceData)
data(dataCar)

cat("Car insurance claims dataset:\n")
str(dataCar)

# Filter for positive claim amounts
claims_positive <- subset(dataCar, claimcst0 > 0)

# Fit Gamma GLM with log link (common for monetary amounts)
model3 <- glm(claimcst0 ~ agecat + area + veh_age,
              family = Gamma(link = "log"),
              data = claims_positive)

cat("\n=== Gamma Model Summary ===\n")
summary(model3)

cat("\n=== Checking Gamma Model Assumptions ===\n")
# Check residuals
res_gamma <- residuals(model3, type = "deviance")
par(mfrow = c(1, 2))
hist(res_gamma, main = "Deviance Residuals", xlab = "Residuals")
qqnorm(res_gamma, main = "Q-Q Plot of Residuals")
qqline(res_gamma)

# Scale-location plot
fitted_values <- fitted(model3)
plot(fitted_values, sqrt(abs(res_gamma)),
     xlab = "Fitted Values", ylab = "√|Deviance Residuals|",
     main = "Scale-Location Plot")
lines(lowess(fitted_values, sqrt(abs(res_gamma))), col = "red")

Software Implementation in R

Note that `family` is a function (a "closure"), not a list or an object with a \(\$family\) component. In thecode, we're passing the family function itself (e.g., gaussian) to the \(run\_glm\_analysis()\) function, and later accessing \(family\$family\), which doesn’t exist—because gaussian is a function, not a fitted model object. In R, gaussian, binomial, poisson, etc., are functions that return family objects when called. We're passing the function, not the result of calling it. However, `glm()` internally calls `family()`, i.e., `gaussian()`, to get the actual family list object, which does have a \(\$family\) element. Instead of checking \(family\$family\), we extract the family name from the fitted model, not from the input argument. Specifically, after fitting the model, we use \(model\$family\$family\), which is a character string like "gaussian", "binomial", etc.

# Comprehensive GLM analysis function
run_glm_analysis <- function(formula, data, family, link = NULL) {
  # Fit the model
  if (!is.null(link)) {
    model <- glm(formula, data = data, family = family(link = link))
  } else {
    model <- glm(formula, data = data, family = family)
  }
  
  # Model summary
  cat("=== MODEL SUMMARY ===\n")
  print(summary(model))
  
  # Confidence intervals
  cat("\n=== 95% CONFIDENCE INTERVALS ===\n")
  print(confint(model))
  
  # Goodness-of-fit tests
  cat("\n=== GOODNESS-OF-FIT ===\n")
  cat("Null Deviance:", model$null.deviance, "on", model$df.null, "df\n")
  cat("Residual Deviance:", model$deviance, "on", model$df.residual, "df\n")
  cat("AIC:", AIC(model), "\n")
  
  # Check for overdispersion (for Poisson and binomial families)
  family_name <- model$family$family  # ✅ Extract from fitted model
  if (family_name %in% c("poisson", "binomial", "quasipoisson", "quasibinomial")) {
    cat("\n=== OVERDISPERSION CHECK ===\n")
    dispersion <- model$deviance / model$df.residual
    cat("Dispersion parameter:", round(dispersion, 4), "\n")
    if (abs(dispersion - 1) > 0.1) {
      direction <- ifelse(dispersion > 1, "over", "under")
      cat("Note: Significant", direction, "dispersion detected\n")
    }
  }
  
  return(model)
}

# Example usage with mtcars dataset
cat("\n\n=== EXAMPLE: GAUSSIAN GLM (equivalent to linear regression) ===\n")
data(mtcars)
model_gaussian <- run_glm_analysis(
  formula = mpg ~ wt + hp + cyl,
  data = mtcars,
  family = gaussian,
  link = "identity"
)

# Compare with lm() for verification
cat("\n=== COMPARISON WITH lm() ===\n")
model_lm <- lm(mpg ~ wt + hp + cyl, data = mtcars)
print(summary(model_lm))

Common GLM Families and Links in R

Family Default Link Alternative Links Typical Use
gaussian() identity log, inverse Continuous, symmetric data
binomial() logit probit, cauchit, log, cloglog Binary/count proportions
poisson() log identity, sqrt Count data
Gamma() inverse log, identity Positive continuous, right-skewed
inverse.gaussian() 1/μ² inverse, log, identity Positive continuous, highly skewed
quasi() identity User-defined Overdispersed data

Practical Considerations

  • Model Selection:
    • AIC (Akaike Information Criterion): \( \text{AIC} = -2\ell + 2p \)
    • BIC (Bayesian Information Criterion): \( \text{BIC} = -2\ell + p\log(n) \)
    • Cross-validation: Particularly useful for predictive performance.
  • Handling Overdispersion: For Poisson models: \( \text{Var}(Y) = \phi\mu \) where \(\phi > 1\) indicates overdispersion

Solutions:

    • Use quasi-Poisson model
    • Use negative binomial distribution
    • Include random effects.
  • Zero-Inflation: For count data with excess zeros, consider:
    • Zero-inflated Poisson (ZIP) model
    • Zero-inflated negative binomial (ZINB) model
    • Hurdle models.

Advanced Topics

Mixed Effects GLM (GLMM)

Extension incorporating random effects\[ g(E[Y|b]) = \mathbf{X}\beta + \mathbf{Z}b \] where \( b \sim N(0, \mathbf{G}) \).

Implementation in R with lme4::glmer():

library(lme4)

# generate random data
n <- 500
id <- seq(n)
day <- 1:20
mydata <- expand.grid(id = id, day = day)
set.seed(1)
trt <- sample(c("control", "treat"), size = n, replace = TRUE)
sex <- sample(c("female", "male"), size = n, replace = TRUE)
mydata$trt <- trt[mydata$id]
mydata$sex <- sex[mydata$id]
mydata <- mydata[order(mydata$id, mydata$day),]
rownames(mydata) <- NULL
head(mydata, n = 10)

mydata$trtsex <- interaction(mydata$trt, mydata$sex)
probs <- c(0.40, 0.85, 0.30, 0.50)
names(probs) <- levels(mydata$trtsex)
mydata$p <- probs[mydata$trtsex]

set.seed(3)
r_probs <- rnorm(n = n, mean = 0, sd = 0.03)
mydata$random_p <- r_probs[mydata$id]
mydata$p <- mydata$p + mydata$random_p

# use the probabilities to generate zeroes and ones with the binom() function.
mydata$y <- rbinom(n = nrow(mydata), size = 1, prob = mydata$p)

# using the sim data, inspect the first few records.

head(mydata[c("id", "day", "trt", "sex", "p", "y")])



# Example with binary outcome and random intercept
m <- glmer(y ~ trt * sex + (1|id), data = mydata, family = binomial)
summary(m, corr = FALSE)

Bayesian GLM

Using Markov Chain Monte Carlo (MCMC) methods.

library(rstanarm)
# Bayesian logistic regression
bayes_model <- stan_glm(y ~ day +  trt + sex + p, # predictors,
                        family = binomial,
                        data = mydata,
                        prior = normal(0, 2.5),
                        prior_intercept = normal(0, 5))

print(bayes_model, digits = 2)

# Coefficient Plot (Forest Plot)

plot(bayes_model, "areas")  # density + intervals
# OR more customizable:
library(bayesplot)
mcmc_intervals(bayes_model, prob = 0.95) +
  ggplot2::labs(title = "Posterior Distributions of Coefficients")

# Posterior Predictive Checks (PPC)
# Check if the model can reproduce data like the observed
pp_check(bayes_model, plotfun = "hist", nreps = 100) +
  ggplot2::labs(title = "Posterior Predictive Check (Histogram of y_rep vs y)")

Another Bayesian Experiment.

library(rstanarm)
set.seed(123)

# Simulate data — explicitly make trt and sex into factors
n <- 500
mydata <- data.frame(
  day = sample(1:30, n, replace = TRUE),
  trt = factor(sample(c("A", "B"), n, replace = TRUE), levels = c("A", "B")),
  sex = factor(sample(c("F", "M"), n, replace = TRUE), levels = c("F", "M")),
  p   = rnorm(n, mean = 50, sd = 10)
)

# True log-odds
logit_p <- -1 - 0.02 * mydata$day + 
           0.6 * (as.numeric(mydata$trt) - 1) +   # A=0, B=1
           0.3 * (as.numeric(mydata$sex) - 1) +    # F=0, M=1
           0.04 * mydata$p

mydata$y <- rbinom(n, size = 1, prob = plogis(logit_p))

# Fit model
bayes_model <- stan_glm(
  y ~ day + trt + sex + p,
  family = binomial,
  data = mydata,
  prior = normal(0, 2.5),
  prior_intercept = normal(0, 5),
  seed = 42,
  refresh = 0
)

# Create New Data
# Confirm variables are factors
str(mydata[c("trt", "sex")])

# Create prediction grid
newdata <- expand.grid(
  day = seq(min(mydata$day), max(mydata$day), length.out = 30),
  trt = levels(mydata$trt)[1],      # e.g., "A"
  sex = levels(mydata$sex)[1],      # e.g., "F"
  p   = median(mydata$p)
)

# Verify it worked
stopifnot(nrow(newdata) > 0)
str(newdata)

# Generate Predictions from the Bayesian Posterior Probability (after fitting the Bayesian Model):
# Posterior predicted probabilities
post_pred <- posterior_linpred(bayes_model, newdata = newdata, transform = TRUE)

# Summarize
pred_summary <- data.frame(
  day = newdata$day,
  prob = apply(post_pred, 2, median),
  lower = apply(post_pred, 2, quantile, 0.025),
  upper = apply(post_pred, 2, quantile, 0.975)
)

# Plot
library(ggplot2)
ggplot(pred_summary, aes(x = day, y = prob)) +
  geom_line(color = "blue") +
  geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.2, fill = "blue") +
  labs(
    title = "Predicted Probability of Outcome by Day",
    subtitle = "Holding treatment = A, sex = F, and p = median",
    y = "P(y = 1)",
    x = "Day"
  ) +
  theme_minimal()

Gamma GLM vs. Beta Regression

The Gamma GLM and Beta Regression models are designed for very different types of "continuous" data. The fundamental difference lies in the domain (the range of possible values) and the variance structure of the data they expect.

The Gamma Model (glm)

The Gamma distribution is used for strictly positive, continuous data \((0, \infty)\). It is appropriate for data that is "skewed" to the right, where the variance increases as the mean increases. If a variable, like qol_score is a raw score (e.g., 0 to 100), the link, link = "log", ensures the predicted values remain positive.

In a Gamma GLM with a log link, we model the mean \(\mu\) as

\(\ln(\mu_i) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_k X_k.\)

The probability density function (PDF) for the Gamma distribution is

\(f(y; \alpha, \beta) = \frac{\beta^\alpha y^{\alpha-1} e^{-\beta y}}{\Gamma(\alpha)} \quad \text{for } y > 0.\)

The Beta Model (betareg)

The Beta distribution, is used specifically for rates, proportions, or scores constrained between 0 and 1, \((0, 1)\). Unlike the Gamma model, which can go off to infinity, the Beta distribution is "boxed in" this interval. For instance, if qol_scaled is normalized, e.g., by dividing QoL score by its maximum value, it's values are forced into \((0, 1)\) interval. Beta distribution shape is veryflexible. It can be U-shaped, J-shaped, or symmetric, making it perfect for "ceiling" or "floor" effects in Quality of Life scores.

Beta regression typically uses a logit link by default to keep predictions between 0 and 1\[\ln\left(\frac{\mu_i}{1-\mu_i}\right) = \beta_0 + \beta_1 X_1 + \dots + \beta_k X_k.\]

The PDF for the Beta distribution (parameterized by mean \(\mu\) and precision \(\phi\)) is\[f(y; \mu, \phi) = \frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)} y^{\mu\phi-1} (1-y)^{(1-\mu)\phi-1} \quad \text{for } y \in (0, 1).\]

Key Differences between Gamma GLM and Beta Regression

Feature Gamma Model (model4) Beta Model (model_beta)
Outcome Range \((0, \infty)\) (Positive numbers) \((0, 1)\) (Proportions/Scaled)
Typical Use Costs, rainfall, raw test scores Percentages, scaled QoL indices
Interpretation Log-link: Coefficients are multiplicative Logit-link: Coefficients are odds ratios
Boundaries Can handle very large values Naturally handles "ceiling effects"

If the outcome dependent variable (DV) qol_score is a raw sum that could theoretically be much higher, stick with Gamma. But when qol_score is a bounded instrument (like a visual analog scale from 0–1) with many observations scoring near the very top or bottom, Beta may be superior because it understands the "walls" of the scale.

library(readr)
 
 # REMOTE URL
 url_wide <- "https://socr.umich.edu/docs/uploads/2026/SOCR_CRDS_CompAnalysis_CaseStudy_wide.csv"
 url_long <- "https://socr.umich.edu/docs/uploads/2026/SOCR_CRDS_CompAnalysis_CaseStudy_long.csv"
 
 # Import: Reading the Wide data
 df_wide <- read_csv(url_wide, col_types = cols(
     patient_id      = col_factor(),    # Prevents math on IDs
     age             = col_double(),
     gender          = col_character(),
     treatment       = col_character(),
     smoking_history = col_character(),
     baseline_fvc    = col_double(),
     fvc_0           = col_double(),
     fvc_6           = col_double(),
     fvc_12          = col_double(),
     qol_score       = col_double(),
     walk_dist       = col_double(),
     success         = col_integer(),   # 0 or 1 is best as integer
     hospital_visits = col_integer(),
     physician_notes = col_character()
 ))
                                                                                   
 # Import: Reading the Long data
 df_long <- read_csv(url_long, col_types = cols(
     patient_id = col_factor(),  
     age = col_double(),
     gender = col_character(),
     timepoint = col_factor(levels = c("Month 0", "Month 6", "Month 12"))
 ))

df_wide$qol_scaled <- df_wide$qol_score / (max(df_wide$qol_score) + 0.0001) 
 
 # Use the betareg package
 library(betareg)

model4 <- glm(qol_score ~ treatment + gender + age + baseline_fvc + success, family = Gamma(link = "log"), data = df_wide)
summary(model4)
model_beta <- betareg(qol_scaled ~ treatment + gender + baseline_fvc + success + age, data = df_wide)
summary(model_beta )

Problems

1. Conceptual Exercises:

  a) Derive the score equations for a Poisson GLM with log link
  b) Show that the binomial distribution with logit link is the canonical link
  c) Prove that the deviance for a normal GLM equals the residual sum of squares

2. Applied Problems:

  a) Analyze the Consumer Price Index data using appropriate GLM
  b) Model the Monetary Base data considering temporal autocorrelation
  c) Using the iris dataset, build a multinomial logistic regression to classify species

3. Practice Simulation Study:

   # Simulate data from a logistic regression model
   set.seed(123)
   n <- 1000
   x1 <- rnorm(n)
   x2 <- rnorm(n)
   beta <- c(0.5, 1, -0.5)
   linear_predictor <- beta[1] + beta[2]*x1 + beta[3]*x2
   probabilities <- plogis(linear_predictor)
   y <- rbinom(n, size = 1, prob = probabilities)
   
   # Fit model and evaluate performance
   sim_data <- data.frame(y = y, x1 = x1, x2 = x2)
   model_sim <- glm(y ~ x1 + x2, family = binomial, data = sim_data)
   
   # Calculate bias and MSE
   beta_hat <- coef(model_sim)
   bias <- beta_hat - beta
   mse <- mean((beta_hat - beta)^2)
   
   cat("True parameters:", beta, "\n")
   cat("Estimated parameters:", beta_hat, "\n")
   cat("Bias:", bias, "\n")
   cat("MSE:", mse, "\n")
   

References

1. McCullagh, P., & Nelder, J. A. (1989). *Generalized Linear Models* (2nd ed.). Chapman and Hall.

2. Dobson, A. J., & Barnett, A. G. (2018). *An Introduction to Generalized Linear Models* (4th ed.). CRC Press.

3. Agresti, A. (2015). *Foundations of Linear and Generalized Linear Models*. Wiley.

4. Fahrmeir, L., Kneib, T., Lang, S., & Marx, B. (2013). *Regression: Models, Methods and Applications*. Springer.

5. Wood, S. N. (2017). *Generalized Additive Models: An Introduction with R* (2nd ed.). Chapman and Hall/CRC.

Online Resources




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif