# SMHS GLM

## Scientific Methods for Health Sciences - Generalized Linear Modeling (GLM)

### Overview

Generalized Linear Modeling (GLM) is a flexible generalization of ordinary linear regression, which allows for response variables that have error distribution models other than a normal distribution. It generalizes linear regression by allowing the linear model to be related to the response variable via a link function and allowing the magnitude of the variance of each measurement to be function of its predicted value. GLM is a way of unifying statistical models like linear regression logistic regression and Poisson regression. Methods like iteratively reweighted least squares method for maximum likelihood estimation, Bayesian approaches and least squares fitted to variance stabilized responses are proposed for GLM. In this lecture, we are going to present a general introduction to GLM including the assumptions, model applied and illustrate the application of GLM with examples presented.

### Motivation

We have discussed about linear regression, which deal with the linear relationship between response and the predictors. What if the response variable does not follow a normal distribution? For example, a model that predicts the probability of making a yes/no choice is not suitable as a linear-response model given that the probabilities are bounded on both ends. Or a study that aims to predict each decrease in 10 degrees Fahrenheit leads to 1000 fewer people going on vacation in the beach is unlikely to generalize well over small nor large beaches. In many cases, the linear regression model won’t apply and GLM has to be applied for it allows the response variables that have arbitrary distribution other than normal distribution to vary linearly with the predicted values.

### Theory

• 1) GLM components: (1) a probability distribution from the exponential family; (2) a linear predictor $\eta=X\beta$ (3) a link function $g$ such that $E[Y]=\mu=g^{-1}(\eta)$ The dependent variable $Y$ in GLM is assumed to be generated from a particular distribution in the exponential family, a large range of probability distributions include normal, binomial, Poisson and gamma distribution. The mean, $\mu$, of the distribution depends on the independent variables $X:E[Y]=\mu=g^{-1} (X\beta),$ where $E[Y]$ is the expected value of $Y, X\beta$ is the linear predictor, a linear combination of unknown parameters, $\beta$, $g$ is the linked function. The variance is typically a function $V$ of the mean: $Var(Y)=V(\mu)=V(g^{-1} (X\beta)).$ It is convenient if $V$ follows from the exponential family distribution but it may simply be that the variance is a function of the predicted value. The unknown parameters, $\beta$, are typically estimated with maximum likelihood, maximum quasi-likelihood, or Bayesian techniques.
• 2)Probability distribution: the over-dispersed exponential family of distribution is a generalization of exponential family and exponential dispersion model of distributions and includes those by $\theta$ and $\tau$, whose density function can be expressed as $f_{Y} (y│\theta,\tau)=h(y,\tau)exp\left(\frac{b(theta)^{T} T(y)-A(\theta)}{(d{\tau})}\right)$ where $\tau$ is the dispersion parameter and is usually related to the variance of the distribution. For scalar $Y$ and $\theta$, this reduces to $f_{Y} (y│\theta,\tau)=h(y,\tau)exp\left(\frac{b(theta)^{T} T(y)-A(\theta)}{(d{\tau})}\right)$, $\theta$ is related to the mean of the distribution. If $b(\theta)$ is the identity function then the distribution is said to be in canonical form (natural form).
• 3) Linear predictor and link function: the quantity which incorporate the information about the independent variables into the model. $\eta=X\beta$ relates the expected value of the data (predictor). $\eta$ is expressed as linear combination of unknown parameters $\beta$. The coefficients of the linear combination are represented as the matrix of independent variables $X$. The link function provides the relationship between the linear predictor and the mean of the distribution function. The following table lists some commonly used exponential family distribution and data that typically used for along with the canonical link functions and their inverses.

 Distribution Support of distribution Typical uses Link name Link function Mean function Normal $(-\infty,+\infty)$ real Linear response data Identity $X\beta=\mu$ $\mu =X\beta$ Exponential $(0,+\infty)$ real Exponential response data, scale parameters Inverse $X\beta=-\mu^{-1}$ $\mu =(X\beta)^{-1}$ Gamma $(0,+\infty)$ real Exponential response data, scale parameters Inverse $X\beta=-\mu^{-1}$ $\mu =(X\beta)^{-1}$ Inverse Gaussian $(0,+\infty)$ real Inverse $X\beta=-\mu^{-2}$ $\mu =(X\beta)^{-\frac{1} {2}}$ Poisson integer $(0,\infty)$ Count of occurrences in fixed time/space log $X\beta=1n_{\mu}$ $\mu = exp(X\beta)$ Bernoulli integer [0,1] Outcome of single yes/no occurrence logit $X\beta=ln(\frac{\mu}{1-\mu})$ $\mu =\frac{exp(X\beta)}{1+exp(X\beta)}$ Binomial integer [0,N] Count of # of ‘yes’ in N yes/no occurrences logit $X\beta=ln(\frac{\mu}{1-\mu})$ $\mu =\frac{exp(X\beta)}{1+exp(X\beta)}$ Categorical integer [0,K],K-vector of integers [0,1] Outcome of single K-way occurrence logit $X\beta=ln(\frac{\mu}{1-\mu})$ $\mu =\frac{exp(X\beta)}{1+exp(X\beta)}$ Multinomial K-vector of integers: [0,1] Count of occurrences of different types (1,…,K) out of N total K-way occurrences logit $X\beta=ln(\frac{\mu}{1-\mu})$ $\mu =\frac {exp(X\beta)}{1+exp(X\beta)}$

In the cases of the exponential and gamma distributions, the domain of the canonical link function is not the same as the permitted range of the mean. Particularly, the linear predictor may be negative, which would give an impossible negative mean. When maximizing the likelihood, precautions must be taken to avoid this. An alternative is to use a non-canonical link function.

Note also that in the case of the Bernoulli, binomial, categorical and multinomial distributions, the support of the distributions is not the same type of data as the parameter being predicted. In all of these cases, the predicted parameter is one or more probabilities, i.e. real numbers in the range $[0,1]$. The resulting model is known as logistic regression (or multinomial logistic regression in the case that K-way rather than binary values are being predicted).

For the Bernoulli and binomial distributions, the parameter is a single probability, indicating the likelihood of occurrence of a single event. The Bernoulli still satisfies the basic condition of the generalized linear model in that, even though a single outcome will always be either $0$ or $1$, the expected value will nonetheless be a real-valued probability, i.e. the probability of occurrence of a "yes" (or $1$) outcome. Similarly, in a binomial distribution, the expected value is Np, i.e. the expected proportion of "yes" outcomes will be the probability to be predicted.

For categorical and multinomial distributions, the parameter to be predicted is a $K$-vector of probabilities, with the further restriction that all probabilities must add up to $1$. Each probability indicates the likelihood of occurrence of one of the $K$ possible values. For the multinomial distribution, and for the vector form of the categorical distribution, the expected values of the elements of the vector can be related to the predicted probabilities similarly to the binomial and Bernoulli distributions.

• 4) Fitting the model:
• Maximum likelihood: can be found using iteratively reweighted least squares algorithm with $\beta^{t+1}=\beta^{t}+J^{-1} (\beta^{(t)}) u(\beta^{(t)})$, where $J(\beta^{(t)})$ is the observed information matrix (the negative of the Hessian matrix) and $u(\beta^{(t)})$ s the score function. With Fisher’s scoring method: $\beta^{(t+1)}=\beta^{(t)} +I^{-1} (\beta^{(t)})u(\beta^{(t)})$, where $where I^{-1} (\beta^{(t)})$is the Fisher information matrix.
• Bayesian method: to approximate the posterior distribution, usually using Laplace approximation $(\int_{a}^{b}e^{Mf(x)}dx)$ or Markov Chain Monte Carlo method such as Gibbs sampling.
• 5) Advantages of GLM: (1) can perform data analysis within and between subjects without the need to average the data itself; (2) allows you to counterbalance random stimuli orders; (3) allows you to exclude segments of runs with artifacts; (4) can perform more sophisticated analysis (e.g., two factor ANOVA with interactions); (5) easier to work with.

### Applications

This article proposed an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelihood.

This article proposed a conceptually very simple but general algorithm for the estimation of the fixed effects, random effects, and components of dispersion in generalized linear models with random effects. Conditions are described under which the algorithm yields approximate maximum likelihood or quasi-maximum likelihood estimates of the fixed effects and dispersion components, and approximate empirical Bayes estimates of the random effects. The algorithm is applied to two data sets to illustrate the estimation of components of dispersion and the modelling of over-dispersion.

### Software

GLM in R:

glm(formula, family=familytype(link=linkfunction),data= ):
# list of family and default link functions:

 Family Default Like Function Binomial (link=’logit’) Gaussian (link=identity) Gamma (link=inverse) Inverse gamma (link=${1}/mu^{2}$) Poisson (link=’log’) Quasi (link=’identity’,variance=’constant’) Quasibinomial (link=’logit’) Quasipoisson (link=’log’)
# Logistic regression is useful when predict a binary outcome from a set
of continuous predictor   variables.
# F as a binary factor and x1-x3 are continuous predictors
fit <- glm(F~x1+x2+x3, data=mydata, family=binomial())
summary(fit)  # display the results
confint(fit)  # 95% CI for the coefficients
predict(fit,type=’response’)  # predicted values
residuals(fit,type=’deviance’)  # residuals

See these Logistic Regression R Examples.
## Use Poisson regression when predicting an outcome variable representing
counts from a set of continuous predictor variables
# Y is a count and x1-x3 are continuous predictors
fit <- glm(Y~x1+x2+x3, data=mydata, family=poisson())
summary(fit)   # display results


### TO illustrate the GLM with the following example
cuse

 Age Education Wants More Not Using Using 1 <25 low yes 53 6 2 <25 low no 10 4 3 <25 high yes 212 52 4 <25 high no 50 10 5 25-29 low yes 60 14 6 25-29 low no 19 10 7 25-29 high yes 155 54 8 25-29 high no 65 27 9 30-39 low yes 112 33 10 30-39 low no 77 80 11 30-39 high yes 118 46 12 30-39 high no 68 78 13 40-49 low yes 35 6 14 40-49 low no 46 48 15 40-49 high yes 8 8 16 40-49 high no 12 31
attach(cuse)
lrfit <- glm( cbind(using, notUsing) ~ + age + education + wantsMore ,
family = binomial)     #  run the GLM on age, education, wantsMore
lrfit

Call: glm(formula = cbind(using, notUsing) ~ +age + education + wantsMore,
family = binomial)

Coefficients:
(Intercept)      age25-29      age30-39      age40-49  educationlow
-0.8082        0.3894        0.9086        1.1892       -0.3250

wantsMoreyes
-0.8330

Degrees of Freedom: 15 Total (i.e. Null);  10 Residual
Null Deviance:	    165.8
Residual Deviance: 29.92 	AIC: 113.4

### Now relevel to change the base category and define our own indicator variables
with high education and women who want no more children:

noMore <- wantsMore == "no"
hiEduc <- education == "high"
glm( cbind(using,notUsing) ~ age + hiEduc + noMore, family=binomial)
Call:  glm(formula = cbind(using, notUsing) ~ age + hiEduc + noMore,
family = binomial)

Coefficients:
(Intercept)     age25-29     age30-39     age40-49   hiEducTRUE   noMoreTRUE
-1.9662       0.3894       0.9086       1.1892       0.3250       0.8330

Degrees of Freedom: 15 Total (i.e. Null);  10 Residual
Null Deviance:	    165.8
Residual Deviance: 29.92 	AIC: 113.4

1-pchisq(29.92,10)
0.0008828339
## we see that the residual deviance of 29.92 on 10 degree of freedom is highly significant,
a better model could be introducing an interaction term of age and desire for no more children:
lrfit <- glm( cbind(using,notUsing) ~ age * noMore + hiEduc , family=binomial)
lrfit

Call:  glm(formula = cbind(using, notUsing) ~ age * noMore + hiEduc,
family = binomial)

Coefficients:
(Intercept)             age25-29             age30-39
-1.80317              0.39460              0.54666
age40-49           noMoreTRUE           hiEducTRUE
0.57952              0.06622              0.34065
age25-29:noMoreTRUE  age30-39:noMoreTRUE  age40-49:noMoreTRUE
0.25918              1.11266              1.36167

Degrees of Freedom: 15 Total (i.e. Null);  7 Residual
Null Deviance:	    165.8
Residual Deviance: 12.63 	AIC: 102.1

## now the model’s deviance of 12.63 on 7 degree of freedom is not significant at
the conventional five per cent level, so there is no evidence against this model.


summary(lrfit)
Call:
glm(formula = cbind(using, notUsing) ~ age * noMore + hiEduc,
family = binomial)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.30027  -0.66163  -0.03286   0.81945   1.73851

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)         -1.80317    0.18018 -10.008  < 2e-16 ***
age25-29             0.39460    0.20145   1.959  0.05013 .
age30-39             0.54666    0.19842   2.755  0.00587 **
age40-49             0.57952    0.34742   1.668  0.09530 .
noMoreTRUE           0.06622    0.33071   0.200  0.84130
hiEducTRUE           0.34065    0.12577   2.709  0.00676 **
age25-29:noMoreTRUE  0.25918    0.40975   0.633  0.52704
age30-39:noMoreTRUE  1.11266    0.37404   2.975  0.00293 **
age40-49:noMoreTRUE  1.36167    0.48433   2.811  0.00493 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 165.77  on 15  degrees of freedom
Residual deviance:  12.63  on  7  degrees of freedom
AIC: 102.14

Number of Fisher Scoring iterations: 4


### Problems

Repeat the GLM analyses above using the following SOCR datasets