# Difference between revisions of "AP Statistics Curriculum 2007 GLM MultLin"

Line 1: | Line 1: | ||

==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Multiple Linear Regression == | ==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Multiple Linear Regression == | ||

− | In the previous sections we saw how to study the relations in bivariate designs. Now we extend that to any finite number of | + | In the [[AP_Statistics_Curriculum_2007#Chapter_X:_Correlation_and_Regression | previous sections]] we saw how to study the relations in bivariate designs. Now we extend that to any finite number of variables (multivariate case). |

=== Multiple Linear Regression === | === Multiple Linear Regression === | ||

Line 16: | Line 16: | ||

is still one of '''linear''' regression, that is, linear in <math>x</math> and <math>x^2</math> respectively, even though the graph on <math>x</math> by itself is not a straight line. | is still one of '''linear''' regression, that is, linear in <math>x</math> and <math>x^2</math> respectively, even though the graph on <math>x</math> by itself is not a straight line. | ||

− | === | + | ===Parameter Estimation in Multilinear Regression=== |

− | + | A multilinear regression with ''p'' coefficients and the regression intercept β<sub>0</sub> and ''n'' data points (sample size), with <math>n\geq (p+1) </math> allows construction of the following vectors and matrix with associated standard errors: | |

− | + | :<math> \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1p} \\ 1 & x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_{n1} & x_{n2} & \dots & x_{np} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n \end{bmatrix} </math> | |

− | + | or, in '''vector-matrix notation''' | |

− | |||

− | + | :<math> \ y = \mathbf{X}\cdot\beta + \varepsilon.\, </math> | |

+ | Each data point can be given as <math>(\vec x_i, y_i)</math>, <math>i=1,2,\dots,n.</math>. For n = p, standard errors of the parameter estimates could not be calculated. For n less than p, parameters could not be calculated. | ||

− | === | + | * '''Point Estimates''': The estimated values of the parameters <math>\beta_i</math> are given as |

− | + | :<math>\widehat{\beta} </math><math>=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T {\vec y}</math> | |

+ | |||

+ | * '''Residuals''': The residuals, representing the difference between the observations and the model's predictions, are required to analyse the regression and are given by: | ||

+ | |||

+ | :<math>\hat\vec\varepsilon = \vec y - \mathbf{X} \hat\beta\,</math> | ||

+ | |||

+ | The standard deviation, <math>\hat \sigma </math> for the model is determined from | ||

+ | |||

+ | :<math> {\hat \sigma = \sqrt{ \frac {\hat\vec\varepsilon^T \hat\vec\varepsilon} {n-p-1}} = \sqrt {\frac{{ \vec y^T \vec y - \hat\vec\beta^T \mathbf{X}^T \vec y}}{{n - p - 1}}} } </math> | ||

+ | |||

+ | The variance in the errors is Chi-square distributed: | ||

+ | :<math>\hat\sigma^2 \sim \frac { \chi_{n-p-1}^2 \ \sigma^2 } {n-p-1}</math> | ||

+ | |||

+ | * '''Interval Estimates''': The <math>100(1-\alpha)% </math> [[AP_Statistics_Curriculum_2007#Chapter_VII:_Point_and_Interval_Estimates | confidence interval]] for the parameter, <math>\beta_i </math>, is computed as follows: | ||

+ | |||

+ | :<math> {\widehat \beta_i \pm t_{\frac{\alpha }{2},n - p - 1} \hat \sigma \sqrt {(\mathbf{X}^T \mathbf{X})_{ii}^{ - 1} } } </math>, | ||

+ | |||

+ | where ''t'' follows the [[AP_Statistics_Curriculum_2007_StudentsT | Student's t-distribution]] with <math>n-p-1</math> degrees of freedom and <math> (\mathbf{X}^T \mathbf{X})_{ii}^{ - 1}</math> denotes the value located in the <math>i^{th}</math> row and column of the matrix. | ||

+ | |||

+ | The '''regression sum of squares''' (or sum of squared residuals) ''SSR'' (also commonly called ''RSS'') is given by: | ||

+ | |||

+ | :<math> {\mathit{SSR} = \sum {\left( {\hat y_i - \bar y} \right)^2 } = \hat\beta^T \mathbf{X}^T \vec y - \frac{1}{n}\left( { \vec y^T \vec u \vec u^T \vec y} \right)} </math>, | ||

+ | |||

+ | where <math> \bar y = \frac{1}{n} \sum y_i</math> and <math> \vec u </math> is an ''n'' by 1 unit vector (i.e. each element is 1). Note that the terms <math>y^T u</math> and <math>u^T y</math> are both equivalent to <math>\sum y_i</math>, and so the term <math>\frac{1}{n} y^T u u^T y</math> is equivalent to <math>\frac{1}{n}\left(\sum y_i\right)^2</math>. | ||

+ | |||

+ | The '''error''' (or explained)''' sum of squares''' (''ESS'') is given by: | ||

+ | |||

+ | :<math> {\mathit{ESS} = \sum {\left( {y_i - \hat y_i } \right)^2 } = \vec y^T \vec y - \hat\beta^T \mathbf{X}^T \vec y}. </math> | ||

+ | |||

+ | The '''total sum of squares''' ('''TSS''') is given by | ||

+ | |||

+ | :<math> {\mathit{TSS} = \sum {\left( {y_i - \bar y} \right)^2 } = \vec y^T \vec y - \frac{1}{n}\left( { \vec y^T \vec u \vec u^T \vec y} \right) = \mathit{SSR}+ \mathit{ESS}}. </math> | ||

===Examples=== | ===Examples=== | ||

− | + | In the [[AP_Statistics_Curriculum_2007_GLM_Regress | simple linear regression case]], we were able to compute by hand some (simple) examples). Such calculations are much more involved in the multilinear regression situations. Thus we demonstrate multilinear regression only using the [http://socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Multiple Regression Analysis Appet]. | |

+ | |||

+ | Use the SOCR California Earthquake dataset to investigate whether Earthquake magnitude (dependent variable) can be predicted by knowing the longitude, latitude, distance and depth of the quake. Clearly, we do not expect these predictors to have a strong effect on the earthquake magnitude, so we expect the coefficient parameters not to be significantly distinct from zero (null hypothesis). SOCR Multilinear regression applet reports this model: | ||

+ | |||

+ | : <math>Magnitude = \beta_o + \beta_1\times Close+ \beta_2\times Depth+ \beta_3\times Longitude+ \beta_4\times Latitude + \varepsilon.</math> | ||

+ | |||

+ | : <math>Magnitude = 2.320 + 0.001\times Close -0.003\times Depth -0.035\times Longitude -0.028\times Latitude + \varepsilon.</math> | ||

− | + | <center>[[Image:SOCR_EBook_Dinov_GLM_MLR_021808_Fig1.jpg|500px]] | |

− | + | [[Image:SOCR_EBook_Dinov_GLM_MLR_021808_Fig2.jpg|500px]]</center> | |

− | |||

− | |||

− | |||

<hr> | <hr> | ||

===References=== | ===References=== | ||

− | |||

<hr> | <hr> |

## Revision as of 21:20, 18 February 2008

## Contents

## General Advance-Placement (AP) Statistics Curriculum - Multiple Linear Regression

In the previous sections we saw how to study the relations in bivariate designs. Now we extend that to any finite number of variables (multivariate case).

### Multiple Linear Regression

We are interested in determining the **linear regression**, as a model, of the relationship between one **dependent** variable *Y* and many **independent** variables *X*_{i}, *i* = 1, ..., *p*. The multilinear regression model can be written as

\[Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots +\beta_p X_p + \varepsilon\], where \(\varepsilon\) is the error term.

The coefficient \(\beta_0\) is the intercept ("constant" term) and \(\beta_i\)s are the respective parameters of the * p* independent variables. There are *p+1* parameters to be estimated in the multilinear regression.

- Multilinear vs. non-linear regression: This multilinear regression method is "linear" because the relation of the response (the dependent variable \(Y\)) to the independent variables is assumed to be a linear function of the parameters \(\beta_i\). Note that multilinear regression is a linear modeling technique
**not**because is that the graph of \(Y = \beta_{0}+\beta x \) is a straight line**nor**because \(Y\) is a linear function of the*X*variables. But the "linear" terms refers to the fact that \(Y\) can be considered a linear function of the parameters ( \(\beta_i\)), even though it is not a linear function of \(X\). Thus, any model like

\[Y = \beta_o + \beta_1 x + \beta_2 x^2 + \varepsilon\]

is still one of **linear** regression, that is, linear in \(x\) and \(x^2\) respectively, even though the graph on \(x\) by itself is not a straight line.

### Parameter Estimation in Multilinear Regression

A multilinear regression with *p* coefficients and the regression intercept β_{0} and *n* data points (sample size), with \(n\geq (p+1) \) allows construction of the following vectors and matrix with associated standard errors:

\[ \begin{bmatrix} y_{1} \\ y_{2} \\ \vdots \\ y_{n} \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & x_{12} & \dots & x_{1p} \\ 1 & x_{21} & x_{22} & \dots & x_{2p} \\ \vdots & \vdots & \vdots & & \vdots \\ 1 & x_{n1} & x_{n2} & \dots & x_{np} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_p \end{bmatrix} + \begin{bmatrix} \varepsilon_1\\ \varepsilon_2\\ \vdots\\ \varepsilon_n \end{bmatrix} \]

or, in **vector-matrix notation**

\[ \ y = \mathbf{X}\cdot\beta + \varepsilon.\, \] Each data point can be given as \((\vec x_i, y_i)\), \(i=1,2,\dots,n.\). For n = p, standard errors of the parameter estimates could not be calculated. For n less than p, parameters could not be calculated.

**Point Estimates**: The estimated values of the parameters \(\beta_i\) are given as

\[\widehat{\beta} \]\(=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T {\vec y}\)

**Residuals**: The residuals, representing the difference between the observations and the model's predictions, are required to analyse the regression and are given by:

\[\hat\vec\varepsilon = \vec y - \mathbf{X} \hat\beta\,\]

The standard deviation, \(\hat \sigma \) for the model is determined from

\[ {\hat \sigma = \sqrt{ \frac {\hat\vec\varepsilon^T \hat\vec\varepsilon} {n-p-1}} = \sqrt {\frac{{ \vec y^T \vec y - \hat\vec\beta^T \mathbf{X}^T \vec y}}[[:Template:N - p - 1]]} } \]

The variance in the errors is Chi-square distributed: \[\hat\sigma^2 \sim \frac { \chi_{n-p-1}^2 \ \sigma^2 } {n-p-1}\]

**Interval Estimates**: The \(100(1-\alpha)% \) confidence interval for the parameter, \(\beta_i \), is computed as follows:

\[ {\widehat \beta_i \pm t_{\frac{\alpha }{2},n - p - 1} \hat \sigma \sqrt {(\mathbf{X}^T \mathbf{X})_{ii}^{ - 1} } } \],

where *t* follows the Student's t-distribution with \(n-p-1\) degrees of freedom and \( (\mathbf{X}^T \mathbf{X})_{ii}^{ - 1}\) denotes the value located in the \(i^{th}\) row and column of the matrix.

The **regression sum of squares** (or sum of squared residuals) *SSR* (also commonly called *RSS*) is given by:

\[ {\mathit{SSR} = \sum {\left( {\hat y_i - \bar y} \right)^2 } = \hat\beta^T \mathbf{X}^T \vec y - \frac{1}{n}\left( { \vec y^T \vec u \vec u^T \vec y} \right)} \],

where \( \bar y = \frac{1}{n} \sum y_i\) and \( \vec u \) is an *n* by 1 unit vector (i.e. each element is 1). Note that the terms \(y^T u\) and \(u^T y\) are both equivalent to \(\sum y_i\), and so the term \(\frac{1}{n} y^T u u^T y\) is equivalent to \(\frac{1}{n}\left(\sum y_i\right)^2\).

The **error** (or explained)** sum of squares** (*ESS*) is given by:

\[ {\mathit{ESS} = \sum {\left( {y_i - \hat y_i } \right)^2 } = \vec y^T \vec y - \hat\beta^T \mathbf{X}^T \vec y}. \]

The **total sum of squares** (**TSS**) is given by

\[ {\mathit{TSS} = \sum {\left( {y_i - \bar y} \right)^2 } = \vec y^T \vec y - \frac{1}{n}\left( { \vec y^T \vec u \vec u^T \vec y} \right) = \mathit{SSR}+ \mathit{ESS}}. \]

### Examples

In the simple linear regression case, we were able to compute by hand some (simple) examples). Such calculations are much more involved in the multilinear regression situations. Thus we demonstrate multilinear regression only using the SOCR Multiple Regression Analysis Appet.

Use the SOCR California Earthquake dataset to investigate whether Earthquake magnitude (dependent variable) can be predicted by knowing the longitude, latitude, distance and depth of the quake. Clearly, we do not expect these predictors to have a strong effect on the earthquake magnitude, so we expect the coefficient parameters not to be significantly distinct from zero (null hypothesis). SOCR Multilinear regression applet reports this model:

\[Magnitude = \beta_o + \beta_1\times Close+ \beta_2\times Depth+ \beta_3\times Longitude+ \beta_4\times Latitude + \varepsilon.\]

\[Magnitude = 2.320 + 0.001\times Close -0.003\times Depth -0.035\times Longitude -0.028\times Latitude + \varepsilon.\]

### References

- SOCR Home page: http://www.socr.ucla.edu

Translate this page: