Difference between revisions of "AP Statistics Curriculum 2007 GLM Regress"

From SOCR
Jump to: navigation, search
 
Line 1: Line 1:
 
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Regression ==
 
==[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Regression ==
  
=== Linear Modeling - Regression ===
+
As we discussed in the [[AP_Statistics_Curriculum_2007_GLM_Corr |Correlation section]], many applications involve the analysis of relationships between two, or more, variables involved in the process of interest. Suppose we have bivariate data (''X'' and ''Y'') of a process and we are interested on determining the linear relation between X and Y (e.g., determining a straight line that best fits the pairs of data (''X,Y'')). A linear relationship between ''X'' and ''Y'' will give us the power to make predictions - i.e., given a value of ''X'' predict a corresponding ''Y'' response. Note that in this design, data consists of paired observations (''X,Y'') - for example, the [[SOCR_Data_Dinov_021708_Earthquakes | Longitude and Latitude of the SOCR Eathquake dataset]].
Example on how to attach images to Wiki documents in included below (this needs to be replaced by an appropriate figure for this section)!
+
 
<center>[[Image:AP_Statistics_Curriculum_2007_IntroVar_Dinov_061407_Fig1.png|500px]]</center>
+
===Lines in 2D===
 +
There are 3 types of lines in 2D planes - Vertical Lines, Horizontal Lines and Oblique Lines. In general, the mathematical representation of lines in 2D is given by equations like <math>aX + bY=c</math>, most frequently expressed as <math>Y=aX + b</math>, provides the line is not vertical.
  
===Approach===
+
Recall that there is a one-to-one correspondence between any line in 2D and (linear) equations of the form
Models & strategies for solving the problem, data understanding & inference.  
+
: If the line is '''vertical''' (<math>X_1 =X_2</math>): <math>X=X_1</math>;
 +
: If the line is '''horizontal''' (<math>Y_1 =Y_2</math>): <math>Y=Y_1</math>;
 +
: Otherwise ('''oblique''' line): <math>{Y-Y_1 \over Y_2-Y_1}= {X-X_1 \over X_2-X_1}</math>, (for <math>X_1\not=X_2</math> and <math>Y_1\not=Y_2</math>)
 +
where <math>(X_1,Y_1)</math> and <math>(X_2, Y_2)</math> are two points on the line of interest (2-distinct points in 2D determine a unique line).
  
* TBD
+
* Try drawing the following lines manually and [http://www.pserc.cornell.edu/pserc/java/graph/examples/parse1d.html using this applet]:
 +
: Y=2X+1
 +
: Y=-3X-5
  
===Model Validation===
+
=== Linear Modeling - Regression ===
Checking/affirming underlying assumptions.  
+
There are two contexts for regression:
 +
* Y is an observed variable and X is specified by the researcher - e.g., Y is hair growth after X months, for individuals at certain dose levels of hair growth cream.
  
* TBD
+
* X and Y are both observed variables - e.g., [[SOCR_Data_Dinov_020108_HeightsWeights | Height (Y) and weight (X)]] for 20 randomly selected individuals from the population.
  
===Computational Resources: Internet-based SOCR Tools===
+
Suppose we have ''n'' pairs ''(X,Y)'', {<math>X_1, X_2, X_3, \cdots, X_n</math>} and {<math>Y_1, Y_2, Y_3, \cdots, Y_n</math>}, of observations of the same process. If a [[SOCR_EduMaterials_Activities_ScatterChart |scatterplot]] of the data suggests a general linear trend, it would be reasonable to fit a line to the data. The main question is how to determine the best line?
* TBD
 
  
===Examples===
+
====[[AP_Statistics_Curriculum_2007_GLM_Corr#Airfare_Example |Airfare Example]]====
Computer simulations and real observed data.  
+
We can see from the [[SOCR_EduMaterials_Activities_ScatterChart |scatterplot]] that greater distance is associated with higher airfare. In other words airports that tend to be further from Baltimore tend to be more expensive airfare. To decide on the best fitting line, we use the '''least-squares method''' to fit the least squares (regression) line.
  
* TBD
+
<center>[[Image:SOCR_EBook_Dinov_GLM_Regr_021708_Fig1.jpg|500px]]</center>
 
===Hands-on activities===
 
Step-by-step practice problems.  
 
  
* TBD
 
  
 
<hr>
 
<hr>
 
===References===
 
===References===
* TBD
 
  
 
<hr>
 
<hr>

Revision as of 21:45, 17 February 2008

General Advance-Placement (AP) Statistics Curriculum - Regression

As we discussed in the Correlation section, many applications involve the analysis of relationships between two, or more, variables involved in the process of interest. Suppose we have bivariate data (X and Y) of a process and we are interested on determining the linear relation between X and Y (e.g., determining a straight line that best fits the pairs of data (X,Y)). A linear relationship between X and Y will give us the power to make predictions - i.e., given a value of X predict a corresponding Y response. Note that in this design, data consists of paired observations (X,Y) - for example, the Longitude and Latitude of the SOCR Eathquake dataset.

Lines in 2D

There are 3 types of lines in 2D planes - Vertical Lines, Horizontal Lines and Oblique Lines. In general, the mathematical representation of lines in 2D is given by equations like \(aX + bY=c\), most frequently expressed as \(Y=aX + b\), provides the line is not vertical.

Recall that there is a one-to-one correspondence between any line in 2D and (linear) equations of the form

If the line is vertical (\(X_1 =X_2\))\[X=X_1\];
If the line is horizontal (\(Y_1 =Y_2\))\[Y=Y_1\];
Otherwise (oblique line)\[{Y-Y_1 \over Y_2-Y_1}= {X-X_1 \over X_2-X_1}\], (for \(X_1\not=X_2\) and \(Y_1\not=Y_2\))

where \((X_1,Y_1)\) and \((X_2, Y_2)\) are two points on the line of interest (2-distinct points in 2D determine a unique line).

Y=2X+1
Y=-3X-5

Linear Modeling - Regression

There are two contexts for regression:

  • Y is an observed variable and X is specified by the researcher - e.g., Y is hair growth after X months, for individuals at certain dose levels of hair growth cream.

Suppose we have n pairs (X,Y), {\(X_1, X_2, X_3, \cdots, X_n\)} and {\(Y_1, Y_2, Y_3, \cdots, Y_n\)}, of observations of the same process. If a scatterplot of the data suggests a general linear trend, it would be reasonable to fit a line to the data. The main question is how to determine the best line?

Airfare Example

We can see from the scatterplot that greater distance is associated with higher airfare. In other words airports that tend to be further from Baltimore tend to be more expensive airfare. To decide on the best fitting line, we use the least-squares method to fit the least squares (regression) line.

SOCR EBook Dinov GLM Regr 021708 Fig1.jpg



References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif