Simple Linear Regression Tutorial

From SOCR
Revision as of 02:06, 24 July 2011 by JayZzz (talk | contribs) (SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial)
Jump to: navigation, search

SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial

Simple Linear Regression Tutorial Using LA Neighborhoods Data

Data: We will be using the LA Neighborhoods Data for this tutorial.

Goal: Our goal is to predict the median income using one explanatory variable by using SOCR. In this example, we will use the age variable.

Step 1: First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source and find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.

Step 2: Next, head to http://socr.ucla.edu/htmls/SOCR_Analyses.html, and find the Simple Regression Analysis Activity in the drop-down menu.

Error creating thumbnail: File missing

Step 3: Now Click the “PASTE” button under the drop down menu. You should now see the data in the window.

Error creating thumbnail: File missing

Step 4: Click on the “MAPPING” tab, and add Income to the dependent variable list and Age to the independent variable list.

Error creating thumbnail: File missing

Step 5: Click “CALCULATE”. You will now be taken to the “RESULTS” tab.

Error creating thumbnail: File missing

Here you can see the regression equation, \(R^2\), individual residuals, and also mean and standard deviation for both variables.


Step 6: Click “GRAPH”. Here is the scatterplot of Income vs Age. We see the upward trend: As median age increases, so does median household income

Error creating thumbnail: File missing

. There are also residual plots

Error creating thumbnail: File missing

and the Normal-QQ Plot

Error creating thumbnail: File missing

.

Step 7: We want to check that the assumptions of linear regression, and make sure that they are met.

Assumption 1: There is a linear relationship between the independent (age) and dependent variable (income)

  • How to check: Make a scatter plot of income and age
  • How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.

Assumption Met

Assumption 2: The variance is constant

  • How to check: Look at plot of residuals vs. predicted values ( ). Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
  • How to fix: Logging of variables, fixing underlying independence or linearity causes.

Slight increase of residuals at the high end of age

Assumption 3: Errors are normally distributed

  • How to check: Normal QQ Plot (Should lie close to straight line)
  • How to fix: Take out outliers, if applicable. Non-linear transformation may be needed

Assumption Met

Conclusions

No major violation of linear regression assumptions, we proceed with our analysis:

We can see from the results tab that the regression equation is:

Error creating thumbnail: File missing

Income = -74549.596 + 4096.055 age

Income is the predicted value, -74549.596 is the intercept, 4096.055 is the slope, and age is the independent variable.

The linear model states that for every 1 year increase in median age, the median household income will increase by $4,096.06.