Difference between revisions of "Simple Linear Regression Tutorial"
(→SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial) |
|||
Line 18: | Line 18: | ||
− | ''Step 6:'' Click “GRAPH”. Here is the scatterplot of Income vs Age. We see the upward trend: As median age increases, so does median household income [[File:SReg6.png|center|800px]] | + | ''Step 6:'' Click “GRAPH”. Here is the scatterplot of Income vs Age. We see the upward trend: As median age increases, so does median household income. [[File:SReg6.png|center|800px]] There are also residual plots [[File:SReg7.png|center|800px]]and the Normal-QQ Plot.[[File:SReg8.png|center|800px]] |
''Step 7:'' We want to check that the assumptions of linear regression, and make sure that they are met. | ''Step 7:'' We want to check that the assumptions of linear regression, and make sure that they are met. |
Revision as of 17:41, 25 July 2011
SOCR_EduMaterials_AnalysesActivities - Simple Linear Regression Tutorial
Simple Linear Regression Tutorial Using LA Neighborhoods Data
Data: We will be using the LA Neighborhoods Data for this tutorial.
Goal: Our goal is to predict the median income using one explanatory variable by using SOCR. In this example, we will use the age variable.
Step 1: First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to LA Neighborhoods Dataand find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.
Step 2: Next, head to http://socr.ucla.edu/htmls/SOCR_Analyses.html, and find the Simple Regression Analysis Activity in the drop-down menu.
Step 3: Now Click the “PASTE” button under the drop down menu. You should now see the data in the window.
Step 4: Click on the “MAPPING” tab, and add Income to the dependent variable list and Age to the independent variable list.
Step 5: Click “CALCULATE”. You will now be taken to the “RESULTS” tab.
Here you can see the regression equation, \(R^2\), individual residuals, and also mean and standard deviation for both variables.
Step 6: Click “GRAPH”. Here is the scatterplot of Income vs Age. We see the upward trend: As median age increases, so does median household income.
There are also residual plots
and the Normal-QQ Plot.
Step 7: We want to check that the assumptions of linear regression, and make sure that they are met.
Assumption 1: There is a linear relationship between the independent (age) and dependent variable (income)
- How to check: Make a scatter plot of income and age
- How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.
Assumption Met
Assumption 2: The variance is constant
- How to check: Look at plot of residuals vs. predicted values. Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
- How to fix: Logging of variables, fixing underlying independence or linearity causes.
Slight increase of residuals at the high end of age
Assumption 3: Errors are normally distributed
- How to check: Normal QQ Plot (Should lie close to straight line)
- How to fix: Take out outliers, if applicable. Non-linear transformation may be needed
Assumption Met
Conclusions
No major violation of linear regression assumptions, we proceed with our analysis:
We can see from the results tab that the regression equation is:
Income = -74549.596 + 4096.055 age
Income is the predicted value, -74549.596 is the intercept, 4096.055 is the slope, and age is the independent variable.
The linear model states that for every 1 year increase in median age, the median household income will increase by $4,096.06.
Translate this page: