Difference between revisions of "Multiple Linear Regression Tutorial"

From SOCR
Jump to: navigation, search
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(4 intermediate revisions by one other user not shown)
Line 5: Line 5:
 
'''Data:''' We will be using the [http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source LA Neighborhoods Data] for this tutorial.
 
'''Data:''' We will be using the [http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source LA Neighborhoods Data] for this tutorial.
  
'''Goal:''' Our goal is to predict the median income using multiple explanatory variables by using SOCR. In this example, we will predict median income using age, proportion of homeowners, and proportion white in population.  
+
'''Goal:''' Our goal is to predict the median income using multiple explanatory variables by using SOCR. In this example, we will predict median income using age, proportion of homeowners, and proportion of whites in population.  
  
 
''Step 1:'' First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to [http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source LA Neighborhoods Data]and find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.
 
''Step 1:'' First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to [http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_LA_Neighborhoods_Data#Data_Source LA Neighborhoods Data]and find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.
Line 13: Line 13:
 
''Step 3:'' Now Click the “PASTE” button under the drop down menu. You should now see the data in the window. [[File:SReg3.png|center|800px]]
 
''Step 3:'' Now Click the “PASTE” button under the drop down menu. You should now see the data in the window. [[File:SReg3.png|center|800px]]
  
''Step 4:'' You should now see the data in the window. Click on the “MAPPING” tab. This is where we define our dependent and independent variables. The dependent variable is the one we want to make a prediction on, and the independent variables are the ones which we will use to make the prediction. In this example, we add “Income” to the dependent variables list and “Age”, “Homes” and “White” to the independent variables list.[[File:MReg2.png|center|800px]]
+
''Step 4:'' Click on the “MAPPING” tab. This is where we define our dependent and independent variables. The dependent variable is the one we want to make a prediction on, and the independent variables are the ones which we will use to make the prediction. In this example, we add “Income” to the dependent variables list and “Age”, “Homes” and “White” to the independent variables list.[[File:MReg2.png|center|800px]]
  
 
''Step 5:'' Click “CALCULATE”. You will now be taken to the “RESULTS” tab. Here you can see the regression equation and <math>R^2</math>, among others. [[File:MReg3.png|center|800px]]
 
''Step 5:'' Click “CALCULATE”. You will now be taken to the “RESULTS” tab. Here you can see the regression equation and <math>R^2</math>, among others. [[File:MReg3.png|center|800px]]
Line 21: Line 21:
 
''Step 7:'' We want to check that the assumptions of linear regression, and make sure that they are met.
 
''Step 7:'' We want to check that the assumptions of linear regression, and make sure that they are met.
  
Assumption 1: There is a linear relationship between the independent (age) and dependent variable (income).  
+
Assumption 1: There is a linear relationship between the independent and dependent variables.  
* How to check: Make a scatter plot of income and age
+
* How to check: Make a scatter plot of the variables
 
* How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.
 
* How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.
 
'''''Linear model fits the data moderately well'''''
 
'''''Linear model fits the data moderately well'''''
  
 
Assumption 2: The variance is constant
 
Assumption 2: The variance is constant
* How to check: Look at plot of residuals vs. predicted values ( ). Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
+
* How to check: Look at plot of residuals vs. predicted values. Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
 
* How to fix: Logging of variables, fixing underlying independence or linearity causes.
 
* How to fix: Logging of variables, fixing underlying independence or linearity causes.
 
'''''Slight increase in residuals at the top range of exploratory variables'''''
 
'''''Slight increase in residuals at the top range of exploratory variables'''''
Line 34: Line 34:
 
* How to check: Normal QQ Plot (Should lie close to straight line)
 
* How to check: Normal QQ Plot (Should lie close to straight line)
 
* How to fix: Take out outliers, if applicable. Non-linear transformation may be needed
 
* How to fix: Take out outliers, if applicable. Non-linear transformation may be needed
'''''Assumptions met'''''
+
'''''Assumption met'''''
  
 
'''Conclusions'''
 
'''Conclusions'''
Line 50: Line 50:
  
 
<hr>
 
<hr>
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php/Multiple_Linear_Regression_Tutorial}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php/Multiple_Linear_Regression_Tutorial}}

Latest revision as of 13:41, 3 March 2020

SOCR_EduMaterials_AnalysesActivities - Multiple Linear Regression Tutorial

Multiple Linear Regression Tutorial Using LA Neighborhoods Data

Data: We will be using the LA Neighborhoods Data for this tutorial.

Goal: Our goal is to predict the median income using multiple explanatory variables by using SOCR. In this example, we will predict median income using age, proportion of homeowners, and proportion of whites in population.

Step 1: First, we will import the data into the SOCR Simple Regression Analysis Activity. Head to LA Neighborhoods Dataand find the table with the data. Select all of the data, and press Ctrl+C (Command+C on Macs) to copy it.

Step 2: Next, head to http://socr.ucla.edu/htmls/SOCR_Analyses.html, and find the Simple Regression Analysis Activity in the drop-down menu.

Error creating thumbnail: File missing

Step 3: Now Click the “PASTE” button under the drop down menu. You should now see the data in the window.

Error creating thumbnail: File missing

Step 4: Click on the “MAPPING” tab. This is where we define our dependent and independent variables. The dependent variable is the one we want to make a prediction on, and the independent variables are the ones which we will use to make the prediction. In this example, we add “Income” to the dependent variables list and “Age”, “Homes” and “White” to the independent variables list.

Error creating thumbnail: File missing

Step 5: Click “CALCULATE”. You will now be taken to the “RESULTS” tab. Here you can see the regression equation and \(R^2\), among others.

Error creating thumbnail: File missing

Step 6: Click “GRAPH”. Here you will see scatterplots of the Income variable against each of the three chosen explanatory variables,

Error creating thumbnail: File missing

as well as the residual plots

Error creating thumbnail: File missing

and the Normal QQ Plot.

Error creating thumbnail: File missing

Step 7: We want to check that the assumptions of linear regression, and make sure that they are met.

Assumption 1: There is a linear relationship between the independent and dependent variables.

  • How to check: Make a scatter plot of the variables
  • How to fix: Transformations (for example Log(y) vs x), or the relationship is not linear.

Linear model fits the data moderately well

Assumption 2: The variance is constant

  • How to check: Look at plot of residuals vs. predicted values. Make sure there is not a pattern, such as the residuals getting larger as the predicted values increase.
  • How to fix: Logging of variables, fixing underlying independence or linearity causes.

Slight increase in residuals at the top range of exploratory variables

Assumption 3: Errors are normally distributed.

  • How to check: Normal QQ Plot (Should lie close to straight line)
  • How to fix: Take out outliers, if applicable. Non-linear transformation may be needed

Assumption met

Conclusions

No major violation of linear regression assumptions, we proceed with our analysis:

We can see from the "Results" tab that the regression equation is:

Error creating thumbnail: File missing

Income = -21139.729 +1347.656*Age +49806.135*White +53726.649*Homes + E

The “E” is the error term. “Income” is the predicted value, and “Homes”, “Age”, and “White” are the explanatory variables.

This model states that for every 100 percent increase in homeowner proportion, and everything else held constant, the median household income will increase by $53726.65. For every 1 year increase in median age, and everything else held constant, the median household income will increase by $1,347.66. For every 100 percent increase in the proportion of whites in the population, with everything else held constant, the median household income will increase by $49806.14.


"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif