Difference between revisions of "SOCR Activity ANOVA FlignerKilleen MeatConsumption"

From SOCR
Jump to: navigation, search
(Quantitative data analysis (QDA))
Line 291: Line 291:
 
The result from the [[SOCR_EduMaterials_AnalysisActivities_FlignerKilleen|Fligner-Killeen test (for variance homogeneity)]] yields a rather small p-value (0.004). Assuming α = 0.05, we would rejects the null hypothesis of equal variances between countries. Therefore, we should be wary of continuing with a [[AP_Statistics_Curriculum_2007_ANOVA_1Way|standard (parametric) Analysis of Variance (ANOVA)]]. Looking back at the data, it is easy to see why.  
 
The result from the [[SOCR_EduMaterials_AnalysisActivities_FlignerKilleen|Fligner-Killeen test (for variance homogeneity)]] yields a rather small p-value (0.004). Assuming α = 0.05, we would rejects the null hypothesis of equal variances between countries. Therefore, we should be wary of continuing with a [[AP_Statistics_Curriculum_2007_ANOVA_1Way|standard (parametric) Analysis of Variance (ANOVA)]]. Looking back at the data, it is easy to see why.  
  
 +
==Conclusions==
 
Note, for example, that in the data for the Chinese consumption of beef, the values are steadily rising between the years. In our ANOVA, we had made an assumption, that the overall consumption would not follow a set pattern between the years, allowing us to treat them as sample points. This results in a much different variance between the groups. Therefore, it would be incorrect to use our previous conclusion from the ANOVA that “these countries have different population values for meat consumption” without a qualifier of the date range that we were sampling within. Note that without this analysis, you might have missed the trend in Chinese beef consumption—which might be worth studying in its own right.  
 
Note, for example, that in the data for the Chinese consumption of beef, the values are steadily rising between the years. In our ANOVA, we had made an assumption, that the overall consumption would not follow a set pattern between the years, allowing us to treat them as sample points. This results in a much different variance between the groups. Therefore, it would be incorrect to use our previous conclusion from the ANOVA that “these countries have different population values for meat consumption” without a qualifier of the date range that we were sampling within. Note that without this analysis, you might have missed the trend in Chinese beef consumption—which might be worth studying in its own right.  
  
Line 297: Line 298:
 
Just because the assumptions of [[AP_Statistics_Curriculum_2007_ANOVA_1Way|ANOVA]] are not satisfied, we should not give up on this data set. Without the results of the [[SOCR_EduMaterials_AnalysisActivities_FlignerKilleen|Fligner-Killeen]], we may have missed the effects of time on meat consumption, an effect that warrants is own investigation.  In such situations, the use of non-parametric alternatives to ANOVA may be appropriate (e.g., [[SOCR_EduMaterials_AnalysisActivities_KruskalWallis|Kruskal-Wallis test]]).
 
Just because the assumptions of [[AP_Statistics_Curriculum_2007_ANOVA_1Way|ANOVA]] are not satisfied, we should not give up on this data set. Without the results of the [[SOCR_EduMaterials_AnalysisActivities_FlignerKilleen|Fligner-Killeen]], we may have missed the effects of time on meat consumption, an effect that warrants is own investigation.  In such situations, the use of non-parametric alternatives to ANOVA may be appropriate (e.g., [[SOCR_EduMaterials_AnalysisActivities_KruskalWallis|Kruskal-Wallis test]]).
  
==Conclusions==
 
According to the results of the analysis, you will find that there is are significant main effects of locality (F(2, 106) = 18.39651, p < 0.001) and sex (F(1, 106) = 119.23809, p < 0.001) on shell width. The interaction between sex and locality is not significant on shell width (F (2,106) = 1.55056, p > 0.20). Post-hoc tests reveal that t-tests will reveal that there is a significant difference in width between male (M 7106.88136, SD = 247.06778) and female (M = 7578.03773, SD = 256.89806) snails shells (t (110) = 9.88846, p < 0.001). The 99.7% confidence interval for the difference is 471.15638 ± 157.08993. Note that this interval does not include 0 (a lack of difference between the means). There is also a significant difference in width between the snails collected at localities one and two, two and three, & one and three. We leave these analyses to you in the first practice problems
 
 
Based on these results, it would be possible to classify whether a ''Cocholotoma septemspirale'' is male or female, regardless of the locality it comes from (there is no interaction of the two effects); females have significantly taller shells. Limitations of the study include its correlational nature. One issue with the study, for example, is that age might be a confounding variable, if these snails are the type that grows throughout their lifecycle.
 
  
 
==Practice problems==
 
==Practice problems==
* Finish the post-hoc t-tests for the effect of locality on shell width.  
+
* Run this same analysis for the pork and poultry datasets. Interpret any conclusions as shown above.
* Complete an analysis similar to the one above, using one of the variables other than shell.h as -your dependent variable. See if that variable would be of use in classifying the snails.  
+
* Practice running a one-way ANOVA, excluding the data from China and Japan.  
* Complete a new analysis of this [[SOCR_Data_April2011_NI_IBS_Pain| pain/neuroimaging data set]]. Use sex and disease group as independent variables. Choose for your dependent variable one of the brain volumes.
+
* “Turn” the beef data set around, and run this Fligner-Killeen test between the year groups. Interpret the conclusion you find. Why is this same/different to the result from the above example?
  
 
==See also==
 
==See also==
* [[SOCR_EduMaterials_AnalysisActivities_ANOVA_2|SOCR 2-Way ANOVA Wiki Page]]
+
* [[SOCR_EduMaterials_AnalysisActivities_FlignerKilleen|Fligner-Killeen activity]]
* [[SOCR_EduMaterials_Activities_BaseballSalaryWins| Baseball Salary/Wins Activity]]
+
* [[SOCR_EduMaterials_AnalysisActivities_KruskalWallis|Kruskal-Wallis activity]]
 +
* NOVA [[SOCR_EduMaterials_AnalysisActivities_ANOVA_1|activity]] and [[AP_Statistics_Curriculum_2007_ANOVA_1Way|EBook Chapter]]
  
 
==References==
 
==References==
* Che, Annie, Cui, Jenny, and Dinov, Ivo (2009). [http://www.jstatsoft.org/v30/i03 SOCR Analyses: Implementation and Demonstration of a New Graphical Statistics Educational Toolkit]. JSS, Vol. 30, Issue 3, Apr 2009.
+
* Meat Consumption by Type and Country (2006). U.S. Department of Agriculture, Foreign Agricultural Service, Livestock, and Poultry.  
* Che, A, Cui, J, and Dinov, ID (2009) [http://jolt.merlot.org/vol5no1/dinov_0309.htm SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit], JOLT, 5(1), 1-19, March 2009.
 
* Dinov, ID. [http://www.jstatsoft.org/ Statistics Online Computational Resource], Journal of Statistical Software, Vol. 16, No. 1, 1-16, October 2006.
 
* Reichenbach F, Baur H, Neubert E (2012) Sexual dimorphism in shells of Cochlostoma septemspirale (Caenogastropoda, Cyclophoroidea, Diplommatinidae, Cochlostomatinae). ZooKeys 208: 1-16. doi:10.3897/zookeys.208.2869
 
* Baur H, Reichenbach F, Neubert E (2012) Data from: Sexual dimorphism in shells of Cochlostoma septemspirale (Caenogastropoda, Cyclophoroidea, Diplommatinidae, Cochlostomatinae). Dryad Digital Repository. doi:10.5061/dryad.ns7v7
 
  
 
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_Activity_ANOVA_FlignerKilleen_MeatConsumption}}
 
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=SOCR_Activity_ANOVA_FlignerKilleen_MeatConsumption}}

Revision as of 18:24, 21 February 2013

SOCR Educational Materials - Activities - SOCR Meat Consumption Activity – ANOVA assumptions about the variance homogeneity Activity

Motivation and Goals

In many developed countries, when people imagine their next meal, they focus on one specific part: the meat. That choice of meat, however, varies from country to country due to the popularity and availability of various domesticated animals. Furthermore, the amount of meat eaten has a surprising degree of variability across time, cultures and geographic regions.

The following activity will study the effects of that variance on the statistical analyses. Specifically, we will consider how deviations from homoscedasticity (also known as equivalence of variance or variance homogeneity) can lead to making some incomplete or even incorrect conclusions. To do so, we will employ the Fligner-Killeen method to analyze some real meet consumption data.

Summary

This activity uses a reduced version of the open-source meat-consumption dataset. All data comes from the US Census Bureau.

This dataset summarizes the meat consumption, by animal type, of various countries (the European Union (EU) is being treated as a single country in this case). For simplicity, records from countries that did provide consumption measures for all meat types and all years were removed from the data set.

Data

Data Description

  • Number of cases: 147
  • Variables
    • Country: The country or world region in question
      • Brazil
      • China
      • European Union
      • Japan
      • Mexico
      • Russia
      • United States
    • Meat: The type of meat
      • Beef
      • Pork
      • Poultry
    • Years Represented (2000 – 2006)
  • Values are in thousands of metric tons


Data Summaries

Chicken/Poultry

Year Brazil China Europe Japan Mexico Russia UnitedStates YearAverage YearSD
2000 5110 9393 6934 1772 2163 1320 11474 5452.286 3990.459
2001 5341 9237 7359 1797 2311 1588 11558 5598.714 3942.57
2002 5873 9556 7417 1830 2424 1697 12270 5866.714 4134.211
2003 5742 9963 7312 1841 2627 1680 12540 5957.857 4234.565
2004 5992 9931 7280 1713 2713 1675 13080 6054.857 4379.591
2005 6612 10088 7596 1880 2871 2139 13430 6373.714 4388.111
2006 6853 10371 7380 1908 3005 2382 13754 6521.857 4448.974
Country_Average 5931.857 9791.286 7325.429 1820.143 2587.714 1783 12586.57
Country_SD 629.6543 407.0908 200.4826 66.03895 304.2404 357.6777 886.5564

Pork

Year Brazil China Europe Japan Mexico Russia UnitedStates YearAverage YearSD
2000 1827 40378 19242 2228 1252 2019 8455 10771.57 14570.99
2001 1919 41829 19317 2268 1298 2076 8389 11013.71 15049.33
2002 1975 43238 19746 2377 1349 2453 8685 11403.29 15502.7
2003 1957 45054 20043 2373 1423 2420 8816 11726.57 16145.49
2004 1979 46648 19773 2562 1556 2337 8817 11953.14 16648.16
2005 1949 49703 19768 2507 1556 2476 8669 12375.43 17714.83
2006 2191 51809 20015 2450 1580 2637 8640 12760.29 18438.64
Country_Average 1971 45522.71 19700.57 2395 1430.571 2345.429 8638.714
Country_SD 110 4159.521 312.355 121.3013 135.3808 223.0148 164.5121

Beef

Year Brazil China Europe Japan Mexico Russia UnitedStates YearAverage YearSD
2000 6102 5284 8106 1585 2309 2246 12502 5447.714 3922.316
2001 6191 5434 7658 1419 2341 2400 12351 5399.143 3835.093
2002 6437 5818 8187 1319 2409 2450 12737 5622.429 4016.753
2003 6273 6274 8315 1366 2308 2378 12340 5607.714 3933.847
2004 6400 6703 8292 1182 2368 2308 12667 5702.857 4077.861
2005 6774 7026 8194 1200 2419 2503 12663 5825.571 4056.693
2006 6939 7395 8270 1173 2509 2370 12830 5926.571 4148.408
Country_Average 6445.143 6276.286 8146 1320.571 2380.429 2379.286 12584.29
Country_SD 307.1685 806.2036 226.9295 151.2138 71.75388 85.31259 190.4396

Raw Dataset

Country Meat 2000 2001 2002 2003 2004 2005 2006
Brazil Beef 6102 6191 6437 6273 6400 6774 6939
Brazil Pork 1827 1919 1975 1957 1979 1949 2191
Brazil Poultry 5110 5341 5873 5742 5992 6612 6853
China Beef 5284 5434 5818 6274 6703 7026 7395
China Pork 40378 41829 43238 45054 46648 49703 51809
China Poultry 9393 9237 9556 9963 9931 10088 10371
EuropeanUnion Beef 8106 7658 8187 8315 8292 8194 8270
EuropeanUnion Pork 19242 19317 19746 20043 19773 19768 20015
EuropeanUnion Poultry 6934 7359 7417 7312 7280 7596 7380
Japan Beef 1585 1419 1319 1366 1182 1200 1173
Japan Pork 2228 2268 2377 2373 2562 2507 2450
Japan Poultry 1772 1797 1830 1841 1713 1880 1908
Mexico Beef 2309 2341 2409 2308 2368 2419 2509
Mexico Pork 1252 1298 1349 1423 1556 1556 1580
Mexico Poultry 2163 2311 2424 2627 2713 2871 3005
Russia Beef 2246 2400 2450 2378 2308 2503 2370
Russia Pork 2019 2076 2453 2420 2337 2476 2637
Russia Poultry 1320 1588 1697 1680 1675 2139 2382
UnitedStates Beef 12502 12351 12737 12340 12667 12663 12830
UnitedStates Pork 8455 8389 8685 8816 8817 8669 8640
UnitedStates Poultry 11474 11558 12270 12540 13080 13430 13754

Exploratory data analyses (EDA)

In the following analysis, we will aim to perform an analysis of variance (ANOVA) to compare the meat consumption amounts between different countries and/or across time. Note that the data points for each country-meat type combination are from the various years. Typically, we would expect the amount not to change between the years (especially in this 7-year timespan). Even if it did, in assuming homoscedasticity, we are making the assumption that any increase or decrease is constant between countries. Applying the Fligner-Killeen test will help us decide if this assumption is valid. Look at the bar graphs listed below and note which of them seem to vary more than the others between the years.

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig2.png
SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig3.png
SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig4.png
SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig5.png


Quantitative data analysis (QDA)

ANOVA

Open the SOCR ANOVA-Two Way applet (requires Java-enabled browser). For the following analyses, we will focus on the data for Beef consumption. Let us test to see if different countries eat different amounts of beef. Usually when we compare a set of groups as this one, we would use a one-way ANOVA (comparing the seven countries). To run this test, open up the one-way ANOVA analysis in the SOCR Analyses Applet in a java-enabled browser. It should be the default setting when you open up the page:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig6.png

Now, prepare your dataset (it will be the Beef table from the above summary tables). We will be treating the yearly results as being our sample’s data points, attempting to capture the overall population consumption. This seems reasonable; the average meet consumption of a country should not change that much in a seven-year timespan.

Once you have rearranged your dataset for use in the ANOVA applet (if you do not know what it should look like, try considering one of the SOCR ANOVA tutorials). It should look like this in the applet data screen:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig7.png

Rename your column headers to define the independent and dependent variables.

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig8.png

Click on the mapping tab, and assign your independent and dependent variables appropriately:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig9.png

Set your precision to all, then click calculate:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig10.png

The following results should appear in the Results tab:

Sample Size = 49
Independent Variable = Country
Dependent Variable = Consumption
Results of One-Way Analysis of Variance:
Standard 1-Way ANOVA Table
VarianceSource DF RSS MSS F-Statistics P-value
TreatmentEffect(B/w_Groups) 6 668292765.43 111382127.57 898.89 < 1E-15
Error 42 5204240.57 123910.48
Total 48 673497006.00
Model:
Degrees of Freedom = 6
Residual Sum of Squares = 668292765.4285718000
Mean Square Error = 111382127.5714286400
Error:
Degrees of Freedom = 42
Residual Sum of Squares = 5204240.5714285690
Mean Square Error = 123910.4897959183
Corrected Total:
Degrees of Freedom = 48
Residual Sum of Squares = 673497006.0000004000
F-Value = 898.8918351858
P-Value = < 1E-15
R-Square = 0.9922728082

Fligner-Killeen Analysis

Note our extremely low p-value. The results of our one-way ANOVA reject the null hypothesis that these countries do not differ in terms of beef consumption. This shouldn’t surprise anyone. Even a third grader could have made that decision after looking at the summary table, or the earlier EDA graphs. Besides, the data represents total volume of meat consumption, whereas the populations of these countries are vastly different, and a per-capita meat consumption analysis may be more appropriate in this case.

However, let’s say that you find a book that claims, using the results of your ANOVA as proof, that “China consumes more beef than any country other than the United States”. That book is saying that these populations are significantly different in terms of their beef consumption. Is this claim justified by the ANOVA analysis above? Going through the assumptions of the ANOVA test, remember that the different levels must follow homoscedasticity (equivalence of variance). If two populations are significantly far from this assumption, then they should be compared using ANOVA. An alternative statistical test needs to be applied (e.g., Kruskal-Wallis test).

To test for homoscedacity, we will use the Fligner-Killeen test, which is a non-parametric test that doesn’t assume normality. This is important, considering that we have a rather limited data set (only seven points, one for each year) and the data may not be normally distributed. Use the tab to find the Fligner-Killeen analysis in the SOCR Analyses Applet in a java-enabled browser:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig11.png

Next, copy and paste your data into the spreadsheet using the paste button built into the applet. The data copied here is from the Beef summary table (excluding the marginal means). Note that we are treating the cases within each country (one for each year) as the data points. When trends take many years to happen, we can assume that, within a short timespan (such as the 7 years captured here), each year serves as estimates of a generally unchanging population value for that country.

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig12.png

Re-name the data headers to the country names, our levels for this comparison.

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig13.png

Click on the mapping tap and add all of the country names to the next tab.

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig14.png

Finally, select the precision setting “All.” This will display the maximum number of decimal numbers. Afterwards, click “CALCULATE”:

SOCR Activity ANOVA FlignerKilleen MeatConsumption Fig15.png

The following results text should appear on the screen:

Group Brazil
median = 6400.0
Group China
median = 6274.0
Group Europe
median = 8194.0
Group Japan
median = 1319.0
Group Mexico
median = 2368.0
Group Russia
median = 2378.0
Group United States
median = 12663.0
Total Size = 49
Total Mean Score = .784
Total Variance = .323
Degrees of Freedom = 6
Chi-Square Statistic = 19.421
P-Value = .004

The result from the Fligner-Killeen test (for variance homogeneity) yields a rather small p-value (0.004). Assuming α = 0.05, we would rejects the null hypothesis of equal variances between countries. Therefore, we should be wary of continuing with a standard (parametric) Analysis of Variance (ANOVA). Looking back at the data, it is easy to see why.

Conclusions

Note, for example, that in the data for the Chinese consumption of beef, the values are steadily rising between the years. In our ANOVA, we had made an assumption, that the overall consumption would not follow a set pattern between the years, allowing us to treat them as sample points. This results in a much different variance between the groups. Therefore, it would be incorrect to use our previous conclusion from the ANOVA that “these countries have different population values for meat consumption” without a qualifier of the date range that we were sampling within. Note that without this analysis, you might have missed the trend in Chinese beef consumption—which might be worth studying in its own right.

We can also think about this from a more mathematical perspective. This brings up the fundamental definition of Analysis of Variance, in which we are trying to keep constant the within-group variance of each of our levels (in this case, countries). If we ignore homoscedasticity, from the viewpoint of the algebra involved, we are essentially comparing completely unrelated populations.

Just because the assumptions of ANOVA are not satisfied, we should not give up on this data set. Without the results of the Fligner-Killeen, we may have missed the effects of time on meat consumption, an effect that warrants is own investigation. In such situations, the use of non-parametric alternatives to ANOVA may be appropriate (e.g., Kruskal-Wallis test).


Practice problems

  • Run this same analysis for the pork and poultry datasets. Interpret any conclusions as shown above.
  • Practice running a one-way ANOVA, excluding the data from China and Japan.
  • “Turn” the beef data set around, and run this Fligner-Killeen test between the year groups. Interpret the conclusion you find. Why is this same/different to the result from the above example?

See also

References

  • Meat Consumption by Type and Country (2006). U.S. Department of Agriculture, Foreign Agricultural Service, Livestock, and Poultry.


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif