Difference between revisions of "NISER 081107 ID"

From SOCR
Jump to: navigation, search
(Statistics)
Line 18: Line 18:
 
==[[NISER_081107_ID_Data | Data]]==
 
==[[NISER_081107_ID_Data | Data]]==
 
[[NISER_081107_ID_Data | Largemouth bass were studied in 53 different Florida]] lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. The pH level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured.
 
[[NISER_081107_ID_Data | Largemouth bass were studied in 53 different Florida]] lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. The pH level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured.
 
  
 
==Challenge==
 
==Challenge==
To make a fair comparison of the fish in different lakes using a regression estimate of the expected mercury concentration in a three year old fish as the standardized value for each lake. Determine the age of the individual fish in some lakes and correlate this with the average mercury concentration of the sampled fish.
+
To make a fair comparison of the fish in different lakes using a regression estimate of the expected mercury concentration in a three-year-old fish as the standardized value for each lake. Determine the age of the individual fish in some lakes and correlate this with the average mercury concentration of the sampled fish.
  
Florida has set a standard of 1/2 part per million as the unsafe level of mercury concentration in edible foods. 45.3% of the lakes exceed this level. The smallest level of mercury concentration that the measuring instrument can detect is 40 parts per billion. Any level below that was set to 40 parts per billion. This, of course, "flatens out" the slope of the relationship at the low end as well as affecting the standardized values. These observations are usually on young fish.
+
Florida has set a standard of 1/2 part per million as the unsafe level of mercury concentration in edible foods. 45.3% of the lakes exceed this level. The smallest level of mercury concentration that the measuring instrument can detect is 40 parts per billion. Any level below that was set to 40 parts per billion. This, of course, "flattens out" the slope of the relationship at the low end as well as affecting the standardized values. These observations are usually on young fish.
  
 
Logarithmic transformations on some of the variables may provide insights into the relationships among the other variables in the study. For instance, '''alkalinity level''' may be associated with mercury concentration, and may help account for the higher levels of mercury.
 
Logarithmic transformations on some of the variables may provide insights into the relationships among the other variables in the study. For instance, '''alkalinity level''' may be associated with mercury concentration, and may help account for the higher levels of mercury.
 
  
 
==Methods & Approaches==
 
==Methods & Approaches==
Line 35: Line 33:
  
 
===Biology===
 
===Biology===
* Why is mercury (and other [http://en.wikipedia.org/wiki/Heavy_metals heavy metals]) accumulating in mussle tissue?
+
* Why is mercury (and other [http://en.wikipedia.org/wiki/Heavy_metals heavy metals]) accumulating in muscle tissue?
 
* What causes the toxicity of larger than normal amounts of mercury in the body?
 
* What causes the toxicity of larger than normal amounts of mercury in the body?
 
* The [http://en.wikipedia.org/wiki/Chlorophyll Chlorophyll] molecule.
 
* The [http://en.wikipedia.org/wiki/Chlorophyll Chlorophyll] molecule.
Line 42: Line 40:
 
===Chemistry===
 
===Chemistry===
 
* [http://www.dartmouth.edu/~chemlab/info/resources/p_table/Periodic.html Periodic Table]
 
* [http://www.dartmouth.edu/~chemlab/info/resources/p_table/Periodic.html Periodic Table]
* [http://en.wikipedia.org/wiki/Alkalinity Alkalinity ]
+
* [http://en.wikipedia.org/wiki/Alkalinity Alkalinity]
 
* [http://en.wikipedia.org/wiki/PH pH]
 
* [http://en.wikipedia.org/wiki/PH pH]
  
 
===Engineering===
 
===Engineering===
 +
* TBD
  
 
===Mathematics===
 
===Mathematics===
Line 52: Line 51:
 
===Statistics===
 
===Statistics===
 
* Are there associations in [[NISER_081107_ID_Data | these data]] between ''Alkalinity'', ''pH'', ''Calcium'' and ''Chlorophyll''?
 
* Are there associations in [[NISER_081107_ID_Data | these data]] between ''Alkalinity'', ''pH'', ''Calcium'' and ''Chlorophyll''?
 +
*Some exploratory data analyses (EDA plots)
 +
** Scatter plot (''Alkalinity'' vs. ''pH'') <center>[[Image:NISER_081107_ID_Fig1.png|400px]]</center>
 +
** Regression Plots (Relations between ''Calcium, Chlorophyll'' and '' Avg_Mercury'')
 +
<center>[[Image:NISER_081107_ID_Fig2.png|400px]]</center>
 
* What is the distribution of the average amount of ''mercury'' in the entire sample?
 
* What is the distribution of the average amount of ''mercury'' in the entire sample?
** We can easily compute the histogram of the average (per lake) mercury measurements and fit in Normal, Exponential or other distribution models to the histogram of the average mercury measurement using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (see [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials examples here]). The left image below shows the fit of a <math>Normal (\mu=10.54, \sigma^2=45.64)</math> distribution model and the right image shows the (offset by 0.8) <math>Exponential(\lambda = 10.54)</math> model fit to the mercury histogram. The marameters in both cases are automatically calculated from the data using [http://en.wikipedia.org/wiki/Maximum_likelihood_estimation maximum likelihood estimation].  
+
** We can easily compute the histogram of the average (per lake) mercury measurements and fit in Normal, Exponential or other distribution models to the histogram of the average mercury measurement using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (see [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials examples here]). The left image below shows the fit of a <math>Normal (\mu=10.54, \sigma^2=45.64)</math> distribution model and the right image shows the (offset by 0.8) <math>Exponential(\lambda = 10.54)</math> model fit to the mercury histogram. The parameters in both cases are automatically calculated from the data using [http://en.wikipedia.org/wiki/Maximum_likelihood_estimation maximum likelihood estimation].  
 
<center>[[Image:NISER_081107_ID_Fig3.png|300px]] [[Image:NISER_081107_ID_Fig4.png|300px]]</center>
 
<center>[[Image:NISER_081107_ID_Fig3.png|300px]] [[Image:NISER_081107_ID_Fig4.png|300px]]</center>
  
 
* Is there evidence of statistical differences between the two groups (according to the ''age_data'' variable) in either ''Alkalinity'' or ''pH''?
 
* Is there evidence of statistical differences between the two groups (according to the ''age_data'' variable) in either ''Alkalinity'' or ''pH''?
** There does not seem to be a statistically significant difference in the ''pH'' levels between the two groups separated by the age_data varaible. A non-parametric [http://en.wikipedia.org/wiki/Mann-Whitney_U Wilcoxon Rank Sum test] did not produce significantly small p-value (see the image below showing the result of the caluclations using [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses].
+
** There does not seem to be a statistically significant difference in the ''pH'' levels between the two groups separated by the age_data varaible. A non-parametric [http://en.wikipedia.org/wiki/Mann-Whitney_U Wilcoxon Rank Sum test] did not produce significantly small p-value (see the image below showing the result of the calculations using [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses]. <center>[[Image:NISER_081107_ID_Fig5.png|400px]]</center> ** Similarly, there does not seem to be a statistically significant difference in the ''Alkalinity'' levels between the two groups separated by the ''age_data'' variable. The same non-parametric [http://en.wikipedia.org/wiki/Mann-Whitney_U Wilcoxon Rank Sum test] did not produce significantly small p-value (see the image below showing the result of the calculations using [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses].
<center>[[Image:NISER_081107_ID_Fig5.png|400px]]</center>
 
** Similarly, there does not seem to be a statistically significant difference in the ''Alkalinity'' levels between the two groups separated by the ''age_data'' varaible. The same non-parametric [http://en.wikipedia.org/wiki/Mann-Whitney_U Wilcoxon Rank Sum test] did not produce significantly small p-value (see the image below showing the result of the caluclations using [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html SOCR Analyses].
 
 
<center>[[Image:NISER_081107_ID_Fig6.png|400px]]</center>
 
<center>[[Image:NISER_081107_ID_Fig6.png|400px]]</center>
  
 
==Computational Resources==
 
==Computational Resources==
Internet-based SOCR Tools (including offline resources, e.g., tables).
+
* Internet-based SOCR Tools (including offline resources, e.g., tables)
 
+
** [http://www.socr.ucla.edu/htmls/SOCR_Charts.html Graphs & Charts], [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html Model Fitting], [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html Statistical Analyses] and [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html Distributions].
===Some exploratory data analyses (EDA plots)===
 
* Scatter plot (''Lakalinity'' vs. ''pH'')
 
<center>[[Image:NISER_081107_ID_Fig1.png|400px]]</center>
 
* Regression Plots (Relations between ''Calcium, Chlorophyll'' and '' Avg_Mercury'')
 
<center>[[Image:NISER_081107_ID_Fig2.png|400px]]</center>
 
 
 
* TBD
 
 
 
===Model Fitting===
 
* Model 1
 
** Model goodness-of-fit assessment
 
 
 
==Examples==
 
===Computer simulations vs. real observed data===
 
  
 
==Hands-on activities==
 
==Hands-on activities==
Step-by-step practice problems.
+
* TBD (Step-by-step practice problems).
  
 
===Notes===
 
===Notes===

Revision as of 00:13, 14 August 2007

NISER Wiki Resource Prototype - First template for a multi-disciplinary NISER activity

Summary

Visualization, understanding and interpreting real data may be challenging because of noise in the data, data complexity, multiple variables, hidden relations between variables and large variation. This NISER activity demonstrates how to use free Internet-based IT-tools and resources to solve problems that arise in the areas of biological, chemical, medical and social research.

Goals

This NISER activity has the following specific goals:

  • to demonstrate the typical research investigation pipeline - from problem formulation, to data collection, visualization, analysis and interpretation;
  • to illustrate the variety of portable freely available Internet-based Java tools, computational resources and learning materials for solving practical problems;
  • to provide a hands-on example of interdisciplinary training, cross-over of research techniques, data, models and expertise to enhance contemporary science education;
  • to promote interactions between different science education areas and stimulate the development of new and synergistic learning materials and course curricula across disciplines.

Motivation/Problem

Mercury contamination of edible freshwater fish poses a direct threat to human health. Largemouth bass is a fresh water fish that was studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. The pH level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. Also, samples of fish were taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured. Since fish absorb mercury over time, older fish will tend to have higher concentrations.

Data

Largemouth bass were studied in 53 different Florida lakes to examine the factors that influence the level of mercury contamination. Water samples were collected from the surface of the middle of each lake in August 1990 and then again in March 1991. The pH level, the amount of chlorophyll, calcium, and alkalinity were measured in each sample. The average of the August and March values were used in the analysis. Next, a sample of fish was taken from each lake with sample sizes ranging from 4 to 44 fish. The age of each fish and mercury concentration in the muscle tissue was measured.

Challenge

To make a fair comparison of the fish in different lakes using a regression estimate of the expected mercury concentration in a three-year-old fish as the standardized value for each lake. Determine the age of the individual fish in some lakes and correlate this with the average mercury concentration of the sampled fish.

Florida has set a standard of 1/2 part per million as the unsafe level of mercury concentration in edible foods. 45.3% of the lakes exceed this level. The smallest level of mercury concentration that the measuring instrument can detect is 40 parts per billion. Any level below that was set to 40 parts per billion. This, of course, "flattens out" the slope of the relationship at the low end as well as affecting the standardized values. These observations are usually on young fish.

Logarithmic transformations on some of the variables may provide insights into the relationships among the other variables in the study. For instance, alkalinity level may be associated with mercury concentration, and may help account for the higher levels of mercury.

Methods & Approaches

We now discuss the models & strategies for data understanding, inference and solving this problem and discuss checking/affirming underlying assumptions.

Physics

  • What is an atom and what is the atomic structure of mercury?

Biology

Chemistry

Engineering

  • TBD

Mathematics

Statistics

  • Are there associations in these data between Alkalinity, pH, Calcium and Chlorophyll?
  • Some exploratory data analyses (EDA plots)
    • Scatter plot (Alkalinity vs. pH)
      NISER 081107 ID Fig1.png
    • Regression Plots (Relations between Calcium, Chlorophyll and Avg_Mercury)
Error creating thumbnail: File missing
  • What is the distribution of the average amount of mercury in the entire sample?
    • We can easily compute the histogram of the average (per lake) mercury measurements and fit in Normal, Exponential or other distribution models to the histogram of the average mercury measurement using SOCR Charts and SOCR Modeler (see examples here). The left image below shows the fit of a \(Normal (\mu=10.54, \sigma^2=45.64)\) distribution model and the right image shows the (offset by 0.8) \(Exponential(\lambda = 10.54)\) model fit to the mercury histogram. The parameters in both cases are automatically calculated from the data using maximum likelihood estimation.
NISER 081107 ID Fig3.png NISER 081107 ID Fig4.png
  • Is there evidence of statistical differences between the two groups (according to the age_data variable) in either Alkalinity or pH?
    • There does not seem to be a statistically significant difference in the pH levels between the two groups separated by the age_data varaible. A non-parametric Wilcoxon Rank Sum test did not produce significantly small p-value (see the image below showing the result of the calculations using SOCR Analyses.
      NISER 081107 ID Fig5.png
      ** Similarly, there does not seem to be a statistically significant difference in the Alkalinity levels between the two groups separated by the age_data variable. The same non-parametric Wilcoxon Rank Sum test did not produce significantly small p-value (see the image below showing the result of the calculations using SOCR Analyses.
NISER 081107 ID Fig6.png

Computational Resources

Hands-on activities

  • TBD (Step-by-step practice problems).

Notes




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif