Difference between revisions of "AP Statistics Curriculum 2007 IntroVar"

From SOCR
Jump to: navigation, search
 
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(23 intermediate revisions by 3 users not shown)
Line 1: Line 1:
==This is an Outline of a General Advance-Placement (AP) Statistics Curriculum==
+
[[AP_Statistics_Curriculum_2007 | General Advance-Placement (AP) Statistics Curriculum]] - Introduction to Statistics
  
===Outline===
+
==The Nature of Data & Variation==
Each topic discussed in the SOCR AP Curricumum should contain the following subsections:
+
No matter how controlled are the environment, the protocol or the design, virtually any repeated measurement, observation, experiment, trial, study or survey is bounded to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects.
* '''Motivation/Problem''': A real data set and fundamental challenge.
 
* '''Approach''': Models & strategies for solving the problem, data understanding & inference.
 
* '''Model Validation''': Checking/affirming underlying assumptions.
 
* '''Computational Resources''': Internet-based SOCR Tools (including offline resources, e.g., tables).
 
* '''Examples''': computer simulations and real observed data.
 
* '''Hands-on activities''': Step-by-step practice problems.
 
  
===Introduction to Statistics===
+
For example, the UCLA's [[AP_Statistics_Curriculum_2007_IntroVar#References | study of Alzheimer’s disease*]] analyzed the data of 31 Mild Cognitive Impairment (MCI) and 34 probable Alzheimer’s disease (AD) patients. The investigators made every attempt to control as many variables as possible. Yet, the demographic information they collected from the outcomes of the subjects contained unavoidable variation. The same study found variation in the MMSE cognitive scores even in the same subject. The table below shows the demographic characteristics for the subjects and patients included in this study, where the following notation is used M: male; F: female; W: white; AA: African American; A: Asian:
====[[AP_Statistics_Curriculum_2007_IntroVar | The Nature of Data & Variation]]====
 
No mater how controlled the environment, the protocol or the design, virtually any repeated measurement, observation, experiment, trial, study or survey is bound to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects.
 
  
====Uses and Abuses of Statistics ====
+
<center>
====Design of Experiments ====
+
{| class="wikitable" style="text-align:center; width:75%" border="1"
====Statistics with Calculators and Computers====  
+
|-
 +
| '''Variable''' || '''Alzheimer’s disease''' || '''MCI''' || '''Test statistics''' || '''Test score''' || '''P-value'''
 +
|-
 +
| '''Age (years)''' || 76.2 (8.3) range 52–89 || 73.7 (7.4) range 57–84 || Student’s T  || <math>t_o = 1.284</math> || ''p=0.21''
 +
|-
 +
| '''Gender (M:F)''' || 15:19 || 15:16 || Proportion || <math>z_o = -0.345</math>  || ''p=0.733''
 +
|-
 +
| '''Education (years)''' || 14.0 (2.1) range 12–19 || 16.23 (2.7) range 12–20 || Wilcoxon rank sum || <math>w_o = 773.0</math>  || ''p<0.001''
 +
|-
 +
| '''Race (W:AA:A)'''  || 29:1:4 || 26:2:3 || <math>\chi_{(df=2)}^2</math> || <math>\chi_{(df=2)}^2=1.18</math> || 0.55
 +
|-
 +
| '''MMSE''' || 20.9 (6.3) range 4–29 || 28.2 (1.6) range 23–30 || Wilcoxon rank-sum || <math>w_o= 977.5</math>  || ''p<0.001''
 +
|}
 +
</center>
 +
 
 +
==Approach==
 +
Models and strategies for solving  problems and understanding data and inferences.
 +
 
 +
* Once we accept that all natural phenomena are inherently variant and there are no completely deterministic processes, we need to look for models and techniques that allow us to study such acquired data in the presence of variation, uncertainty and chance.
 +
* '''Statistics''' is the data science that investigates natural processes and allows us to quantify variation to make population inferences based on limited observations.
 +
 
 +
==Model Validation==
 +
Checking/affirming underlying assumptions.
 +
 
 +
* Each model or technique for data exploration, analysis and understanding relies on a set of assumptions, which always need to be validated before the model or analysis tool is employed to study real data (observations or measurements that are perceived or detected by the investigator).
 +
 
 +
* Such prior model conjectures or presumptions could take the form of mathematical constraints about the properties of the underlying process, restrictions on the study design or demands on the data acquisition protocol.
 +
 
 +
* Common assumptions include (statistical) independence of the measurements, specific limitations on the shape of the observed distribution, restrictions on the parameters of the processes being studied, etc.
 +
 
 +
==Computational Resources: Internet-based SOCR Tools==
 +
* The [[SOCR]] resource contains a variety of educational materials, demonstration applets and learning resources that illustrate data generation, experimentation, exploratory and statistical data analysis.
 +
* [[SOCR_EduMaterials_Activities_RNG | (Numeric Pseudo-Random) Data Generation]]
 +
* [[SOCR_EduMaterials_ExperimentsActivities | Interactive SOCR Experimentation]] with computer generated models of natural phenomena
 +
**  [[SOCR_EduMaterials_Activities_DieCoinExperiment | Bivariate Die-Coin Experiment]]
 +
* [[SOCR_EduMaterials_Activities_Histogram_Graphs | Exploratory Data Analysis]]
 +
* [[SOCR_EduMaterials_AnalysisActivities_ANOVA_1 | Statistical Data Analysis]]
 +
 
 +
==Datasets==
 +
There are [[SOCR_Data | a number of large, natural, useful and demonstrative datasets]] that are provided as part of this statistics [[EBook]]. Many of these data collections are intentionally selected to be large and complex. This choice is driven by the need of emphasizing the symbiosis between driving challenges, statistical concepts, mathematical derivations and the use of technology to solve relevant research problems.
 +
 
 +
==Examples==
 +
Computer simulations and real observed data.
 +
 
 +
* For example, [[SOCR_EduMaterials_Activities_Histogram_Graphs | exploratory data analysis using data histograms]]. This SOCR activity illustrates the generation and interpretation of the histogram of quantitative data.
 
   
 
   
===Describing, Exploring, and Comparing Data===
+
==Hands-on activities==
====Summarizing data with Frequency Tables ====
+
Step-by-step practice problems.
====Pictures of Data ====
+
 
====Measures of Central Tendency ====
+
* [[SOCR_EduMaterials_Activities_Histogram_Graphs | Histograms and Frequency Graphs Activity]]
====Measures of Variation ====
+
* [[SOCR_EduMaterials_Activities_CardsCoinsSampling | Bivariate Cards and Coins Meta-Activity]]
====Measures of Position ====
+
 
====Exploratory Data Analysis ====
+
==[[EBook_Problems_EDA_IntroVar|Problems]]==
 
===Probability===
 
====Fundamentals====
 
====Addition Rule ====
 
====Multiplication Rule ====
 
====Probabilities through Simulations ====
 
====Counting ====
 
 
===Probability Distributions===
 
====Random Variables ====
 
====Bernoulli & Binomial Experiments ====
 
====Geometric, HyperGeometric & Negative Binomial====
 
====Mean, Variance, and Standard Deviation for the Binomial Distribution ====
 
====Poisson Distribution====
 
 
===Normal Probability Distributions===
 
====The Standard Normal Distribution ====
 
====Nonstandard Normal Distribution: Finding Probabilities ====
 
====Nonstandard Normal Distributions: Finding Scores ====
 
  
===Relations Between Distributions===
+
<hr>
====The Central Limit Theorem ====
+
==References==
====Law of Large Numbers====
+
* Apostolova LG, Dinov ID, Dutton RA, Hayashi KM, Toga AW, Cummings JL, Thompson PM. [http://brain.oxfordjournals.org/cgi/reprint/awl274v1.pdf 3D comparison of hippocampal atrophy in amnestic mild cognitive impairment and Alzheimer's disease.] Brain. 2006 Nov; 129(Pt 11):2867-73.
====Normal Distribution as Approximation to Binomial Distribution ====
 
====Poisson Approximation to Binomial Distribution ====
 
====Binomial Approximation to HyperGeometric====
 
====Normal Approximation to Poisson====
 
 
===Estimates and Sample Sizes===
 
====Estimating a Population Mean: Large Samples ====
 
====Estimating a Population Mean: Small Samples ====
 
====Estimating a Population Proportion ====
 
====Estimating a Population Variance====
 
 
===Hypothesis Testing===
 
====Fundamentals of Hypothesis Testing ====
 
====Testing a Claim about a Mean: Large Samples ====
 
====Testing a Claim about a Mean: Small Samples ====
 
====Testing a Claim about a Proportion ====
 
====Testing a Claim about a Standard Deviation or Variance====
 
 
===Inferences from Two Samples===
 
====Inferences about Two Means: Dependent Samples ====
 
====Inferences about Two Means: Independent and Large Samples ====
 
====Comparing Two Variances ====
 
====Inferences about Two Means: Independent and Small Samples====
 
====Inferences about Two Proportions ====
 
 
===Correlation and Regression===
 
====Correlation ====
 
====Regression ====
 
====Variation and Prediction Intervals ====
 
====Multiple Regression ====
 
 
===Multinomial Experiments and Contingency Tables===
 
====Multinomial Experiments: Goodness-of-Fit ====
 
====Contingency Tables: Independence and Homogeneity====
 
 
===Statistical Process Control===
 
====Control Charts for Variation and Mean ====
 
====Control Charts for Attributes====
 
  
 
<hr>
 
<hr>
 
* SOCR Home page: http://www.socr.ucla.edu
 
* SOCR Home page: http://www.socr.ucla.edu
  
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=AP_Statistics_Curriculum_2007_IntroVar}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_IntroVar}}

Latest revision as of 13:26, 3 March 2020

General Advance-Placement (AP) Statistics Curriculum - Introduction to Statistics

The Nature of Data & Variation

No matter how controlled are the environment, the protocol or the design, virtually any repeated measurement, observation, experiment, trial, study or survey is bounded to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects.

For example, the UCLA's study of Alzheimer’s disease* analyzed the data of 31 Mild Cognitive Impairment (MCI) and 34 probable Alzheimer’s disease (AD) patients. The investigators made every attempt to control as many variables as possible. Yet, the demographic information they collected from the outcomes of the subjects contained unavoidable variation. The same study found variation in the MMSE cognitive scores even in the same subject. The table below shows the demographic characteristics for the subjects and patients included in this study, where the following notation is used M: male; F: female; W: white; AA: African American; A: Asian:

Variable Alzheimer’s disease MCI Test statistics Test score P-value
Age (years) 76.2 (8.3) range 52–89 73.7 (7.4) range 57–84 Student’s T \(t_o = 1.284\) p=0.21
Gender (M:F) 15:19 15:16 Proportion \(z_o = -0.345\) p=0.733
Education (years) 14.0 (2.1) range 12–19 16.23 (2.7) range 12–20 Wilcoxon rank sum \(w_o = 773.0\) p<0.001
Race (W:AA:A) 29:1:4 26:2:3 \(\chi_{(df=2)}^2\) \(\chi_{(df=2)}^2=1.18\) 0.55
MMSE 20.9 (6.3) range 4–29 28.2 (1.6) range 23–30 Wilcoxon rank-sum \(w_o= 977.5\) p<0.001

Approach

Models and strategies for solving problems and understanding data and inferences.

  • Once we accept that all natural phenomena are inherently variant and there are no completely deterministic processes, we need to look for models and techniques that allow us to study such acquired data in the presence of variation, uncertainty and chance.
  • Statistics is the data science that investigates natural processes and allows us to quantify variation to make population inferences based on limited observations.

Model Validation

Checking/affirming underlying assumptions.

  • Each model or technique for data exploration, analysis and understanding relies on a set of assumptions, which always need to be validated before the model or analysis tool is employed to study real data (observations or measurements that are perceived or detected by the investigator).
  • Such prior model conjectures or presumptions could take the form of mathematical constraints about the properties of the underlying process, restrictions on the study design or demands on the data acquisition protocol.
  • Common assumptions include (statistical) independence of the measurements, specific limitations on the shape of the observed distribution, restrictions on the parameters of the processes being studied, etc.

Computational Resources: Internet-based SOCR Tools

Datasets

There are a number of large, natural, useful and demonstrative datasets that are provided as part of this statistics EBook. Many of these data collections are intentionally selected to be large and complex. This choice is driven by the need of emphasizing the symbiosis between driving challenges, statistical concepts, mathematical derivations and the use of technology to solve relevant research problems.

Examples

Computer simulations and real observed data.

Hands-on activities

Step-by-step practice problems.

Problems


References


"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif