Difference between revisions of "AP Statistics Curriculum 2007"

From SOCR
Jump to: navigation, search
m
Line 52: Line 52:
  
 
===[[AP_Statistics_Curriculum_2007_Prob_Rules | Rules for Computing Probabilities]]===
 
===[[AP_Statistics_Curriculum_2007_Prob_Rules | Rules for Computing Probabilities]]===
There are many important rule for computing probabilities of composite events. These include conditional probability, statistical independence, multiplication and addition rules, the law of total probability and the Bayesian rule.
+
There are many important rules for computing probabilities of composite events. These include conditional probability, statistical independence, multiplication and addition rules, the law of total probability and the Bayesian rule.
  
 
===[[AP_Statistics_Curriculum_2007_Prob_Simul |Probabilities Through Simulations]] ===
 
===[[AP_Statistics_Curriculum_2007_Prob_Simul |Probabilities Through Simulations]] ===
Line 61: Line 61:
  
 
==Chapter IV: Probability Distributions==
 
==Chapter IV: Probability Distributions==
There are two basic types of processes that we observe in nature - discrete and continuous. We begine by discussing several important discrete random processes, their distributions, expectations, variances and applications. In the [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution | next chapter]], we will discuss their continuous counterparts.
+
There are two basic types of processes that we observe in nature - discrete and continuous. We begin by discussing several important discrete random processes, their distributions, expectations, variances and applications. In the [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution | next chapter]], we will discuss their continuous counterparts.
  
 
===[[AP_Statistics_Curriculum_2007_Distrib_RV | Random Variables]]===
 
===[[AP_Statistics_Curriculum_2007_Distrib_RV | Random Variables]]===
To simplify the calculations of probabilities, we will define the concept of a '''random variable''' which will allows ut to study uniformly various processes, using the same mathamatical and computational techniques.
+
To simplify the calculations of probabilities, we will define the concept of a '''random variable''' which will allows us to study uniformly various processes, using the same mathematical and computational techniques.
  
 
===[[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Expectation (Mean) and Variance]]===
 
===[[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Expectation (Mean) and Variance]]===
Line 82: Line 82:
  
 
===[[AP_Statistics_Curriculum_2007_Normal_Std |The Standard Normal Distribution]]===
 
===[[AP_Statistics_Curriculum_2007_Normal_Std |The Standard Normal Distribution]]===
The standard Normal distribution is the simplest version (zero-mean, unit-standard-deviation) of the (general) Normal dsitribtuion. Yet, it is perheps the most frequently used version because many tables and computational resources are explicitely available for cclculating probabilities.
+
The standard Normal distribution is the simplest version (zero-mean, unit-standard-deviation) of the (general) Normal distribution. Yet, it is perhaps the most frequently used version because many tables and computational resources are explicitly available for calculating probabilities.
  
 
===[[AP_Statistics_Curriculum_2007_Normal_Prob |Nonstandard Normal Distribution: Finding Probabilities]]===
 
===[[AP_Statistics_Curriculum_2007_Normal_Prob |Nonstandard Normal Distribution: Finding Probabilities]]===
Line 88: Line 88:
  
 
===[[AP_Statistics_Curriculum_2007_Normal_Critical |Nonstandard Normal Distribution: Finding Scores (critical values)]]===
 
===[[AP_Statistics_Curriculum_2007_Normal_Critical |Nonstandard Normal Distribution: Finding Scores (critical values)]]===
In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal distribution for a given p-value.
+
In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal distribution for a given p-value.
  
 
==Chapter VI: Relations Between Distributions==
 
==Chapter VI: Relations Between Distributions==
Line 94: Line 94:
 
   
 
   
 
===[[AP_Statistics_Curriculum_2007_Limits_CLT |The Central Limit Theorem]]===
 
===[[AP_Statistics_Curriculum_2007_Limits_CLT |The Central Limit Theorem]]===
The exploration of the relation between different distributions begines with the study of the '''sampling distribution of the sample average'''. This will demonstrate the universally important role of normal distribution.
+
The exploration of the relation between different distributions begins with the study of the '''sampling distribution of the sample average'''. This will demonstrate the universally important role of normal distribution.
  
 
===[[AP_Statistics_Curriculum_2007_Limits_LLN |Law of Large Numbers]]===
 
===[[AP_Statistics_Curriculum_2007_Limits_LLN |Law of Large Numbers]]===
Line 106: Line 106:
  
 
===[[AP_Statistics_Curriculum_2007_Limits_Bin2HyperG |Binomial Approximation to HyperGeometric]]===
 
===[[AP_Statistics_Curriculum_2007_Limits_Bin2HyperG |Binomial Approximation to HyperGeometric]]===
Binomial distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the poo\pulation sizes are large (relative to the sample size) and the probability of success are not close to zero.
+
Binomial distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the population sizes are large (relative to the sample size) and the probability of success are not close to zero.
  
 
===[[AP_Statistics_Curriculum_2007_Limits_Norm2Poisson |Normal Approximation to Poisson]]===
 
===[[AP_Statistics_Curriculum_2007_Limits_Norm2Poisson |Normal Approximation to Poisson]]===
Line 112: Line 112:
  
 
==Chapter VII: Point and Interval Estimates==
 
==Chapter VII: Point and Interval Estimates==
Estimation of population parameters is critical in many applications. Estimation is most frequently carried in terms of point-estimates or interval (range) estimates for population papameters that are of interest.
+
Estimation of population parameters is critical in many applications. Estimation is most frequently carried in terms of point-estimates or interval (range) estimates for population parameters that are of interest.
 
   
 
   
 
===[[AP_Statistics_Curriculum_2007_Estim_L_Mean |Estimating a Population Mean: Large Samples]]===
 
===[[AP_Statistics_Curriculum_2007_Estim_L_Mean |Estimating a Population Mean: Large Samples]]===

Revision as of 17:04, 4 February 2008

This is a General Advanced-Placement (AP) Statistics Curriculum E-Book

Contents

Preface

This is an Internet-based E-Book for advanced-placement (AP) statistics educational curriculum. The E-Book is initially developed by the UCLA Statistics Online Computational Resource (SOCR), however, any statistics instructor, researcher or educator is encouraged to contribute to this effort and improve the content of these learning materials.

Format

Follow the instructions in this page to expand, revise or improve the materials in this E-Book.

Chapter I: Introduction to Statistics

The Nature of Data & Variation

No mater how controlled the environment, the protocol or the design, virtually any repeated measurement, observation, experiment, trial, study or survey is bound to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects. How many natural processes or phenomena in real life can we describe that have an exact mathematical closed-form description and are completely deterministic? How do we model the rest of the processes that are unpredictable and have random characteristics?

Uses and Abuses of Statistics

Statistics is the science of variation, randomness and chance. As such, statistics is different from other sciences, where the processes being studied obey exact deterministic mathematical laws. Statistics provides quantitative inference represented as long-time probability values, confidence or prediction intervals, odds, chances, etc., which may ultimately be subjected to varying interpretations. The phrase Uses and Abuses of Statistics refers to the notion that in some cases statistical results may be used as evidence to seemingly opposite theses. However, most of the time, common principles of logic allow us to disambiguate the obtained statistical inference.

Design of Experiments

Design of experiments is the blueprint for planning a study or experiment, performing the data collection protocol and controlling the study parameters for accuracy and consistency. Data, or information, is typically collected in regard to a specific process or phenomenon being studied to investigate the effects of some controlled variables (independent variables or predictors) on other observed measurements (responses or dependent variables). Both types of variables are associated with specific observational units (living beings, components, objects, materials, etc.)

Statistics with Tools (Calculators and Computers)

All methods for data analysis, understanding or visualization are based on models that often have compact analytical representations (e.g., formulas, symbolic equations, etc.) Models are used to study processes theoretically. Empirical validations of the utility of models are achieved by plugging in data and actually testing the models. This validation step may be done manually, by computing the model prediction or model inference from recorded measurements. This however is possible by hand only for small number of observations (<10). In practice, we write (or use existent) algorithms and computer programs that automate these calculations for better efficiency, accuracy and consistency in applying models to larger datasets.

Chapter II: Describing, Exploring, and Comparing Data

Types of Data

There are two important concepts in any data analysis - population and sample. Each of these may generate data of two major types - quantitative or qualitative measurements.

Summarizing data with Frequency Tables

There are two important ways to describe a data set (sample from a population) - Graphs or Tables.

Pictures of Data

There are many different ways to display and graphically visualize data. These graphical techniques facilitate the understanding of the dataset and enable the selection of an appropriate statistical methodology for the analysis of the data.

Measures of Central Tendency

There are three main features of populations (or sample data) that are always critical in understanding and interpreting their distributions - Center, Spread and Shape. The main measures of centrality are mean, median and mode(s).

Measures of Variation

There are many measures of (population or sample) spread, e.g., the range, the variance, the standard deviation, mean absolute deviation, etc. These are used to assess the dispersion or variation in the population.

Measures of Shape

The shape of a distribution can usually be determined by just looking at a histogram of a (representative) sample from that population frequency plots, dot plots or stem and leaf displays may be helpful.

Statistics

Variables can be summarized using statistics - functions of data samples.

Graphs & Exploratory Data Analysis

Graphical visualization and interrogation of data are critical components of any reliable method for statistical modeling, analysis and interpretation of data.

Chapter III: Probability

Probability is important in many studies and disciplines because measurements, observations and findings are often influenced by variation. In addition, probability theory provides the theoretical groundwork for statistical inference.

Fundamentals

Some fundamental concepts of probability theory include random events, sampling, types of probabilities, event manipulations and axioms of probability.

Rules for Computing Probabilities

There are many important rules for computing probabilities of composite events. These include conditional probability, statistical independence, multiplication and addition rules, the law of total probability and the Bayesian rule.

Probabilities Through Simulations

Many experimental setting require probability computations of complex events. Such calculations may be carried out exactly, using theoretical models, or approximately, using estimation or simulations.

Counting

There are many useful counting principles (including permutations and combinations) to compute the number of ways that certain arrangements of objects can be formed. This allows counting-based estimation of probabilities of complex events.

Chapter IV: Probability Distributions

There are two basic types of processes that we observe in nature - discrete and continuous. We begin by discussing several important discrete random processes, their distributions, expectations, variances and applications. In the next chapter, we will discuss their continuous counterparts.

Random Variables

To simplify the calculations of probabilities, we will define the concept of a random variable which will allows us to study uniformly various processes, using the same mathematical and computational techniques.

Expectation (Mean) and Variance

The expectation and the variance for any discrete random variable or process are important measures of centrality and dispersion.

Bernoulli & Binomial Experiments

The Bernoulli and Binomial processes provide the simplest models for discrete random experiments.

Geometric, Hypergeometric & Negative Binomial

The Geometric, Hypergeometric and Negative Binomial distributions provide computational models for calculating probabilities for a large number of experiment and random variables. This section presents the theoretical foundations and the applications of each of these discrete distributions.

Poisson Distribution

The Poisson distribution models many different discrete processes where the probability of the observed phenomenon is constant in time or space. Poisson distribution may be used as approximation to Binomial.

Chapter V: Normal Probability Distribution

The normal distribution is perhaps the most important model for studying quantitative phenomena in the natural and behavioral sciences - this is due to the Central Limit Theorem. Many numerical measurements (e.g., weight, time, etc.) can be well approximated by the normal distribution.

The Standard Normal Distribution

The standard Normal distribution is the simplest version (zero-mean, unit-standard-deviation) of the (general) Normal distribution. Yet, it is perhaps the most frequently used version because many tables and computational resources are explicitly available for calculating probabilities.

Nonstandard Normal Distribution: Finding Probabilities

In practice, the mechanisms underlying natural phenomena may be unknown, yet the use of the normal model can be theoretically justified in many situations to compute critical and probability values for various processes.

Nonstandard Normal Distribution: Finding Scores (critical values)

In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal distribution for a given p-value.

Chapter VI: Relations Between Distributions

In this chapter we explore the relations between different distributions. This knowledge will help us in two ways: First, some inter-distribution relations will enable us to compute difficult probabilities using reasonable approximations; Second, it would help us identify appropriate probability models, graphical and statistical analysis tools for data interpretation.

The Central Limit Theorem

The exploration of the relation between different distributions begins with the study of the sampling distribution of the sample average. This will demonstrate the universally important role of normal distribution.

Law of Large Numbers

Suppose the relative frequency of occurrence of one event whose probability to be observed at each experiment is p. If we repeat the same experiment over and over, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of experiments increases. Why is that and why is this important?

Normal Distribution as Approximation to Binomial Distribution

Normal distribution provides a valuable approximation to Binomial when the sample sizes are large and the probability of success and failure are not close to zero.

Poisson Approximation to Binomial Distribution

Poisson provides an approximation to Binomial distribution when the sample sizes are large and the probability of success or failure is close to zero.

Binomial Approximation to HyperGeometric

Binomial distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the population sizes are large (relative to the sample size) and the probability of success are not close to zero.

Normal Approximation to Poisson

The Poisson can be approximated fairly well by Normal distribution when λ is large.

Chapter VII: Point and Interval Estimates

Estimation of population parameters is critical in many applications. Estimation is most frequently carried in terms of point-estimates or interval (range) estimates for population parameters that are of interest.

Estimating a Population Mean: Large Samples

This section discusses how to find point and interval estimates when the sample-sizes are large.

Estimating a Population Mean: Small Samples

Next, we discuss point and interval estimates when the sample-sizes are small. Naturally, the point estimates are less precise and the interval estimates produce wider intervals, compared to the case of large-samples.

Student's T distribution

The Student's t-distribution arises in the problem of estimating the mean of a normally distributed population when the sample size is small and the population variance is unknown.

Estimating a Population Proportion

Normal distribution is appropriate model for proportions, when the sample size is large enough. In this section we demonstrate how to obtain point and interval estimates for population proportion.

Estimating a Population Variance

In many processes and experiments, controlling the amount of variance is of critical importance. Thus the ability to assess variation, using point and interval estimates, facilitates our ability to make inference, revise manufacturing protocols, improve clinical trials, etc.

Chapter VIII: Hypothesis Testing

Fundamentals of Hypothesis Testing

Overview TBD

Testing a Claim about a Mean: Large Samples

Overview TBD

Testing a Claim about a Mean: Small Samples

Overview TBD

Testing a Claim about a Proportion

Overview TBD

Testing a Claim about a Standard Deviation or Variance

Overview TBD

Chapter IX: Inferences from Two Samples

Inferences about Two Means: Dependent Samples

Overview TBD

Inferences about Two Means: Independent and Large Samples

Overview TBD

Comparing Two Variances

Overview TBD

Inferences about Two Means: Independent and Small Samples

Overview TBD

Inferences about Two Proportions

Overview TBD

Chapter X: Correlation and Regression

Correlation

Overview TBD

Regression

Overview TBD

Variation and Prediction Intervals

Overview TBD

Multiple Regression

Overview TBD

Chapter XI: Non-Parametric Inference

Differences of Means of Two Paired Samples

Overview TBD

Differences of Means of Two Independent Samples

Overview TBD

Differences of Medians of Two Paired Samples

Overview TBD

Differences of Medians of Two Independent Samples

Overview TBD

Differences of Proportions of Two Independent Samples

Overview TBD

Differences of Means of Several Independent Samples

Overview TBD

Differences of Variances of Two Independent Samples

Overview TBD

Chapter XII: Multinomial Experiments and Contingency Tables

Multinomial Experiments: Goodness-of-Fit

Overview TBD

Contingency Tables: Independence and Homogeneity

Overview TBD

Chapter XIII: Statistical Process Control

Control Charts for Variation and Mean

Overview TBD

Control Charts for Attributes

Overview TBD

Chapter XIV: Survival/Failure Analysis

Overview TBD

Chapter XV: Multivariate Statistical Analyses

Multivariate Analysis of Variance

Overview TBD

Multiple Linear Regression

Overview TBD

Logistic Regression

Overview TBD

Log-Linear Regression

Overview TBD

Multivariate Analysis of Covariance

Overview TBD

Chapter XVI: Time Series Analysis

Overview TBD





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif