# Difference between revisions of "AP Statistics Curriculum 2007 EDA Center"

Line 2: | Line 2: | ||

===Measurements of Central Tendency=== | ===Measurements of Central Tendency=== | ||

− | There are three main features of all populations (or data samples) that are always critical in | + | There are three main features of all populations (or data samples) that are always critical in understanding and interpreting their distributions. These characteristics are '''Center''', '''Spread''' and '''Shape'''. The main measure of centrality are '''mean''', '''median''' and '''mode'''. |

− | |||

Suppose we are interested in the long-jump performance of some students. We can carry an experiment by randomly selecting 8 male statistics students and ask them to perform the standing long jump. In reality every student participated, but for the ease of calculations below we will focus on these eight students. The long jumps were as follows: | Suppose we are interested in the long-jump performance of some students. We can carry an experiment by randomly selecting 8 male statistics students and ask them to perform the standing long jump. In reality every student participated, but for the ease of calculations below we will focus on these eight students. The long jumps were as follows: | ||

Line 13: | Line 12: | ||

|} | |} | ||

− | |||

+ | ===Mean=== | ||

+ | The '''sample-mean''' is the arithmetic average of a finite sample of numbers. In the long-jump example. The sample-mean is calculated as follows: | ||

+ | <center><math>\overline{y} = {1 \over 8} (74+78+106+80+68+64+60+76)=75.75 in.</math></center> | ||

+ | |||

+ | |||

+ | ===Median=== | ||

+ | The '''sample-median''' can be thought of as the point that divides a distribution in half (50/50). The following steps are used to find the sample-median: | ||

+ | * Arrange the data in ascending order | ||

+ | * If the sample size is odd, the median is the middle value of the ordered collection | ||

+ | * If the sample size is even, the median is the average of the middle two values int he ordered collection. | ||

− | === | + | For the long-jump data above we have: |

− | + | * Ordered data: {| class="wikitable" style="text-align:center; width:75%" border="1" | |

+ | |+Long-Jump (inches) Sample Data | ||

+ | |- | ||

+ | | 60 || 64 || 68 || 74 || 76 || 78 || 80 || 106 | ||

+ | |} | ||

+ | * <math>Median = {74+76 \over 2} = 75</math>. | ||

− | + | ===Mode(s)=== | |

+ | The '''modes''' represent the most frequently occurring values ( The numbers that appear the most). The term mode is applied both to probability distributions and to collections of experimental data. | ||

− | + | For instance, for the [[SOCR_012708_ID_Data_HotDogs | Hot dogs datafile]], there apper to be 3 modes for the calorie variable! this is evident by the histogram of the '''Calorie''' content of all hotdogs, shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations. | |

− | |||

− | + | <center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig3.jpg|500px]]</center> | |

− | |||

− | + | ===Resistance=== | |

− | + | A statistic is said to be '''resistant''' if the value of the statistic is relatively unchanged by changes in a small portion of the data. Referencing the formulas for the median, mean and mode which statistic seems to be more resistant? | |

− | === | ||

− | |||

− | + | If you remove the student with the long jump distance of 106 and recalculate the median and mean, which one is altered less (therefore is more resistant)? Notice that the mean is very sensitive to outliers and atypical observations, and hence less resistant than the median. | |

<hr> | <hr> | ||

===References=== | ===References=== | ||

− | * | + | * [http://www.stat.ucla.edu/%7Edinov/courses_students.dir/07/Fall/STAT13.1.dir/STAT13_notes.dir/lecture02.pdf Lecture notes on EDA] |

<hr> | <hr> |

## Revision as of 22:20, 27 January 2008

## Contents

## General Advance-Placement (AP) Statistics Curriculum - Central Tendency

### Measurements of Central Tendency

There are three main features of all populations (or data samples) that are always critical in understanding and interpreting their distributions. These characteristics are **Center**, **Spread** and **Shape**. The main measure of centrality are **mean**, **median** and **mode**.

Suppose we are interested in the long-jump performance of some students. We can carry an experiment by randomly selecting 8 male statistics students and ask them to perform the standing long jump. In reality every student participated, but for the ease of calculations below we will focus on these eight students. The long jumps were as follows:

74 | 78 | 106 | 80 | 68 | 64 | 60 | 76 |

### Mean

The **sample-mean** is the arithmetic average of a finite sample of numbers. In the long-jump example. The sample-mean is calculated as follows:

### Median

The **sample-median** can be thought of as the point that divides a distribution in half (50/50). The following steps are used to find the sample-median:

- Arrange the data in ascending order
- If the sample size is odd, the median is the middle value of the ordered collection
- If the sample size is even, the median is the average of the middle two values int he ordered collection.

For the long-jump data above we have:

- Ordered data: {| class="wikitable" style="text-align:center; width:75%" border="1"

|+Long-Jump (inches) Sample Data |- | 60 || 64 || 68 || 74 || 76 || 78 || 80 || 106 |}

- \(Median = {74+76 \over 2} = 75\).

### Mode(s)

The **modes** represent the most frequently occurring values ( The numbers that appear the most). The term mode is applied both to probability distributions and to collections of experimental data.

For instance, for the Hot dogs datafile, there apper to be 3 modes for the calorie variable! this is evident by the histogram of the **Calorie** content of all hotdogs, shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations.

### Resistance

A statistic is said to be **resistant** if the value of the statistic is relatively unchanged by changes in a small portion of the data. Referencing the formulas for the median, mean and mode which statistic seems to be more resistant?

If you remove the student with the long jump distance of 106 and recalculate the median and mean, which one is altered less (therefore is more resistant)? Notice that the mean is very sensitive to outliers and atypical observations, and hence less resistant than the median.

### References

- SOCR Home page: http://www.socr.ucla.edu

Translate this page: