Difference between revisions of "AP Statistics Curriculum 2007 EDA Pics"

From SOCR
Jump to: navigation, search
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(14 intermediate revisions by 4 users not shown)
Line 2: Line 2:
  
 
===Pictures of Data===
 
===Pictures of Data===
There are a [[SOCR_EduMaterials_ChartsActivities | varieties of graphs and plots]] that may be used to display data.  
+
There are [[SOCR_EduMaterials_ChartsActivities | a variety of graphs and plots]] that may be used to display data.  
* For '''quantitative''' variables, we need to make classes (meaningful intervals) first. To accomplish this we need to separate (or bin) the quantitative data into classes.   
+
* For '''quantitative''' variables, we need to make classes (meaningful intervals) first. To accomplish this, we need to separate (or bin) the quantitative data into classes.   
* For qualitative variables we need to use the frequency counts, instead of the native measurements as the latter may not even have a natural ordering (so binning the variables in classes may not be possible).
+
* For qualitative variables, we need to use the frequency counts instead of the native measurements, as the latter may not even have a natural ordering (so binning the variables in classes may not be possible).
* How to define the number of bins or classes? One common rule of thumb is that the number of classes should be close to <math>\sqrt{sample-size}</math>. For accurate interpretation of data, it is important that all classes (or bins) are of equal width. Once we have our classes we can create a frequency/relative frequency table or histogram.
+
* How to define the number of bins or classes? One common ''rule of thumb'' is that the number of classes should be close to <math>\sqrt{\texttt{sample size}}.</math> For accurate interpretation of data, it is important that all classes (or bins) are of equal width. Once we have our classes, we can create a frequency/relative frequency table or histogram.
  
 
===Example===
 
===Example===
People who are concerned about their health may prefer hot dogs that are low in salt and calories. The [[SOCR_012708_ID_Data_HotDogs | Hot dogs datafile]] contains data on the ''sodium'' and ''calories'' contained in each of 54 major hot dog brands. The hot dogs are also classified by type: ''beef'', ''poultry'', and ''meat'' (mostly pork and beef, but up to 15% poultry meat).  For now we will focus on the calories of these sampled hotdogs.  
+
People who are concerned about their health may prefer hot dogs that are low in salt and calories. The [[SOCR_012708_ID_Data_HotDogs | Hot dogs data file]] contains data on the ''sodium'' and ''calories'' contained in each of 54 major hot dog brands. The hot dogs are also classified by type: ''beef'', ''poultry'', and ''meat'' (mostly pork and beef, but up to 15% poultry meat).  For now we will focus on the calories of these sampled hotdogs.  
  
 
===Frequency Histogram Charts===
 
===Frequency Histogram Charts===
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_ChartsActivities | Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_ChartsActivities | Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
  
* The histogram of the '''Calory''' content of all hotdogs in shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations. Could this be related to the type of meat in the hotdogs?
+
* The histogram of the '''Calorie''' content of all hotdogs in shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations. Could this be related to the type of meat in the hotdogs?
  
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig3.jpg|500px]]</center>
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig3.jpg|500px]]</center>
Line 24: Line 24:
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_Activities_BoxAndWhiskerChart | Box-And-Whisker Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_Activities_BoxAndWhiskerChart | Box-And-Whisker Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
  
* The graph below shows the box and whisker plot of the '''Calory''' content for all 3 types of hotdogs.
+
* The graph below shows the box and whisker plot of the '''Calorie''' content for all 3 types of hotdogs.
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig5.jpg|500px]]</center>
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig5.jpg|500px]]</center>
  
Line 33: Line 33:
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_Activities_DotChart | Dot Plot Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
 
* Using [http://socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and the [[SOCR_EduMaterials_Activities_DotChart | Dot Plot Charts activities]] you can produce a number of interesting graphical summaries for [[SOCR_012708_ID_Data_HotDogs | this hotdogs dataset]].
  
* The graph below shows the dot-plot of the '''Calory''' content for all 3 types of hotdogs.
+
* The graph below shows the dot-plot of the '''Calorie''' content for all 3 types of hotdogs.
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig7.jpg|500px]]</center>
 
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig7.jpg|500px]]</center>
 +
 +
* The graph below shows the dot-plot of the '''Sodium''' content for all 3 types of hotdogs.
 +
<center>[[Image:SOCR_EBook_Dinov_EDA_012708_Fig8.jpg|500px]]</center>
 +
 +
 +
===Stem-and-Leaf Plots===
 +
Stem-and-leaf plot is a method of presenting quantitative data in a graphical format, similar to a histogram. It is used to assist in visualizing the shape of a distribution. Stem-and-leaf plots are useful tools in exploratory data analysis. Unlike histograms, stem-and-leaf plots retain the original data to at least two significant digits, and put the data in order, which simplifies the move to order-based inference and non-parametric statistics. A basic stem-plot contains two columns separated by a vertical line. The left column contains the '''stems''' and the right column contains the '''leaves'''.
 +
 +
* Construction: To construct a stem-and-leaf plot, the observations must first be sorted in ascending order. Then, it must be determined what the '''stems''' will represent and what the '''leaves''' will represent. Frequently, the leaf contains the last digit of the number and the stem contains all of the other digits. Sometimes, the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stems.
 +
 +
* Example: The stem-and-leaf plot for the '''calorie''' variable of the [[SOCR_012708_ID_Data_HotDogs | Hot dogs data]] is shown below.
 +
**'''Legend''': Stem-and-leaf of Calories, N  = 54; Leaf Unit = 1.0
 +
 +
{| class="wikitable"
 +
|+Stem-and-Leaf Plot for Hot-Dog Calories
 +
|-
 +
| 8  || | || 67
 +
|-
 +
|  9  || | || 49
 +
|-
 +
|  10  || | || 22677
 +
|-
 +
|  11  || | || 13
 +
|-
 +
|  12  || | || 9
 +
|-
 +
|  13  || | || 1225556899
 +
|-
 +
| 14  || | || 01234667899
 +
|-
 +
| 15  || | || 223378
 +
|-
 +
| 16 || | ||
 +
|-
 +
| 17  || | || 235569
 +
|-
 +
| 18  || | || 1246
 +
|-
 +
| 19  || | || 00015
 +
|}
  
 
===Summary===
 
===Summary===
* Histograms can handle large data sets, but can’t tell exact data values and require the user to set-up classes
+
* Histograms can handle large data sets, but they can’t tell exact data values and require the user to set-up classes.
* Dot plots can get a better picture of data values, but can’t handle large data sets
+
* Dot plots can get a better picture of data values, but they can’t handle large data sets.
* Stem and leaf plots can see actual data values, but can’t handle large data sets
+
* Stem and leaf plots can see actual data values, but they can’t handle large data sets.
  
 
<hr>
 
<hr>
 +
 
===References===
 
===References===
 
* [http://www.stat.ucla.edu/%7Edinov/courses_students.dir/07/Fall/STAT13.1.dir/STAT13_notes.dir/lecture02.pdf Lecture notes on EDA]
 
* [http://www.stat.ucla.edu/%7Edinov/courses_students.dir/07/Fall/STAT13.1.dir/STAT13_notes.dir/lecture02.pdf Lecture notes on EDA]
 +
 +
===Problems===
 +
[[EBook_Problems_EDA_Pics | See these practice Problems]]
  
 
<hr>
 
<hr>
 
* SOCR Home page: http://www.socr.ucla.edu
 
* SOCR Home page: http://www.socr.ucla.edu
  
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=AP_Statistics_Curriculum_2007_EDA_Pics}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=AP_Statistics_Curriculum_2007_EDA_Pics}}

Latest revision as of 14:15, 3 March 2020

General Advance-Placement (AP) Statistics Curriculum - Pictures of Data

Pictures of Data

There are a variety of graphs and plots that may be used to display data.

  • For quantitative variables, we need to make classes (meaningful intervals) first. To accomplish this, we need to separate (or bin) the quantitative data into classes.
  • For qualitative variables, we need to use the frequency counts instead of the native measurements, as the latter may not even have a natural ordering (so binning the variables in classes may not be possible).
  • How to define the number of bins or classes? One common rule of thumb is that the number of classes should be close to \(\sqrt{\texttt{sample size}}.\) For accurate interpretation of data, it is important that all classes (or bins) are of equal width. Once we have our classes, we can create a frequency/relative frequency table or histogram.

Example

People who are concerned about their health may prefer hot dogs that are low in salt and calories. The Hot dogs data file contains data on the sodium and calories contained in each of 54 major hot dog brands. The hot dogs are also classified by type: beef, poultry, and meat (mostly pork and beef, but up to 15% poultry meat). For now we will focus on the calories of these sampled hotdogs.

Frequency Histogram Charts

  • The histogram of the Calorie content of all hotdogs in shown in the image below. Note the clear separation of the calories into 3 distinct sub-populations. Could this be related to the type of meat in the hotdogs?
SOCR EBook Dinov EDA 012708 Fig3.jpg
  • The histogram of the Sodium content of all hotdogs in shown in the image below. What patterns in this histogram can you identify? Try to explain!
SOCR EBook Dinov EDA 012708 Fig4.jpg

Box and Whisker Plots

  • The graph below shows the box and whisker plot of the Calorie content for all 3 types of hotdogs.
SOCR EBook Dinov EDA 012708 Fig5.jpg
  • The graph below shows the box and whisker plot of the Sodium (salt) content for all 3 types of hotdogs.
SOCR EBook Dinov EDA 012708 Fig6.jpg

Dot Plots

  • The graph below shows the dot-plot of the Calorie content for all 3 types of hotdogs.
SOCR EBook Dinov EDA 012708 Fig7.jpg
  • The graph below shows the dot-plot of the Sodium content for all 3 types of hotdogs.
SOCR EBook Dinov EDA 012708 Fig8.jpg


Stem-and-Leaf Plots

Stem-and-leaf plot is a method of presenting quantitative data in a graphical format, similar to a histogram. It is used to assist in visualizing the shape of a distribution. Stem-and-leaf plots are useful tools in exploratory data analysis. Unlike histograms, stem-and-leaf plots retain the original data to at least two significant digits, and put the data in order, which simplifies the move to order-based inference and non-parametric statistics. A basic stem-plot contains two columns separated by a vertical line. The left column contains the stems and the right column contains the leaves.

  • Construction: To construct a stem-and-leaf plot, the observations must first be sorted in ascending order. Then, it must be determined what the stems will represent and what the leaves will represent. Frequently, the leaf contains the last digit of the number and the stem contains all of the other digits. Sometimes, the data values may be rounded to a particular place value (such as the hundreds place) that will be used for the leaves. The remaining digits to the left of the rounded place value are used as the stems.
  • Example: The stem-and-leaf plot for the calorie variable of the Hot dogs data is shown below.
    • Legend: Stem-and-leaf of Calories, N = 54; Leaf Unit = 1.0
Stem-and-Leaf Plot for Hot-Dog Calories
8 67
9 49
10 22677
11 13
12 9
13 1225556899
14 01234667899
15 223378
16
17 235569
18 1246
19 00015

Summary

  • Histograms can handle large data sets, but they can’t tell exact data values and require the user to set-up classes.
  • Dot plots can get a better picture of data values, but they can’t handle large data sets.
  • Stem and leaf plots can see actual data values, but they can’t handle large data sets.

References

Problems

See these practice Problems


"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif