Difference between revisions of "SOCR EduMaterials Activities Histogram Graphs"

From SOCR
Jump to: navigation, search
Line 19: Line 19:
 
* Try revising some of the numbers in the second (frequency) column and click '''UPDATE''' button to see the effect of these changes on the histogram.
 
* Try revising some of the numbers in the second (frequency) column and click '''UPDATE''' button to see the effect of these changes on the histogram.
 
* Remeber that if you enter your own data you need to go to the '''MAP''' tab-pane and select the colums that contain your histogram bin and frequency columns.
 
* Remeber that if you enter your own data you need to go to the '''MAP''' tab-pane and select the colums that contain your histogram bin and frequency columns.
 +
* Using the '''SHOW_ALL''' tab-pane you can see all three (graph, data and mapping) in the same view.
 
<center>[[Image:SOCR_Activities_HistogramGraphing_Dinov_061307_Fig2.png|400px]]</center>
 
<center>[[Image:SOCR_Activities_HistogramGraphing_Dinov_061307_Fig2.png|400px]]</center>
  
=== '''Exercise 3''': Histogram from Categories and Frequencies===
+
=== '''Exercise 3''': Histogram from Simulated Data===
* Let’s first get some data: Go to [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout. We choose Cauchy data to demonstrate how the Power Transform Family allows us to normalize data that is far from being Normal-like.
+
* Let’s first get some data: Go to [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout this exercise.  
<center>[[Image:SOCR_Activities_HistogramGraphing_Dinov_061307_Fig3.png|400px]]</center>
+
* Next, paste (CNT-V) these 100 observations in [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] '''HistogramChartDemo''' (BarCharts -> XYChart). Go to the '''MAP''' tab-pane and select the first colume (where you pasted your data) in the '''XValue''' bin. Click '''Update Chart''' to see the histogram plot of these 100 Cauchy observations in RED!
 
+
* Note that the shape of this data histogram resembles the shape of the [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html Cauchy distribution] that we sampled the data from.
* Next, paste (CNT-V) these 100 observations in [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] (Line-Charts -> Power Transform Chart). Click '''Update Chart''' to see the index plot of this data in RED!
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig2.jpg|400px]]</center>
 
 
 
* Now go to the '''Graph Tab-Pane''' and choose <math>\lambda = 0</math> (the power parameter). Why is <math>\lambda = 0</math> the best choice for this data? Try experimenting with different values of <math>\lambda</math>. Observe the variability in the Graph of the transformed data in Blue (relative to the variability of the native data in Red).
 
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig3.jpg|400px]]</center>
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig3.jpg|400px]]</center>
  
* Then go back to the '''Data Tab-Pane''' and copy in your mouse buffer the transformed data. We will compare how well does [[About_pages_for_SOCR_Distributions | Normal distribution]] fit the histograms of the raw data ([[About_pages_for_SOCR_Distributions | Cauchy distribution]]) and the transformed data. One can experiment with other powers of <math>\lambda</math>, as well! In the case of <math>\lambda =0</math>, the power transform reduces to a '''log transform''', which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the ''T(df)'' distribution provides a 1-parameter homotopy between Cauchy and Normal).
+
==='''Questions'''===
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig4.jpg|400px]]</center>
+
* What is the effect of the width/size of the histogram bin on the shape of the resulting histogram? Would the shape of the histogram change significantly if we alter the bin-size? Does the sample-size play role in this?
 
+
* Would you expect the shape of the sample histogram to ''look like'' the shape of the population distribution the data sample came from?
* Now copy in your mouse buffer the transformed data and paste it in the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler]. Check the '''Estimate Parameters''' check-box on the top-left. This will allow you to fit a Normal curve to the histogram of the (log) Power Family Transformed Data. You see that Normal Distribution is a great fit to the histogram of the transformed Data. Be sure to check the parameters of the Normal Distribution (these are estimated using least squares and reported in the '''Results''' Tab-Pane). In this case, these parameters are: ''Mean = 0.177, Variance = 1.77'', however, these will vary, in general.
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig5.jpg|400px]]</center>
 
 
 
 
 
* '''Questions'''
 
** What is the effect of the width/size of the histogram bin on the shape of the resulting histogram? Would the shape of the histogram change significantly if we alter the bin-size? Does the sample-size play role in this?
 
** Would you expect the shape of the sample histogram to ''look like'' the shape of the population distribution the data sample came from?
 
  
 
<hr>
 
<hr>

Revision as of 17:51, 13 June 2007

SOCR Educational Materials - Activities - SOCR Histogram Generation Graphing Activity

Summary

This is an exploratory data analysis SOCR activity that illustrates the generation and interpretation of the histogram of quantitative data. The complete details about histograms can be found here. In a nutshell, a histogram of a dataset is a graphical visualization of tabulated frequencies or counts of data within equispaced partition of the range of the data. A histogram shows what proportion of measurements that fall into each of the categories defined by the partition of the data range space.

Exercises

Exercise 1: Simple Histogram from Raw Data

  • This exercise demonstrates the construction of a histogram plot from raw quantitative data.
  • First, point your browser to SOCR Charts and select the HistogramChartDemo (under BarCharts --> XYChart). There are three different ways to select data for this histogram chart:
    • Use the default data provided for this chart (DEMO button);
    • Enter your own data. This can be done by copying to the mouse buffer data from external spreadsheet/table, clicking on the top-left cell in the SOCR Histogram Data table and pasting (Paste button) the data into the histogram data table. Remember to MAP the data - this indicates what columns rows, parts of the data need to be used in teh histogram calculations. Then you click UPDATE chart to have the new graph drawn in the Graph tab-pane;
    • Obtain SOCR simulated data from the Data-Generation tab of the SOCR Modeler (an example is shown here).
SOCR Activities HistogramGraphing Dinov 061307 Fig1.png


Exercise 2: Histogram from Categories and Frequencies

  • Again, point your browser to SOCR Charts. This time select the HistogramChartDemo3 chart (under BarCharts --> XYChart). Use the default data provided for this chart (DEMO button).
  • Notice that this time, the chart requires the user to enter the counts/frequencies of observations within each of the range categories (in this default data case, year).
  • Try revising some of the numbers in the second (frequency) column and click UPDATE button to see the effect of these changes on the histogram.
  • Remeber that if you enter your own data you need to go to the MAP tab-pane and select the colums that contain your histogram bin and frequency columns.
  • Using the SHOW_ALL tab-pane you can see all three (graph, data and mapping) in the same view.
SOCR Activities HistogramGraphing Dinov 061307 Fig2.png

Exercise 3: Histogram from Simulated Data

  • Let’s first get some data: Go to SOCR Modeler and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout this exercise.
  • Next, paste (CNT-V) these 100 observations in SOCR Charts HistogramChartDemo (BarCharts -> XYChart). Go to the MAP tab-pane and select the first colume (where you pasted your data) in the XValue bin. Click Update Chart to see the histogram plot of these 100 Cauchy observations in RED!
  • Note that the shape of this data histogram resembles the shape of the Cauchy distribution that we sampled the data from.
SOCR Activities PowerTransformGraphing Dinov 022007 Fig3.jpg

Questions

  • What is the effect of the width/size of the histogram bin on the shape of the resulting histogram? Would the shape of the histogram change significantly if we alter the bin-size? Does the sample-size play role in this?
  • Would you expect the shape of the sample histogram to look like the shape of the population distribution the data sample came from?



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif