Difference between revisions of "SOCR EduMaterials Activities PowerTransformFamily Graphs"

From SOCR
Jump to: navigation, search
Line 4: Line 4:
 
This is activity demonstrates the usage, effects and properties of the modified power transformation family applied to real or simulated data to reduce variation and enhance Normality. There are 4 exercises each demonstrating the properties of the power transform in different settings for observed or simulated data: X-Y scatter plot, QQ-Normal plot, Histogram plot and Time/Index plot.
 
This is activity demonstrates the usage, effects and properties of the modified power transformation family applied to real or simulated data to reduce variation and enhance Normality. There are 4 exercises each demonstrating the properties of the power transform in different settings for observed or simulated data: X-Y scatter plot, QQ-Normal plot, Histogram plot and Time/Index plot.
  
== '''Background'''==
+
== Background==
 
The '''power transformation family''' is often used for transforming data for the purpose of making it more Normal-like. The power transformation is continuously varying with respect to the power parameter <math>\lambda</math> and defined, as continuous piece-wise function, for all <math>y>0</math> by
 
The '''power transformation family''' is often used for transforming data for the purpose of making it more Normal-like. The power transformation is continuously varying with respect to the power parameter <math>\lambda</math> and defined, as continuous piece-wise function, for all <math>y>0</math> by
 
<center><math>y^{(\lambda)} = \left \{ {(y^{\lambda}-1)} / {\lambda}, if \lambda \neq 0; and \log{y}, if \lambda = 0  \right\} </math> </center>
 
<center><math>y^{(\lambda)} = \left \{ {(y^{\lambda}-1)} / {\lambda}, if \lambda \neq 0; and \log{y}, if \lambda = 0  \right\} </math> </center>
Line 12: Line 12:
 
* This exercise demonstrates the characteristics of the power-transform when applied independently to the two processes in an X-Y scatter plot setting. In this situation, one observed paired (X,Y) observations which are typically plotted X vs. Y in the 2D plane. We are interested in studying the effects of independently applying the power transforms to the X and Y processes. How and why would the corresponding scatter plot change as we vary the power parameters for X and Y?
 
* This exercise demonstrates the characteristics of the power-transform when applied independently to the two processes in an X-Y scatter plot setting. In this situation, one observed paired (X,Y) observations which are typically plotted X vs. Y in the 2D plane. We are interested in studying the effects of independently applying the power transforms to the X and Y processes. How and why would the corresponding scatter plot change as we vary the power parameters for X and Y?
  
* First, point your browser to [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and select the '''PowerTransformXYStatterChart''' (Line-Charts -> PowerTransformXYStatterChart). Then either use the default data provided for this chart, enter your own data (remember to '''MAP''' the data before your '''UPDATE''' the chart) or obtain SOCR simulated data from the '''Data-Generation''' tab of the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (an example is shown later in Exercise 4). As shown on the image below try changing the power parameters for the X and Y power-transforms and observe the graphical behavior of the transformed scatter-plot (blue points connected by a thin line) versus the native (original) data (red color points). We have applied a linear rescaling to the power-transform data to map it in the same space as the original data. This is done purely for visualization perposes, as without this rescalling it will be difficult to see the correspondence of the transformed and original data. Also note the changes of the numerical summaries for the transformed data (bottom text area) as you update the power parameters. What power parameters would you suggest that make the X-Y relation most linear?
+
* First, point your browser to [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and select the '''PowerTransformXYStatterChart''' (Line-Charts -> PowerTransformXYStatterChart). Then either use the default data provided for this chart, enter your own data (remember to '''MAP''' the data before your '''UPDATE''' the chart) or obtain SOCR simulated data from the '''Data-Generation''' tab of the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (an example is shown later in Exercise 4). As shown on the image below try changing the power parameters for the X and Y power-transforms and observe the graphical behavior of the transformed scatter-plot (blue points connected by a thin line) versus the native (original) data (red color points). We have applied a linear rescaling to the power-transform data to map it in the same space as the original data. This is done purely for visualization purposes, as without this rescaling it will be difficult to see the correspondence of the transformed and original data. Also note the changes of the numerical summaries for the transformed data (bottom text area) as you update the power parameters. What power parameters would you suggest that make the X-Y relation most linear?
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig7.jpg|400px]]</center>
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig7.jpg|400px]]</center>
  
  
 
=== '''Exercise 2''': Power Transformation Family in a QQ-Normal Plot Setting===
 
=== '''Exercise 2''': Power Transformation Family in a QQ-Normal Plot Setting===
* The second exercise demonstrates the effects of the power-transform applied to data in a QQ-Normal plot setting. We are interested in studying the effects of power transforming the native (original) data on the qunatiles, relative the Normal quantiles (i.e., QQ-Normal plot effects). How and why do you expect the QQ-Normal plot to change as we vary the power parameter?
+
* The second exercise demonstrates the effects of the power-transform applied to data in a QQ-Normal plot setting. We are interested in studying the effects of power transforming the native (original) data on the quantiles, relative the Normal quantiles (i.e., QQ-Normal plot effects). How and why do you expect the QQ-Normal plot to change as we vary the power parameter?
  
 
* Again go to [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and select the '''PowerTransformQQNormalPlotChart''' (Line-Charts -> PowerTransformQQNormalPlotChart). You can use different data for this experiment - either use the default data provided with the QQ-Normal chart, enter your own data (remember to '''MAP''' the data before your '''UPDATE''' the chart) or obtain SOCR simulated data from the '''Data-Generation''' tab of the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (an example is shown later in Exercise 4). Change the power-transform parameter (using the slider or the by typing int he text area) and observe the graphical behavior of the transformed data in the QQ-Normal plot (green points connected by a thin line) versus the plot of the native data (red color points). What power parameter would you suggest that make the (transformed) data quantiles similar to those of the Normal distribution? Why?
 
* Again go to [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] and select the '''PowerTransformQQNormalPlotChart''' (Line-Charts -> PowerTransformQQNormalPlotChart). You can use different data for this experiment - either use the default data provided with the QQ-Normal chart, enter your own data (remember to '''MAP''' the data before your '''UPDATE''' the chart) or obtain SOCR simulated data from the '''Data-Generation''' tab of the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] (an example is shown later in Exercise 4). Change the power-transform parameter (using the slider or the by typing int he text area) and observe the graphical behavior of the transformed data in the QQ-Normal plot (green points connected by a thin line) versus the plot of the native data (red color points). What power parameter would you suggest that make the (transformed) data quantiles similar to those of the Normal distribution? Why?
Line 35: Line 35:
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig3.jpg|400px]]</center>
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig3.jpg|400px]]</center>
  
* Then go back to the '''Data Tab-Pane''' and copy in your mouse buffer the transformed data. We will compare how well does [[About_pages_for_SOCR_Distributions | Normal distribution]] fit the histograms of the raw data ([[About_pages_for_SOCR_Distributions | Cauchy distribution]]) and the transformed data. One can experiment with other powers of <math>\lambda</math>, as well! In the case of <math>\lambda =0</math>, the power transform reduces to a '''log transform''', which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the T(df) distribution provides a 1-parameter homotopy between Cauchy and Normal).
+
* Then go back to the '''Data Tab-Pane''' and copy in your mouse buffer the transformed data. We will compare how well does [[About_pages_for_SOCR_Distributions | Normal distribution]] fit the histograms of the raw data ([[About_pages_for_SOCR_Distributions | Cauchy distribution]]) and the transformed data. One can experiment with other powers of <math>\lambda</math>, as well! In the case of <math>\lambda =0</math>, the power transform reduces to a '''log transform''', which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the ''T(df)'' distribution provides a 1-parameter homotopy between Cauchy and Normal).
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig4.jpg|400px]]</center>
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig4.jpg|400px]]</center>
  

Revision as of 15:08, 1 March 2007

SOCR Educational Materials - Activities - SOCR Power Transformation Family Graphing Activity

Summary

This is activity demonstrates the usage, effects and properties of the modified power transformation family applied to real or simulated data to reduce variation and enhance Normality. There are 4 exercises each demonstrating the properties of the power transform in different settings for observed or simulated data: X-Y scatter plot, QQ-Normal plot, Histogram plot and Time/Index plot.

Background

The power transformation family is often used for transforming data for the purpose of making it more Normal-like. The power transformation is continuously varying with respect to the power parameter \(\lambda\) and defined, as continuous piece-wise function, for all \(y>0\) by

\(y^{(\lambda)} = \left \{ {(y^{\lambda}-1)} / {\lambda}, if \lambda \neq 0; and \log{y}, if \lambda = 0 \right\} \)

Exercises

Exercise 1: Power Transformation Family in a X-Y scatter Plot Setting

  • This exercise demonstrates the characteristics of the power-transform when applied independently to the two processes in an X-Y scatter plot setting. In this situation, one observed paired (X,Y) observations which are typically plotted X vs. Y in the 2D plane. We are interested in studying the effects of independently applying the power transforms to the X and Y processes. How and why would the corresponding scatter plot change as we vary the power parameters for X and Y?
  • First, point your browser to SOCR Charts and select the PowerTransformXYStatterChart (Line-Charts -> PowerTransformXYStatterChart). Then either use the default data provided for this chart, enter your own data (remember to MAP the data before your UPDATE the chart) or obtain SOCR simulated data from the Data-Generation tab of the SOCR Modeler (an example is shown later in Exercise 4). As shown on the image below try changing the power parameters for the X and Y power-transforms and observe the graphical behavior of the transformed scatter-plot (blue points connected by a thin line) versus the native (original) data (red color points). We have applied a linear rescaling to the power-transform data to map it in the same space as the original data. This is done purely for visualization purposes, as without this rescaling it will be difficult to see the correspondence of the transformed and original data. Also note the changes of the numerical summaries for the transformed data (bottom text area) as you update the power parameters. What power parameters would you suggest that make the X-Y relation most linear?
SOCR Activities PowerTransformGraphing Dinov 022007 Fig7.jpg


Exercise 2: Power Transformation Family in a QQ-Normal Plot Setting

  • The second exercise demonstrates the effects of the power-transform applied to data in a QQ-Normal plot setting. We are interested in studying the effects of power transforming the native (original) data on the quantiles, relative the Normal quantiles (i.e., QQ-Normal plot effects). How and why do you expect the QQ-Normal plot to change as we vary the power parameter?
  • Again go to SOCR Charts and select the PowerTransformQQNormalPlotChart (Line-Charts -> PowerTransformQQNormalPlotChart). You can use different data for this experiment - either use the default data provided with the QQ-Normal chart, enter your own data (remember to MAP the data before your UPDATE the chart) or obtain SOCR simulated data from the Data-Generation tab of the SOCR Modeler (an example is shown later in Exercise 4). Change the power-transform parameter (using the slider or the by typing int he text area) and observe the graphical behavior of the transformed data in the QQ-Normal plot (green points connected by a thin line) versus the plot of the native data (red color points). What power parameter would you suggest that make the (transformed) data quantiles similar to those of the Normal distribution? Why?
SOCR Activities PowerTransformGraphing Dinov 022007 Fig8.jpg

Exercise 3: Power Transformation Family in a Histogram Plot Setting

  • To be completed

Exercise 4: Power Transformation Family in a Time/Index Plot Setting

  • Let’s first get some data: Go to SOCR Modeler and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout. We choose Cauchy data to demonstrate how the Power Transform Family allows us to normalize data that is far from being Normal-like.
SOCR Activities PowerTransformGraphing Dinov 022007 Fig1.jpg
  • Next, paste (CNT-V) these 100 observations in SOCR Charts (Line-Charts -> Power Transform Chart). Click Update Chart to see the index plot of this data in RED!
SOCR Activities PowerTransformGraphing Dinov 022007 Fig2.jpg
  • Now go to the Graph Tab-Pane and choose \(\lambda = 0\) (the power parameter). Why is \(\lambda = 0\) the best choice for this data? Try experimenting with different values of \(\lambda\). Observe the variability in the Graph of the transformed data in Blue (relative to the variability of the native data in Red).
SOCR Activities PowerTransformGraphing Dinov 022007 Fig3.jpg
  • Then go back to the Data Tab-Pane and copy in your mouse buffer the transformed data. We will compare how well does Normal distribution fit the histograms of the raw data ( Cauchy distribution) and the transformed data. One can experiment with other powers of \(\lambda\), as well! In the case of \(\lambda =0\), the power transform reduces to a log transform, which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the T(df) distribution provides a 1-parameter homotopy between Cauchy and Normal).
SOCR Activities PowerTransformGraphing Dinov 022007 Fig4.jpg
  • Now copy in your mouse buffer the transformed data and paste it in the SOCR Modeler. Check the Estimate Parameters check-box on the top-left. This will allow you to fit a Normal curve to the histogram of the (log) Power Family Transformed Data. You see that Normal Distribution is a great fit to the histogram of the transformed Data. Be sure to check the parameters of the Normal Distribution (these are estimated using least squares and reported in the Results Tab-Pane). In this case, these parameters are: Mean = 0.177, Variance = 1.77, however, these will vary, in general.
SOCR Activities PowerTransformGraphing Dinov 022007 Fig5.jpg
  • Let’s try to fit a Normal model to the histogram of the native data (recall that this histogram should be shaped as Cauchy, as we sampled from Cauchy distribution – therefore, we would not expect a Normal Distribution to be a good fit for these data. This fact, by itself, demonstrates the importance of the Power Transformation Family. Basically we were able to Normalize a significantly Non-Normal data set. Go back to the original SOCR Modeler, where you sampled the 100 Cauchy observations. Select NormalFit_Modeler from the drop-down list of models in the top-left and click on the Graphs and Results Tab-Panes to see the graphical results of the histogram of the native (heavy-tailed) data and the parameters of its best Normal Fit. Clearly, as expected, we do not have a got match.
SOCR Activities PowerTransformGraphing Dinov 022007 Fig6.jpg
  • Try experimenting with other (real or simulated) data sets and different Power parameters (\(\lambda\)).

References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif