SOCR EduMaterials Activities PowerTransformFamily Graphs
SOCR Educational Materials - Activities - SOCR Power Transformation Family Graphing Activity
This is activity demonstrates the usage, effects and properties of the modified power transformation family using graphs of scatter plots
- Background: The power transformation family is often used for transforming data for the perpose of making it more Normal-like. The power transformation is continuously varying with respect to the power parameter \(\lambda\) and defined, as continuous pice-wise function, for all \(y>0\) by:
- Exercise 1: Let’s first get some data: Go to SOCR Modeler and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout. We choose Cauchy data to demonstrate how the Power Transform Family allows us to normalize data that is far from being Normal-like.
- Exercise 2: Paste (CNT-V) these 100 observations in SOCR Charts (Line-Charts -> Power Transform Chart). Click Update Chart to see the index plot of this data in RED!
- Exercise 3: Go to the Graph Tab-Pane and choose \(\lambda = 0\) (the power parameter). Why is \(\lambda = 0\) the best choice for this data? Try experimenting with different values of \(\lambda\). Observe the variability in the Graph of the transformed data in Blue (relative to the variability of the native data in Red).
- Exercise 4: Go back to the Data Tab-Pane and copy in your mouse buffer the transformed data. We will compare how well does Normal distribution fit the histograms of the raw data ( Cauchy distribution) and the transformed data. One can experiment with other powers of \(\lambda\), as well! In the case of \(\lambda =0\), the power transform reduces to a log transform, which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the T(df) distribution provides a 1-parameter homotopy between Cauchy and Normal).
- Exercise 5: Now copy in your mouse buffer the transformed data and paste it in the SOCR Modeler. Check the Estimate Parameters check-box on the top-left. This will allow you to fit a Normal curve to the histogram of the (log) Power Family Transformed Data. You see that Normal Distribution is a great fit to the histogram of the transformed Data. Be sure to check the parameters of the Normal Distribution (these are estimated using least squares and reported in the Results Tab-Pane). In this case, these parameters are: Mean = 0.177, Variance = 1.77, however, these will vary, in general.
- Exercise 6: Now let’s try to fit a Normal model to the histogram of the native data (recall that this histogram should be shaped as Cauchy, as we sampled from Cauchy distribution – therefore, we would not expect a Normal Distribution to be a good fit for these data. This fact, by itself, demonstrates the importance of the Power Transformation Family. Basically we were able to Normalize a significantly Non-Normal data set. Go back to the original SOCR Modeler, where you sampled the 100 Cauchy observations. Select NormalFit_Modeler from the drop-down list of models in the top-left and click on the Graphs and Results Tab-Panes to see the graphical results of the histogram of the native (heavy-tailed) data and the parameters of its best Normal Fit. Clearly, as expected, we do not have a got match.
- Exercise 7: Try experimenting with other (real or simulated) data sets and different Power parameters (lambda).
References
- Carroll, RJ and Ruppert, D. On prediction and the power transformation family. Biometrika 68: 609-615.
- SOCR Home page: http://www.socr.ucla.edu
Translate this page: