Difference between revisions of "SOCR EduMaterials Activities RNG"

From SOCR
Jump to: navigation, search
(Applications)
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== [[SOCR_EduMaterials_Activities | SOCR Educational Materials - Activities]] - SOCR Random Numner Generation (RNG) Activity ==
+
== [[SOCR_EduMaterials_Activities | SOCR Educational Materials - Activities]] - SOCR Random Number Generation (RNG) Activity ==
  
=== Summary===
+
== Summary==
This activity describes the need, general methods and SOCR utilities for random number generation and simulation. [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] allows interactive sampling from any [[About_pages_for_SOCR_Distributions | SOCR Distribution]]. This similated data may easily be copied and pasted in different SOCR [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html Analyses] or [http://www.socr.ucla.edu/htmls/SOCR_Charts.html Graphing] tools for further interrogation.
+
This activity describes the need, general methods and SOCR utilities for random number generation and simulation. [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] allows interactive sampling from any [[About_pages_for_SOCR_Distributions | SOCR Distribution]]. The simulated data may then be easily copied and pasted in different SOCR [http://www.socr.ucla.edu/htmls/SOCR_Analyses.html Analyses] or [http://www.socr.ucla.edu/htmls/SOCR_Charts.html Graphing] tools for further interrogation.
  
===Goals===
+
==Goals==
 
The aims of this activity are to:
 
The aims of this activity are to:
 
* motivate the need for robust random number generators
 
* motivate the need for robust random number generators
Line 10: Line 10:
 
* present applications of random number generation
 
* present applications of random number generation
  
===Background & Motivation===
+
==Background & Motivation==
'''How many natural processes or phenomena in real life can you describe that have an exact mathematical close-form description and are completely deterministic?''' Arrival time to school each day? Motion of the Moon around the Earth? The computer CPU? The Atomic clock? It is an unsattling paradox that all natural phenomena we observe are stochastic in nature. Yet, we do not know how to replicate any of them exactly. There are good computational strategies to approximate natural processes using analytical mathematical models; however, upon careful review one always finds out a deterministic pattern in all purely computationally generated processes.
+
'''How many natural processes or phenomena in real life can you describe that have an exact mathematical close-form description and are completely deterministic?''' Arrival times to school each day? Motion of the Moon around the Earth? The computer CPU? The atomic clock? It is an unsettling paradox that all natural phenomena we observe are stochastic in nature. Yet, we do not know how to replicate any of them exactly. There are good computational strategies to approximate natural processes using analytical mathematical models; however, upon careful review one always finds out a deterministic pattern in all purely computationally generated processes.
  
'''There are two strategies to generate random numbers.''' The first one relies on a ''physical process'' which is expected to be random. The other uses ''computational algorithms'' that produce long sequences of apparently random results, which are in fact determined by a shorter initial seed. Random number generators based on ''physical processes'' may be based on random particles' momentum or position or any of the [http://en.wikipedia.org/wiki/GUT three fundamental physical forces]. Examples of such processess are the Atari gaming console (noise from an analog circuits to generate true random numbers), radioactive decay, thermal noise, shot noise and clock drift. A random number generator (RNG) based solely on deterministic computation is referred to ''pseudo-random number generator''. There are [http://en.wikipedia.org/wiki/Random_number_generator various techniques for obtaining computational (pseudo)random numbers]. Virtually all RNG's used in pactice are pseudo-RNGs.  To distinguish real random numbers from the pseudo-random numbers is a very difficult problem.
+
'''There are two strategies to generate random numbers.''' The first one relies on a ''physical process'' which is expected to be random. The other uses ''computational algorithms'' that produce long sequences of apparently random results, which are in fact determined by a shorter initial seed. Random number generators based on ''physical processes'' may be based on random particles' momentum or position or any of the [http://en.wikipedia.org/wiki/GUT three fundamental physical forces]. Examples of such processes are the Atari gaming console (noise from an analog circuits to generate true random numbers), radioactive decay, thermal noise, shot noise and clock drift. A random number generator (RNG) based solely on deterministic computation is referred to ''pseudo-random number generator''. There are [http://en.wikipedia.org/wiki/Random_number_generator various techniques for obtaining computational (pseudo)random numbers]. Virtually all RNG's used in practice are pseudo-RNGs.  To distinguish real random numbers from the pseudo-random numbers is a very difficult problem.
  
 
'''If all natural processes are inherently random and at the same time we can not generate ourselves good (non-deterministic) RNG processes why are we even attempting to do that?''' Wouldn't it be much easier to just use measurements of the natural physical processes? The answer is simple: We typically need to sample/simulate data from a ''specific process'' and it is not easy to show that a physical phenomena we observe has the same distribution as the process of interest! So, our need of sampling from a specific distribution demands that we ensure the proper characteristics of the sample.
 
'''If all natural processes are inherently random and at the same time we can not generate ourselves good (non-deterministic) RNG processes why are we even attempting to do that?''' Wouldn't it be much easier to just use measurements of the natural physical processes? The answer is simple: We typically need to sample/simulate data from a ''specific process'' and it is not easy to show that a physical phenomena we observe has the same distribution as the process of interest! So, our need of sampling from a specific distribution demands that we ensure the proper characteristics of the sample.
  
'''Where does this sampling need come from?''' Random number generators have several important applications in statistical modeling, computer simulation, cryptography, etc. For example, data collection is often very expensive. Hence, to do appropriate inference on datasets of smaller sizes, we may consider simulating repeatedly from appropriate distributions, instead of using real observations. Another example of why are random number generators so important comes from cryptography. It is a commonly held misconception that every encryption method can be broken. Claude Shannon, Bell Labs, 1948, proved that the [http://en.wikipedia.org/wiki/One-time_pad one-time pad cipher] is unbreakable, provided the secret key is truly random and of length equal or greater than the length of the encoded message. [http://en.wikipedia.org/wiki/Monte_carlo_simulation Monte Carlo simulations] are also based on RNGs and are used for finding numerical solutions to (multi-dimensional) mathematical problems that cannot easily be solved exactly. For example, integration, differentiation, root-finding, etc.
+
'''Where does this sampling-need come from?''' Random number generators have several important applications in statistical modeling, computer simulation, cryptography, etc. For example, data collection is often very expensive. Hence, to do appropriate inference on datasets of smaller sizes, we may consider simulating repeatedly from appropriate distributions, instead of using real observations. Another example of why are random number generators so important comes from cryptography. It is a commonly held misconception that every encryption method can be broken. Claude Shannon, Bell Labs, 1948, proved that the [http://en.wikipedia.org/wiki/One-time_pad one-time pad cipher] is unbreakable, provided the secret key is truly random and of length equal or greater than the length of the encoded message. [http://en.wikipedia.org/wiki/Monte_carlo_simulation Monte Carlo simulations] are also based on RNGs and are used for finding numerical solutions to (multi-dimensional) mathematical problems that cannot easily be solved exactly. For example, integration, differentiation, root-finding, etc.
  
 
+
==Exercises==
===Exercises===
+
* '''Exercise 1''': Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and click on the '''Data Generation''' tab. Select 200 observations from the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution], as shown on the image below. Choose this four-tuple for the parameters <math> \alpha=1.5; \beta=3; A=0; B=7</math>. Copy these 200 values in your mouse buffer (CNT-C) and paste them in the '''Data''' tab of the '''LineCharts --> PowerTransformHistogramChart''' under [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]. Then ''Map'' this column to ''XYValue'' (under the '''MAP''' tab) and click '''Update_Chart'''. This will generate the histogram of the 200 observations. Indeed, this graph should look like a discrete analog of the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta] density curve. You can see exactly what the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution] looks like by going to [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions] and selecting <math> Beta(\alpha=1.5; \beta=3; A=0; B=7)</math>.
* Go to the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and click on the '''Data Generation''' tab. Select 200 observations from the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution], as shown on the image below. Choose this four-tuple for the parameters <math> \alpha=1.5; \beta=3; A=0; B=7</math>. Copy these 200 values in your mouse buffer (CNT-C) and paste them in the '''Data''' tab of the '''LineCharts --> PowerTransformHistogramChart''' under [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts]. Then ''Map'' this column to ''XYValue'' (under the '''MAP''' tab) and click '''Update_Chart'''. This will generate the histogram of the 200 observations. Indeed, this graph should look like a discrete analoge of the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta] density curve. YOu can see exactly what the [http://wiki.stat.ucla.edu/socr/index.php/About_pages_for_SOCR_Distributions Generalized Beta Distribution] looks like by going to [http://www.socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions] and selecting <math> Beta(\alpha=1.5; \beta=3; A=0; B=7)</math>.
 
  
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig10.jpg|400px]]
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig10.jpg|400px]]
Line 27: Line 26:
 
</center>
 
</center>
  
* In the '''Graph''' tab of the '''PowerTransformHistogramChart''', change the power-transform parameter (using the slider on the top). All SOCR Histogram charts allow you to choose the width of the histogram bins, using the second slider on the top. Observe the graphical behavior of the '''histogram''' of the transformed data (blue bins) and compare it to the '''histogram''' of the native data (red bins). What power parameter would you suggest that make the '''histogram''' of the power-transformed data better? Why?
+
* '''Exercise 2''': Let’s get some more simulated data: Go to [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may choose any other distribution. Next, paste (CNT-V) these 100 observations in [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] ('''Line-Charts -> Power Transform Chart'''). ''Map'' this column to ''XYValue'' (under the '''MAP''' tab) and click '''Update_Chart''' to see the index plot of this data in RED!  
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig11.jpg|400px]]</center>
+
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig1.jpg|400px]]
 
+
[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig2.jpg|400px]]</center>
 
+
Imagine now that we want to compare the bell-shape-looking [[About_pages_for_SOCR_Distributions | Cauchy]] and [[About_pages_for_SOCR_Distributions | Normal]] distributions. As pointed out above, we can generate 1,000 random Cauchy observations (using the '''Data Generation''' tab of the [http://socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler]). Then we can fit the best possible Normal distribution to the histogram of these 1,000 Cauchy observations. As the figure below shows the best ([http://en.wikipedia.org/wiki/Maximum_likelihood maximum likelihood estimates]) for the mean and variance of the best normal fit are ''(-0.8529654842885793, 6355.713457104439)''. The measure of centrality (mean) is pretty accurately estimated (exact center of Cauchy is zero (Cauchy is symmetric), even though its mean is undefined). The variance is large, as Cauchy distribution has much heavier tails than Normal distribution -- hence it's ''Bell-Shapeness'' is '''misleading'''! The distribution insert on the right illustrates the exact Cauchy distribution, which you can see interactively in [http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions].
=== '''Exercise 4''': Power Transformation Family in a Time/Index Plot Setting===
+
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_091707_Fig3.jpg|400px]]</center>
* Let’s first get some data: Go to [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler] and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may use your own data throughout. We choose Cauchy data to demonstrate how the Power Transform Family allows us to normalize data that is far from being Normal-like.
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig1.jpg|400px]]</center>
 
 
 
* Next, paste (CNT-V) these 100 observations in [http://www.socr.ucla.edu/htmls/SOCR_Charts.html SOCR Charts] (Line-Charts -> Power Transform Chart). Click '''Update Chart''' to see the index plot of this data in RED!
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig2.jpg|400px]]</center>
 
 
 
* Now go to the '''Graph Tab-Pane''' and choose <math>\lambda = 0</math> (the power parameter). Why is <math>\lambda = 0</math> the best choice for this data? Try experimenting with different values of <math>\lambda</math>. Observe the variability in the Graph of the transformed data in Blue (relative to the variability of the native data in Red).
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig3.jpg|400px]]</center>
 
 
 
* Then go back to the '''Data Tab-Pane''' and copy in your mouse buffer the transformed data. We will compare how well does [[About_pages_for_SOCR_Distributions | Normal distribution]] fit the histograms of the raw data ([[About_pages_for_SOCR_Distributions | Cauchy distribution]]) and the transformed data. One can experiment with other powers of <math>\lambda</math>, as well! In the case of <math>\lambda =0</math>, the power transform reduces to a '''log transform''', which is generally a good way to make the histogram of a data set well approximated by a Normal Distribution. In our case, the histogram of the original data is close to Cauchy distribution, which is heavy tailed and far from Normal (Recall that the ''T(df)'' distribution provides a 1-parameter homotopy between Cauchy and Normal).
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig4.jpg|400px]]</center>
 
 
 
* Now copy in your mouse buffer the transformed data and paste it in the [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler]. Check the '''Estimate Parameters''' check-box on the top-left. This will allow you to fit a Normal curve to the histogram of the (log) Power Family Transformed Data. You see that Normal Distribution is a great fit to the histogram of the transformed Data. Be sure to check the parameters of the Normal Distribution (these are estimated using least squares and reported in the '''Results''' Tab-Pane). In this case, these parameters are: ''Mean = 0.177, Variance = 1.77'', however, these will vary, in general.
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig5.jpg|400px]]</center>
 
 
 
* Let’s try to fit a Normal model to the histogram of the native data (recall that this histogram should be shaped as Cauchy, as we sampled from Cauchy distribution – therefore, we would not expect a Normal Distribution to be a good fit for these data. This fact, by itself, demonstrates the importance of the Power Transformation Family. Basically we were able to ''Normalize'' a significantly Non-Normal data set.  Go back to the original [http://www.socr.ucla.edu/htmls/SOCR_Modeler.html SOCR Modeler], where you sampled the 100 Cauchy observations. Select '''NormalFit_Modeler''' from the drop-down list of models in the top-left and click on the '''Graphs''' and '''Results''' Tab-Panes to see the graphical results of the histogram of the native (heavy-tailed) data and the parameters of its best Normal Fit. Clearly, as expected, we do not have a got match.
 
<center>[[Image:SOCR_Activities_PowerTransformGraphing_Dinov_022007_Fig6.jpg|400px]]</center>
 
 
 
 
  
===Applications===
+
==Applications==
The RGN background and motivation section clearly described some of the critical scienctific and technological challenges that rely upon the existence of quality RNGs. Here we present the applications of the SOCR RNG's for various interactive activities and demonstrations.
+
The RNG background and motivation section clearly described some of the critical scientific and technological challenges that rely upon the existence of quality RNGs. Here we present the applications of the SOCR RNG's for various interactive activities and demonstrations.
 
* [[SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs | Power-Transform Family Graphs]]
 
* [[SOCR_EduMaterials_Activities_PowerTransformFamily_Graphs | Power-Transform Family Graphs]]
* [[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_ModelerActivities_MixtureModel_1 | Mixture Model Experiment]]
+
* [http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_ModelerActivities_MixtureModel_1   Mixture Model Experiment]
  
  

Latest revision as of 20:32, 2 May 2008

SOCR Educational Materials - Activities - SOCR Random Number Generation (RNG) Activity

Summary

This activity describes the need, general methods and SOCR utilities for random number generation and simulation. SOCR Modeler allows interactive sampling from any SOCR Distribution. The simulated data may then be easily copied and pasted in different SOCR Analyses or Graphing tools for further interrogation.

Goals

The aims of this activity are to:

  • motivate the need for robust random number generators
  • illustrate how to use the SOCR random number generators
  • present applications of random number generation

Background & Motivation

How many natural processes or phenomena in real life can you describe that have an exact mathematical close-form description and are completely deterministic? Arrival times to school each day? Motion of the Moon around the Earth? The computer CPU? The atomic clock? It is an unsettling paradox that all natural phenomena we observe are stochastic in nature. Yet, we do not know how to replicate any of them exactly. There are good computational strategies to approximate natural processes using analytical mathematical models; however, upon careful review one always finds out a deterministic pattern in all purely computationally generated processes.

There are two strategies to generate random numbers. The first one relies on a physical process which is expected to be random. The other uses computational algorithms that produce long sequences of apparently random results, which are in fact determined by a shorter initial seed. Random number generators based on physical processes may be based on random particles' momentum or position or any of the three fundamental physical forces. Examples of such processes are the Atari gaming console (noise from an analog circuits to generate true random numbers), radioactive decay, thermal noise, shot noise and clock drift. A random number generator (RNG) based solely on deterministic computation is referred to pseudo-random number generator. There are various techniques for obtaining computational (pseudo)random numbers. Virtually all RNG's used in practice are pseudo-RNGs. To distinguish real random numbers from the pseudo-random numbers is a very difficult problem.

If all natural processes are inherently random and at the same time we can not generate ourselves good (non-deterministic) RNG processes why are we even attempting to do that? Wouldn't it be much easier to just use measurements of the natural physical processes? The answer is simple: We typically need to sample/simulate data from a specific process and it is not easy to show that a physical phenomena we observe has the same distribution as the process of interest! So, our need of sampling from a specific distribution demands that we ensure the proper characteristics of the sample.

Where does this sampling-need come from? Random number generators have several important applications in statistical modeling, computer simulation, cryptography, etc. For example, data collection is often very expensive. Hence, to do appropriate inference on datasets of smaller sizes, we may consider simulating repeatedly from appropriate distributions, instead of using real observations. Another example of why are random number generators so important comes from cryptography. It is a commonly held misconception that every encryption method can be broken. Claude Shannon, Bell Labs, 1948, proved that the one-time pad cipher is unbreakable, provided the secret key is truly random and of length equal or greater than the length of the encoded message. Monte Carlo simulations are also based on RNGs and are used for finding numerical solutions to (multi-dimensional) mathematical problems that cannot easily be solved exactly. For example, integration, differentiation, root-finding, etc.

Exercises

  • Exercise 1: Go to the SOCR Modeler and click on the Data Generation tab. Select 200 observations from the Generalized Beta Distribution, as shown on the image below. Choose this four-tuple for the parameters \( \alpha=1.5; \beta=3; A=0; B=7\). Copy these 200 values in your mouse buffer (CNT-C) and paste them in the Data tab of the LineCharts --> PowerTransformHistogramChart under SOCR Charts. Then Map this column to XYValue (under the MAP tab) and click Update_Chart. This will generate the histogram of the 200 observations. Indeed, this graph should look like a discrete analog of the Generalized Beta density curve. You can see exactly what the Generalized Beta Distribution looks like by going to SOCR Distributions and selecting \( Beta(\alpha=1.5; \beta=3; A=0; B=7)\).
SOCR Activities PowerTransformGraphing Dinov 022007 Fig10.jpg

SOCR Activities PowerTransformGraphing Dinov 022007 Fig9.jpg

  • Exercise 2: Let’s get some more simulated data: Go to SOCR Modeler and generate 100 Cauchy Distributed variables. Copy these data in your mouse buffer (CNT-C). Of course, you may choose any other distribution. Next, paste (CNT-V) these 100 observations in SOCR Charts (Line-Charts -> Power Transform Chart). Map this column to XYValue (under the MAP tab) and click Update_Chart to see the index plot of this data in RED!
SOCR Activities PowerTransformGraphing Dinov 022007 Fig1.jpg SOCR Activities PowerTransformGraphing Dinov 022007 Fig2.jpg

Imagine now that we want to compare the bell-shape-looking Cauchy and Normal distributions. As pointed out above, we can generate 1,000 random Cauchy observations (using the Data Generation tab of the SOCR Modeler). Then we can fit the best possible Normal distribution to the histogram of these 1,000 Cauchy observations. As the figure below shows the best (maximum likelihood estimates) for the mean and variance of the best normal fit are (-0.8529654842885793, 6355.713457104439). The measure of centrality (mean) is pretty accurately estimated (exact center of Cauchy is zero (Cauchy is symmetric), even though its mean is undefined). The variance is large, as Cauchy distribution has much heavier tails than Normal distribution -- hence it's Bell-Shapeness is misleading! The distribution insert on the right illustrates the exact Cauchy distribution, which you can see interactively in SOCR Distributions.

SOCR Activities PowerTransformGraphing Dinov 091707 Fig3.jpg

Applications

The RNG background and motivation section clearly described some of the critical scientific and technological challenges that rely upon the existence of quality RNGs. Here we present the applications of the SOCR RNG's for various interactive activities and demonstrations.





Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif