Difference between revisions of "SOCR EduMaterials ModelerActivities MixtureModel 1"
m |
m |
||
Line 3: | Line 3: | ||
== This is a SOCR Activity that demonstrates random sampling and fitting of mixture models to data == | == This is a SOCR Activity that demonstrates random sampling and fitting of mixture models to data == | ||
− | * '''Data Generation''': You typically have investigator-acquired data that you need to fit a model to. In this case we will generate the data by randomly sampling using the SOCR resource. Go to the SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Modeler.html Modeler] and select the '''Data Generation''' tab from the right panel. | + | * '''Data Generation''': You typically have investigator-acquired data that you need to fit a model to. In this case we will generate the data by randomly sampling using the SOCR resource. Go to the SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Modeler.html Modeler] and select the '''Data Generation''' tab from the right panel. <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig2.jpg|400px]]</center> **Now, click the '''Raw Data''' check-box in the left panel, select '''Laplace Distribution''' (or any other distribution you want to sample data from), choose the '''sample-size''' to be 100 (keep the center, mu, at zero) and click '''Sample'''. Then go to the '''Data''' tab, in the right panel. There you should see the 100 random Laplace observations stored as a column vector. |
− | <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig2.jpg|400px]]</center> | ||
− | **Now, click the '''Raw Data''' check-box in the left panel, select '''Laplace Distribution''' (or any other distribution you want to sample data from), choose the '''sample-size''' to be 100 (keep the center, mu, at zero) and click '''Sample'''. Then go to the '''Data''' tab, in the right panel. There you should see the 100 random Laplace observations stored as a column vector. | ||
** Next, go back to the '''Data Generation''' tab from the right panel and change the center of the Laplace distribution (set Mu=20, say). Click '''Sample''' again and you will see the list of randomly generated data in the '''Data''' tab expand to 200 (as you sampled another set of 100 random Laplace observations). | ** Next, go back to the '''Data Generation''' tab from the right panel and change the center of the Laplace distribution (set Mu=20, say). Click '''Sample''' again and you will see the list of randomly generated data in the '''Data''' tab expand to 200 (as you sampled another set of 100 random Laplace observations). | ||
− | * '''Exploratory Data Analysis (EDA)''': Go to the Data tab and select all observations in the data column (use CTR-A, or mouse-copy). Then open another web browser and go to SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Charts.html Charts]. Choose '''HistogramChartDemo2''', say, clear the default data ('''Data''' tab) and paste in (CTR-V or mouse paste-in) the first column the 200 observations that you sampled in the SOCR Modeler Data Generator (above). Then you need to '''map''' the values - go to the '''Mapping''' tab, select the first column, where you pasted the data (C1), and click '''XValue'''. This will move the C1 column label from the right bin to the bottom-right bin. Finally, click '''Update Chart''' and go to the '''Graph''' tab to see your histogram of the 200 (bimodal) Laplace observations. Notice, that | + | * '''Exploratory Data Analysis (EDA)''': Go to the Data tab and select all observations in the data column (use CTR-A, or mouse-copy). Then open another web browser and go to SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Charts.html Charts]. Choose '''HistogramChartDemo2''', say, clear the default data ('''Data''' tab) and paste in (CTR-V or mouse paste-in) the first column the 200 observations that you sampled in the SOCR Modeler Data Generator (above). Then you need to '''map''' the values - go to the '''Mapping''' tab, select the first column, where you pasted the data (C1), and click '''XValue'''. This will move the C1 column label from the right bin to the bottom-right bin. Finally, click '''Update Chart''' and go to the '''Graph''' tab to see your histogram of the 200 (bimodal) Laplace observations. Notice, that you can change the width of the histogram bin to clearly see the bi-modality of the distribution of these 200 measurements. Of course, this is due to the fact that we sampled from two distinct Laplace distributions, one with mean of zero and the second with mean of 20.0. |
<center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig3.jpg|400px]]</center> | <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig3.jpg|400px]]</center> | ||
* '''Model Fitting''': Now go back to the SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Modeler.html Modeler] browser (where you did the data sampling). Choose Mixed-Model-Fit from the drop-down list in the left panel. <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig4.jpg|400px]]</center> | * '''Model Fitting''': Now go back to the SOCR [http://socr.stat.ucla.edu/htmls/SOCR_Modeler.html Modeler] browser (where you did the data sampling). Choose Mixed-Model-Fit from the drop-down list in the left panel. <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig4.jpg|400px]]</center> | ||
− | * We will now try to fit a 2-component mixture of Gaussian (Normal) distributions to this Bimodal Laplace distribution (of the generated sample). You may need to click the Re-Initialize button a few times. The Expectation-Maximization algorithm used to estimate the mixture distribution parameters is unstable and will produce somewhat different results for different initial | + | * We will now try to fit a 2-component mixture of Gaussian (Normal) distributions to this Bimodal Laplace distribution (of the generated sample). You may need to click the Re-Initialize button a few times. The [http://www.stat.ucla.edu/%7Edinov/courses_students.dir/04/Spring/Stat233.dir/STAT233_notes.dir/EM_Tutorial.pdf Expectation-Maximization algorithm] used to estimate the mixture distribution parameters is unstable and will produce somewhat different results for different initial conditions. <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig5.jpg|400px]]</center> |
− | * Notice the quantitative results of this mixture model fitting protocol (in the Results panel). | + | * Notice the quantitative results of this mixture model fitting protocol (in the Results panel). Recall that we sampled 100 observations from Laplace distribution with mean of zero (not Normal Gaussian, which we could also have done and the fit would have been much better, of course) and then another 100 observations from Laplace distribution with mean = 20.0. The reported estimates of the means of the two Gaussian mixtures are 0 and 22 (pretty close to the original/theoretical means). We could have also fit in a mixture of 3 (or more) Gaussian mixture components, if we had a reason to believe that the mixture distribution is tri- (or higher-)modal. |
<center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig6.jpg|400px]]</center> | <center>[[Image:SOCR_ModelerActivities_MixtureModelFit_Dinov_011707_Fig6.jpg|400px]]</center> | ||
− | + | * '''Caution''': You may need to properly set the values of the sliders on the top of your Graph tab, in the right panel, so that you can see the entire graph of the histogram and the models fit to the data. Also, the random data you generate and the EM algorithm are stochastic and you can not expect to get exactly the same results, charts as reported in this SOCR activity. Everyone that tries to replicate these steps will obtain different results, however, the principles we demonstrate here are indeed robust. | |
Revision as of 15:12, 17 January 2007
SOCR Modeler Activities - SOCR Mixture Model Fitting Activity
This is a SOCR Activity that demonstrates random sampling and fitting of mixture models to data
- Data Generation: You typically have investigator-acquired data that you need to fit a model to. In this case we will generate the data by randomly sampling using the SOCR resource. Go to the SOCR Modeler and select the Data Generation tab from the right panel.
**Now, click the Raw Data check-box in the left panel, select Laplace Distribution (or any other distribution you want to sample data from), choose the sample-size to be 100 (keep the center, mu, at zero) and click Sample. Then go to the Data tab, in the right panel. There you should see the 100 random Laplace observations stored as a column vector. - Next, go back to the Data Generation tab from the right panel and change the center of the Laplace distribution (set Mu=20, say). Click Sample again and you will see the list of randomly generated data in the Data tab expand to 200 (as you sampled another set of 100 random Laplace observations).
- Exploratory Data Analysis (EDA): Go to the Data tab and select all observations in the data column (use CTR-A, or mouse-copy). Then open another web browser and go to SOCR Charts. Choose HistogramChartDemo2, say, clear the default data (Data tab) and paste in (CTR-V or mouse paste-in) the first column the 200 observations that you sampled in the SOCR Modeler Data Generator (above). Then you need to map the values - go to the Mapping tab, select the first column, where you pasted the data (C1), and click XValue. This will move the C1 column label from the right bin to the bottom-right bin. Finally, click Update Chart and go to the Graph tab to see your histogram of the 200 (bimodal) Laplace observations. Notice, that you can change the width of the histogram bin to clearly see the bi-modality of the distribution of these 200 measurements. Of course, this is due to the fact that we sampled from two distinct Laplace distributions, one with mean of zero and the second with mean of 20.0.
- Model Fitting: Now go back to the SOCR Modeler browser (where you did the data sampling). Choose Mixed-Model-Fit from the drop-down list in the left panel.
- We will now try to fit a 2-component mixture of Gaussian (Normal) distributions to this Bimodal Laplace distribution (of the generated sample). You may need to click the Re-Initialize button a few times. The Expectation-Maximization algorithm used to estimate the mixture distribution parameters is unstable and will produce somewhat different results for different initial conditions.
- Notice the quantitative results of this mixture model fitting protocol (in the Results panel). Recall that we sampled 100 observations from Laplace distribution with mean of zero (not Normal Gaussian, which we could also have done and the fit would have been much better, of course) and then another 100 observations from Laplace distribution with mean = 20.0. The reported estimates of the means of the two Gaussian mixtures are 0 and 22 (pretty close to the original/theoretical means). We could have also fit in a mixture of 3 (or more) Gaussian mixture components, if we had a reason to believe that the mixture distribution is tri- (or higher-)modal.
- Caution: You may need to properly set the values of the sliders on the top of your Graph tab, in the right panel, so that you can see the entire graph of the histogram and the models fit to the data. Also, the random data you generate and the EM algorithm are stochastic and you can not expect to get exactly the same results, charts as reported in this SOCR activity. Everyone that tries to replicate these steps will obtain different results, however, the principles we demonstrate here are indeed robust.
- SOCR Home page: http://www.socr.ucla.edu
Translate this page: