Difference between revisions of "SMHS ParamInference"

From SOCR
Jump to: navigation, search
(Problems)
 
(53 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 
==[[SMHS| Scientific Methods for Health Sciences]] - Parametric Inference ==
 
==[[SMHS| Scientific Methods for Health Sciences]] - Parametric Inference ==
  
 +
===Overview===
 +
In statistical inference, we aim to draw inferences about an underlying population based on a sample drawn therefrom. For example, we sometimes achieve this by estimating the parameters of a probability density function based on observations. In an idealized case, we would have a perfect model with unknown parameters, and based on this, we would make inferences about the population by estimating the parameters with the data we have. In this section, we are going to introduce the concepts of variables, parametric models, and inference based on these models.
  
 +
===Motivation===
 +
Consider the well-known example of flipping a coin 10 times. Experience tells us that the expected number of heads in one experiment with 10 flips would be equal to the probability of observing a head on each flip.  For an evenly weighted coin, we would expect approximately 5 heads.  If we repeated the experiment many times, we would expect the results to follow a binomial distribution with the parameter $ p=P(head)$ for each flip. In other words, we believe the underlying model to be a binomial $ (n,p) $ where $ n=10 $.
  
'''IV. HS 850: Fundamentals
+
The next step would be to determine the value of $ p $. An obvious way of doing this would be to flip the coin many times (e.g., 100 times) and record the number of heads. The estimate of $ p $ would just be the number of heads in the 100 flips divided by 100. For example, if we got 63 heads, we would estimate the probability $p$ of getting a head on any given flip to be $ 63/100 $. Based on this information, we believe the number of heads in our experiment follows a binomial distribution with parameters $ (n=10,p=0.63) $. That is, we can infer that we will flip an average of 6.3 heads in 10 flips if we repeat the experiment enough times.  
  
Parametric Inference'''
+
Next we will explore the following questions:
  
1) Overview: Statistics aims to retrieving the ‘causes’ (e.g. parameters of a probability density function) from the observations. In statistical inference, we aim to collect information on the underlying population based on a sample drawn from it. The ideal case would be to find the perfect model with unknown parameters based on which we can make further inference about the data (the population) and of which the parameters can be determined with data we have. In this lecture, we are going to introduce to the concept of variables, parametric models and making inference based on the parametric model.
+
* What is a random variable?
 +
* How do we build a parametric model based on data?
 +
* What kind of inference can we make based on a parametric model?
  
 
+
===Theory===
2) Motivation: consider a well-known example of flipping a coin 10 times. Experience tells us that the outcome of the number of heads in one experiment with 10 flips would follow a Binomial Distribution with $ p=P(head)$ in one flip. Here, we have chosen the model to be a Binomial $ (n,p) $, where $ n=10 $. So, the next step would be to determine on the value of $ p $. An obvious way of doing this would be to flip the coin many times (say 100) and get the number of heads and the estimate of $ p $ would just be the number of heads in the 100 flips divided by 100, say $ 63/100 $. Based on the information, we have the number of heads in our experiment follows a Binomial distribution with $ (10,0.63) $. That is, we can infer that we will flip an average of 6.3 heads in 10 flips if we repeat the experiment enough time. So, what is a random variable? How to build up a parametric model based on the data? What kind of inference can we make based on the parametric model?
+
====[http://en.wikipedia.org/wiki/Random_variable Random Variables]====
 
+
A ''random variable'' is a variable whose value is subject to variations due to chance (i.e., randomness). It can take on a set of values, each with an associated probability for discrete variables or a probability density for continuous variables. The value of a random variable represents the possible outcomes of a yet-to-be-performed experiment or the possible outcomes of a past experiment whose pre-existing value is uncertain. The possible values of a random variable and their associated probabilities (known as a probability distribution) can be further described with mathematical functions.  
 
 
3) Theory
 
 
 
*3.1) Random variable: a variable whose value is subject to variations due to chance (i.e., randomness). It can take on a set of values, each with an associated probability for discrete variables or a probability density function for continuous variables. The value of a random variable represents the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain. The possible values of a random variable and their associated probabilities (known as a probability distribution) can be further described with mathematical functions.  
 
  
 
There are two types of random variables:  
 
There are two types of random variables:  
Discrete random variables: take on a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution;
+
* ''Discrete random variables'' take on a specified finite or countable list of values and are endowed with a probability mass function, which is characteristic of a particular probability distribution;
Continuous random variables: take on any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution, or a mixture of both types.
+
* ''Continuous random variables'' take on any numerical value in an interval or collection of intervals via a probability density function, which is characteristic of a probability distribution.
  
 +
====[http://en.wikipedia.org/wiki/Parameter Parameters]====
  
*3.2) Parameters:  a characteristic, or measurable factor that can help in defining a particular system. It is an important element to consider in evaluation or comprehension of an event. Say, μ is often used as the mean and σ is often used as the standard deviation in statistics. The following table provides of a list of commonly used parameters with descriptions:
+
A parameter is a characteristic or measurable factor that can help in defining a particular system. It is an important element to consider when evaluating or trying to understand an event. μ is often used to represent the mean and σ is used to represent the standard deviation in statistics. The following table provides a list of commonly used parameters with descriptions:
  
 +
<center>
 
{| class="wikitable" style="text-align:center; width:75%" border="1"
 
{| class="wikitable" style="text-align:center; width:75%" border="1"
 
|Parameter || Description || Parameter || Description
 
|Parameter || Description || Parameter || Description
 
|-
 
|-
| x ̅ || Sample mean || α,β,γ || Greek
+
| $\bar{x}$ || Sample mean || α,β,γ || Various Greek letters
 
|-
 
|-
| μ || Population mean || θ || Lower case for Theta
+
| μ || Population mean || θ || Lower case theta
 
|-
 
|-
|σ || Population standard deviation || φ || Lower case for Phi
+
|σ || Population standard deviation || φ || Lower case phi
 
|-
 
|-
| σ^2 || Population variance || ω || Lower case for Omega
+
| $σ^2$ || Population variance || ω || Lower case omega
 
|-  
 
|-  
 
| s || Sample standard deviation || ∆ || Increment
 
| s || Sample standard deviation || ∆ || Increment
 
|-
 
|-
| s^2 || Sample variance || ν || Nu
+
| $s^2$ || Sample variance || ν || Nu
 
|-
 
|-
 
| λ || Poisson mean, Lambda || τ || Tau
 
| λ || Poisson mean, Lambda || τ || Tau
Line 47: Line 51:
 
| ϕ || Normal density function, Phi || Θ || Parameter space
 
| ϕ || Normal density function, Phi || Θ || Parameter space
 
|-
 
|-
| Γ || Gamma || Ω || Sample Space, Omega
+
| Γ || Gamma || Ω || Sample space, omega
 
|-
 
|-
| ∂ || Per/ divided || δ || Lower case for Delta
+
| ∂ || Per/ divided || δ || Lower case delta
 
|-
 
|-
| S || Sample space|| Κ,k || Kappa
+
| S || Sample space|| Κ,k || kappa
 
|}
 
|}
 +
</center>
  
 +
====[http://en.wikipedia.org/wiki/Parametric_model Parametric Model]====
  
3.3) Parametric model: a collection of probability distribution that can be described using a finite number of parameters. These parameters are usually collected together to form a single k-dimensional parameter vector $ θ=(θ_1,θ_2,…,θ_k) $. The main characteristics of a parametric mode: all the parameters are in finite-dimensional parameter spaces.
+
A parametric model is a collection of probability distributions that can be described using a finite number of parameters. These parameters are usually written together to form a single k-dimensional parameter vector $\theta=(\theta_1,\theta_2,…,\theta_k)$. The main characteristic of a parametric model is that all the parameters are from finite-dimensional parameter spaces.
 
 
Each member of the collection of the parametric model $ p_θ $ is described by a finite-dimensional parameter $ θ $. The set of all allowable values for the parameter is denoted $ Θ⊆R^k $, and the model itself is written as $ P={p_θ |θ∈Θ} $, when the model consists of absolutely continuous distribution, it is often specified in terms of corresponding probability density function $ P={f_θ |θ∈Θ}$. It’s identifiable if the mapping $ θ→p_θ $ is invertible, that is there are no two different parameter value $ θ_1 $ and $ θ_2 $ such that $ p_{θ_1} =p_{θ_2} $.
 
 
 
 
 
Consider one of the most popular distribution of normal distribution, where the parameter is $ θ=(μ,σ) $, where $ μ∈R $ is a location parameter, and σ>0 is a scale parameter. This parameterized family: $ p={f_θ (x)=1/(√2π σ) e^ {-1/(2σ^2}^{(x-μ)^2} ) |μ∈R,σ>0} $.
 
 
 
 
 
3.4) Parametric inference: here, we would be interested in estimating $ θ $, or more generally, a function of $ θ $, say $ g(θ) $. Let’s consider a few examples that will enable us to understand this.
 
**Let $ x_1,x_2,…,x_n $ be the outcomes of n independent flips of the same coin. Here, we code $ X_i=1 $ if the ''ith'' toss produces a Head and code $ X_i=0 $ if the ''ith'' toss produces a tail. So $ θ $, which is the probability of flipping a head in a single toss could be any number between 0 and 1. We know that $ x_i’s are i.i.d. $ and the common distribution $ p_θ $ is the Bernoulli $ (θ) $ distribution which has the probability mass function of $ f(x,θ)=θ^x (1-θ)^(1-x),x∈{0,1} $. If we repeat the experiment with the same coin for enough time, the average number of heads we will have would be nθ.
 
**Let $ x_1,x_2,…,x_n $ be the number of customers that arrive at n different identical counters in unit time. Then the $ X_i's $ can be though of as i.i.d. random variable with Poisson distribution with mean $ θ $, which varies in the set $ (0,∞) $, representing the parameter space $ Θ $. The probability mass function of $ f(x,θ)=(e^{-θ} θ^{x})/x! $.
 
 
 
 
 
After determining the parameters in the model, we will be able to apply the characteristic of the distribution and the model to the data. The characteristics of various distributions will be discussed further in the ''Distribution'' section. We will also discuss about hypothesis testing and estimation later.
 
 
 
R examples:
 
 
 
Random number generator: the random variable follows a normal distribution. Suppose it follows $ N(0,1) $.
 
## random number generator to generator 10 random variables follow a normal distribution with mean 0, variance 1:
 
 
 
> runif(10,0,1)
 
$ [1] 0.64900447    0.82074379    0.56889471    0.95659206    0.69771341    0.19772881    0.07656862 $
 
$ [8] 0.29823980    0.31825198    0.45029058 $
 
 
 
 
 
 
 
# generate 5 random variables follow a Poisson distribution with lambda = 2
 
 
 
> rpois(5,2)
 
 
 
[1] 3 2 1 4 1
 
 
 
 
 
 
 
# generate 5 random variables follow a Binomial distribution with $ p = 0.3, n = 10 $
 
 
 
> rbinom(5,10,0.3)
 
 
 
[1] 2 3 3 2 3
 
 
 
 
 
 
 
 
 
 
 
4) Applications
 
 
 
4.1) This article (http://link.springer.com/article/10.1007/BF00341287) titled ''Parametric Inference For Imperfectly Observed Gibbsian Fields'' presents a maximum likelihood estimation method for imperfectly observed Gibbsian fields on a finite lattice. This method is an adaptation of the algorithm given in Younes. Presentation of the new algorithm is followed by a theorem about the limit of the second derivative of the likelihood when the lattice increases, which is related to convergence of the method. The paper offers some practical remarks about the implementation of the procedure.
 
 
 
 
 
4.2) This article (http://www.pnas.org/content/101/46/16138.short) uses graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
 
 
 
 
 
 
 
5) Software
 
 
 
http://socr.ucla.edu/htmls/SOCR_Distributions.html
 
 
 
http://socr.ucla.edu/htmls/exp/Bivariate_Normal_Experiment.html
 
 
 
http://socr.ucla.edu/htmls/dist/Multinomial_Distribution.html
 
 
 
http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_Binomial_Distributions
 
 
 
 
 
 
 
 
 
6) Problems
 
 
 
6.1) Suppose we are flipping a fair dice, what would be the average probability that we are going to roll three six in a row? What kind of model we are inferring on?
 
 
 
 
 
6.2) Consider the unfair coin flipping game, where the probability of flipping a head is unknown. Construct an experiment to test the probability of flipping a head in a single experiment. What is the probability that we are going to roll 5 heads out of 8 flips?
 
 
 
 
 
6.3) Random number generator is a commonly used in scientific studies. Explain how it works.
 
 
 
 
 
6.4) The average number of homes sold by realty Tom is 3 houses per day, what is the probability that exactly 4 houses will be sold tomorrow?
 
 
 
 
 
6.5) Suppose that the average number of patients with cancer seen per day is 5, what is the probability that less than 4 patients with cancer will be seen on the next day?
 
 
 
 
 
 
 
7) References
 
 
 
http://www.itl.nist.gov/div898/handbook/eda/eda.htm
 
 
http://mirlyn.lib.umich.edu/Record/000252958
 
 
 
http://mirlyn.lib.umich.edu/Record/012841334 
 
 
 
 
 
 
 
 
 
ANSWERS:
 
 
 
6.4) This is a Poisson experiment in which we have the following parameters: μ=3 given that 3 houses are sold per day on average; x=4 given that we want to find the likelihood that 4 houses will be sold tomorrow.
 
 
 
$ p(x;μ)={e^{-μ {μ^{x}/x!, p(4,3) ={e^{-3 3^{4}/4!≈0.168 $
 
 
 
$ p(x;μ)=(e^(-μ) μ^x)/x!,p(4,3)=(e^(-3) 3^4)/4!≈0.168 $
 
 
 
 
 
6.5) This is a Poisson experiment in which we know the following parameters: μ=5 given that 5 patients with cancer will be seen per day on average; $ x=0,1,2,or 3 $ since we want to have 0, 1, 2, or 3 patients with cancer be seen the next day.
 
 
 
$ p(x;μ)=\frac(e^{-μ}μ^{x})/x!,p(x≤3,5)=p(0,5)+p(1,5)+p(2,5)+p(3,5) $
 
 
 
 
 
 
 
$ =(e^(-5) 5^0)/0!+(e^(-5) 5^1)/1!+(e^(-5) 5^2)/2!+(e^(-5) 5^3)/3! $
 
 
 
$ ≈0.0067+0.03369+0.084224+0.140375≈0.2650 $
 
 
 
 
 
             
 
 
 
  
 +
* Each member of the collection of parameters, $ p_θ $, is described by a finite-dimensional parameter $ θ $. The set of all allowable values for the parameter is denoted $ Θ⊆R^k $, and the model itself is written as $ P={p_θ |θ∈Θ} $. If the model consists of absolutely continuous distributions, it is often specified in terms of the corresponding probability density function $ P={f_θ |θ∈Θ}$. The model is considered identifiable if the mapping $ θ→p_θ $ is invertible (i.e., if there are no two different parameter values $ θ_1 $ and $ θ_2 $ such that $ p_{θ_1} =p_{θ_2} $).
  
 +
* Consider one of the most popular distributions, the normal distribution, in which the parameter vector is $ θ=(μ,σ) $. Here, $ μ∈R $ is a location parameter, and σ>0 is a scale parameter. This parametrized family can be expressed as:
 +
$$ \big\{f_θ (x)=\frac{1}{\sqrt{2πσ}} e^{-\frac{1}{2σ^2}{(x-μ)^2}} |μ∈R,σ>0\big\}.$$
  
 +
* Parametric inference: Often, we are interested in estimating $ \theta $, or more generally, a function of $ \theta $, say $ g(\theta) $. Let’s consider a few examples that will enable us to understand this.
 +
** Let $ x_1,x_2,…,x_n $ be the outcomes of $ n $ independent flips of the same coin. Here, we code $ X_i=1 $ if the $i^{th}$ toss produces a head and code $ X_i=0 $ if the $i^{th}$ toss produces a tail. Therefore, $ \theta $, which is the probability of flipping a head in a single toss, could be any number between 0 and 1. We know that the $ x_i$’s are independent and identically distributed (i.i.d.). The distribution $ p_{\theta} $ commonly used to describe this type of experiment is a Bernoulli distribution with parameter $ (\theta) $. It has the probability mass function $ f(x,\theta)=\theta^x (1-\theta)^{1-x}$, $x \in {0,1} $. If we repeat the experiment with the same coin enough times, we would expect to toss $n \theta$ heads on average.
 +
**Let $ x_1,x_2,…,x_n $ be the number of customers that arrive at $n$ different identical counters in a unit of time. The $ X_i$'s can be thought of as an i.i.d. random variable with a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with mean $ \theta $. This distribution varies in the set $ (0,\infty) $, representing the parameter space $ \Theta $. The Poisson distribution is $ f(x,\theta)=e^{-\theta} \frac{\theta^{x}}{x!}$, for each $x=0, 1, 2, ...$.
  
 +
: After determining the parameters of the model, we will be able to apply the characteristics of the distribution and the model to the data. The characteristics of various distributions will be discussed further in the [[SMHS_ProbabilityDistributions|Distribution section]]. We will also discuss hypothesis testing and estimation later.
  
 +
====[http://en.wikipedia.org/wiki/Random_number_generation Random Number Generation]====
 +
* [[SOCR_EduMaterials_Activities_RNG|SOCR RNG examples]]
 +
* R examples: A random variable follows a normal distribution, $ N(\mu=0,\sigma=1) $.
 +
* We use a random number generator to obtain 10 samples from a normal distribution with mean 0 and variance 1:
 +
> r_data_unif <- runif(10,0,1)
 +
> r_data_unif
 +
[1] 0.64900447    0.82074379    0.56889471    0.95659206    0.69771341    0.19772881    0.07656862
 +
[8] 0.29823980    0.31825198    0.45029058
 +
> write.csv(r_data_unif, "/location/file.csv") # to write the data out to a file "file.csv"
 +
> write.csv(r_data_unif, row.names=FALSE)      # to write the data directly into the R-shell
 +
> # (this could be used for copy-pasting the data, "r_data_unif", from R into SOCR, external tables, etc.
  
 +
* We generate 5 random variables following a Poisson distribution with $\lambda = 2$
 +
> rpois(5,2)
 +
[1] 3 2 1 4 1
  
 +
* We generate 5 random variables following a binomial distribution with $ p = 0.3, n = 10 $
 +
> rbinom(5,10,0.3)
 +
[1] 2 3 3 2 3
  
 +
* [[SOCR_EduMaterials_Activities_RNG|SOCR Random Number Generation Activity]]
  
 +
===Applications===
 +
* An article entitled [http://link.springer.com/article/10.1007/BF00341287 Parametric Inference For Imperfectly Observed Gibbsian Fields] presents a maximum likelihood estimation method for imperfectly observed Gibbsian fields on a finite lattice. This method is an adaptation of [http://www.researchgate.net/profile/Laurent_Younes/publication/243645162_Estimation_and_annealing_for_Gibbsian_fields/links/0a85e52fd0f2930737000000.pdf an algorithm developed by the author, Laurent Younes, in 1988].  A presentation of the new algorithm is followed by a theorem about the limit of the second derivative of the likelihood when the lattice increases, which is related to the convergence of the method. This paper offers some practical remarks about the implementation of the procedure.
  
 +
* An article entitled [http://www.pnas.org/content/101/46/16138.short Parametric Inference for Biological Sequence Analysis] uses graphical models that have been applied to problems including hidden [http://en.wikipedia.org/wiki/Markov_model Markov models] for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the [http://en.wikipedia.org/wiki/Belief_propagation sum-product algorithm], solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum ''a posteriori'' inference calculations for graphical models.
  
 +
===Software===
 +
*[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions]
 +
*[http://socr.ucla.edu/htmls/exp/Bivariate_Normal_Experiment.html Bivariate Normal Experiment]
 +
*[http://socr.ucla.edu/htmls/dist/Multinomial_Distribution.html  Multinomial Distribution]
 +
*[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_Binomial_Distributions Activities with Binomial Distributions]
  
 +
===Problems===
 +
* Suppose we are rolling a fair die; what would be the average probability that we are going to roll three sixes in a row? What is the kind of model on which we are inferring?
  
 +
* Consider the unfair coin tossing game, where the probability of tossing a head is unknown. Construct an experiment to test the probability of getting one head in a single experiment. What is the probability that we are going to get 5 heads out of 8 tosses?
  
 +
* Random number generation is commonly used in scientific studies; explain how it works.
  
 +
* The average number of homes sold by a realtor named Tom is 3 houses per day. What is the probability that Tom will sell exactly 4 houses tomorrow?
  
 +
* Suppose that the average number of patients with cancer who are seen per day is 5. What is the probability that less than 4 patients with cancer will be seen on the next day?
  
 +
=== References===
 +
* [http://www.itl.nist.gov/div898/handbook/eda/eda.htm NIST EDA]
 +
* [http://en.wikipedia.org/wiki/Random_variable  Random variable Wikipedia]
 +
* [http://en.wikipedia.org/wiki/Parameter  Parameter Wikipedia]
 +
* [http://en.wikipedia.org/wiki/Parametric_model Parametric model Wikipedia]
  
 
<hr>
 
<hr>

Latest revision as of 15:15, 19 March 2015

Scientific Methods for Health Sciences - Parametric Inference

Overview

In statistical inference, we aim to draw inferences about an underlying population based on a sample drawn therefrom. For example, we sometimes achieve this by estimating the parameters of a probability density function based on observations. In an idealized case, we would have a perfect model with unknown parameters, and based on this, we would make inferences about the population by estimating the parameters with the data we have. In this section, we are going to introduce the concepts of variables, parametric models, and inference based on these models.

Motivation

Consider the well-known example of flipping a coin 10 times. Experience tells us that the expected number of heads in one experiment with 10 flips would be equal to the probability of observing a head on each flip. For an evenly weighted coin, we would expect approximately 5 heads. If we repeated the experiment many times, we would expect the results to follow a binomial distribution with the parameter $ p=P(head)$ for each flip. In other words, we believe the underlying model to be a binomial $ (n,p) $ where $ n=10 $.

The next step would be to determine the value of $ p $. An obvious way of doing this would be to flip the coin many times (e.g., 100 times) and record the number of heads. The estimate of $ p $ would just be the number of heads in the 100 flips divided by 100. For example, if we got 63 heads, we would estimate the probability $p$ of getting a head on any given flip to be $ 63/100 $. Based on this information, we believe the number of heads in our experiment follows a binomial distribution with parameters $ (n=10,p=0.63) $. That is, we can infer that we will flip an average of 6.3 heads in 10 flips if we repeat the experiment enough times.

Next we will explore the following questions:

  • What is a random variable?
  • How do we build a parametric model based on data?
  • What kind of inference can we make based on a parametric model?

Theory

Random Variables

A random variable is a variable whose value is subject to variations due to chance (i.e., randomness). It can take on a set of values, each with an associated probability for discrete variables or a probability density for continuous variables. The value of a random variable represents the possible outcomes of a yet-to-be-performed experiment or the possible outcomes of a past experiment whose pre-existing value is uncertain. The possible values of a random variable and their associated probabilities (known as a probability distribution) can be further described with mathematical functions.

There are two types of random variables:

  • Discrete random variables take on a specified finite or countable list of values and are endowed with a probability mass function, which is characteristic of a particular probability distribution;
  • Continuous random variables take on any numerical value in an interval or collection of intervals via a probability density function, which is characteristic of a probability distribution.

Parameters

A parameter is a characteristic or measurable factor that can help in defining a particular system. It is an important element to consider when evaluating or trying to understand an event. μ is often used to represent the mean and σ is used to represent the standard deviation in statistics. The following table provides a list of commonly used parameters with descriptions:

Parameter Description Parameter Description
$\bar{x}$ Sample mean α,β,γ Various Greek letters
μ Population mean θ Lower case theta
σ Population standard deviation φ Lower case phi
$σ^2$ Population variance ω Lower case omega
s Sample standard deviation Increment
$s^2$ Sample variance ν Nu
λ Poisson mean, Lambda τ Tau
χ χ distribution, Chi η Eta
ρ The density, Rho τ Sometimes used in tau function
ϕ Normal density function, Phi Θ Parameter space
Γ Gamma Ω Sample space, omega
Per/ divided δ Lower case delta
S Sample space Κ,k kappa

Parametric Model

A parametric model is a collection of probability distributions that can be described using a finite number of parameters. These parameters are usually written together to form a single k-dimensional parameter vector $\theta=(\theta_1,\theta_2,…,\theta_k)$. The main characteristic of a parametric model is that all the parameters are from finite-dimensional parameter spaces.

  • Each member of the collection of parameters, $ p_θ $, is described by a finite-dimensional parameter $ θ $. The set of all allowable values for the parameter is denoted $ Θ⊆R^k $, and the model itself is written as $ P={p_θ |θ∈Θ} $. If the model consists of absolutely continuous distributions, it is often specified in terms of the corresponding probability density function $ P={f_θ |θ∈Θ}$. The model is considered identifiable if the mapping $ θ→p_θ $ is invertible (i.e., if there are no two different parameter values $ θ_1 $ and $ θ_2 $ such that $ p_{θ_1} =p_{θ_2} $).
  • Consider one of the most popular distributions, the normal distribution, in which the parameter vector is $ θ=(μ,σ) $. Here, $ μ∈R $ is a location parameter, and σ>0 is a scale parameter. This parametrized family can be expressed as:

$$ \big\{f_θ (x)=\frac{1}{\sqrt{2πσ}} e^{-\frac{1}{2σ^2}{(x-μ)^2}} |μ∈R,σ>0\big\}.$$

  • Parametric inference: Often, we are interested in estimating $ \theta $, or more generally, a function of $ \theta $, say $ g(\theta) $. Let’s consider a few examples that will enable us to understand this.
    • Let $ x_1,x_2,…,x_n $ be the outcomes of $ n $ independent flips of the same coin. Here, we code $ X_i=1 $ if the $i^{th}$ toss produces a head and code $ X_i=0 $ if the $i^{th}$ toss produces a tail. Therefore, $ \theta $, which is the probability of flipping a head in a single toss, could be any number between 0 and 1. We know that the $ x_i$’s are independent and identically distributed (i.i.d.). The distribution $ p_{\theta} $ commonly used to describe this type of experiment is a Bernoulli distribution with parameter $ (\theta) $. It has the probability mass function $ f(x,\theta)=\theta^x (1-\theta)^{1-x}$, $x \in {0,1} $. If we repeat the experiment with the same coin enough times, we would expect to toss $n \theta$ heads on average.
    • Let $ x_1,x_2,…,x_n $ be the number of customers that arrive at $n$ different identical counters in a unit of time. The $ X_i$'s can be thought of as an i.i.d. random variable with a Poisson distribution with mean $ \theta $. This distribution varies in the set $ (0,\infty) $, representing the parameter space $ \Theta $. The Poisson distribution is $ f(x,\theta)=e^{-\theta} \frac{\theta^{x}}{x!}$, for each $x=0, 1, 2, ...$.
After determining the parameters of the model, we will be able to apply the characteristics of the distribution and the model to the data. The characteristics of various distributions will be discussed further in the Distribution section. We will also discuss hypothesis testing and estimation later.

Random Number Generation

  • SOCR RNG examples
  • R examples: A random variable follows a normal distribution, $ N(\mu=0,\sigma=1) $.
  • We use a random number generator to obtain 10 samples from a normal distribution with mean 0 and variance 1:
> r_data_unif <- runif(10,0,1)
> r_data_unif
[1] 0.64900447    0.82074379    0.56889471    0.95659206    0.69771341    0.19772881    0.07656862
[8] 0.29823980    0.31825198    0.45029058 
> write.csv(r_data_unif, "/location/file.csv") # to write the data out to a file "file.csv"
> write.csv(r_data_unif, row.names=FALSE)      # to write the data directly into the R-shell 
> # (this could be used for copy-pasting the data, "r_data_unif", from R into SOCR, external tables, etc.
  • We generate 5 random variables following a Poisson distribution with $\lambda = 2$
> rpois(5,2)
[1] 3 2 1 4 1
  • We generate 5 random variables following a binomial distribution with $ p = 0.3, n = 10 $
> rbinom(5,10,0.3)
[1] 2 3 3 2 3

Applications

  • An article entitled Parametric Inference for Biological Sequence Analysis uses graphical models that have been applied to problems including hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.

Software

Problems

  • Suppose we are rolling a fair die; what would be the average probability that we are going to roll three sixes in a row? What is the kind of model on which we are inferring?
  • Consider the unfair coin tossing game, where the probability of tossing a head is unknown. Construct an experiment to test the probability of getting one head in a single experiment. What is the probability that we are going to get 5 heads out of 8 tosses?
  • Random number generation is commonly used in scientific studies; explain how it works.
  • The average number of homes sold by a realtor named Tom is 3 houses per day. What is the probability that Tom will sell exactly 4 houses tomorrow?
  • Suppose that the average number of patients with cancer who are seen per day is 5. What is the probability that less than 4 patients with cancer will be seen on the next day?

References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif