Difference between revisions of "SMHS ParamInference"
(→Scientific Methods for Health Sciences - Parametric Inference) |
(→Problems) |
||
(54 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
==[[SMHS| Scientific Methods for Health Sciences]] - Parametric Inference == | ==[[SMHS| Scientific Methods for Health Sciences]] - Parametric Inference == | ||
+ | ===Overview=== | ||
+ | In statistical inference, we aim to draw inferences about an underlying population based on a sample drawn therefrom. For example, we sometimes achieve this by estimating the parameters of a probability density function based on observations. In an idealized case, we would have a perfect model with unknown parameters, and based on this, we would make inferences about the population by estimating the parameters with the data we have. In this section, we are going to introduce the concepts of variables, parametric models, and inference based on these models. | ||
+ | ===Motivation=== | ||
+ | Consider the well-known example of flipping a coin 10 times. Experience tells us that the expected number of heads in one experiment with 10 flips would be equal to the probability of observing a head on each flip. For an evenly weighted coin, we would expect approximately 5 heads. If we repeated the experiment many times, we would expect the results to follow a binomial distribution with the parameter $ p=P(head)$ for each flip. In other words, we believe the underlying model to be a binomial $ (n,p) $ where $ n=10 $. | ||
− | + | The next step would be to determine the value of $ p $. An obvious way of doing this would be to flip the coin many times (e.g., 100 times) and record the number of heads. The estimate of $ p $ would just be the number of heads in the 100 flips divided by 100. For example, if we got 63 heads, we would estimate the probability $p$ of getting a head on any given flip to be $ 63/100 $. Based on this information, we believe the number of heads in our experiment follows a binomial distribution with parameters $ (n=10,p=0.63) $. That is, we can infer that we will flip an average of 6.3 heads in 10 flips if we repeat the experiment enough times. | |
− | + | Next we will explore the following questions: | |
− | + | * What is a random variable? | |
+ | * How do we build a parametric model based on data? | ||
+ | * What kind of inference can we make based on a parametric model? | ||
− | + | ===Theory=== | |
− | + | ====[http://en.wikipedia.org/wiki/Random_variable Random Variables]==== | |
− | + | A ''random variable'' is a variable whose value is subject to variations due to chance (i.e., randomness). It can take on a set of values, each with an associated probability for discrete variables or a probability density for continuous variables. The value of a random variable represents the possible outcomes of a yet-to-be-performed experiment or the possible outcomes of a past experiment whose pre-existing value is uncertain. The possible values of a random variable and their associated probabilities (known as a probability distribution) can be further described with mathematical functions. | |
− | |||
− | |||
− | |||
− | |||
There are two types of random variables: | There are two types of random variables: | ||
− | + | * ''Discrete random variables'' take on a specified finite or countable list of values and are endowed with a probability mass function, which is characteristic of a particular probability distribution; | |
− | + | * ''Continuous random variables'' take on any numerical value in an interval or collection of intervals via a probability density function, which is characteristic of a probability distribution. | |
+ | ====[http://en.wikipedia.org/wiki/Parameter Parameters]==== | ||
− | + | A parameter is a characteristic or measurable factor that can help in defining a particular system. It is an important element to consider when evaluating or trying to understand an event. μ is often used to represent the mean and σ is used to represent the standard deviation in statistics. The following table provides a list of commonly used parameters with descriptions: | |
+ | <center> | ||
{| class="wikitable" style="text-align:center; width:75%" border="1" | {| class="wikitable" style="text-align:center; width:75%" border="1" | ||
|Parameter || Description || Parameter || Description | |Parameter || Description || Parameter || Description | ||
|- | |- | ||
− | | x | + | | $\bar{x}$ || Sample mean || α,β,γ || Various Greek letters |
|- | |- | ||
− | | μ || Population mean || θ || Lower case | + | | μ || Population mean || θ || Lower case theta |
|- | |- | ||
− | |σ || Population standard deviation || φ || Lower case | + | |σ || Population standard deviation || φ || Lower case phi |
|- | |- | ||
− | | σ^2 || Population variance || ω || Lower case | + | | $σ^2$ || Population variance || ω || Lower case omega |
|- | |- | ||
| s || Sample standard deviation || ∆ || Increment | | s || Sample standard deviation || ∆ || Increment | ||
|- | |- | ||
− | | s^2 || Sample variance || ν || Nu | + | | $s^2$ || Sample variance || ν || Nu |
|- | |- | ||
| λ || Poisson mean, Lambda || τ || Tau | | λ || Poisson mean, Lambda || τ || Tau | ||
Line 47: | Line 51: | ||
| ϕ || Normal density function, Phi || Θ || Parameter space | | ϕ || Normal density function, Phi || Θ || Parameter space | ||
|- | |- | ||
− | | Γ || Gamma || Ω || Sample | + | | Γ || Gamma || Ω || Sample space, omega |
|- | |- | ||
− | | ∂ || Per/ divided || δ || Lower case | + | | ∂ || Per/ divided || δ || Lower case delta |
|- | |- | ||
− | | S || Sample space|| Κ,k || | + | | S || Sample space|| Κ,k || kappa |
|} | |} | ||
+ | </center> | ||
+ | ====[http://en.wikipedia.org/wiki/Parametric_model Parametric Model]==== | ||
− | + | A parametric model is a collection of probability distributions that can be described using a finite number of parameters. These parameters are usually written together to form a single k-dimensional parameter vector $\theta=(\theta_1,\theta_2,…,\theta_k)$. The main characteristic of a parametric model is that all the parameters are from finite-dimensional parameter spaces. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | * Each member of the collection of parameters, $ p_θ $, is described by a finite-dimensional parameter $ θ $. The set of all allowable values for the parameter is denoted $ Θ⊆R^k $, and the model itself is written as $ P={p_θ |θ∈Θ} $. If the model consists of absolutely continuous distributions, it is often specified in terms of the corresponding probability density function $ P={f_θ |θ∈Θ}$. The model is considered identifiable if the mapping $ θ→p_θ $ is invertible (i.e., if there are no two different parameter values $ θ_1 $ and $ θ_2 $ such that $ p_{θ_1} =p_{θ_2} $). | ||
+ | * Consider one of the most popular distributions, the normal distribution, in which the parameter vector is $ θ=(μ,σ) $. Here, $ μ∈R $ is a location parameter, and σ>0 is a scale parameter. This parametrized family can be expressed as: | ||
+ | $$ \big\{f_θ (x)=\frac{1}{\sqrt{2πσ}} e^{-\frac{1}{2σ^2}{(x-μ)^2}} |μ∈R,σ>0\big\}.$$ | ||
+ | * Parametric inference: Often, we are interested in estimating $ \theta $, or more generally, a function of $ \theta $, say $ g(\theta) $. Let’s consider a few examples that will enable us to understand this. | ||
+ | ** Let $ x_1,x_2,…,x_n $ be the outcomes of $ n $ independent flips of the same coin. Here, we code $ X_i=1 $ if the $i^{th}$ toss produces a head and code $ X_i=0 $ if the $i^{th}$ toss produces a tail. Therefore, $ \theta $, which is the probability of flipping a head in a single toss, could be any number between 0 and 1. We know that the $ x_i$’s are independent and identically distributed (i.i.d.). The distribution $ p_{\theta} $ commonly used to describe this type of experiment is a Bernoulli distribution with parameter $ (\theta) $. It has the probability mass function $ f(x,\theta)=\theta^x (1-\theta)^{1-x}$, $x \in {0,1} $. If we repeat the experiment with the same coin enough times, we would expect to toss $n \theta$ heads on average. | ||
+ | **Let $ x_1,x_2,…,x_n $ be the number of customers that arrive at $n$ different identical counters in a unit of time. The $ X_i$'s can be thought of as an i.i.d. random variable with a [http://en.wikipedia.org/wiki/Poisson_distribution Poisson distribution] with mean $ \theta $. This distribution varies in the set $ (0,\infty) $, representing the parameter space $ \Theta $. The Poisson distribution is $ f(x,\theta)=e^{-\theta} \frac{\theta^{x}}{x!}$, for each $x=0, 1, 2, ...$. | ||
+ | : After determining the parameters of the model, we will be able to apply the characteristics of the distribution and the model to the data. The characteristics of various distributions will be discussed further in the [[SMHS_ProbabilityDistributions|Distribution section]]. We will also discuss hypothesis testing and estimation later. | ||
+ | ====[http://en.wikipedia.org/wiki/Random_number_generation Random Number Generation]==== | ||
+ | * [[SOCR_EduMaterials_Activities_RNG|SOCR RNG examples]] | ||
+ | * R examples: A random variable follows a normal distribution, $ N(\mu=0,\sigma=1) $. | ||
+ | * We use a random number generator to obtain 10 samples from a normal distribution with mean 0 and variance 1: | ||
+ | > r_data_unif <- runif(10,0,1) | ||
+ | > r_data_unif | ||
+ | [1] 0.64900447 0.82074379 0.56889471 0.95659206 0.69771341 0.19772881 0.07656862 | ||
+ | [8] 0.29823980 0.31825198 0.45029058 | ||
+ | > write.csv(r_data_unif, "/location/file.csv") # to write the data out to a file "file.csv" | ||
+ | > write.csv(r_data_unif, row.names=FALSE) # to write the data directly into the R-shell | ||
+ | > # (this could be used for copy-pasting the data, "r_data_unif", from R into SOCR, external tables, etc. | ||
+ | * We generate 5 random variables following a Poisson distribution with $\lambda = 2$ | ||
+ | > rpois(5,2) | ||
+ | [1] 3 2 1 4 1 | ||
+ | * We generate 5 random variables following a binomial distribution with $ p = 0.3, n = 10 $ | ||
+ | > rbinom(5,10,0.3) | ||
+ | [1] 2 3 3 2 3 | ||
+ | * [[SOCR_EduMaterials_Activities_RNG|SOCR Random Number Generation Activity]] | ||
+ | ===Applications=== | ||
+ | * An article entitled [http://link.springer.com/article/10.1007/BF00341287 Parametric Inference For Imperfectly Observed Gibbsian Fields] presents a maximum likelihood estimation method for imperfectly observed Gibbsian fields on a finite lattice. This method is an adaptation of [http://www.researchgate.net/profile/Laurent_Younes/publication/243645162_Estimation_and_annealing_for_Gibbsian_fields/links/0a85e52fd0f2930737000000.pdf an algorithm developed by the author, Laurent Younes, in 1988]. A presentation of the new algorithm is followed by a theorem about the limit of the second derivative of the likelihood when the lattice increases, which is related to the convergence of the method. This paper offers some practical remarks about the implementation of the procedure. | ||
+ | * An article entitled [http://www.pnas.org/content/101/46/16138.short Parametric Inference for Biological Sequence Analysis] uses graphical models that have been applied to problems including hidden [http://en.wikipedia.org/wiki/Markov_model Markov models] for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the [http://en.wikipedia.org/wiki/Belief_propagation sum-product algorithm], solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum ''a posteriori'' inference calculations for graphical models. | ||
+ | ===Software=== | ||
+ | *[http://socr.ucla.edu/htmls/SOCR_Distributions.html SOCR Distributions] | ||
+ | *[http://socr.ucla.edu/htmls/exp/Bivariate_Normal_Experiment.html Bivariate Normal Experiment] | ||
+ | *[http://socr.ucla.edu/htmls/dist/Multinomial_Distribution.html Multinomial Distribution] | ||
+ | *[http://wiki.stat.ucla.edu/socr/index.php/SOCR_EduMaterials_Activities_Binomial_Distributions Activities with Binomial Distributions] | ||
+ | ===Problems=== | ||
+ | * Suppose we are rolling a fair die; what would be the average probability that we are going to roll three sixes in a row? What is the kind of model on which we are inferring? | ||
+ | * Consider the unfair coin tossing game, where the probability of tossing a head is unknown. Construct an experiment to test the probability of getting one head in a single experiment. What is the probability that we are going to get 5 heads out of 8 tosses? | ||
+ | * Random number generation is commonly used in scientific studies; explain how it works. | ||
+ | * The average number of homes sold by a realtor named Tom is 3 houses per day. What is the probability that Tom will sell exactly 4 houses tomorrow? | ||
+ | * Suppose that the average number of patients with cancer who are seen per day is 5. What is the probability that less than 4 patients with cancer will be seen on the next day? | ||
+ | === References=== | ||
+ | * [http://www.itl.nist.gov/div898/handbook/eda/eda.htm NIST EDA] | ||
+ | * [http://en.wikipedia.org/wiki/Random_variable Random variable Wikipedia] | ||
+ | * [http://en.wikipedia.org/wiki/Parameter Parameter Wikipedia] | ||
+ | * [http://en.wikipedia.org/wiki/Parametric_model Parametric model Wikipedia] | ||
<hr> | <hr> |
Latest revision as of 15:15, 19 March 2015
Contents
Scientific Methods for Health Sciences - Parametric Inference
Overview
In statistical inference, we aim to draw inferences about an underlying population based on a sample drawn therefrom. For example, we sometimes achieve this by estimating the parameters of a probability density function based on observations. In an idealized case, we would have a perfect model with unknown parameters, and based on this, we would make inferences about the population by estimating the parameters with the data we have. In this section, we are going to introduce the concepts of variables, parametric models, and inference based on these models.
Motivation
Consider the well-known example of flipping a coin 10 times. Experience tells us that the expected number of heads in one experiment with 10 flips would be equal to the probability of observing a head on each flip. For an evenly weighted coin, we would expect approximately 5 heads. If we repeated the experiment many times, we would expect the results to follow a binomial distribution with the parameter $ p=P(head)$ for each flip. In other words, we believe the underlying model to be a binomial $ (n,p) $ where $ n=10 $.
The next step would be to determine the value of $ p $. An obvious way of doing this would be to flip the coin many times (e.g., 100 times) and record the number of heads. The estimate of $ p $ would just be the number of heads in the 100 flips divided by 100. For example, if we got 63 heads, we would estimate the probability $p$ of getting a head on any given flip to be $ 63/100 $. Based on this information, we believe the number of heads in our experiment follows a binomial distribution with parameters $ (n=10,p=0.63) $. That is, we can infer that we will flip an average of 6.3 heads in 10 flips if we repeat the experiment enough times.
Next we will explore the following questions:
- What is a random variable?
- How do we build a parametric model based on data?
- What kind of inference can we make based on a parametric model?
Theory
Random Variables
A random variable is a variable whose value is subject to variations due to chance (i.e., randomness). It can take on a set of values, each with an associated probability for discrete variables or a probability density for continuous variables. The value of a random variable represents the possible outcomes of a yet-to-be-performed experiment or the possible outcomes of a past experiment whose pre-existing value is uncertain. The possible values of a random variable and their associated probabilities (known as a probability distribution) can be further described with mathematical functions.
There are two types of random variables:
- Discrete random variables take on a specified finite or countable list of values and are endowed with a probability mass function, which is characteristic of a particular probability distribution;
- Continuous random variables take on any numerical value in an interval or collection of intervals via a probability density function, which is characteristic of a probability distribution.
Parameters
A parameter is a characteristic or measurable factor that can help in defining a particular system. It is an important element to consider when evaluating or trying to understand an event. μ is often used to represent the mean and σ is used to represent the standard deviation in statistics. The following table provides a list of commonly used parameters with descriptions:
Parameter | Description | Parameter | Description |
$\bar{x}$ | Sample mean | α,β,γ | Various Greek letters |
μ | Population mean | θ | Lower case theta |
σ | Population standard deviation | φ | Lower case phi |
$σ^2$ | Population variance | ω | Lower case omega |
s | Sample standard deviation | ∆ | Increment |
$s^2$ | Sample variance | ν | Nu |
λ | Poisson mean, Lambda | τ | Tau |
χ | χ distribution, Chi | η | Eta |
ρ | The density, Rho | τ | Sometimes used in tau function |
ϕ | Normal density function, Phi | Θ | Parameter space |
Γ | Gamma | Ω | Sample space, omega |
∂ | Per/ divided | δ | Lower case delta |
S | Sample space | Κ,k | kappa |
Parametric Model
A parametric model is a collection of probability distributions that can be described using a finite number of parameters. These parameters are usually written together to form a single k-dimensional parameter vector $\theta=(\theta_1,\theta_2,…,\theta_k)$. The main characteristic of a parametric model is that all the parameters are from finite-dimensional parameter spaces.
- Each member of the collection of parameters, $ p_θ $, is described by a finite-dimensional parameter $ θ $. The set of all allowable values for the parameter is denoted $ Θ⊆R^k $, and the model itself is written as $ P={p_θ |θ∈Θ} $. If the model consists of absolutely continuous distributions, it is often specified in terms of the corresponding probability density function $ P={f_θ |θ∈Θ}$. The model is considered identifiable if the mapping $ θ→p_θ $ is invertible (i.e., if there are no two different parameter values $ θ_1 $ and $ θ_2 $ such that $ p_{θ_1} =p_{θ_2} $).
- Consider one of the most popular distributions, the normal distribution, in which the parameter vector is $ θ=(μ,σ) $. Here, $ μ∈R $ is a location parameter, and σ>0 is a scale parameter. This parametrized family can be expressed as:
$$ \big\{f_θ (x)=\frac{1}{\sqrt{2πσ}} e^{-\frac{1}{2σ^2}{(x-μ)^2}} |μ∈R,σ>0\big\}.$$
- Parametric inference: Often, we are interested in estimating $ \theta $, or more generally, a function of $ \theta $, say $ g(\theta) $. Let’s consider a few examples that will enable us to understand this.
- Let $ x_1,x_2,…,x_n $ be the outcomes of $ n $ independent flips of the same coin. Here, we code $ X_i=1 $ if the $i^{th}$ toss produces a head and code $ X_i=0 $ if the $i^{th}$ toss produces a tail. Therefore, $ \theta $, which is the probability of flipping a head in a single toss, could be any number between 0 and 1. We know that the $ x_i$’s are independent and identically distributed (i.i.d.). The distribution $ p_{\theta} $ commonly used to describe this type of experiment is a Bernoulli distribution with parameter $ (\theta) $. It has the probability mass function $ f(x,\theta)=\theta^x (1-\theta)^{1-x}$, $x \in {0,1} $. If we repeat the experiment with the same coin enough times, we would expect to toss $n \theta$ heads on average.
- Let $ x_1,x_2,…,x_n $ be the number of customers that arrive at $n$ different identical counters in a unit of time. The $ X_i$'s can be thought of as an i.i.d. random variable with a Poisson distribution with mean $ \theta $. This distribution varies in the set $ (0,\infty) $, representing the parameter space $ \Theta $. The Poisson distribution is $ f(x,\theta)=e^{-\theta} \frac{\theta^{x}}{x!}$, for each $x=0, 1, 2, ...$.
- After determining the parameters of the model, we will be able to apply the characteristics of the distribution and the model to the data. The characteristics of various distributions will be discussed further in the Distribution section. We will also discuss hypothesis testing and estimation later.
Random Number Generation
- SOCR RNG examples
- R examples: A random variable follows a normal distribution, $ N(\mu=0,\sigma=1) $.
- We use a random number generator to obtain 10 samples from a normal distribution with mean 0 and variance 1:
> r_data_unif <- runif(10,0,1) > r_data_unif [1] 0.64900447 0.82074379 0.56889471 0.95659206 0.69771341 0.19772881 0.07656862 [8] 0.29823980 0.31825198 0.45029058 > write.csv(r_data_unif, "/location/file.csv") # to write the data out to a file "file.csv" > write.csv(r_data_unif, row.names=FALSE) # to write the data directly into the R-shell > # (this could be used for copy-pasting the data, "r_data_unif", from R into SOCR, external tables, etc.
- We generate 5 random variables following a Poisson distribution with $\lambda = 2$
> rpois(5,2) [1] 3 2 1 4 1
- We generate 5 random variables following a binomial distribution with $ p = 0.3, n = 10 $
> rbinom(5,10,0.3) [1] 2 3 3 2 3
Applications
- An article entitled Parametric Inference For Imperfectly Observed Gibbsian Fields presents a maximum likelihood estimation method for imperfectly observed Gibbsian fields on a finite lattice. This method is an adaptation of an algorithm developed by the author, Laurent Younes, in 1988. A presentation of the new algorithm is followed by a theorem about the limit of the second derivative of the likelihood when the lattice increases, which is related to the convergence of the method. This paper offers some practical remarks about the implementation of the procedure.
- An article entitled Parametric Inference for Biological Sequence Analysis uses graphical models that have been applied to problems including hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Software
- SOCR Distributions
- Bivariate Normal Experiment
- Multinomial Distribution
- Activities with Binomial Distributions
Problems
- Suppose we are rolling a fair die; what would be the average probability that we are going to roll three sixes in a row? What is the kind of model on which we are inferring?
- Consider the unfair coin tossing game, where the probability of tossing a head is unknown. Construct an experiment to test the probability of getting one head in a single experiment. What is the probability that we are going to get 5 heads out of 8 tosses?
- Random number generation is commonly used in scientific studies; explain how it works.
- The average number of homes sold by a realtor named Tom is 3 houses per day. What is the probability that Tom will sell exactly 4 houses tomorrow?
- Suppose that the average number of patients with cancer who are seen per day is 5. What is the probability that less than 4 patients with cancer will be seen on the next day?
References
- SOCR Home page: http://www.socr.umich.edu
Translate this page: