Difference between revisions of "SMHS Probability"

From SOCR
Jump to: navigation, search
(Motivation)
(Problems)
 
(36 intermediate revisions by 4 users not shown)
Line 2: Line 2:
  
 
===Overview===
 
===Overview===
Probability theory plays an important role in statistics and its application to many other disciplines because it provides the theoretical groundwork for statistical inference. Probability theory is concerned with probability, which is the analysis of random phenomena. The central objects are random variables, stochastic processes, and events. Consider an individual coin toss, which can be considered to be a random event; if it is repeated many times then the sequence of random events will exhibit certain patterns. Probability theory helps us study and predict those patterns. Often, probability theory is further divided into two categories: discrete probability distributions and continuous probability distributions. We will study these later in the Distribution section. Here, we aim to study fundamental concepts of probability theory and define the rules of probability theory that we will apply in following studies.
+
Probability theory plays an important role in statistics and its application to many other disciplines because it provides the theoretical groundwork for statistical inference. Probability theory is concerned with probability, which is the analysis of random phenomena. The central objects are random variables, stochastic processes, and events. Consider an individual coin toss, which can be considered to be a random event; if it is repeated many times, the sequence of random events will exhibit certain patterns. Probability theory helps us study and predict those patterns. Often, probability theory is further divided into two categories: ''discrete probability distributions'' and ''continuous probability distributions''. We will study these later in the [http://wiki.socr.umich.edu/index.php/SMHS_ProbabilityDistributions Distribution] section. Here, we aim to study fundamental concepts of probability theory and define the rules of probability theory that we will apply in following studies.
  
 
===Motivation===
 
===Motivation===
Consider you are performing an experiment in which a number of outcomes are produced. This set of outcomes is called the sample space and the power set of the sample space includes all the different collections of  possible results from the experiment. Suppose we are rolling a fair die, which has 6 possible outcomes. The sample space is {1, 2, 3, 4, 5, 6}. An event is any collection of the possible results. For example, the event of rolling an even number involves the subset {2, 4, 6}, which is an element of the power set of the sample space in this experiment. What if we want to estimate the chance of rolling three 2’s in a row or the chance of rolling an odd number in an experiment? Probability is a way of assigning every event a value between 0 and 1; this value represents the chance that the event will occur.
+
Imagine that you are performing an experiment in which a number of outcomes are produced. This set of outcomes is called the ''sample space'', and the power set of the sample space includes all the different collections of  possible results from the experiment. Suppose we are rolling a fair die, which has 6 possible outcomes. The sample space is {1, 2, 3, 4, 5, 6}. An ''event'' is any collection of the possible results. For example, the event of rolling an even number involves the subset {2, 4, 6}, which is an element of the power set of the sample space in this experiment. What if we want to estimate the chance of rolling three 2’s in a row or the chance of rolling an odd number in an experiment? Probability is a way of assigning every event a value between 0 and 1; this value represents the chance that the event will occur.
  
 
===Theory===
 
===Theory===
'''Random Sampling''': A simple random sample of n items is a sample in which very member of the population has an equal chance of being selected and the members of the sample are chosen independently. For example, consider a survey where 100 students are chosen from the total of 5000 students to take the questionnaires and the chance of chosen is the same for each student. This is a simple example of random sampling. An easy application is random number generator.  
+
====Random Sampling====
 +
A simple random sample of $n$ items is a sample in which every member of the population has an equal chance of being selected and the members of the sample are chosen independently. For example, consider a survey for which 100 students are selected from a total of 5000 students, and in which the chances of being selected are the same for each student. This is a simple example of ''random sampling'', and a common application of this can be found in random number generators.
  
 +
====Types of probabilities====
 +
Probability models have two components: ''sample space'' and ''probabilities''.
 +
*The '''sample space (S)''' for a random experiment is the set of all possible outcomes of the experiment.
 +
**''Event'': A collection of outcomes.
 +
**An event is said to occur if any outcome making up that event occurs.
 +
*'''Probabilities''' for each event are included in the sample space.
 +
**Probabilities may come from models and be mathematical and/or physical descriptions of a sample space and the chance of each event. An example of this would be a fair dice tossing game.
 +
**Probabilities may be derived from data. Data observations can determine probability distribution. An example would be tossing a coin 50 times and observing the heads count.
 +
*Subjective probabilities: combining data and psychological factors to design a reasonable probability table. An example may be the stock market.
  
'''Types of probabilities''': Probability models have two components: sample space and probabilities.
+
====Axioms of probability====
*Sample space (S) for a random experiment is the set of all possible outcomes of the experiment.
+
*First axiom: The probability of an event is a non-negative real number.
**Event: a collection of outcomes.
+
*Second axiom: The probability that some elementary event in the entire sample space will occur is 1. More specifically, there are no elementary events outside the sample space $P(S)=1$.
**Event occurs if an outcome making up that event occurs.
+
*Third axiom: A countable sequence of pair-wise disjoint events $E_1,E_2, E_3, … $ satisfies $P(E_1 \cup E_2 \cup E_3 \cup … ) = \sum_i {P(E_i)} $.
*Probabilities for each event in the sample space.
 
*Probabilities may come from models – say mathematical/physical description of the sample space and the chance of each event. An example may be a fair dice tossing game.
 
*Probabilities may be derived from data – data observations determine the probability distribution. An example may be tossing a coin 50 times and observe the head counts.
 
*Subject probabilities: combining data and psychological factors to design a reasonable probability table. An example may be the stock market.
 
  
 
+
====Event manipulations====
'''Axioms of probability'''
+
*Complement: The complement of event $A$ is denoted as $A^c$ or $A'$. It occurs if and only if $A$ does not occur. The union of $A$ and $A^c$ make up the entire sample space ($S$).
*First axiom: the probability of an event is a non-negative real number.
 
*Second axiom: the probability that some elementary event in the entire sample space will occur is 1. More specifically, there are no elementary events outside the sample space $P(S)=1$.
 
*Third axiom: An countable sequence of pair-wise disjoint events $E_1,E_2, E_3, … $ satisfies $P(E_1 \cup E_2 \cup E_3 \cup … ) = \sum_i {P(E_i)} $.
 
 
 
 
 
'''Event manipulations'''
 
*Complement: the complement of event $A$ is denoted as $A^c$ or $A'$, it occurs if and only if $A$ does not occur. The union of $A$ and $A^C$ make up the whole sample space ($S$).
 
 
* Union: $A\cup B$ contains all outcomes in $A$ or $B$ (or both). $ P(A\cup B)=P(A)+P(B)-P(A\cap B). $
 
* Union: $A\cup B$ contains all outcomes in $A$ or $B$ (or both). $ P(A\cup B)=P(A)+P(B)-P(A\cap B). $
 
* Intersection: $A\cap B$ contains all outcomes which are in both $A$ and $B$.
 
* Intersection: $A\cap B$ contains all outcomes which are in both $A$ and $B$.
 
* Mutually exclusive events are events that cannot occur at the same time, $A\cap B =\emptyset$.
 
* Mutually exclusive events are events that cannot occur at the same time, $A\cap B =\emptyset$.
* Conditional Probability: The conditional probability of event $A$ occurring given that event $B$ occurs is $ P(A│B)=(P(A\cap B))/(P(B)) $. When $A$ and $B$ are independent then knowing $B$, or $B^c$, gives no information on the probability of $A$, i.e., $ P(A│B)=P(A) $.
+
* Conditional Probability: The conditional probability of event $A$ occurring, given that event $B$ occurs is $ P(A│B)=(P(A\cap B))/(P(B)) $. If $A$ and $B$ are independent, then knowing $B$, or $B^c$, gives no information on the probability of $A$, i.e., $ P(A│B)=P(A) $.
* Multiplication rule: For any two events, $A$ and $B$, $ P(A\cap B)=P(A│B)P(B) $. In general, for $n$ events $A_1, ..., A_n$: $ P(A_1 \cap A_2 \cap A_3 \cap … \cap A_n ) = P(A_1 )P(A_1│A_2 )P(A_3│A_1\cap A_2 ) … P(A_n│A_1\cap A_1\cap A_2\cap A_3\cap … \cap A_(n-1) ) $.
+
* Multiplication rule: For any two events, $A$ and $B$, $ P(A\cap B)=P(A│B)P(B) $. In general, for $n$ events $A_1, ..., A_n$: $ P(A_1 \cap A_2 \cap A_3 \cap … \cap A_n ) = P(A_1)P(A_2│A_1)P(A_3│A_1\cap A_2 ) … P(A_n│A_1\cap A_1\cap A_2\cap A_3\cap … \cap A_{n-1} ) $.
 
* Law of total probability: $P(B)=P(B│A_1 )P(A_1 )+P(B│A_2 )P(A_2 )+⋯ +P(B│A_n )P(A_n) $, where the events $ {A_1,…,A_n} $ partition the sample space $S$.
 
* Law of total probability: $P(B)=P(B│A_1 )P(A_1 )+P(B│A_2 )P(A_2 )+⋯ +P(B│A_n )P(A_n) $, where the events $ {A_1,…,A_n} $ partition the sample space $S$.
 
* Inverting the order of conditioning: $ P(A \cap B)  = P(A | B) \times  P(B) =  P(B | A) \times P(A) $.
 
* Inverting the order of conditioning: $ P(A \cap B)  = P(A | B) \times  P(B) =  P(B | A) \times P(A) $.
* Bayesian Rule: If $ {A_1,…,A_n} $ partition the sample space $S$, and $A$ and $B$ are any events, subsets of $S$, then we have:  
+
** Suppose a Laboratory blood test is used as evidence for a disease. Assume P(positive Test| Disease) = 0.95, P(positive Test| no Disease)=0.01 and P(Disease) = 0.005. Find P(Disease|positive Test)=? Denote D = {the test person has the disease}, $D^c$ = {the test person does not have the disease} and  T = {the test result is positive}. Then
 +
$$P(D | T) = {P(T | D) P(D) \over P(T)} = {P(T | D) P(D) \over P(T|D)P(D) + P(T|D^c)P(D^c)}=$$
 +
$$={0.95\times 0.005 \over {0.95\times 0.005 +0.01\times 0.995}}=0.32312$$
 +
* Bayesian Rule: If $ {A_1,…,A_n} $ partition the sample space $S$, and $A$ and $B$ are any events (i.e., subsets of $S$) then we have:  
 
$$ P(A | B) = {P(B | A) P(A) \over P(B)} = {P(B | A) P(A) \over P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + \cdots + P(B|A_n)P(A_n)}. $$
 
$$ P(A | B) = {P(B | A) P(A) \over P(B)} = {P(B | A) P(A) \over P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + \cdots + P(B|A_n)P(A_n)}. $$
 +
 +
* Note: also see the [[SMHS_BayesianInference | Bayesian Inference Section of the SMHS EBook]].
 +
 +
* Example: [[AP_Statistics_Curriculum_2007_Prob_Rules#Monty_Hall_Problem| Play the Monty Hall Game and compute the probability of success under different strategies]].
  
 
====Counting====
 
====Counting====
 
Counting principles are very useful in probability theory. Consider picking 3 students from a total of 26 students named A to Z.
 
Counting principles are very useful in probability theory. Consider picking 3 students from a total of 26 students named A to Z.
*Permutation: rearrangement of objects in distinguishable sequences. Each unique ordering is called a permutation. For example $\{A, B, D\}$ are different from $\{D, A, B\}$. There are $3!=6$ permutations of students A, B and D.
+
*'''Permutation''': Permutation is the rearrangement of objects in distinguishable sequences. Each unique ordering is called a permutation. For example $\{A, B, D\}$ is different from $\{D, A, B\}$. There are $3!=6$ permutations of students A, B and D.
**Permutation with repetitions (replacement): when the ordering of objects matters and an object can be chosen more than once, then the number of permutations is $ n^r $, where n is the number of objects from which you can choose and r is the number of objects you choose. In our example above, we have $ 26^3 $  permutations with repetitions.
+
**Permutation with repetitions (replacement): If the ordering of objects matters and an object can be chosen more than once, then the number of permutations is $ n^r $, where n is the number of objects from which you can choose and r is the number of objects you choose. In our example above, we have $ 26^3 $  permutations with repetitions.
**Permutation without repetitions (replacement): when the order matters and each object can be chosen only once, then the number of permutation is $ n(n-1)…(n-r+1)=n!/(n-r)! $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example above, we have $26*25*24$ permutations without repetitions.  
+
**Permutation without repetitions (replacement): If the order matters and each object can be chosen only once, then the number of permutation is $ n(n-1)…(n-r+1)=\frac{n!}{(n-r)!} $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example above, we have $26*25*24$ permutations without repetitions.  
*Combinations: An un-ordered collection of unique objects. In our example above, {A, B, D} are the same as {D, B, A}.  
+
*'''Combinations''': A combination is an un-ordered collection of unique objects. In our example above, {A, B, D} is the same as {D, B, A}.  
**Combinations with repetitions (replacement): when the order doesn’t matter and an object can be chosen more than once. Then the number of combinations is $ {n+r-1 \choose r}= \frac{(n+r-1)!}{r!(n-1)!} $, in our example above we have $ ((26+3-1)!)/3!(26-1)!=6552 $ combinations with repetitions.
+
**Combinations with repetitions (replacement): This is the case in which order does not matter, and an object can be chosen more than once. The number of combinations is $ {n+r-1 \choose r}= \frac{(n+r-1)!}{r!(n-1)!} $. In the example above, we have $\frac{(26+3-1)!}{3!(26-1)!}=3276$ combinations with repetitions.
**Combinations without repetitions (replacement): when the order doesn’t matter and an object can be chosen only once. Then the number of combinations is $ {n \choose r}=n!/r!(n-r)! $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example, we have $ {26 \choose 3} $ combinations without repetitions.
+
**Combinations without repetitions (replacement): This is the case in which the order does not matter, and an object can be chosen only once. The number of combinations is $ {n \choose r}=\frac{n!}{r!(n-r)!} $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example, we have $ {26 \choose 3}=2600 $ combinations without repetitions.
 +
* [[AP_Statistics_Curriculum_2007_Prob_Count| Examples of combinations and permutations applications]].
  
'''Independence vs. disjointness/mutual-exclusiveness'''
+
====Independence vs. disjointness/mutual exclusiveness====
 
The events $A$ and $B$ are independent if $ P(A│B)=P(A)$, that is $ P(A\cap B)=P(A)P(B) $.
 
The events $A$ and $B$ are independent if $ P(A│B)=P(A)$, that is $ P(A\cap B)=P(A)P(B) $.
The events $C$ and $D$ are disjoint or mutually-exclusive, if $ P(C\cap D)=0 $, that is $ P(C\cup D)=P(C)+P(D) $.
+
The events $C$ and $D$ are disjoint or mutually exclusive if $ P(C\cap D)=0 $, that is $ P(C\cup D)=P(C)+P(D) $.
  
These two concepts are different and should not be mixed together. Given that if two events are mutually-exclusive, they cannot happen together $ (P(A│B)=0)) $ so the occurrence of one gives information about the probability of the other so events that are mutually-exclusive can’t be independent.
+
These two concepts are different and should not be conflated. If two events are mutually exclusive, they cannot happen together (i.e.,  $P(A│B)=0)$). The occurrence of one provides information about the probability of the other; therefore, events that are mutually exclusive cannot be independent.
 
   
 
   
Consider the [[SOCR_EduMaterials_Activities_PokerExperiment|SOCR poker game]], if we know the card we picked randomly is a Queen, then the event that it is a red Queen given it is a Queen and the event that it is a black Queen given it is a Queen is independent. The event that it is a black card is not mutually-exclusive from the event that it is a spade.
+
Consider the [[SOCR_EduMaterials_Activities_PokerExperiment|SOCR poker game]]. If we know the card we picked randomly is a queen, then the event that it is a red queen (given that it is a queen) and the event that it is a black queen (also given that it is a queen) are independent. The event that it is a black card is not mutually exclusive from the event that it is a spade.
 +
 
 +
====Contingency Tables====
 +
Contingency tables provide data summaries that can be used to compute various probabilities of interest (e.g., determining conditional or marginal probabilities). These tables display sample values indexed by two different variables that may be dependent (contingent) on one another. 
 +
 
 +
=====Melanoma Example=====
 +
Here is the data of 400 Melanoma (skin cancer) Patients by Type and Site
 +
<center>
 +
{| class="wikitable" style="text-align:center; width:75%" border="1"
 +
|-
 +
| rowspan="2"|Type || colspan="3" align="center"|Site || rowspan="2"|Totals
 +
|-
 +
| Head and Neck || Trunk || Extremities
 +
|-
 +
| Hutchinson's melanomic freckle || 22 || 2 || 10 || 34
 +
|-
 +
| Superficial || 16 || 54 || 115 || 185
 +
|-
 +
| Nodular || 19 || 33 || 73 || 125
 +
|-
 +
| Indeterminant || 11 || 17 || 28 || 56
 +
|-
 +
| Column Totals || 68 || 106 || 226 || 400
 +
|}
 +
</center>
 +
 
 +
* Suppose we select one out of the 400 patients in the study and we want to find the probability that the cancer is on the extremities ''given'' that it is a type of  nodular: $ P = 73/125 = P(Extremities | Nodular) $
 +
 
 +
* What is the probability that for a randomly chosen patient the cancer type is Superficial given that it appears on the Trunk?
 +
 
 +
=====Obesity Example=====
 +
The table below shows a random sample of 103 people and their BMI (body mass index, an important biomarker of obesity and diabetes):
 +
 
 +
<center>
 +
{| class="wikitable" style="text-align:center; width:75%" border="1"
 +
|-
 +
! Cohort \ BMI  || <22 ||  23-26 ||  27-30  || >30  || Total
 +
|-
 +
| Children ||12 ||9 || ___ ||3 || 30
 +
|-
 +
| Adults ||___ || ___ ||14 || 5 || 30
 +
|-
 +
| Elderly ||4 || ___ ||  12  ||  3 || 43
 +
|-
 +
| Total || 19 ||41 || ___ || ___ ||___
 +
|}
 +
</center>
 +
 
 +
Complete the table using the given information:
 +
* Compute the probability that a child has a BMI <27, $ P(Child and BMI<27) $:
 +
* Compute the probability that a randomly chosen subject is an adult or has a BMI over 30, $ P(Adult or BMI>30) $:
 +
* Compute the probability that a randomly chosen subject is normal (22<BMI<27) given that the subject is an elderly, $ P(22<BMI<27 | Elderly) $:
  
 
===Applications===
 
===Applications===
* [http://wiki.socr.umich.edu/index.php/AP_Statistics_Curriculum_2007_Prob_Simul This website] introduced on application of probability theory through simulation. Many practical examples require probability computations of complex events. Such calculations may be carried out exactly, using the proper probability rules, or approximately using estimation or simulations. [http://wiki.socr.umich.edu/index.php/AP_Statistics_Curriculum_2007_Prob_Simul SOCR simulations] may be used to compute (approximately) probabilities of various processes and compare these empirical probabilities to their exact counterparts. This article included examples of ''Ball and Urn Experiment, Binomial Coin Toss Experiment, Card Experiment, Roulette Experiment, and Chuck A Luck Experiment'' and would be a great source to take practice on simulations using probability theory.
+
* [[AP_Statistics_Curriculum_2007_Prob_Rules#Monty_Hall_Problem |Review the theory and the simulation of the Monty-Hall Problem]].
 +
* [[AP_Statistics_Curriculum_2007_Prob_Simul| Try various SOCR simulations]].
 +
* [http://wiki.socr.umich.edu/index.php/AP_Statistics_Curriculum_2007_Prob_Simul This website] introduces an application of probability theory through simulation. Many practical examples require probability computations of complex events. Such calculations may be carried out exactly, using the rules of probability, or approximately using estimation and/or simulations. [http://wiki.socr.umich.edu/index.php/AP_Statistics_Curriculum_2007_Prob_Simul SOCR simulations] may be used to compute approximate probabilities for various processes and to compare these empirical probabilities to their exact counterparts. This article included examples of a ''Ball and Urn Experiment, Binomial Coin Toss Experiment, Card Experiment, Roulette Experiment, and Chuck A Luck Experiment''. It is a valuable source for practicing simulations using probability theory.
  
* [http://www.probabilitytheory.info This website] offers a list of interesting articles on the topic of probability theory. It included a general introduction to the history of probability theory and addresses a wide list of articles of the application of probability in different areas including business, medicine, economics, biology, and etc. These short articles would be a good start to learn about application of probability theory in various fields.
+
* [http://www.probabilitytheory.info This website] offers a list of interesting articles on the topic of probability theory. It includes a general introduction to the history of probability theory and addresses a wide list of articles of the application of probability in different areas including business, medicine, economics, and biology. These short articles are a good starting place to learn about applications of probability theory in various fields.
  
 
===Software ===
 
===Software ===
Line 68: Line 128:
  
 
===Problems===
 
===Problems===
* A box contains 6 balls, where 2 are red, 2 are white, and 2 are blue. Four balls are picked at random, one at a time. Each time a ball is picked, the color is recorded, and the ball is put back in the box. If the first 3 balls are red, what color is the fourth ball most likely to be?
+
* A box contains 6 balls; 2 are red, 2 are white, and 2 are blue. Four balls are picked at random, one at a time. Each time a ball is picked, the color is recorded, and the ball is put back in the box. If the first 3 balls are red, what color is the fourth ball most likely to be?
 
: (a) Red
 
: (a) Red
 
: (b) White
 
: (b) White
Line 75: Line 135:
 
: (e) Red, blue, and white are all equally likely.
 
: (e) Red, blue, and white are all equally likely.
  
* A coin is tossed 400 times and 170 heads are observed. This coin is__ ?
+
* A coin is tossed 400 times and 170 heads are observed. This coin is __ ?
: (a) fair, because the probability of seeing that amount of heads or less is approximately 0.0013
+
: (a) Fair, because the probability of seeing that amount of heads or less is approximately 0.0013.
: (b) neither fair or unfair. There is not enough information to determine that.
+
: (b) Neither fair nor unfair. There is not enough information to determine that.
: (c) fair, because the probability of seeing that amount of heads or less is approximately 0.5
+
: (c) Fair, because the probability of seeing that amount of heads or less is approximately 0.5
: (d) not fair, because the probability of seeing that amount of heads or less is close to 0.
+
: (d) Not fair, because the probability of seeing that amount of heads or less is close to 0.
  
 
* If two events are independent, then they are automatically mutually exclusive.
 
* If two events are independent, then they are automatically mutually exclusive.
Line 109: Line 169:
 
: (e) $2!/5!$
 
: (e) $2!/5!$
  
* In a university with 20,000 students, 20% are engineering students, 40% are in the sciences, 30% are in the social sciences, and the rest have other majors. The counselors in the registrar's office want to survey the opinions of students on the issue of posting grades on-line and they seek opinions from students of various majors. They conduct a survey by randomly selecting students. Among the first three students selected, what is the probability that two of the three major in social sciences and one has a major other than social science?
+
* In a university with 20,000 students, 20% are engineering students, 40% are in the sciences, 30% are in the social sciences, and the rest are in other majors. The counselors in the registrar's office want to survey the opinions of students on the issue of posting grades on-line, and they seek opinions from students in various majors. They conduct a survey by randomly selecting students. Among the first three students selected, what is the probability that two of the three major in social sciences and one has a major other than social science?
 
: (a) 0.600
 
: (a) 0.600
 
: (b) 0.189
 
: (b) 0.189
Line 115: Line 175:
 
: (d) 0.063
 
: (d) 0.063
  
* Every five years the Conference Board of Mathematical Sciences surveys college math departments. In a recent report, 51% of all undergraduates taking calculus were in classes using graphing calculators and 31% were in classes using computer assignments. Suppose that 16% of these students use both calculator and computer. What proportion of undergraduates taking calculus use no technology?
+
* Every five years, the Conference Board of Mathematical Sciences surveys college math departments. In a recent report, 51% of all undergraduates taking calculus were in classes using graphing calculators, and 31% were in classes using computer assignments. Suppose that 16% of these students use both calculators and computers. What proportion of undergraduates taking calculus uses no technology?
 
: (a) 0.44
 
: (a) 0.44
 
: (b) 0.82
 
: (b) 0.82
Line 127: Line 187:
 
: (c) X > Y
 
: (c) X > Y
  
* [[SOCR_EduMaterials_Activities_PokerExperiment|Poker game]]: What is number of hands of Full house where you have patterns like AAABB and A and B are from distinct kinds? What is number of hands of two pairs where you have patterns like AABBC and A, B and C are distinct kinds? What is total number of 5-card hands?
+
* [[SOCR_EduMaterials_Activities_PokerExperiment|Poker game]]: How many hands would contain a full house with an AAABB-type pattern, where A and B have distinct values? How many hands are there with two pairs (i.e., an AABBC-type pattern), where A, B and C have distinct values? What is total number of 5-card hands?
 
 
  
 
===References===
 
===References===
Line 134: Line 193:
 
* [http://en.wikipedia.org/wiki/Probability  Probability Wikipedia]
 
* [http://en.wikipedia.org/wiki/Probability  Probability Wikipedia]
 
* [[Probability_and_statistics_EBook#Chapter_III:_Probability|SOCR EBook: Probability Chapter]]
 
* [[Probability_and_statistics_EBook#Chapter_III:_Probability|SOCR EBook: Probability Chapter]]
* [[AP_Statistics_Curriculum_2007_Prob_Count||SOCR EBook: Counting Examples]]
+
* [[AP_Statistics_Curriculum_2007_Prob_Count|SOCR EBook: Counting Examples]]
  
 
<hr>
 
<hr>

Latest revision as of 13:44, 24 March 2015

Scientific Methods for Health Sciences - Probability Theory

Overview

Probability theory plays an important role in statistics and its application to many other disciplines because it provides the theoretical groundwork for statistical inference. Probability theory is concerned with probability, which is the analysis of random phenomena. The central objects are random variables, stochastic processes, and events. Consider an individual coin toss, which can be considered to be a random event; if it is repeated many times, the sequence of random events will exhibit certain patterns. Probability theory helps us study and predict those patterns. Often, probability theory is further divided into two categories: discrete probability distributions and continuous probability distributions. We will study these later in the Distribution section. Here, we aim to study fundamental concepts of probability theory and define the rules of probability theory that we will apply in following studies.

Motivation

Imagine that you are performing an experiment in which a number of outcomes are produced. This set of outcomes is called the sample space, and the power set of the sample space includes all the different collections of possible results from the experiment. Suppose we are rolling a fair die, which has 6 possible outcomes. The sample space is {1, 2, 3, 4, 5, 6}. An event is any collection of the possible results. For example, the event of rolling an even number involves the subset {2, 4, 6}, which is an element of the power set of the sample space in this experiment. What if we want to estimate the chance of rolling three 2’s in a row or the chance of rolling an odd number in an experiment? Probability is a way of assigning every event a value between 0 and 1; this value represents the chance that the event will occur.

Theory

Random Sampling

A simple random sample of $n$ items is a sample in which every member of the population has an equal chance of being selected and the members of the sample are chosen independently. For example, consider a survey for which 100 students are selected from a total of 5000 students, and in which the chances of being selected are the same for each student. This is a simple example of random sampling, and a common application of this can be found in random number generators.

Types of probabilities

Probability models have two components: sample space and probabilities.

  • The sample space (S) for a random experiment is the set of all possible outcomes of the experiment.
    • Event: A collection of outcomes.
    • An event is said to occur if any outcome making up that event occurs.
  • Probabilities for each event are included in the sample space.
    • Probabilities may come from models and be mathematical and/or physical descriptions of a sample space and the chance of each event. An example of this would be a fair dice tossing game.
    • Probabilities may be derived from data. Data observations can determine probability distribution. An example would be tossing a coin 50 times and observing the heads count.
  • Subjective probabilities: combining data and psychological factors to design a reasonable probability table. An example may be the stock market.

Axioms of probability

  • First axiom: The probability of an event is a non-negative real number.
  • Second axiom: The probability that some elementary event in the entire sample space will occur is 1. More specifically, there are no elementary events outside the sample space $P(S)=1$.
  • Third axiom: A countable sequence of pair-wise disjoint events $E_1,E_2, E_3, … $ satisfies $P(E_1 \cup E_2 \cup E_3 \cup … ) = \sum_i {P(E_i)} $.

Event manipulations

  • Complement: The complement of event $A$ is denoted as $A^c$ or $A'$. It occurs if and only if $A$ does not occur. The union of $A$ and $A^c$ make up the entire sample space ($S$).
  • Union: $A\cup B$ contains all outcomes in $A$ or $B$ (or both). $ P(A\cup B)=P(A)+P(B)-P(A\cap B). $
  • Intersection: $A\cap B$ contains all outcomes which are in both $A$ and $B$.
  • Mutually exclusive events are events that cannot occur at the same time, $A\cap B =\emptyset$.
  • Conditional Probability: The conditional probability of event $A$ occurring, given that event $B$ occurs is $ P(A│B)=(P(A\cap B))/(P(B)) $. If $A$ and $B$ are independent, then knowing $B$, or $B^c$, gives no information on the probability of $A$, i.e., $ P(A│B)=P(A) $.
  • Multiplication rule: For any two events, $A$ and $B$, $ P(A\cap B)=P(A│B)P(B) $. In general, for $n$ events $A_1, ..., A_n$: $ P(A_1 \cap A_2 \cap A_3 \cap … \cap A_n ) = P(A_1)P(A_2│A_1)P(A_3│A_1\cap A_2 ) … P(A_n│A_1\cap A_1\cap A_2\cap A_3\cap … \cap A_{n-1} ) $.
  • Law of total probability: $P(B)=P(B│A_1 )P(A_1 )+P(B│A_2 )P(A_2 )+⋯ +P(B│A_n )P(A_n) $, where the events $ {A_1,…,A_n} $ partition the sample space $S$.
  • Inverting the order of conditioning: $ P(A \cap B) = P(A | B) \times P(B) = P(B | A) \times P(A) $.
    • Suppose a Laboratory blood test is used as evidence for a disease. Assume P(positive Test| Disease) = 0.95, P(positive Test| no Disease)=0.01 and P(Disease) = 0.005. Find P(Disease|positive Test)=? Denote D = {the test person has the disease}, $D^c$ = {the test person does not have the disease} and T = {the test result is positive}. Then

$$P(D | T) = {P(T | D) P(D) \over P(T)} = {P(T | D) P(D) \over P(T|D)P(D) + P(T|D^c)P(D^c)}=$$ $$={0.95\times 0.005 \over {0.95\times 0.005 +0.01\times 0.995}}=0.32312$$

  • Bayesian Rule: If $ {A_1,…,A_n} $ partition the sample space $S$, and $A$ and $B$ are any events (i.e., subsets of $S$) then we have:

$$ P(A | B) = {P(B | A) P(A) \over P(B)} = {P(B | A) P(A) \over P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + \cdots + P(B|A_n)P(A_n)}. $$

Counting

Counting principles are very useful in probability theory. Consider picking 3 students from a total of 26 students named A to Z.

  • Permutation: Permutation is the rearrangement of objects in distinguishable sequences. Each unique ordering is called a permutation. For example $\{A, B, D\}$ is different from $\{D, A, B\}$. There are $3!=6$ permutations of students A, B and D.
    • Permutation with repetitions (replacement): If the ordering of objects matters and an object can be chosen more than once, then the number of permutations is $ n^r $, where n is the number of objects from which you can choose and r is the number of objects you choose. In our example above, we have $ 26^3 $ permutations with repetitions.
    • Permutation without repetitions (replacement): If the order matters and each object can be chosen only once, then the number of permutation is $ n(n-1)…(n-r+1)=\frac{n!}{(n-r)!} $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example above, we have $26*25*24$ permutations without repetitions.
  • Combinations: A combination is an un-ordered collection of unique objects. In our example above, {A, B, D} is the same as {D, B, A}.
    • Combinations with repetitions (replacement): This is the case in which order does not matter, and an object can be chosen more than once. The number of combinations is $ {n+r-1 \choose r}= \frac{(n+r-1)!}{r!(n-1)!} $. In the example above, we have $\frac{(26+3-1)!}{3!(26-1)!}=3276$ combinations with repetitions.
    • Combinations without repetitions (replacement): This is the case in which the order does not matter, and an object can be chosen only once. The number of combinations is $ {n \choose r}=\frac{n!}{r!(n-r)!} $, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example, we have $ {26 \choose 3}=2600 $ combinations without repetitions.
  • Examples of combinations and permutations applications.

Independence vs. disjointness/mutual exclusiveness

The events $A$ and $B$ are independent if $ P(A│B)=P(A)$, that is $ P(A\cap B)=P(A)P(B) $. The events $C$ and $D$ are disjoint or mutually exclusive if $ P(C\cap D)=0 $, that is $ P(C\cup D)=P(C)+P(D) $.

These two concepts are different and should not be conflated. If two events are mutually exclusive, they cannot happen together (i.e., $P(A│B)=0)$). The occurrence of one provides information about the probability of the other; therefore, events that are mutually exclusive cannot be independent.

Consider the SOCR poker game. If we know the card we picked randomly is a queen, then the event that it is a red queen (given that it is a queen) and the event that it is a black queen (also given that it is a queen) are independent. The event that it is a black card is not mutually exclusive from the event that it is a spade.

Contingency Tables

Contingency tables provide data summaries that can be used to compute various probabilities of interest (e.g., determining conditional or marginal probabilities). These tables display sample values indexed by two different variables that may be dependent (contingent) on one another.

Melanoma Example

Here is the data of 400 Melanoma (skin cancer) Patients by Type and Site

Type Site Totals
Head and Neck Trunk Extremities
Hutchinson's melanomic freckle 22 2 10 34
Superficial 16 54 115 185
Nodular 19 33 73 125
Indeterminant 11 17 28 56
Column Totals 68 106 226 400
  • Suppose we select one out of the 400 patients in the study and we want to find the probability that the cancer is on the extremities given that it is a type of nodular: $ P = 73/125 = P(Extremities | Nodular) $
  • What is the probability that for a randomly chosen patient the cancer type is Superficial given that it appears on the Trunk?
Obesity Example

The table below shows a random sample of 103 people and their BMI (body mass index, an important biomarker of obesity and diabetes):

Cohort \ BMI <22 23-26 27-30 >30 Total
Children 12 9 ___ 3 30
Adults ___ ___ 14 5 30
Elderly 4 ___ 12 3 43
Total 19 41 ___ ___ ___

Complete the table using the given information:

  • Compute the probability that a child has a BMI <27, $ P(Child and BMI<27) $:
  • Compute the probability that a randomly chosen subject is an adult or has a BMI over 30, $ P(Adult or BMI>30) $:
  • Compute the probability that a randomly chosen subject is normal (22<BMI<27) given that the subject is an elderly, $ P(22<BMI<27 | Elderly) $:

Applications

  • Review the theory and the simulation of the Monty-Hall Problem.
  • Try various SOCR simulations.
  • This website introduces an application of probability theory through simulation. Many practical examples require probability computations of complex events. Such calculations may be carried out exactly, using the rules of probability, or approximately using estimation and/or simulations. SOCR simulations may be used to compute approximate probabilities for various processes and to compare these empirical probabilities to their exact counterparts. This article included examples of a Ball and Urn Experiment, Binomial Coin Toss Experiment, Card Experiment, Roulette Experiment, and Chuck A Luck Experiment. It is a valuable source for practicing simulations using probability theory.
  • This website offers a list of interesting articles on the topic of probability theory. It includes a general introduction to the history of probability theory and addresses a wide list of articles of the application of probability in different areas including business, medicine, economics, and biology. These short articles are a good starting place to learn about applications of probability theory in various fields.

Software

Problems

  • A box contains 6 balls; 2 are red, 2 are white, and 2 are blue. Four balls are picked at random, one at a time. Each time a ball is picked, the color is recorded, and the ball is put back in the box. If the first 3 balls are red, what color is the fourth ball most likely to be?
(a) Red
(b) White
(c) Blue
(d) Blue and white are equally likely and more likely than red.
(e) Red, blue, and white are all equally likely.
  • A coin is tossed 400 times and 170 heads are observed. This coin is __ ?
(a) Fair, because the probability of seeing that amount of heads or less is approximately 0.0013.
(b) Neither fair nor unfair. There is not enough information to determine that.
(c) Fair, because the probability of seeing that amount of heads or less is approximately 0.5
(d) Not fair, because the probability of seeing that amount of heads or less is close to 0.
  • If two events are independent, then they are automatically mutually exclusive.
(a) True
(b) False
  • If two events are mutually exclusive, then the sums of their probabilities is 1.
(a) True
(b) False
  • A professor who teaches 500 students in an introductory psychology course reports that 250 of the students have taken at least one introductory statistics course, and the other 250 have not taken any statistics courses. 200 of the students were freshmen, and the other 300 students were not freshmen. Exactly 50 of the students were freshmen who had taken at least one introductory statistics course. If you select one of these psychology students at random, what is the probability that the student is not a freshman and has never taken a statistics course?
(a) 30%
(b) 40%
(c) 50%
(d) 60%
(e) 20%
  • A professor who teaches 300 students in an introductory psychology course reports that 135 of the students have taken exactly one introductory statistics course, 60 have taken two or more introductory statistics courses, and the other 105 have not taken any statistics courses. If you select one of these psychology students at random, what is the probability that the student has taken at least one statistics class?
(a) 0.20
(b) 0.45
(c) 0.65
(d) 0.35
  • In a carnival game, a person can win a prize by guessing which one of 5 identical boxes contains the prize. After each guess, if the prize has been won, a new prize is randomly placed in one of the 5 boxes. If a person makes 4 guesses, what is the probability that the person wins a prize exactly twice?
(a) $(0.2)^2/(0.8)^2$
(b) $2(0.2)^2*(0.8)^2$
(c) $6(0.2)^2*(0.8)^2$
(d) $(0.2)^2*(0.8)^2$
(e) $2!/5!$
  • In a university with 20,000 students, 20% are engineering students, 40% are in the sciences, 30% are in the social sciences, and the rest are in other majors. The counselors in the registrar's office want to survey the opinions of students on the issue of posting grades on-line, and they seek opinions from students in various majors. They conduct a survey by randomly selecting students. Among the first three students selected, what is the probability that two of the three major in social sciences and one has a major other than social science?
(a) 0.600
(b) 0.189
(c) 0.090
(d) 0.063
  • Every five years, the Conference Board of Mathematical Sciences surveys college math departments. In a recent report, 51% of all undergraduates taking calculus were in classes using graphing calculators, and 31% were in classes using computer assignments. Suppose that 16% of these students use both calculators and computers. What proportion of undergraduates taking calculus uses no technology?
(a) 0.44
(b) 0.82
(c) 0.66
(d) 0.34
(e) 0.16
  • Two cards are dealt to you (without replacement) from an ordinary well-shuffled deck. Let X = the probability that you have a pair. Let Y = the probability that both of your cards are diamonds. Compare X and Y.
(a) X < Y
(b) X = Y
(c) X > Y
  • Poker game: How many hands would contain a full house with an AAABB-type pattern, where A and B have distinct values? How many hands are there with two pairs (i.e., an AABBC-type pattern), where A, B and C have distinct values? What is total number of 5-card hands?

References




Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif