# Difference between revisions of "SMHS Probability"

## Scientific Methods for Health Sciences - Probability Theory

### Overview

Probability theory plays an important role in statistics and its application to many other disciplines because it provides the theoretical groundwork for statistical inference. Probability theory is concerned with probability, which is the analysis of random phenomena. The central objects are random variables, stochastic processes, and events. Consider an individual coin toss, which can be considered to be a random event; if it is repeated many times, the sequence of random events will exhibit certain patterns. Probability theory helps us study and predict those patterns. Often, probability theory is further divided into two categories: discrete probability distributions and continuous probability distributions. We will study these later in the Distribution section. Here, we aim to study fundamental concepts of probability theory and define the rules of probability theory that we will apply in following studies.

### Motivation

Imagine that you are performing an experiment in which a number of outcomes are produced. This set of outcomes is called the sample space, and the power set of the sample space includes all the different collections of possible results from the experiment. Suppose we are rolling a fair die, which has 6 possible outcomes. The sample space is {1, 2, 3, 4, 5, 6}. An event is any collection of the possible results. For example, the event of rolling an even number involves the subset {2, 4, 6}, which is an element of the power set of the sample space in this experiment. What if we want to estimate the chance of rolling three 2’s in a row or the chance of rolling an odd number in an experiment? Probability is a way of assigning every event a value between 0 and 1; this value represents the chance that the event will occur.

### Theory

#### Random Sampling

A simple random sample of $n$ items is a sample in which every member of the population has an equal chance of being selected and the members of the sample are chosen independently. For example, consider a survey for which 100 students are selected to take the questionnaire from a total of 5000 students, and the chance of being selected is the same for each student. This is a simple example of random sampling. An a common application is in random number generators.

#### Types of probabilities

Probability models have two components: sample space and probabilities.

• The sample space (S) for a random experiment is the set of all possible outcomes of the experiment.
• Event: An event is a collection of outcomes.
• An event is said to occur if any outcome making up that event occurs.
• Probabilities for each event in the sample space.
• Probabilities may come from models and be mathematical and/or physical descriptions of a sample space and the chance of each event. An example of this would be a fair dice tossing game.
• Probabilities may be derived from data. Data observations can determine probability distribution. An example would be tossing a coin 50 times and observing the heads count.
• Subjective probabilities: combining data and psychological factors to design a reasonable probability table. An example may be the stock market.

#### Axioms of probability

• First axiom: The probability of an event is a non-negative real number.
• Second axiom: The probability that some elementary event in the entire sample space will occur is 1. More specifically, there are no elementary events outside the sample space $P(S)=1$.
• Third axiom: A countable sequence of pair-wise disjoint events $E_1,E_2, E_3, …$ satisfies $P(E_1 \cup E_2 \cup E_3 \cup … ) = \sum_i {P(E_i)}$.

#### Event manipulations

• Complement: The complement of event $A$ is denoted as $A^c$ or $A'$. It occurs if and only if $A$ does not occur. The union of $A$ and $A^c$ make up the entire sample space ($S$).
• Union: $A\cup B$ contains all outcomes in $A$ or $B$ (or both). $P(A\cup B)=P(A)+P(B)-P(A\cap B).$
• Intersection: $A\cap B$ contains all outcomes which are in both $A$ and $B$.
• Mutually exclusive events are events that cannot occur at the same time, $A\cap B =\emptyset$.
• Conditional Probability: The conditional probability of event $A$ occurring given that event $B$ occurs is $P(A│B)=(P(A\cap B))/(P(B))$. If $A$ and $B$ are independent then knowing $B$, or $B^c$, gives no information on the probability of $A$, i.e., $P(A│B)=P(A)$.
• Multiplication rule: For any two events, $A$ and $B$, $P(A\cap B)=P(A│B)P(B)$. In general, for $n$ events $A_1, ..., A_n$: $P(A_1 \cap A_2 \cap A_3 \cap … \cap A_n ) = P(A_1)P(A_2│A_1)P(A_3│A_1\cap A_2 ) … P(A_n│A_1\cap A_1\cap A_2\cap A_3\cap … \cap A_{n-1} )$.
• Law of total probability: $P(B)=P(B│A_1 )P(A_1 )+P(B│A_2 )P(A_2 )+⋯ +P(B│A_n )P(A_n)$, where the events ${A_1,…,A_n}$ partition the sample space $S$.
• Inverting the order of conditioning: $P(A \cap B) = P(A | B) \times P(B) = P(B | A) \times P(A)$.
• Suppose a Laboratory blood test is used as evidence for a disease. Assume P(positive Test| Disease) = 0.95, P(positive Test| no Disease)=0.01 and P(Disease) = 0.005. Find P(Disease|positive Test)=? Denote D = {the test person has the disease}, $D^c$ = {the test person does not have the disease} and T = {the test result is positive}. Then

$$P(D | T) = {P(T | D) P(D) \over P(T)} = {P(T | D) P(D) \over P(T|D)P(D) + P(T|D^c)P(D^c)}=$$ $$={0.95\times 0.005 \over {0.95\times 0.005 +0.01\times 0.995}}=0.32312$$

• Bayesian Rule: If ${A_1,…,A_n}$ partition the sample space $S$, and $A$ and $B$ are any events (i.e., subsets of $S$) then we have:

$$P(A | B) = {P(B | A) P(A) \over P(B)} = {P(B | A) P(A) \over P(B|A_1)P(A_1) + P(B|A_2)P(A_2) + \cdots + P(B|A_n)P(A_n)}.$$

#### Counting

Counting principles are very useful in probability theory. Consider picking 3 students from a total of 26 students named A to Z.

• Permutation: Permutation is the rearrangement of objects in distinguishable sequences. Each unique ordering is called a permutation. For example $\{A, B, D\}$ is different from $\{D, A, B\}$. There are $3!=6$ permutations of students A, B and D.
• Permutation with repetitions (replacement): If the ordering of objects matters and an object can be chosen more than once then the number of permutations is $n^r$, where n is the number of objects from which you can choose and r is the number of objects you choose. In our example above, we have $26^3$ permutations with repetitions.
• Permutation without repetitions (replacement): If the order matters and each object can be chosen only once, then the number of permutation is $n(n-1)…(n-r+1)=\frac{n!}{(n-r)!}$, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example above, we have $26*25*24$ permutations without repetitions.
• Combinations: A combination is an un-ordered collection of unique objects. In our example above, {A, B, D} is the same as {D, B, A}.
• Combinations with repetitions (replacement): This is the case in which order does not matter, and an object can be chosen more than once. The number of combinations is ${n+r-1 \choose r}= \frac{(n+r-1)!}{r!(n-1)!}$. In The example above, we have $\frac{(26+3-1)!}{3!(26-1)!}=3276$ combinations with repetitions.
• Combinations without repetitions (replacement): This is the case in which the order does not matter, and an object can be chosen only once. The number of combinations is ${n \choose r}=\frac{n!}{r!(n-r)!}$, where $n$ is the number of objects you can choose from and $r$ is the number of objects you choose. In our example, we have ${26 \choose 3}=2600$ combinations without repetitions.
• Examples of combinations and permutations applications.

#### Independence vs. disjointness/mutual exclusiveness

The events $A$ and $B$ are independent if $P(A│B)=P(A)$, that is $P(A\cap B)=P(A)P(B)$. The events $C$ and $D$ are disjoint or mutually exclusive if $P(C\cap D)=0$, that is $P(C\cup D)=P(C)+P(D)$.

These two concepts are different and should not be conflated. If two events are mutually exclusive, they cannot happen together (i.e., $P(A│B)=0)$). The occurrence of one provides information about the probability of the other; therefore, events that are mutually exclusive cannot be independent.

Consider the SOCR poker game. If we know the card we picked randomly is a queen, then the event that it is a red queen given that it is a queen and the event that it is a black queen given that it is a queen are independent. The event that it is a black card is not mutually exclusive from the event that it is a spade.

#### Contingency Tables

Contingency tables provide data summaries that can be used to compute various probabilities of interest, e.g., determining conditional or marginal probabilities. These tables display sample values indexed by two different variables that may be dependent (contingent) on one another.

##### Melanoma Example

Here is the data of 400 Melanoma (skin cancer) Patients by Type and Site

 Type Site Totals Head and Neck Trunk Extremities Hutchinson's melanomic freckle 22 2 10 34 Superficial 16 54 115 185 Nodular 19 33 73 125 Indeterminant 11 17 28 56 Column Totals 68 106 226 400
• Suppose we select one out of the 400 patients in the study and we want to find the probability that the cancer is on the extremities given that it is a type of nodular: P = 73/125 = P(Extremities | Nodular)
• What is the probability that for a randomly chosen patient the cancer type is Superficial given that it appears on the Trunk?
##### Obesity Example

The table below shows a random sample of 103 people and their BMI (body mass index, an important biomarker of obesity and diabetes):

Cohort \ BMI <22 23-26 27-30 >30 Total
Children 12 9 ___ 3 30
Adults ___ ___ 14 5 30
Elderly 4 ___ 12 3 43
Total 19 41 ___ ___ ___
• Complete the table using the given information
• Compute the probability that a child has a BMI <27, P(Child and BMI<27):
• Compute the probability that a randomly chosen subject is an adult or has a BMI over 30, P(Adult or BMI>30):
• Compute the probability that a randomly chosen subject is normal (22<BMI<27) given that the subject is an elderly, P(22<BMI<27 | Elderly):

### Applications

• Review the theory and the simulation of the Monty-Hall Problem.
• Try various SOCR simulations.
• This website introduces an application of probability theory through simulation. Many practical examples require probability computations of complex events. Such calculations may be carried out exactly, using the rules of probability, or approximately using estimation and/or simulations. SOCR simulations may be used to compute approximate probabilities for various processes and to compare these empirical probabilities to their exact counterparts. This article included examples of a Ball and Urn Experiment, Binomial Coin Toss Experiment, Card Experiment, Roulette Experiment, and Chuck A Luck Experiment. It is a valuable source for practicing simulations using probability theory.
• This website offers a list of interesting articles on the topic of probability theory. It includes a general introduction to the history of probability theory and addresses a wide list of articles of the application of probability in different areas including business, medicine, economics, and biology. These short articles are good starting place to learn about applications of probability theory in various fields.

### Problems

• A box contains 6 balls; 2 are red, 2 are white, and 2 are blue. Four balls are picked at random, one at a time. Each time a ball is picked, the color is recorded, and the ball is put back in the box. If the first 3 balls are red, what color is the fourth ball most likely to be?
(a) Red
(b) White
(c) Blue
(d) Blue and white are equally likely and more likely than red.
(e) Red, blue, and white are all equally likely.
• A coin is tossed 400 times and 170 heads are observed. This coin is __ ?
(a) fair, because the probability of seeing that amount of heads or less is approximately 0.0013
(b) neither fair nor unfair. There is not enough information to determine that.
(c) fair, because the probability of seeing that amount of heads or less is approximately 0.5
(d) not fair, because the probability of seeing that amount of heads or less is close to 0.
• If two events are independent, then they are automatically mutually exclusive.
(a) True
(b) False
• If two events are mutually exclusive, then the sums of their probabilities is 1.
(a) True
(b) False
• A professor who teaches 500 students in an introductory psychology course reports that 250 of the students have taken at least one introductory statistics course, and the other 250 have not taken any statistics courses. 200 of the students were freshmen, and the other 300 students were not freshmen. Exactly 50 of the students were freshmen who had taken at least one introductory statistics course. If you select one of these psychology students at random, what is the probability that the student is not a freshman and has never taken a statistics course?
(a) 30%
(b) 40%
(c) 50%
(d) 60%
(e) 20%
• A professor who teaches 300 students in an introductory psychology course reports that 135 of the students have taken exactly one introductory statistics course, 60 have taken two or more introductory statistics courses, and the other 105 have not taken any statistics courses. If you select one of these psychology students at random, what is the probability that the student has taken at least one statistics class?
(a) 0.20
(b) 0.45
(c) 0.65
(d) 0.35
• In a carnival game, a person can win a prize by guessing which one of 5 identical boxes contains the prize. After each guess, if the prize has been won, a new prize is randomly placed in one of the 5 boxes. If a person makes 4 guesses, what is the probability that the person wins a prize exactly twice?
(a) $(0.2)^2/(0.8)^2$
(b) $2(0.2)^2*(0.8)^2$
(c) $6(0.2)^2*(0.8)^2$
(d) $(0.2)^2*(0.8)^2$
(e) $2!/5!$
• In a university with 20,000 students, 20% are engineering students, 40% are in the sciences, 30% are in the social sciences, and the rest are in other majors. The counselors in the registrar's office want to survey the opinions of students on the issue of posting grades on-line, and they seek opinions from students in various majors. They conduct a survey by randomly selecting students. Among the first three students selected, what is the probability that two of the three major in social sciences and one has a major other than social science?
(a) 0.600
(b) 0.189
(c) 0.090
(d) 0.063
• Every five years, the Conference Board of Mathematical Sciences surveys college math departments. In a recent report, 51% of all undergraduates taking calculus were in classes using graphing calculators, and 31% were in classes using computer assignments. Suppose that 16% of these students use both calculators and computers. What proportion of undergraduates taking calculus uses no technology?
(a) 0.44
(b) 0.82
(c) 0.66
(d) 0.34
(e) 0.16
• Two cards are dealt to you (without replacement) from an ordinary well-shuffled deck. Let X = the probability that you have a pair. Let Y = the probability that both of your cards are diamonds. Compare X and Y.
(a) X < Y
(b) X = Y
(c) X > Y
• Poker game: How many hands would contain a full house with an AAABB-type pattern, where A and B have distinct values? How many hands are there with two pairs (i.e., an AABBC-type pattern), where A, B and C have distinct values? What is total number of 5-card hands?