Difference between revisions of "EBook Problems"

From SOCR
Jump to: navigation, search
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/")
 
(71 intermediate revisions by 2 users not shown)
Line 4: Line 4:
  
 
=I. Introduction to Statistics=
 
=I. Introduction to Statistics=
==The Nature of Data and Variation==
+
==[[AP_Statistics_Curriculum_2007_IntroVar | The Nature of Data and Variation]]==
==Uses and Abuses of Statistics==
+
Although natural phenomena in real life are unpredictable, the designs of experiments are bound to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects.
==Design of Experiments==
+
How many natural processes or phenomena in real life can we describe that have an exact mathematical closed-form description and are completely deterministic? How do we model the rest of the processes that are unpredictable and have random characteristics?
==Statistics with Tools (Calculators and Computers)==
+
===[[EBook_Problems_EDA_IntroVar | Problems]]===
  
=II. Describing, Exploring, and Comparing Data=
+
==[[AP_Statistics_Curriculum_2007_IntroUses |Uses and Abuses of Statistics]]==
==Types of Data==
+
Statistics is the science of variation, randomness and chance. As such, statistics is different from other sciences, where the processes being studied obey exact deterministic mathematical laws. Statistics provides quantitative inference represented as long-time probability values, confidence or prediction intervals, odds, chances, etc., which may ultimately be subjected to varying interpretations. The phrase ''Uses and Abuses of Statistics'' refers to the notion that in some cases statistical results may be used as evidence to seemingly opposite theses. However, most of the time, common [http://en.wikipedia.org/wiki/Logic principles of logic] allow us to disambiguate the obtained statistical inference.
==Summarizing Data with Frequency Tables==
+
===[[EBook_Problems_EDA_IntroUses | Problems]]===
==Pictures of Data==
 
'''1. Two random samples were taken to determine backpack load difference between seniors and freshmen, in pounds. The following are the summaries:  
 
  
{| border="1"
+
==[[AP_Statistics_Curriculum_2007_IntroDesign | Design of Experiments]]==
|-
+
Design of experiments is the blueprint for planning a study or experiment, performing the data collection protocol and controlling the study parameters for accuracy and consistency. Data, or information, is typically collected in regard to a specific process or phenomenon being studied to investigate the effects of some controlled variables (independent variables or predictors) on other observed measurements (responses or dependent variables). Both types of variables are associated with specific observational units (living beings, components, objects, materials, etc.)
| Year || Mean || SD || Median || Min || Max || Range || Count 
+
===[[EBook_Problems_EDA_IntroDesign | Problems]]===
|-
 
| Freshmen || 20.43 || 4.21 || 17.20 || 5.78 || 31.68 || 25.9 || 115
 
|-
 
| Senior || 18.67 || 3.56 || 18.67 || 5.31 || 27.66 || 22.35 || 157
 
|}
 
  
'''Which of the following plots would be the most useful in comparing the two sets of backpack weights?
+
==[[AP_Statistics_Curriculum_2007_IntroTools |Statistics with Tools (Calculators and Computers)]]==
 +
All methods for data analysis, understanding or visualizing are based on models that often have compact analytical representations (e.g., formulas, symbolic equations, etc.) Models are used to study processes theoretically. Empirical validations of the utility of models are achieved by inputting data and executing tests of the models. This validation step may be done manually, by computing the model prediction or model inference from recorded measurements. This process may be possible by hand, but only for small numbers of observations (<10). In practice, we write (or use existent) algorithms and computer programs that automate these calculations for better efficiency, accuracy and consistency in applying models to larger datasets.
 +
===[[EBook_Problems_EDA_IntroTools | Problems]]===
  
'''Choose One Answer:
+
=II. Describing, Exploring, and Comparing Data=
 
+
==[[AP_Statistics_Curriculum_2007_EDA_DataTypes |Types of Data ]]==
''A. Histograms
+
There are two important concepts in any data analysis - '''Population''' and '''Sample'''.
 
+
Each of these may generate data of two major types - '''Quantitative''' or '''Qualitative''' measurements.
''B. Dot Plots
+
===[[EBook_Problems_EDA_DataTypes | Problems]]===
 
 
''C. Scatter Plots
 
 
 
''D. Box Plots
 
{{hidden|Answer|''D.''}}
 
 
 
==Measures of Central Tendency==
 
'''1. Suppose that in a certain country, the average yearly income for 75% of the population is below average, what would you use as the measure of center and spread?
 
 
 
'''Choose one answer.
 
 
 
''A. Mean and interquartile range
 
 
 
''B. Mean and standard deviation
 
 
 
''C. Median and interquartile range
 
 
 
''D. Mean and standard deviation
 
 
 
 
 
==Measures of Variation==
 
'''1. The number of flaws of an electroplated automobile grill is known to have the following probability distribution:
 
 
 
{| border="1"
 
|-
 
| X || 0 || 1 || 2 || 3
 
|-
 
| P(X) || 0.8 || 0.1 || 0.05 || 0.05
 
|-
 
|}
 
 
 
'''What would be the standard deviation of the sample means if we took 100 samples, each sample with 200 grills, and computed their sample means?
 
 
 
'''Choose One Answer.
 
 
 
''A. 0.6275
 
 
 
''B. 0.0560
 
 
 
''C. None of the Above
 
 
 
''D. 0.89269
 
  
'''2. Suppose that in a certain country, the average yearly income for 75% of the population is below average, what would you use as the measure of center and spread?
+
==[[AP_Statistics_Curriculum_2007_EDA_Freq |Summarizing Data with Frequency Tables ]]==
 +
There are two important ways to describe a data set (sample from a population) - '''Graphs''' or '''Tables'''.
 +
===[[EBook_Problems_EDA_Freq | Problems]]===
  
'''Choose one answer.
+
==[[AP_Statistics_Curriculum_2007_EDA_Pics | Pictures of Data]]==
 +
There are many different ways to display and graphically visualize data. These graphical techniques facilitate the understanding of the dataset and enable the selection of an appropriate statistical methodology for the analysis of the data.
 +
===[[EBook_Problems_EDA_Pics | Problems]]===
  
''A. Mean and interquartile range
+
==[[AP_Statistics_Curriculum_2007_EDA_Center | Measures of Central Tendency]]==
 +
There are three main features of populations (or sample data) that are always critical in understanding and interpreting their distributions - Center, Spread and Shape. The main measures of centrality are Mean, Median and Mode(s).
 +
===[[EBook_Problems_EDA_Center | Problems]]===
  
''B. Mean and standard deviation
+
==[[AP_Statistics_Curriculum_2007_EDA_Var | Measures of Variation]]==
 +
There are many measures of (population or sample) spread, e.g., the range, the variance, the standard deviation, mean absolute deviation, etc. These are used to assess the dispersion or variation in the population.
 +
===[[EBook_Problems_EDA_Var | Problems]]===
  
''C. Median and interquartile range
+
==[[AP_Statistics_Curriculum_2007_EDA_Shape | Measures of Shape]]==
 +
The shape of a distribution can usually be determined by looking at a histogram of a (representative) sample from that population; Frequency Plots, Dot Plots or Stem and Leaf Displays may be helpful.
 +
===[[EBook_Problems_EDA_Shape | Problems]]===
  
''D. Mean and standard deviation
+
==[[AP_Statistics_Curriculum_2007_EDA_Statistics | Statistics]]==
 +
Variables can be summarized using statistics - functions of data samples.
 +
===[[EBook_Problems_EDA_Statistics | Problems]]===
  
 
+
==[[AP_Statistics_Curriculum_2007_EDA_Plots | Graphs and Exploratory Data Analysis]] ==
 
+
Graphical visualization and interrogation of data are critical components of any reliable method for statistical modeling, analysis and interpretation of data.
==Measures of Shape==
+
===[[EBook_Problems_EDA_Plots | Problems]]===
==Statistics==
 
==Graphs and Exploratory Data Analysis==
 
  
 
=III. Probability=
 
=III. Probability=
==Fundamentals==
+
Probability is important in many studies and disciplines because measurements, observations and findings are often influenced by variation. In addition, probability theory provides the theoretical groundwork for statistical inference.  
 
 
==Rules for Computing Probabilities==
 
'''1. A professor who teaches 500 students in an introductory psychology course reports that 250 of the students have taken at least one introductory statistics course, and the other 250 have not taken any statistics courses. 200 of the students were freshmen, and the other 300 students were not freshmen. Exactly 50 of the students were freshmen who had taken at least one introductory statistics course.'''
 
 
 
'''If you select one of these psychology students at random, what is the probability that the student is not a freshman and has never taken a statistics course?'''
 
 
 
''A. 30%''
 
 
 
''B. 40%''
 
 
 
''C. 50%''
 
 
 
''D. 60%''
 
 
 
''E. 20%''
 
 
 
'''2. A box contains 30 pens, where 5 are red, 14 are black, and 11 are blue. If you pick three pens from the box at random without replacement, what is the probability that these three pens will all be black?'''
 
 
 
'''Choose one answer.'''
 
 
 
''A. 14/30 + 14/30 + 14/30''
 
 
 
''B. 14/30 + 13/29 + 12/28''
 
 
 
''C. 14/30 x 13/29 x 12/28''
 
 
 
''D. 1 - (14/30 x 13/29 x 12/28)''
 
 
 
'''3. When three fair dice are simultaneously thrown, which of these three results is least likely to be obtained?'''
 
 
 
'''Choose one answer.'''
 
 
 
''A. All three results are equally unlikely.''
 
 
 
''B. Two fives and a 3 in any order.''
 
 
 
''C. A 5, a 3 and a 6 in any order.''
 
 
 
''D. Three 5's.''
 
 
 
'''4. Suppose that you take a three question "true/false" quiz for which you are completely unprepared. You have to guess the correct answer for each question. What is the probability of answering at least one question correctly?'''
 
 
 
'''Choose one answer.'''
 
 
 
''A. 4/8''
 
 
 
''B. 5/8''
 
  
''C. 7/8''
+
==[[AP_Statistics_Curriculum_2007_Prob_Basics |Fundamentals]]==
 +
Some fundamental concepts of probability theory include random events, sampling, types of probabilities, event manipulations and axioms of probability.
 +
===[[EBook_Problems_Prob_Basics | Problems]]===
  
''D. 1/8''
+
==[[AP_Statistics_Curriculum_2007_Prob_Rules | Rules for Computing Probabilities]]==
 +
There are many important rules for computing probabilities of composite events. These include conditional probability, statistical independence, multiplication and addition rules, the law of total probability and the Bayesian rule.
 +
===[[EBook_Problems_Prob_Rules| Problems]]===
  
''E. 3/8''
+
==[[AP_Statistics_Curriculum_2007_Prob_Simul |Probabilities Through Simulations]]==
 +
Many experimental setting require probability computations of complex events. Such calculations may be carried out exactly, using theoretical models, or approximately, using estimation or simulations.
 +
===[[EBook_Problems_Prob_Simul | Problems]]===
  
==Probabilities Through Simulations==
+
==[[AP_Statistics_Curriculum_2007_Prob_Count |Counting]]==
==Counting==
+
There are many useful counting principles (including permutations and combinations) to compute the number of ways that certain arrangements of objects can be formed. This allows counting-based estimation of probabilities of complex events.
 +
===[[EBook_Problems_Prob_Count | Problems]]===
  
 
=IV. Probability Distributions=
 
=IV. Probability Distributions=
==Random Variables==
+
There are two basic types of processes that we observe in nature - '''Discrete''' and '''Continuous'''. We begin by discussing several important discrete random processes, emphasizing the different distributions, expectations, variances and applications. In the [[AP_Statistics_Curriculum_2007#Chapter_V:_Normal_Probability_Distribution | next chapter]], we will discuss their continuous counterparts. The complete list of all [[About_pages_for_SOCR_Distributions |SOCR Distributions is available here]].
==Expectation(Mean) and Variance)==
 
  
  
'''1. Ming’s Seafood Shop stocks live lobsters. Ming pays $6.00 for each lobster and sells each one for $12.00. The demand X for these lobsters in a given day has the following probability mass function.'''
+
==[[AP_Statistics_Curriculum_2007_Distrib_RV | Random Variables]]==
 +
To simplify the calculations of probabilities, we will define the concept of a '''random variable''' which will allow us to study uniformly various processes with the same mathematical and computational techniques.
 +
===[[EBook_Problems_Distrib_RV | Problems]]===
  
{| border="1"
+
==[[AP_Statistics_Curriculum_2007_Distrib_MeanVar | Expectation (Mean) and Variance]]==
|-
+
The expectation and the variance for any discrete random variable or process are important measures of [[AP_Statistics_Curriculum_2007#Measures_of_Central_Tendency | Centrality]] and [[AP_Statistics_Curriculum_2007#Measures_of_Variation |Dispersion]]. This section also presents the definitions of some common population- or sample-based moments.
| X || 0 || 1 || 2 || 3 || 4 || 5 
+
===[[EBook_Problems_Distrib_MeanVar | Problems]]===
|-
 
| P(x) || 0.05 || 0.15 || 0.30 || 0.20 || 0.20 || 0.1
 
|}
 
  
'''What is the Expected Demand?'''
+
==[[AP_Statistics_Curriculum_2007_Distrib_Binomial |Bernoulli and Binomial Experiments]]==
 +
The '''Bernoulli''' and '''Binomial''' processes provide the simplest models for discrete random experiments.
 +
===[[EBook_Problems_Distrib_Binomial | Problems]]===
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_Distrib_Multinomial |Multinomial Experiments]]==
 +
'''Multinomial processes''' extend the [[AP_Statistics_Curriculum_2007_Distrib_Binomial |Binomial experiments]] for the situation of multiple possible outcomes.
 +
===[[EBook_Problems_Distrib_Multinomial | Problems]]===
 +
==[[AP_Statistics_Curriculum_2007_Distrib_Dists |Geometric, Hypergeometric and Negative Binomial]]==
 +
The '''Geometric, Hypergeometric and Negative Binomial distributions''' provide computational models for calculating probabilities for a large number of experiment and random variables. This section presents the theoretical foundations and the applications of each of these discrete distributions.
 +
===[[EBook_Problems_Distrib_Dists | Problems]]===
  
''A. 13.5''
+
==[[AP_Statistics_Curriculum_2007_Distrib_Poisson |Poisson Distribution]]==
 
+
The '''Poisson distribution''' models many different discrete processes where the probability of the observed phenomenon is constant in time or space. Poisson distribution may be used as an approximation to the Binomial distribution.
''B. 3.1''
+
===[[EBook_Problems_Distrib_Poisson | Problems]]===
 
 
''C. 2.65''
 
 
 
''D. 5.2''
 
==Bernoulli and Binomial Experiments==
 
==Multinomial Experiments==
 
==Geometric, Hypergeometric, and Negative Binomial==
 
==Poisson Distribution==
 
  
 
=V. Normal Probability Distribution=
 
=V. Normal Probability Distribution=
==The Standard Normal Distribution==
+
The Normal Distribution is perhaps the most important model for studying quantitative phenomena in the natural and behavioral sciences - this is due to the [[AP_Statistics_Curriculum_2007_Limits_CLT | Central Limit Theorem]]. Many numerical measurements (e.g., weight, time, etc.) can be well approximated by the normal distribution.  
'''1. Weight is a measure that tends to be normally distributed. Suppose the mean weight of all women at a large university is 135 pounds, with a standard deviation of 12 pounds. If you were to randomly sample 9 women at the university, there would be a 68% chance that the sample mean weight would be between:'''
 
 
 
'''Choose one answer.'''
 
 
 
''A. 131 and 139 pounds.''
 
  
''B. 133 and 137 pounds.''
+
==[[AP_Statistics_Curriculum_2007_Normal_Std |The Standard Normal Distribution]]==
 +
The Standard Normal Distribution is the simplest version (zero-mean, unit-standard-deviation) of the (General) Normal Distribution. Yet, it is perhaps the most frequently used version because many tables and computational resources are explicitly available for calculating probabilities.
 +
===[[EBook_Problems_Normal_Std | Problems]]===
  
''C. 119 and 151 pounds''
+
==[[AP_Statistics_Curriculum_2007_Normal_Prob |Nonstandard Normal Distribution: Finding Probabilities]]==
 +
In practice, the mechanisms underlying natural phenomena may be unknown, yet the use of the normal model can be theoretically justified in many situations to compute critical and probability values for various processes.
 +
===[[EBook_Problems_Normal_Prob | Problems]]===
  
''D. 125 and 145 pounds.''
+
==[[AP_Statistics_Curriculum_2007_Normal_Critical |Nonstandard Normal Distribution: Finding Scores (Critical Values)]]==
 +
In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal Distribution for a given p-value.
 +
===[[EBook_Problems_Normal_Critical | Problems]]===
  
''E. 123 and 147 pounds.''
+
==[[AP_Statistics_Curriculum_2007_MultivariateNormal |Multivariate Normal Distribution]]==
==Nonstandard Normal Distribution: Finding Probabilities==
+
The multivariate normal distribution (also known as multivariate Gaussian distribution) is a generalization of the [[AP_Statistics_Curriculum_2007_Normal_Prob|univariate (one-dimensional) normal distribution]] to higher dimensions (2D, 3D, etc.) The multivariate normal distribution is useful in studies of correlated real-valued random variables.
==Nonstandard Normal Distribution: Finding Scores(Critical Values)==
+
===[[EBook_Problems_MultivariateNormal | Problems]]===
  
 
=VI. Relations Between Distributions=
 
=VI. Relations Between Distributions=
==The Central Limit Theorem==
+
In this chapter, we will explore the relations between different distributions. This knowledge will help us to compute difficult probabilities using reasonable approximations and identify appropriate probability models, graphical and statistical analysis tools for data interpretation.
==Law of Large Numbers==
+
The complete list of all [[About_pages_for_SOCR_Distributions |SOCR Distributions is available here]] and the [http://socr.ucla.edu/htmls/SOCR_Distributome.html SOCR Distributome applet] provides an interactive graphical interface for exploring the relations between different distributions.
==Normal Distribution as Approximation to Binomial Distribution==
 
==Poisson Approximation to Binomial Distribution==
 
==Binomial Approximation to Hypergeometric==
 
==Normal Approximation to Poisson==
 
  
=VII. Point and Interval Estimates=
+
==[[AP_Statistics_Curriculum_2007_Limits_CLT |The Central Limit Theorem]]==
==Method of Moments and Maximum Likelihood Estimation==
+
The exploration of the relation between different distributions begins with the study of the '''sampling distribution of the sample average'''. This will demonstrate the universally important role of normal distribution.
==Estimating a Population Mean: Large Samples==
+
===[[EBook_Problems_Limits_CLT | Problems]]===
'''1. Two researchers are going to take a sample of data from the same population of physics students. Researcher A will select a random sample of students from among all students taking physics. Researcher B's sample will consist only of the students in her class. Both researchers will construct a 95% confidence interval for the mean score on the physics final exam using their own sample data. Which researcher's method has a 95% chance of capturing the true mean of the population of all students taking physics?'''
 
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_Limits_LLN |Law of Large Numbers]]==
 +
Suppose the relative frequency of occurrence of one event whose probability to be observed at each experiment is ''p''. If we repeat the same experiment over and over, the ratio of the observed frequency of that event to the total number of repetitions converges towards ''p'' as the number of experiments increases. Why is that and why is this important?
 +
===[[EBook_Problems_Limits_LLN | Problems]]===
  
''A. Research B''
+
==[[AP_Statistics_Curriculum_2007_Limits_Norm2Bin |Normal Distribution as Approximation to Binomial Distribution]]==
 +
Normal Distribution provides a valuable approximation to Binomial when the sample sizes are large and the probability of successes and failures are not close to zero.
 +
===[[EBook_Problems_Limits_Norm2Bin | Problems]]===
  
''B. Researcher A''
+
==[[AP_Statistics_Curriculum_2007_Limits_Poisson2Bin |Poisson Approximation to Binomial Distribution]]==
 +
Poisson provides an approximation to Binomial Distribution when the sample sizes are large and the probability of successes or failures is close to zero.
 +
===[[EBook_Problems_Limits_Poisson2Bin | Problems]]===
  
''C. Both methods have a 95% chance of capturing the true mean''
+
==[[AP_Statistics_Curriculum_2007_Limits_Bin2HyperG |Binomial Approximation to Hypergeometric]]==
 +
Binomial Distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the population sizes are large (relative to the sample size) and the probability of successes is not close to zero.
 +
===[[EBook_Problems_Limits_Bin2HyperG | Problems]]===
  
''D. Neither''
+
==[[AP_Statistics_Curriculum_2007_Limits_Norm2Poisson |Normal Approximation to Poisson]]==
==Estimating a Population Mean: Small Samples==
+
The Poisson can be approximated fairly well by Normal Distribution when λ is large.
==Student's T Distribution==
+
===[[EBook_Problems_Limits_Norm2Poisson | Problems]]===
==Estimating a Population Proportion==
 
==Estimating a Population Variance==
 
  
=VIII. Hypothesis Testing=
+
=VII. Point and Interval Estimates=
==Fundamentals of Hypothesis Testing==
+
Estimation of population parameters is critical in many applications. Estimation is most frequently carried in terms of point-estimates or interval (range) estimates for population parameters that are of interest.
'''1. Suppose you were hired to conduct a study to find out which of two brands of soda college students think taste better. In your study, students are given a blind taste test. They rate one brand and then rated the other, in random order. The ratings are given on a scale of 1 (awful) to 5 (delicious). Which type of test would be the best to compare these ratings?'''
 
  
''A. One-Sample t''
+
==[[AP_Statistics_Curriculum_2007_Estim_MOM_MLE |Method of Moments and Maximum Likelihood Estimation]]==
 +
There are many ways to obtain point (value) estimates of various population parameters of interest, using observed data from the specific process we study. The '''method of moments''' and the '''maximum likelihood estimation''' are among the most popular ones frequently used in practice.
 +
===[[EBook_Problems_Estim_MOM_MLE | Problems]]===
  
''B. Chi-Square''
+
==[[AP_Statistics_Curriculum_2007_Estim_L_Mean |Estimating a Population Mean: Large Samples]]==
 +
This section discusses how to find point and interval estimates when the sample-sizes are large.
 +
===[[EBook_Problems_Estim_L_Mean | Problems]]===
  
''C. Paired Difference t''
+
==[[AP_Statistics_Curriculum_2007_Estim_S_Mean |Estimating a Population Mean: Small Samples]]==
 +
Next, we discuss point and interval estimates when the sample-sizes are small. Naturally, the point estimates are less precise and the interval estimates produce wider intervals, compared to the case of large-samples.
 +
===[[EBook_Problems_Estim_S_Mean | Problems]]===
  
''D. Two-Sample t''
+
==[[AP_Statistics_Curriculum_2007_StudentsT |Student's T distribution]]==
 +
The '''Student's T-Distribution''' arises in the problem of estimating the mean of a normally distributed population when the sample size is small and the population variance is unknown.
 +
===[[EBook_Problems_StudentsT | Problems]]===
  
'''2. USA Today's AD Track examined the effectiveness of the new ads involving the Pets.com Sock Puppet (which is now extinct). In particular, they conducted a nationwide poll of 428 adults who had seen the Pets.com ads and asked for their opinions. They found that 36% of the respondents said they liked the ads. Suppose you increased the sample size for this poll to 1000, but you had the same sample percentage who like the ads (36%). How would this change the p-value of the hypothesis test you want to conduct?
+
==[[AP_Statistics_Curriculum_2007_Estim_Proportion |Estimating a Population Proportion]]==
 +
'''Normal Distribution''' is appropriate model for proportions, when the sample size is large enough. In this section, we demonstrate how to obtain point and interval estimates for population proportion.
  
'''Choose One Answer.
+
===[[EBook_Problems_Estim_Proportion | Problems]]===
  
''A. No way to tell
+
==[[AP_Statistics_Curriculum_2007_Estim_Var |Estimating a Population Variance]]==
 +
In many processes and experiments, controlling the amount of variance is of critical importance. Thus the ability to assess variation, using point and interval estimates, facilitates our ability to make inference, revise manufacturing protocols, improve clinical trials, etc.
 +
===[[EBook_Problems_Estim_Var | Problems]]===
  
''B. The new p-value would be the same as before
+
=VIII. Hypothesis Testing=
 +
'''Hypothesis Testing''' is a statistical technique for decision making regarding populations or processes based on experimental data. It quantitatively answers the possibility that chance alone might be responsible for the observed discrepancy between a theoretical model and the empirical observations.
  
''C. The new p-value would be smaller than before
+
==[[AP_Statistics_Curriculum_2007_Hypothesis_Basics |Fundamentals of Hypothesis Testing]]==
 +
In this section, we define the core terminology necessary to discuss Hypothesis Testing (Null and Alternative Hypotheses, Type I and II errors, Sensitivity, Specificity, Statistical Power, etc.)
 +
===[[EBook_Problems_Hypothesis_Basics | Problems]]===
  
''D. The new p-value would be larger than before
+
==[[AP_Statistics_Curriculum_2007_Hypothesis_L_Mean |Testing a Claim about a Mean: Large Samples]]==
 +
As we already saw how to construct point and interval estimates for the population mean in the large sample case, we now show how to do hypothesis testing in the same situation.
 +
===[[EBook_Problems_Hypothesis_L_Mean | Problems]]===
  
'''3. A marketing director for a radio station collects a random sample of three hundred 18 to 25 year-olds and two hundred and fifty 25 to 40 year-olds. She records the percent of each group that had purchased music online in the last 30 days. She performs a hypothesis test, and the p-value of her test turns out to be 0.15. From this she should conclude:'''
+
==[[AP_Statistics_Curriculum_2007_Hypothesis_S_Mean |Testing a Claim about a Mean: Small Samples]]==
 +
We continue with the discussion on inference for the population mean for small samples.
 +
===[[EBook_Problems_Hypothesis_S_Mean | Problems]]===
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_Hypothesis_Proportion |Testing a Claim about a Proportion]]==
 +
When the sample size is large, the sampling distribution of the sample proportion <math>\hat{p}</math> is approximately Normal, by [[AP_Statistics_Curriculum_2007_Limits_CLT | CLT]]. This helps us formulate hypothesis testing protocols and compute the appropriate statistics and p-values to assess significance.
 +
===[[EBook_Problems_Hypothesis_Proportion | Problems]]===
  
''A. that about 15% more people purchased on-line music in the younger group than in the older group.''
+
==[[AP_Statistics_Curriculum_2007_Hypothesis_Var |Testing a Claim about a Standard Deviation or Variance]]==
 +
The significance testing for the variation or the standard deviation of a process, a natural phenomenon or an experiment is of paramount importance in many fields. This chapter provides the details for formulating testable hypotheses, computation, and inference on assessing variation.
 +
===[[EBook_Problems_Hypothesis_Var | Problems]]===
  
''B. there is insufficient evidence to conclude that there is a difference in the proportion of on-line music purchases in the younger and older group.''
+
=IX. Inferences from Two Samples=
 +
In this chapter, we continue our pursuit and study of significance testing in the case of having two populations. This expands the possible applications of one-sample hypothesis testing we saw in the [[EBook#Chapter_VIII:_Hypothesis_Testing | previous chapter]].
  
''C. the proportion of on-line music purchasers is the same in the under-25 year-old group as in the older group.''
+
==[[AP_Statistics_Curriculum_2007_Infer_2Means_Dep |Inferences About Two Means: Dependent Samples]]==
 +
We need to clearly identify whether samples we compare are '''Dependent''' or '''Independent''' in all study designs. In this section, we discuss one specific dependent-samples case - '''Paired Samples'''.
 +
===[[EBook_Problems_Infer_2Means_Dep | Problems]]===
  
''D. the probability of getting the same results again is 0.15.''
+
==[[AP_Statistics_Curriculum_2007_Infer_2Means_Indep |Inferences About Two Means: Independent Samples]]==
 +
'''Independent''' Samples designs refer to experiments or observations where all measurements are individually independent from each other within their groups and the groups are independent. In this section, we discuss inference based on independent samples.
 +
===[[EBook_Problems_Infer_2Means_Indep | Problems]]===
  
'''4. If we want to estimate the mean difference in scores on a pre-test and post-test for a sample of students, how should we proceed?'''
+
==[[AP_Statistics_Curriculum_2007_Infer_BiVar |Comparing Two Variances]]==
 +
In this section, we compare '''variances (or standard deviations)''' of two populations using randomly sampled data.
 +
===[[EBook_Problems_Infer_BiVar | Problems]]===
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_Infer_2Proportions |Inferences about Two Proportions]]==
 +
This section presents the '''significance testing''' and '''inference on equality''' of proportions from two independent populations.
 +
===[[EBook_Problems_Infer_2Proportions | Problems]]===
  
''A. We should construct a confidence interval or conduct a hypothesis test''
+
=X. Correlation and regression=
 +
Many scientific applications involve the analysis of relationships between two or more variables involved in a process of interest. We begin with the simplest of all situations where '''Bivariate Data''' (X and Y) are measured for a process and we are interested on determining the association, relation or an appropriate model for these observations (e.g., fitting a straight line to the pairs of (X,Y) data).
  
''B. We should collect one sample, two samples, or conduct a paired data procedure''
+
==[[AP_Statistics_Curriculum_2007_GLM_Corr |Correlation]]==
 +
The '''Correlation''' between X and Y represents the first bivariate model of association which may be used to make predictions.
 +
===[[EBook_Problems_GLM_Corr | Problems]]===
  
''C. We should calculate a z or a t statistic''
+
==[[AP_Statistics_Curriculum_2007_GLM_Regress |Regression]]==
 +
We are now ready to discuss the modeling of linear relations between two variables using '''Regression Analysis'''. This section demonstrates this methodology for the SOCR California Earthquake dataset.
 +
===[[EBook_Problems_GLM_Regress | Problems]]===
  
'''5. The paint used to make lines on roads must reflect enough light to be clearly visible at night. Let mu denote the true average reflectometer reading for a new type of paint under consideration. A test of the null hypothesis that mu = 20 versus the alternative hypothesis that mu > 20 will be based on a random sample of size n from a normal population distribution. In which of the following scenarios is there significant evidence that mu is larger than 20?'''
+
==[[AP_Statistics_Curriculum_2007_GLM_Predict |Variation and Prediction Intervals]]==
 +
In this section, we discuss point and interval estimates about the slope of linear models.
 +
===[[EBook_Problems_GLM_Predict | Problems]]===
  
'''(i) n=15, t=3.2, alpha=0.05'''
+
==[[AP_Statistics_Curriculum_2007_GLM_MultLin |Multiple Regression]]==
 +
Now, we are interested in determining linear regressions and multilinear models of the relationships between one dependent variable Y and many independent variables <math>X_i</math>.
 +
===[[EBook_Problems_GLM_MultLin | Problems]]===
  
'''(ii) n=9, t=1.8, alpha=0.01'''
+
=XI. Analysis of Variance (ANOVA)=
  
'''(iii) n=24, t=-0.2, alpha=0.01'''
+
==[[AP_Statistics_Curriculum_2007_ANOVA_1Way | One-Way ANOVA]]==
 +
We now expand our inference methods to study and compare ''k'' '''independent''' samples. In this case, we will be decomposing the entire variation in the data into independent components.
 +
===[[EBook_Problems_ANOVA_1Way | Problems]]===
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_ANOVA_2Way | Two-Way ANOVA]]==
 +
Now we focus on decomposing the variance of a dataset into (independent/orthogonal) components when we have two (grouping) factors. This procedure called '''Two-Way Analysis of Variance'''.
 +
===[[Ebook_Problems_ANOVA_2Way | Problems]]===
  
''A. (ii) and (iii)''
+
=XII. Non-Parametric Inference=
 +
To be valid, many statistical methods impose (parametric) requirements about the format, parameters and distributions of the data to be analyzed. For instance, the [[AP_Statistics_Curriculum_2007_Infer_2Means_Indep | Independent T-Test]] requires the distributions of the two samples to be Normal, whereas Non-Parametric (distribution-free) statistical methods are often useful in practice, and are [[AP_Statistics_Curriculum_2007_Hypothesis_Basics | less-powerful]].
 +
==[[AP_Statistics_Curriculum_2007_NonParam_2MedianPair | Differences of Medians (Centers) of Two Paired Samples]]==
 +
The '''Sign Test''' and the '''Wilcoxon Signed Rank Test''' are the simplest non-parametric tests which are also alternatives to the [[AP_Statistics_Curriculum_2007_Infer_2Means_Dep | One-Sample and Paired T-Test]]. These tests are applicable for paired designs where the data is not required to be normally distributed.
 +
===[[EBook_Problems_NonParam_2MedianPair | Problems]]===
  
''B. (i)''
+
==[[AP_Statistics_Curriculum_2007_NonParam_2MedianIndep | Differences of Medians (Centers) of Two Independent Samples]]==
 +
The '''Wilcoxon-Mann-Whitney (WMW) Test''' (also known as Mann-Whitney U Test, Mann-Whitney-Wilcoxon Test, or Wilcoxon rank-sum Test) is a ''non-parametric'' test for assessing whether two samples come from the same distribution.
 +
===[[EBook_Problems_NonParam_2MedianIndp | Problems]]===
  
''C. (iii)''
+
==[[AP_Statistics_Curriculum_2007_NonParam_2PropIndep | Differences of Proportions of Two Samples]]==
 +
Depending upon whether the samples are dependent or independent, we use different statistical tests.
 +
===[[EBook_Problems_NonParam_2PropIndep | Problems]]===
  
''D. (ii)''
+
==[[AP_Statistics_Curriculum_2007_NonParam_ANOVA | Differences of Means of Several Independent Samples]]==
==Testing a Claim About a Mean: Large Samples==
+
We now extend the [[EBook#Chapter_XI:_Analysis_of_Variance_.28ANOVA.29 | multi-sample inference which we discussed in the ANOVA section]], to the situation where the [[AP_Statistics_Curriculum_2007_ANOVA_1Way#ANOVA_Conditions| ANOVA assumptions]] are invalid.
'''1. Hong is a pharmacist studying the effect of an anti-depressant drug. She organizes a simple random sample of 100 patients, and then collect their anxiety test scores before and after administering the anti-depressant drug. Hong wants to estimate the mean difference between the pre-drug and post-drug test scores. How should she proceed?'''
+
===[[EBook_Problems_NonParam_ANOVA | Problems]]===
  
'''Choose one answer.'''
+
==[[AP_Statistics_Curriculum_2007_NonParam_VarIndep | Differences of Variances of Independent Samples (Variance Homogeneity)]]==
 +
There are several tests for variance equality in ''k'' samples. These tests are commonly known as tests for '''Homogeneity of Variances'''.
 +
===[[EBook_Problems_NonParam_VarIndep | Problems]]===
  
''A. She should compute a confidence interval or conduct a hypothesis test''
+
=XIII. Multinomial Experiments and Contingency Tables=
 
 
''B. She should calculate the z or the t statistics''
 
 
 
''C. She should compute the correlation between the two samples''
 
 
 
''D. Not enough information to tell''
 
==Testing a Claim About a Mean: Small Samples==
 
==Testing a Claim About a Proportion==
 
==Testing a Claim About a Standard Deviation or Variance==
 
 
 
=IX. Inferences from Two Samples=
 
==Inferences About Two Means: Dependent Samples==
 
==Inferences About Two Means: Independent Samples==
 
==Comparing Two Variances==
 
==Inferences About Two Proportions==
 
 
 
=X. Correlation and regression=
 
==Correlation==
 
'''1. A positive correlation between two variables X and Y means that if X increases, this will cause the value of Y to increase.'''
 
 
 
''A. This is always true.''
 
 
 
''B. This is sometimes true.''
 
 
 
''C. This is never true.''
 
{{hidden|Answer|''C.''}}
 
 
 
'''2. The correlation between high school algebra and geometry scores was found to be + 0.8. Which of the following statements is not true?'''
 
 
 
''A. Most of the students who have above average scores in algebra also have above average scores in geometry. ''
 
 
 
''B. Most people who have above average scores in algebra will have below average scores in geometry ''
 
 
 
''C. If we increase a student's score in algebra (ie. with extra tutoring in algebra), then the student's geometry scores will always increase accordingly.''
 
 
 
''D. Most students who have below average scores in algebra also have below average scores in geometry. ''
 
{{hidden|Answer|''C.''}}
 
 
 
==Regression==
 
==Variation and Prediction Intervals==
 
==Multiple Regression==
 
  
=XI. Analysis of Variance (ANOVA)=
+
==[[AP_Statistics_Curriculum_2007_Contingency_Fit |Multinomial Experiments: Goodness-of-Fit]]==
==One-Way ANOVA==
+
The '''Chi-Square Test''' is used to test if a data sample comes from a population with specific characteristics.
==Two-Way ANOVA==
+
===[[EBook_Problems_Contingency_Fit | Problems]]===
  
=XII. Non-Parametric Inference=
+
==[[AP_Statistics_Curriculum_2007_Contingency_Indep |Contingency Tables: Independence and Homogeneity]]==
==Differences of Medians (Centers) of Two Paired Samples==
+
The '''Chi-Square Test''' may also be used to test for independence (or association) between two variables.
==Differences of Medians (Centers) of Two Independent Samples==
+
===[[EBook_Problems_Contingency_Indep | Problems]]===
==Differences of Proportions of Two Samples==
 
==Differences of Means of Several Independent Samples==
 
==Differences of Variances of Independent Samples (Variance Homogeneity)==
 
 
 
=XIII. Multinomial Experiments and Contingency Tables=
 
==Multinomial Experiments: Goodness-of-Fit==
 
==Contingency Tables: Independence and Homogeneity==
 
  
 
==References==
 
==References==
Line 357: Line 284:
 
* [http://moodle.stat.ucla.edu UCLA Statistics Moodle]
 
* [http://moodle.stat.ucla.edu UCLA Statistics Moodle]
  
{{translate|pageName=http://wiki.stat.ucla.edu/socr/index.php?title=EBook_Problems}}
+
"{{translate|pageName=http://wiki.socr.umich.edu/index.php?title=EBook_Problems}}

Latest revision as of 12:29, 3 March 2020

Contents

Probability and Statistics EBook Practice Problems

The problems provided below may be useful for practicing the concepts, methods and analysis protocols, and for self-evaluation of learning of the materials presented in the EBook.

I. Introduction to Statistics

The Nature of Data and Variation

Although natural phenomena in real life are unpredictable, the designs of experiments are bound to generate data that varies because of intrinsic (internal to the system) or extrinsic (due to the ambient environment) effects. How many natural processes or phenomena in real life can we describe that have an exact mathematical closed-form description and are completely deterministic? How do we model the rest of the processes that are unpredictable and have random characteristics?

Problems

Uses and Abuses of Statistics

Statistics is the science of variation, randomness and chance. As such, statistics is different from other sciences, where the processes being studied obey exact deterministic mathematical laws. Statistics provides quantitative inference represented as long-time probability values, confidence or prediction intervals, odds, chances, etc., which may ultimately be subjected to varying interpretations. The phrase Uses and Abuses of Statistics refers to the notion that in some cases statistical results may be used as evidence to seemingly opposite theses. However, most of the time, common principles of logic allow us to disambiguate the obtained statistical inference.

Problems

Design of Experiments

Design of experiments is the blueprint for planning a study or experiment, performing the data collection protocol and controlling the study parameters for accuracy and consistency. Data, or information, is typically collected in regard to a specific process or phenomenon being studied to investigate the effects of some controlled variables (independent variables or predictors) on other observed measurements (responses or dependent variables). Both types of variables are associated with specific observational units (living beings, components, objects, materials, etc.)

Problems

Statistics with Tools (Calculators and Computers)

All methods for data analysis, understanding or visualizing are based on models that often have compact analytical representations (e.g., formulas, symbolic equations, etc.) Models are used to study processes theoretically. Empirical validations of the utility of models are achieved by inputting data and executing tests of the models. This validation step may be done manually, by computing the model prediction or model inference from recorded measurements. This process may be possible by hand, but only for small numbers of observations (<10). In practice, we write (or use existent) algorithms and computer programs that automate these calculations for better efficiency, accuracy and consistency in applying models to larger datasets.

Problems

II. Describing, Exploring, and Comparing Data

Types of Data

There are two important concepts in any data analysis - Population and Sample. Each of these may generate data of two major types - Quantitative or Qualitative measurements.

Problems

Summarizing Data with Frequency Tables

There are two important ways to describe a data set (sample from a population) - Graphs or Tables.

Problems

Pictures of Data

There are many different ways to display and graphically visualize data. These graphical techniques facilitate the understanding of the dataset and enable the selection of an appropriate statistical methodology for the analysis of the data.

Problems

Measures of Central Tendency

There are three main features of populations (or sample data) that are always critical in understanding and interpreting their distributions - Center, Spread and Shape. The main measures of centrality are Mean, Median and Mode(s).

Problems

Measures of Variation

There are many measures of (population or sample) spread, e.g., the range, the variance, the standard deviation, mean absolute deviation, etc. These are used to assess the dispersion or variation in the population.

Problems

Measures of Shape

The shape of a distribution can usually be determined by looking at a histogram of a (representative) sample from that population; Frequency Plots, Dot Plots or Stem and Leaf Displays may be helpful.

Problems

Statistics

Variables can be summarized using statistics - functions of data samples.

Problems

Graphs and Exploratory Data Analysis

Graphical visualization and interrogation of data are critical components of any reliable method for statistical modeling, analysis and interpretation of data.

Problems

III. Probability

Probability is important in many studies and disciplines because measurements, observations and findings are often influenced by variation. In addition, probability theory provides the theoretical groundwork for statistical inference.

Fundamentals

Some fundamental concepts of probability theory include random events, sampling, types of probabilities, event manipulations and axioms of probability.

Problems

Rules for Computing Probabilities

There are many important rules for computing probabilities of composite events. These include conditional probability, statistical independence, multiplication and addition rules, the law of total probability and the Bayesian rule.

Problems

Probabilities Through Simulations

Many experimental setting require probability computations of complex events. Such calculations may be carried out exactly, using theoretical models, or approximately, using estimation or simulations.

Problems

Counting

There are many useful counting principles (including permutations and combinations) to compute the number of ways that certain arrangements of objects can be formed. This allows counting-based estimation of probabilities of complex events.

Problems

IV. Probability Distributions

There are two basic types of processes that we observe in nature - Discrete and Continuous. We begin by discussing several important discrete random processes, emphasizing the different distributions, expectations, variances and applications. In the next chapter, we will discuss their continuous counterparts. The complete list of all SOCR Distributions is available here.


Random Variables

To simplify the calculations of probabilities, we will define the concept of a random variable which will allow us to study uniformly various processes with the same mathematical and computational techniques.

Problems

Expectation (Mean) and Variance

The expectation and the variance for any discrete random variable or process are important measures of Centrality and Dispersion. This section also presents the definitions of some common population- or sample-based moments.

Problems

Bernoulli and Binomial Experiments

The Bernoulli and Binomial processes provide the simplest models for discrete random experiments.

Problems

Multinomial Experiments

Multinomial processes extend the Binomial experiments for the situation of multiple possible outcomes.

Problems

Geometric, Hypergeometric and Negative Binomial

The Geometric, Hypergeometric and Negative Binomial distributions provide computational models for calculating probabilities for a large number of experiment and random variables. This section presents the theoretical foundations and the applications of each of these discrete distributions.

Problems

Poisson Distribution

The Poisson distribution models many different discrete processes where the probability of the observed phenomenon is constant in time or space. Poisson distribution may be used as an approximation to the Binomial distribution.

Problems

V. Normal Probability Distribution

The Normal Distribution is perhaps the most important model for studying quantitative phenomena in the natural and behavioral sciences - this is due to the Central Limit Theorem. Many numerical measurements (e.g., weight, time, etc.) can be well approximated by the normal distribution.

The Standard Normal Distribution

The Standard Normal Distribution is the simplest version (zero-mean, unit-standard-deviation) of the (General) Normal Distribution. Yet, it is perhaps the most frequently used version because many tables and computational resources are explicitly available for calculating probabilities.

Problems

Nonstandard Normal Distribution: Finding Probabilities

In practice, the mechanisms underlying natural phenomena may be unknown, yet the use of the normal model can be theoretically justified in many situations to compute critical and probability values for various processes.

Problems

Nonstandard Normal Distribution: Finding Scores (Critical Values)

In addition to being able to compute probability (p) values, we often need to estimate the critical values of the Normal Distribution for a given p-value.

Problems

Multivariate Normal Distribution

The multivariate normal distribution (also known as multivariate Gaussian distribution) is a generalization of the univariate (one-dimensional) normal distribution to higher dimensions (2D, 3D, etc.) The multivariate normal distribution is useful in studies of correlated real-valued random variables.

Problems

VI. Relations Between Distributions

In this chapter, we will explore the relations between different distributions. This knowledge will help us to compute difficult probabilities using reasonable approximations and identify appropriate probability models, graphical and statistical analysis tools for data interpretation. The complete list of all SOCR Distributions is available here and the SOCR Distributome applet provides an interactive graphical interface for exploring the relations between different distributions.

The Central Limit Theorem

The exploration of the relation between different distributions begins with the study of the sampling distribution of the sample average. This will demonstrate the universally important role of normal distribution.

Problems

Law of Large Numbers

Suppose the relative frequency of occurrence of one event whose probability to be observed at each experiment is p. If we repeat the same experiment over and over, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of experiments increases. Why is that and why is this important?

Problems

Normal Distribution as Approximation to Binomial Distribution

Normal Distribution provides a valuable approximation to Binomial when the sample sizes are large and the probability of successes and failures are not close to zero.

Problems

Poisson Approximation to Binomial Distribution

Poisson provides an approximation to Binomial Distribution when the sample sizes are large and the probability of successes or failures is close to zero.

Problems

Binomial Approximation to Hypergeometric

Binomial Distribution is much simpler to compute, compared to Hypergeometric, and can be used as an approximation when the population sizes are large (relative to the sample size) and the probability of successes is not close to zero.

Problems

Normal Approximation to Poisson

The Poisson can be approximated fairly well by Normal Distribution when λ is large.

Problems

VII. Point and Interval Estimates

Estimation of population parameters is critical in many applications. Estimation is most frequently carried in terms of point-estimates or interval (range) estimates for population parameters that are of interest.

Method of Moments and Maximum Likelihood Estimation

There are many ways to obtain point (value) estimates of various population parameters of interest, using observed data from the specific process we study. The method of moments and the maximum likelihood estimation are among the most popular ones frequently used in practice.

Problems

Estimating a Population Mean: Large Samples

This section discusses how to find point and interval estimates when the sample-sizes are large.

Problems

Estimating a Population Mean: Small Samples

Next, we discuss point and interval estimates when the sample-sizes are small. Naturally, the point estimates are less precise and the interval estimates produce wider intervals, compared to the case of large-samples.

Problems

Student's T distribution

The Student's T-Distribution arises in the problem of estimating the mean of a normally distributed population when the sample size is small and the population variance is unknown.

Problems

Estimating a Population Proportion

Normal Distribution is appropriate model for proportions, when the sample size is large enough. In this section, we demonstrate how to obtain point and interval estimates for population proportion.

Problems

Estimating a Population Variance

In many processes and experiments, controlling the amount of variance is of critical importance. Thus the ability to assess variation, using point and interval estimates, facilitates our ability to make inference, revise manufacturing protocols, improve clinical trials, etc.

Problems

VIII. Hypothesis Testing

Hypothesis Testing is a statistical technique for decision making regarding populations or processes based on experimental data. It quantitatively answers the possibility that chance alone might be responsible for the observed discrepancy between a theoretical model and the empirical observations.

Fundamentals of Hypothesis Testing

In this section, we define the core terminology necessary to discuss Hypothesis Testing (Null and Alternative Hypotheses, Type I and II errors, Sensitivity, Specificity, Statistical Power, etc.)

Problems

Testing a Claim about a Mean: Large Samples

As we already saw how to construct point and interval estimates for the population mean in the large sample case, we now show how to do hypothesis testing in the same situation.

Problems

Testing a Claim about a Mean: Small Samples

We continue with the discussion on inference for the population mean for small samples.

Problems

Testing a Claim about a Proportion

When the sample size is large, the sampling distribution of the sample proportion \(\hat{p}\) is approximately Normal, by CLT. This helps us formulate hypothesis testing protocols and compute the appropriate statistics and p-values to assess significance.

Problems

Testing a Claim about a Standard Deviation or Variance

The significance testing for the variation or the standard deviation of a process, a natural phenomenon or an experiment is of paramount importance in many fields. This chapter provides the details for formulating testable hypotheses, computation, and inference on assessing variation.

Problems

IX. Inferences from Two Samples

In this chapter, we continue our pursuit and study of significance testing in the case of having two populations. This expands the possible applications of one-sample hypothesis testing we saw in the previous chapter.

Inferences About Two Means: Dependent Samples

We need to clearly identify whether samples we compare are Dependent or Independent in all study designs. In this section, we discuss one specific dependent-samples case - Paired Samples.

Problems

Inferences About Two Means: Independent Samples

Independent Samples designs refer to experiments or observations where all measurements are individually independent from each other within their groups and the groups are independent. In this section, we discuss inference based on independent samples.

Problems

Comparing Two Variances

In this section, we compare variances (or standard deviations) of two populations using randomly sampled data.

Problems

Inferences about Two Proportions

This section presents the significance testing and inference on equality of proportions from two independent populations.

Problems

X. Correlation and regression

Many scientific applications involve the analysis of relationships between two or more variables involved in a process of interest. We begin with the simplest of all situations where Bivariate Data (X and Y) are measured for a process and we are interested on determining the association, relation or an appropriate model for these observations (e.g., fitting a straight line to the pairs of (X,Y) data).

Correlation

The Correlation between X and Y represents the first bivariate model of association which may be used to make predictions.

Problems

Regression

We are now ready to discuss the modeling of linear relations between two variables using Regression Analysis. This section demonstrates this methodology for the SOCR California Earthquake dataset.

Problems

Variation and Prediction Intervals

In this section, we discuss point and interval estimates about the slope of linear models.

Problems

Multiple Regression

Now, we are interested in determining linear regressions and multilinear models of the relationships between one dependent variable Y and many independent variables \(X_i\).

Problems

XI. Analysis of Variance (ANOVA)

One-Way ANOVA

We now expand our inference methods to study and compare k independent samples. In this case, we will be decomposing the entire variation in the data into independent components.

Problems

Two-Way ANOVA

Now we focus on decomposing the variance of a dataset into (independent/orthogonal) components when we have two (grouping) factors. This procedure called Two-Way Analysis of Variance.

Problems

XII. Non-Parametric Inference

To be valid, many statistical methods impose (parametric) requirements about the format, parameters and distributions of the data to be analyzed. For instance, the Independent T-Test requires the distributions of the two samples to be Normal, whereas Non-Parametric (distribution-free) statistical methods are often useful in practice, and are less-powerful.

Differences of Medians (Centers) of Two Paired Samples

The Sign Test and the Wilcoxon Signed Rank Test are the simplest non-parametric tests which are also alternatives to the One-Sample and Paired T-Test. These tests are applicable for paired designs where the data is not required to be normally distributed.

Problems

Differences of Medians (Centers) of Two Independent Samples

The Wilcoxon-Mann-Whitney (WMW) Test (also known as Mann-Whitney U Test, Mann-Whitney-Wilcoxon Test, or Wilcoxon rank-sum Test) is a non-parametric test for assessing whether two samples come from the same distribution.

Problems

Differences of Proportions of Two Samples

Depending upon whether the samples are dependent or independent, we use different statistical tests.

Problems

Differences of Means of Several Independent Samples

We now extend the multi-sample inference which we discussed in the ANOVA section, to the situation where the ANOVA assumptions are invalid.

Problems

Differences of Variances of Independent Samples (Variance Homogeneity)

There are several tests for variance equality in k samples. These tests are commonly known as tests for Homogeneity of Variances.

Problems

XIII. Multinomial Experiments and Contingency Tables

Multinomial Experiments: Goodness-of-Fit

The Chi-Square Test is used to test if a data sample comes from a population with specific characteristics.

Problems

Contingency Tables: Independence and Homogeneity

The Chi-Square Test may also be used to test for independence (or association) between two variables.

Problems

References

"-----


Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif