EBook Problems

From SOCR
Revision as of 21:48, 7 November 2008 by IvoDinov (talk | contribs) (Pictures of Data)
Jump to: navigation, search

Contents

Probability and Statistics EBook Practice Problems

The problems provided below may be useful for practicing the concepts, methods and analysis protocols, and for self-evaluation of learning of the materials presented in the EBook.

I. Introduction to Statistics

The Nature of Data and Variation

Uses and Abuses of Statistics

Design of Experiments

Statistics with Tools (Calculators and Computers)

II. Describing, Exploring, and Comparing Data

Types of Data

Summarizing Data with Frequency Tables

Pictures of Data

There are many different ways to display and graphically visualize data. These graphical techniques facilitate the understanding of the dataset and enable the selection of an appropriate statistical methodology for the analysis of the data.

Problems

1. Two random samples were taken to determine backpack load difference between seniors and freshmen, in pounds. The following are the summaries:

Year Mean SD Median Min Max Range Count
Freshmen 20.43 4.21 17.20 5.78 31.68 25.9 115
Senior 18.67 3.56 18.67 5.31 27.66 22.35 157

Which of the following plots would be the most useful in comparing the two sets of backpack weights?

Choose One Answer:

A. Histograms

B. Dot Plots

C. Scatter Plots

D. Box Plots


2. There is a compay in which a very small minority of males (3%) receive three times the median salary of males, and a very small minority of females (3%) receive one-third of the median salary of females. What do you expect the side-by-side boxplot of male and female salaries to look like?

Choose one answer.

A. Both boxplots will be skewed and the median line will not be in the middle of any of the boxes.

B. Both boxplots will be skewed, in the case of the females the median line will be close to the top of the box and in the case of the males the median line will be closer to the bottom of the box.

C. Need to have the actual data to compare the shape of the boxplots.

D. Both boxplots will be skewed, in the case of males the median line will be close to the top of the box and for the females the median line will be closer to the bottom of the box.

3. Which of the following parameters is most sensitive to outliers?

Choose one answer.

A. Interquartile Range

B. Mode

C. Median

D. Standard Deviation

Measures of Central Tendency

1. Suppose that in a certain country, the average yearly income for 75% of the population is below average, what would you use as the measure of center and spread?

Choose one answer.

A. Mean and interquartile range

B. Mean and standard deviation

C. Median and interquartile range

D. Mean and standard deviation

2. According to a story in the Guardian newspaper in the U.K., the mean wage for a Premiership player in 2001-2002 in the U.K. was 600,000 pounds. Which of the following is most likely true?

Choose one answer.

A. About as many Premiership players make more than 600,000 pounds as make less.

B. Most Premiership players make close to 600,000 pounds.

C. Most Premiership players make less than 600,000 pounds.

D. Most Premiership players make more than 600,000 pounds.


Measures of Variation

1. The number of flaws of an electroplated automobile grill is known to have the following probability distribution:

X 0 1 2 3
P(X) 0.8 0.1 0.05 0.05

What would be the standard deviation of the sample means if we took 100 samples, each sample with 200 grills, and computed their sample means?

Choose One Answer.

A. 0.6275

B. 0.0560

C. None of the Above

D. 0.89269

2. Suppose that in a certain country, the average yearly income for 75% of the population is below average, what would you use as the measure of center and spread?

Choose one answer.

A. Mean and interquartile range

B. Mean and standard deviation

C. Median and interquartile range

D. Mean and standard deviation


Measures of Shape

Statistics

1. A recent Gallup Poll found that 23% of senior citizens exercise at least 3 times a week. The number 23% is:

Choose one answer.

A. A sample

B. An estimate of the percentage of all senior citizens who exercise in the population

C. The percentage of all senior citizens who exercise in the population

D. A parameter

Graphs and Exploratory Data Analysis

III. Probability

Fundamentals

1. In a large midwestern university with 30 different departments, the university is considering eliminating standardized scores from their admission requirements. The university wants to find out whether the students agree with this plan. They decide to randomly select 100 students from each department, send them a survey, and follow up with a phone call if they do not return the survey within a week. What kind of sampling plan did they use?

Choose one answer.

A. Stratified random sampling

B. Simple random sampling

C. Multi-stage sampling

D. Cluster sampling

Rules for Computing Probabilities

1. A professor who teaches 500 students in an introductory psychology course reports that 250 of the students have taken at least one introductory statistics course, and the other 250 have not taken any statistics courses. 200 of the students were freshmen, and the other 300 students were not freshmen. Exactly 50 of the students were freshmen who had taken at least one introductory statistics course.

If you select one of these psychology students at random, what is the probability that the student is not a freshman and has never taken a statistics course?

A. 30%

B. 40%

C. 50%

D. 60%

E. 20%

2. A box contains 30 pens, where 5 are red, 14 are black, and 11 are blue. If you pick three pens from the box at random without replacement, what is the probability that these three pens will all be black?

Choose one answer.

A. 14/30 + 14/30 + 14/30

B. 14/30 + 13/29 + 12/28

C. 14/30 x 13/29 x 12/28

D. 1 - (14/30 x 13/29 x 12/28)

3. When three fair dice are simultaneously thrown, which of these three results is least likely to be obtained?

Choose one answer.

A. All three results are equally unlikely.

B. Two fives and a 3 in any order.

C. A 5, a 3 and a 6 in any order.

D. Three 5's.

4. Suppose that you take a three question "true/false" quiz for which you are completely unprepared. You have to guess the correct answer for each question. What is the probability of answering at least one question correctly?

Choose one answer.

A. 4/8

B. 5/8

C. 7/8

D. 1/8

E. 3/8

5. Records show that in an introductory chemistry course in a college, 20% of the students get an A, 30% get a B, 40% get a C, and 10% get a D. If you pick three students at random, what is the probability that all three will get an A?

Choose one answer.

A. 0.8*0.8*0.8

B. 0.2*0.2*0.2

C. 200*0.2*0.2*0.2

D. 0.2*3

6.A newly born child is equally likely to be a boy or a girl. What is the probability that in a family of three children there are less than 3 boys?

A. 0.125

B. 0.75

C. 0.875

D. 0.5

7.A professor who teaches 300 students in an introductory psychology course reports that 135 of the students have taken exactly one introductory statistics course, 60 have taken two or more introductory statistics courses, and the other 105 have not taken any statistics courses. If you select one of these psychology students at random, what is the probability that the student has taken at least one statistics class?

Choose one answer.

A. 0.20

B. 0.45

C. 0.65

D. 0.35

8. Three fair coins are flipped. Find the probability that at least one comes up heads.

Choose one answer.

A. 7/8

B. 4/8

C. 6/8

D. 3/8

E. 5/8

Probabilities Through Simulations

Counting

IV. Probability Distributions

Random Variables

Expectation(Mean) and Variance)

1. Ming’s Seafood Shop stocks live lobsters. Ming pays $6.00 for each lobster and sells each one for $12.00. The demand X for these lobsters in a given day has the following probability mass function.

X 0 1 2 3 4 5
P(x) 0.05 0.15 0.30 0.20 0.20 0.1

What is the Expected Demand?

Choose one answer.

A. 13.5

B. 3.1

C. 2.65

D. 5.2

2. If sampling distributions of sample means are examined for samples of size 1, 5, 10, 16 and 50, you will notice that as sample size increases, the shape of the sampling distribution appears more like that of the:

Choose one answer.

A. normal distribution

B. uniform distribution

C. population distribution

D. binomial distribution

Bernoulli and Binomial Experiments

Multinomial Experiments

Geometric, Hypergeometric, and Negative Binomial

Poisson Distribution

V. Normal Probability Distribution

The Standard Normal Distribution

1. Weight is a measure that tends to be normally distributed. Suppose the mean weight of all women at a large university is 135 pounds, with a standard deviation of 12 pounds. If you were to randomly sample 9 women at the university, there would be a 68% chance that the sample mean weight would be between:

Choose one answer.

A. 131 and 139 pounds.

B. 133 and 137 pounds.

C. 119 and 151 pounds

D. 125 and 145 pounds.

E. 123 and 147 pounds.

2. The amount of money college students spend each semester on textbooks is normally distributed with a mean of $195 and a standard deviation of $20. Suppose you take a random sample of 100 college students from this population. There is a 68% chance that the sample mean amount spent on textbooks is between:

Choose one answer.

A. $193 and $197.

B. $155 and $235.

C. $191 and $199.

D. $175 and $215.

3. A researcher converts 100 lung capacity measurements to z-scores. The lung capacity measurements do not follow a normal distribution. What can we say about the standard deviation of the 100 z-scores?

Choose one answer.

A. It depends on the standard deviation of the raw scores

B. It equals 1

C. It equals 100

D. It must always be less than the standard deviation of the raw scores

E. It depends on the shape of the raw score distribution

4. The weights of packets of cookies produced by a certain manufacturer have a normal distribution with a mean of 202 grams and a standard deviation of 3 grams. What is the weight that should be stamped on the packet so that only 0.99% of packets are underweight?

Choose one answer.

A. 200

B. 195

C. 190

D. 205

5. GSP Inc. is trying two different marketing techniques for its toothpaste. In 20 test cities, it is using family branding. This sells toothpaste with a mean of 2,250 units per week and a standard deviation of 250 units per week. In 20 other test cities, GSP is using individual branding. This sells toothpaste with a mean of 2,250 units per week and a standard deviation of 500 units per week. GSP wants to select the marketing technique that sells at least 2,350 units per week more often. If the number of units sold per week follows a normal distribution, which marketing technique should GSP choose?

Choose one answer.

A. Individual Branding

B. Can't be answered with the information given

C. Family Branding

D. They each get the same result

Nonstandard Normal Distribution: Finding Probabilities

Nonstandard Normal Distribution: Finding Scores(Critical Values)

VI. Relations Between Distributions

The Central Limit Theorem

1. Which of the following would make the sampling distribution of the sample mean narrower? Check all answers that apply.

Choose at least one answer.

A. A smaller population standard deviation

B. A smaller sample size

C. A larger standard error

D. A larger sample size

E. A larger population standard deviation

Law of Large Numbers

Normal Distribution as Approximation to Binomial Distribution

Poisson Approximation to Binomial Distribution

Binomial Approximation to Hypergeometric

Normal Approximation to Poisson

VII. Point and Interval Estimates

Method of Moments and Maximum Likelihood Estimation

Estimating a Population Mean: Large Samples

Estimating a Population Mean: Small Samples

Student's T Distribution

Estimating a Population Proportion

1. A 1996 poll of 1,200 African American adults found that 708 think that the American dream has become impossible to achieve. The New Yorker magazine editors want to estimate the proportion of all African American adults who feel this way. Which of the following is an approximate 90% confidence interval for the proportion of all African American adults who feel this way?

Choose one answer.

A. (.56, .62)

B. (.57, .61)

C. Can't be calculated because the population size is too small.

D. Can't be calculated because the sample size is too small.

Estimating a Population Variance

VIII. Hypothesis Testing

Fundamentals of Hypothesis Testing

1. Suppose you were hired to conduct a study to find out which of two brands of soda college students think taste better. In your study, students are given a blind taste test. They rate one brand and then rated the other, in random order. The ratings are given on a scale of 1 (awful) to 5 (delicious). Which type of test would be the best to compare these ratings?

A. One-Sample t

B. Chi-Square

C. Paired Difference t

D. Two-Sample t

2. USA Today's AD Track examined the effectiveness of the new ads involving the Pets.com Sock Puppet (which is now extinct). In particular, they conducted a nationwide poll of 428 adults who had seen the Pets.com ads and asked for their opinions. They found that 36% of the respondents said they liked the ads. Suppose you increased the sample size for this poll to 1000, but you had the same sample percentage who like the ads (36%). How would this change the p-value of the hypothesis test you want to conduct?

Choose One Answer.

A. No way to tell

B. The new p-value would be the same as before

C. The new p-value would be smaller than before

D. The new p-value would be larger than before

3. A marketing director for a radio station collects a random sample of three hundred 18 to 25 year-olds and two hundred and fifty 25 to 40 year-olds. She records the percent of each group that had purchased music online in the last 30 days. She performs a hypothesis test, and the p-value of her test turns out to be 0.15. From this she should conclude:

Choose one answer.

A. that about 15% more people purchased on-line music in the younger group than in the older group.

B. there is insufficient evidence to conclude that there is a difference in the proportion of on-line music purchases in the younger and older group.

C. the proportion of on-line music purchasers is the same in the under-25 year-old group as in the older group.

D. the probability of getting the same results again is 0.15.

4. If we want to estimate the mean difference in scores on a pre-test and post-test for a sample of students, how should we proceed?

Choose one answer.

A. We should construct a confidence interval or conduct a hypothesis test

B. We should collect one sample, two samples, or conduct a paired data procedure

C. We should calculate a z or a t statistic

5. The paint used to make lines on roads must reflect enough light to be clearly visible at night. Let mu denote the true average reflectometer reading for a new type of paint under consideration. A test of the null hypothesis that mu = 20 versus the alternative hypothesis that mu > 20 will be based on a random sample of size n from a normal population distribution. In which of the following scenarios is there significant evidence that mu is larger than 20?

(i) n=15, t=3.2, alpha=0.05

(ii) n=9, t=1.8, alpha=0.01

(iii) n=24, t=-0.2, alpha=0.01

Choose one answer.

A. (ii) and (iii)

B. (i)

C. (iii)

D. (ii)

6. The average length of time required to complete a certain aptitude test is claimed to be 80 minutes. A random sample of 25 students yielded an average of 86.5 minutes and a standard deviation of 15.4 minutes. If we assume normality of the population distribution, is there evidence to reject the claim? (Select all that applies).

Choose at least one answer.

A. No, because the probability that the null is true is > 0.05

B. Yes, because the observed 86.5 did not happen by chance

C. Yes, because the t-test statistic is 2.11

D. Yes, because the observed 86.5 happened by chance

7. We observe the math self-esteem scores from a random sample of 25 female students. How should we determine the probable values of the population mean score for this group?

Choose one answer.

A. Test the difference in means between two paired or dependent samples.

B. Test that a correlation coefficient is not equal to 0 (correlation analysis).

C. Test the difference between two means (independent samples).

D. Test for a difference in more than two means (one way ANOVA).

E. Construct a confidence interval.

F. Test one mean against a hypothesized constant.

G. Use a chi-squared test of association.

Testing a Claim About a Mean: Large Samples

1. Hong is a pharmacist studying the effect of an anti-depressant drug. She organizes a simple random sample of 100 patients, and then collect their anxiety test scores before and after administering the anti-depressant drug. Hong wants to estimate the mean difference between the pre-drug and post-drug test scores. How should she proceed?

Choose one answer.

A. She should compute a confidence interval or conduct a hypothesis test

B. She should calculate the z or the t statistics

C. She should compute the correlation between the two samples

D. Not enough information to tell

Testing a Claim About a Mean: Small Samples

1. To test the claim that the average home in a certain town is within 5.5 miles of the nearest fire station, and insurance company measured the distances from 25 randomly selected homes to the nearest fire station and found x-bar = 5.8 miles and sd = 2.4 miles. Determine what the insurance company found out with a test of significance. Check all that apply.

Choose at least one answer.

A. There is no evidence in the data to conclude that the distance is different from 5.5.

B. The average of 5.8 miles observed is by chance.

C. We cannot reject the null.

D. There is evidence in the data to conclude that the distance is 5.5.

Testing a Claim About a Proportion

Testing a Claim About a Standard Deviation or Variance

IX. Inferences from Two Samples

Inferences About Two Means: Dependent Samples

Inferences About Two Means: Independent Samples

Comparing Two Variances

Inferences About Two Proportions

X. Correlation and regression

Correlation

1. A positive correlation between two variables X and Y means that if X increases, this will cause the value of Y to increase.

A. This is always true.

B. This is sometimes true.

C. This is never true.


2. The correlation between high school algebra and geometry scores was found to be + 0.8. Which of the following statements is not true?

A. Most of the students who have above average scores in algebra also have above average scores in geometry.

B. Most people who have above average scores in algebra will have below average scores in geometry

C. If we increase a student's score in algebra (ie. with extra tutoring in algebra), then the student's geometry scores will always increase accordingly.

D. Most students who have below average scores in algebra also have below average scores in geometry.


3. Researchers discover that the correlation between miles ran per week and cardiovascular endurance is +0.75. They also discover that the correlation between hours spent watching television per week and cardiovascular endurance is -0.75. What is the conclusion that best characterizes the result of this study?

Choose one answer.

A. Most people who spend a lot of hours watching television have low cardiovascular endurance.

B. Most people who have good cardiovascular endurance spend a lot of time running and little time watching television.

C. Based on the correlation, if you increase your running hours per week, your cardiovascular endurance will decrease.

D. Based on the correlation, if you increases your television watching time, your cardiovascular endurance will decrease.

E. Most people with a lot of miles ran per week have high cardiovascular endurance.

Regression

1. Use the information from the Heights of Fathers and Sons to write the linear model that best predicts the height of the son from the height of the father.

Choose one answer.

A. Son's height = 35 + 0.5*Father's height'

B. Son's height = 1.00 + 1.00* Father's height

C. The model cannot be determined without the actual data

D. Son's height = 0.5 + 35*Father's height

2. A congressional report investigates the relationship between income of parents and educational attainment of their daughters. Data are from a sample of families with daughters age 18-24. Average parental income is $29,300, average educational attainment of the daughters is 13.1 years of schooling completed, and the correlation is 0.37.

The regression line for predicting daughter’s education from parental income is reported as: Predicted education = 0.000617*(income) + 8.1

Is the following statement true or false? "The above line is the regression line to predict education from income."

True.

False.


Variation and Prediction Intervals

1. Two researchers are going to take a sample of data from the same population of physics students. Researcher A will select a random sample of students from among all students taking physics. Researcher B's sample will consist only of the students in her class. Both researchers will construct a 95% confidence interval for the mean score on the physics final exam using their own sample data. Which researcher's method has a 95% chance of capturing the true mean of the population of all students taking physics?

Choose one answer.

A. Research B

B. Researcher A

C. Both methods have a 95% chance of capturing the true mean

D. Neither

2. A random sample of 150 UCLA students found that 35% of the respondants wanted a elevator to replace Bruin Walk. A 95% confidence interval for the percentage of all UCLA students who feel this way is approximately:

Choose one answer.

A. (24%, 46%)

B. (32%, 38%)

C. The sample size is too small to compute a confidence interval.

D. (27%, 43%)

3. According to Terry Prachett, the short unit of time in the multiverse is the New York second, defined as the time interval between the light turning green and the cab behind you honking. A magazine took a poll of 100 New Yorkers and found that 90 people agree with that statement wholeheartedly. Which of the following is a 90% confidence interval for the proportion of people who agree with that statement?

Choose one answer.

A. 0.9 +\- 0.50

B. 0.9 +\- .05

C. 0.9 +\- .03

D. 0.9 +\- .06

4. A national poll found that 62% of all Americans agreed that more attention should be paid to mental health of war veterans. If a simple random sample of 326 people was used to make a 95% confidence interval of (0.57,0.67), what is the margin of error?

Choose one answer.

A. 0.03

B. 0.05

C. 0.12

D. In order to calculate the margin of error, we need the p-value of the population.

5. Hermione Granger is on a mission this year to complain about the astronomical cost of wizarding books to the Hogwart board of administrators. Given that the population mean for book cost is 10 and a standard deviation of 2 galleons, If Hermione were to take a simple random sample of 49 students and make a 68% confidence interval, what would be the range of values for the sample mean or Xbar?

Choose one answer.

A. 8 and 12 galleons

B. 9.4 and 10.6 galleons

C. 6 and 14 Galleons

D. 9.7 and 10.3 galleons

6. A 95% confidence interval indicates that:

Choose one answer:

A. 95% of the intervals constructed using this process based on samples from this population will include the population mean

B. 95% of the time the interval will include the sample mean

C. 95% of the possible population means will be included by the interval

D. 95% of the possible sample means will be included by the interval

7. Suppose we want to find out if a coin is not fair. To test this hypothesis we flip the coin 100 times, and in 63 out of 100 flips we get heads. We construct the confidence interval and find it to be (.53,.73). Interpret this confidence interval.

Choose one answer.

A. 95 is the Z score that corresponds to our distribution of sample means

B. Confidence is something you learn at fraternity parties

C. 95% of the time the true proportion of flips that are heads is between .53 and .73

D. If we were to repeat this expirement over and over again, 95 times out of 100 our Confidence interval would cover the true proportion of flips that are heads

Multiple Regression

XI. Analysis of Variance (ANOVA)

One-Way ANOVA

Two-Way ANOVA

XII. Non-Parametric Inference

Differences of Medians (Centers) of Two Paired Samples

Differences of Medians (Centers) of Two Independent Samples

Differences of Proportions of Two Samples

Differences of Means of Several Independent Samples

Differences of Variances of Independent Samples (Variance Homogeneity)

XIII. Multinomial Experiments and Contingency Tables

Multinomial Experiments: Goodness-of-Fit

Contingency Tables: Independence and Homogeneity

References



Translate this page:

(default)
Uk flag.gif

Deutsch
De flag.gif

Español
Es flag.gif

Français
Fr flag.gif

Italiano
It flag.gif

Português
Pt flag.gif

日本語
Jp flag.gif

България
Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Suomi
Fi flag.gif

इस भाषा में
In flag.gif

Norge
No flag.png

한국어
Kr flag.gif

中文
Cn flag.gif

繁体中文
Cn flag.gif

Русский
Ru flag.gif

Nederlands
Nl flag.gif

Ελληνικά
Gr flag.gif

Hrvatska
Hr flag.gif

Česká republika
Cz flag.gif

Danmark
Dk flag.gif

Polska
Pl flag.png

România
Ro flag.png

Sverige
Se flag.gif