Difference between revisions of "EBook Problems EDA IntroVar"
m (Text replacement - "{{translate|pageName=http://wiki.stat.ucla.edu/socr/" to ""{{translate|pageName=http://wiki.socr.umich.edu/") |
|||
(11 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | ==[[EBook_Problems | EBook Problems Set]] - [[ | + | ==[[EBook_Problems | EBook Problems Set]] - [[AP_Statistics_Curriculum_2007_IntroVar | The Nature of Data and Variation]] Problems== |
===Problem 1=== | ===Problem 1=== | ||
Line 40: | Line 40: | ||
:''(b) Can't be determined with the information given | :''(b) Can't be determined with the information given | ||
− | :''(c) Mean = | + | :''(c) Mean = 47.1 min, standard deviation = 3.7 min |
:''(d) Mean = 20 min, standard deviation = 13.69 min | :''(d) Mean = 20 min, standard deviation = 13.69 min | ||
Line 72: | Line 72: | ||
:''(d) Mean = 60, standard deviation = -8 | :''(d) Mean = 60, standard deviation = -8 | ||
{{hidden|Answer|(b)}} | {{hidden|Answer|(b)}} | ||
+ | |||
+ | ===Problem 5=== | ||
+ | A physician collected data on 1000 patients to examine their heights. A statistician hired to look at the files noticed the typical height was about 60 inches, but found that one height was 720 inches. This is clearly an outlier. The physician is out of town and can't be contacted, but the statistician would like to have some preliminary descriptions of the data to present when the doctor returns. Which of the following best describes how the statistician should handle this outlier? | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) The statistician should publish a paper on the emergence of a new race of giants. | ||
+ | |||
+ | :''(b) The statistician should keep the data point in; each point is too valuable to drop one. | ||
+ | |||
+ | :''(c) The statistician should drop the observation from the analysis because this is clearly a mistake; the person would be 60 feet tall. | ||
+ | |||
+ | :''(d) The statistician should analyze the data twice, once with and once without this data point, and then compare how the point affects conclusions. | ||
+ | |||
+ | :''(e) The statistician should drop the observation from the dataset because we can't analyze the data with it. | ||
+ | {{hidden|Answer|(c)}} | ||
+ | |||
+ | ===Problem 6=== | ||
+ | What do you expect the distribution of income in a company where fewer than half of the employees make less than the average to look like? | ||
+ | |||
+ | *Choose one answer | ||
+ | |||
+ | :''(a) Bimodal | ||
+ | |||
+ | :''(b) Skewed to the right or positively skewed | ||
+ | |||
+ | :''(c) Symmetrical | ||
+ | |||
+ | :''(d) Skewed to the left or negatively skewed | ||
+ | {{hidden|Answer|(d)}} | ||
+ | |||
+ | ===Problem 7=== | ||
+ | Which of the following parameters is most sensitive to outliers? | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) Standard deviation | ||
+ | |||
+ | :''(b) Interquartile range | ||
+ | |||
+ | :''(c) Mode | ||
+ | |||
+ | :''(d) Median | ||
+ | {{hidden|Answer|(a)}} | ||
+ | |||
+ | ===Problem 8=== | ||
+ | Which value given below is the best representative for the following data? | ||
+ | |||
+ | 2, 3, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 11 | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) The weighted average of the two modes or (4*5 + 9*5 )/10 = 6.5 | ||
+ | |||
+ | :''(b) No single number could represent this data set | ||
+ | |||
+ | :''(c) The average of the two modes or (4 + 9) / 2 = 6.5 | ||
+ | |||
+ | :''(d) The mean or (2 + 3 + 4 +
+ 10 + 11)/18 = 5.9 | ||
+ | |||
+ | :''(e) The median or (6 + 7)/2 = 6.5 | ||
+ | {{hidden|Answer|(b)}} | ||
+ | |||
+ | ===Problem 9=== | ||
+ | Suppose that the distribution of exam scores has mean = 20.5 and standard deviation = 2.5 and median = 15.0. If you double each score, determine the mean, deviation, and median of the transformed distribution. | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) mean = 41.0, deviation = 5.0, median = 30.0 | ||
+ | |||
+ | :''(b) We cannot determine the statistics unless we have the actual data. | ||
+ | |||
+ | :''(c) mean = 20.5, deviation = 5.0, median = 15.0 | ||
+ | |||
+ | :''(d) mean = 20.5, deviation = 2.5, median = 15.0 | ||
+ | |||
+ | :''(e) mean = 41.0, deviation = 2.5, median = 30.0 | ||
+ | {{hidden|Answer|(a)}} | ||
+ | |||
+ | ===Problem 10=== | ||
+ | A recent housing survey was conducted to determine the price of a typical home in Glendale, CA. Glendale is mostly middle-class, with one very expensive suburb. The mean price of a house was roughly $650,000. Which of the following statements is most likely to be true? | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) There are about as many houses in Glendale that cost more than $650,000 than less than this amount. | ||
+ | |||
+ | :''(b) Most houses in Glendale cost less than $650,000. | ||
+ | |||
+ | :''(c) Most houses in Glendale cost more than $650,000. | ||
+ | |||
+ | :''(d) We need to know the standard deviation to answer this question. | ||
+ | {{hidden|Answer|(b)}} | ||
+ | |||
+ | ===Problem 11=== | ||
+ | UCLA biochemistry major Soo Kyung Lee, who plans to enroll in med school after graduation, collected data from the AAMC (Association of American Medical Colleges) website for 121 schools and included these attributes about each institution: name, public or private institution, state, location either East or West of the Mississippi River, cost of health insurance, resident tuition, resident fees, resident total expenses, nonresident tuition, nonresident fees, and nonresident total expenses in 2005. Soo was surprised that UCLA and other UC medical schools charge no tuition for residents. However, UC students pay about $20,000 in fees. | ||
+ | |||
+ | {|border="1" | ||
+ | |- | ||
+ | | _ || Min || Q1 || Median || Q3 || Max | ||
+ | |- | ||
+ | | Private || $6,550 || $$30,729 || $33,850 || $36,685 || $41,360 | ||
+ | |- | ||
+ | | Public || $0 || $10,219 || $16,168 || $18,800 || $27,886 | ||
+ | |} | ||
+ | |||
+ | On the same scale, use the 5-Number summary to construct two boxplots for the tuition for residents at 73 public and 48 private medical colleges. Use the data and plots to determine which statement about centers is true. | ||
+ | |||
+ | :''(a) For private medical schools, the mean tuition of residents is greater than the median tuition for residents. | ||
+ | |||
+ | :''(b) With these data, we cannot determine the relationship between mean and median tuition for residents. | ||
+ | |||
+ | :''(c) For private medical schools, the mean tuition of residents is equal to the median tuition for residents. | ||
+ | |||
+ | :''(d) For private medical schools, the mean tuition of residents is less the median tuition for residents. | ||
+ | {{hidden|Answer|(d)}} | ||
+ | |||
+ | ===Problem 12=== | ||
+ | Determine which of the following statements is true about the spread for Medical School resident tuition. | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) There is the same variation for resident tuition for residents at private medical schools and for resident tuition at public medical schools since the ranges are almost equal. | ||
+ | |||
+ | :''(b) There is the more variation for resident tuition for residents at private medical schools than for resident tuition at public medical schools since there are outliers for private schools. | ||
+ | |||
+ | :''(c) There is more variation for resident tuition for residents at public medical schools than for resident tuition at private medical schools since the interquartile range is wider for public schools. | ||
+ | |||
+ | :''(d) With these data, we cannot determine the variation for tuition for residents at private and public medical colleges. | ||
+ | {{hidden|Answer|(c)}} | ||
+ | |||
+ | ===Problem 13=== | ||
+ | Use the plots and summary statistics to determine which of the statements about outliers is true. | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) There are outliers for both distributions | ||
+ | |||
+ | :''(b) UCLA is an outlier since UCLA does not charge any tuition for residents | ||
+ | |||
+ | :''(c) There is at least one outlier for the distribution of resident tuition for private medical schools | ||
+ | {{hidden|Answer|(c)}} | ||
+ | |||
+ | ===Problem 14=== | ||
+ | Suppose that we create a new data set by doubling the highest value in a large data set of positive values. What statement is FALSE about the new data set? | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) The mean increases | ||
+ | |||
+ | :''(b) The standard deviation increases | ||
+ | |||
+ | :''(c) The range increases | ||
+ | |||
+ | :''(d) the median and interquartile range both increase | ||
+ | {{hidden|Answer|(d)}} | ||
+ | |||
+ | ===Problem 15=== | ||
+ | Consider a large data set of positive values and multiply each value by 100. | ||
+ | Determine the statement which is true. | ||
+ | |||
+ | *Choose one answer. | ||
+ | |||
+ | :''(a) The mean, median, and standard deviation increase | ||
+ | |||
+ | :''(b) The mean and median increase but the standard deviation is unchanged. | ||
+ | |||
+ | :''(c) The standard deviation increases but the mean and median are unchanged. | ||
+ | |||
+ | :''(d) The range and interquartile range are unchanged | ||
+ | {{hidden|Answer|(a)}} | ||
<hr> | <hr> | ||
* [[EBook | Back to Ebook]] | * [[EBook | Back to Ebook]] | ||
* SOCR Home page: http://www.socr.ucla.edu | * SOCR Home page: http://www.socr.ucla.edu | ||
− | {{translate|pageName=http://wiki. | + | "{{translate|pageName=http://wiki.socr.umich.edu/index.php/EBook_Problems_EDA_IntroVar}} |
Latest revision as of 13:29, 3 March 2020
Contents
EBook Problems Set - The Nature of Data and Variation Problems
Problem 1
Researchers do a study on the number of cars that a person owns. They think that the distribution of their data might be normal, even though the median is much smaller than the mean. They make a p-plot. What does it look like?
- Choose one answer.
- (a) It's not a straight line.
- (b) It's a bell curve.
- (c) It's a group of points clustered around the middle of the plot.
- (d) It's a straight line.
Problem 2
Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted,etc). Based on past experience, the shop manager makes the following assumptions about how long this may take: The times for each setup phase are independent The times for each phase follow a Normal curve The means and standard deviations of the times (in minutes) are as shown
Phase | Mean | SD |
Unpacking | 3.5 | 0.7 |
Assembly | 21.8 | 2.4 |
Tuning | 21.8 | 2.7 |
What are the mean and standard deviation for the total bicycle set up time?
- Choose one answer.
- (a) Mean = 100 min, standard deviation = 12 min
- (b) Can't be determined with the information given
- (c) Mean = 47.1 min, standard deviation = 3.7 min
- (d) Mean = 20 min, standard deviation = 13.69 min
Problem 3
Let X be a random variable with mean 80 and standard deviation 12. Find the mean and the variance of the following variable: 2X-100
- Choose one answer.
- (a) Mean = 100, variance = 288
- (b) Mean = 60, variance = 12
- (c) Mean = 160, variance = 144
- (d) Mean = 60, variance = 576
Problem 4
Let X be a random variable with mean 80 and standard deviation 12. Find the mean and the standard deviation of the following variable: X- 20
- Choose one answer.
- (a) Mean = 60, standard deviation = 144
- (b) Mean = 60, standard deviation = 12
- (c) Mean = 80, standard deviation = 12
- (d) Mean = 60, standard deviation = -8
Problem 5
A physician collected data on 1000 patients to examine their heights. A statistician hired to look at the files noticed the typical height was about 60 inches, but found that one height was 720 inches. This is clearly an outlier. The physician is out of town and can't be contacted, but the statistician would like to have some preliminary descriptions of the data to present when the doctor returns. Which of the following best describes how the statistician should handle this outlier?
- Choose one answer.
- (a) The statistician should publish a paper on the emergence of a new race of giants.
- (b) The statistician should keep the data point in; each point is too valuable to drop one.
- (c) The statistician should drop the observation from the analysis because this is clearly a mistake; the person would be 60 feet tall.
- (d) The statistician should analyze the data twice, once with and once without this data point, and then compare how the point affects conclusions.
- (e) The statistician should drop the observation from the dataset because we can't analyze the data with it.
Problem 6
What do you expect the distribution of income in a company where fewer than half of the employees make less than the average to look like?
- Choose one answer
- (a) Bimodal
- (b) Skewed to the right or positively skewed
- (c) Symmetrical
- (d) Skewed to the left or negatively skewed
Problem 7
Which of the following parameters is most sensitive to outliers?
- Choose one answer.
- (a) Standard deviation
- (b) Interquartile range
- (c) Mode
- (d) Median
Problem 8
Which value given below is the best representative for the following data?
2, 3, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 11
- Choose one answer.
- (a) The weighted average of the two modes or (4*5 + 9*5 )/10 = 6.5
- (b) No single number could represent this data set
- (c) The average of the two modes or (4 + 9) / 2 = 6.5
- (d) The mean or (2 + 3 + 4 + + 10 + 11)/18 = 5.9
- (e) The median or (6 + 7)/2 = 6.5
Problem 9
Suppose that the distribution of exam scores has mean = 20.5 and standard deviation = 2.5 and median = 15.0. If you double each score, determine the mean, deviation, and median of the transformed distribution.
- Choose one answer.
- (a) mean = 41.0, deviation = 5.0, median = 30.0
- (b) We cannot determine the statistics unless we have the actual data.
- (c) mean = 20.5, deviation = 5.0, median = 15.0
- (d) mean = 20.5, deviation = 2.5, median = 15.0
- (e) mean = 41.0, deviation = 2.5, median = 30.0
Problem 10
A recent housing survey was conducted to determine the price of a typical home in Glendale, CA. Glendale is mostly middle-class, with one very expensive suburb. The mean price of a house was roughly $650,000. Which of the following statements is most likely to be true?
- Choose one answer.
- (a) There are about as many houses in Glendale that cost more than $650,000 than less than this amount.
- (b) Most houses in Glendale cost less than $650,000.
- (c) Most houses in Glendale cost more than $650,000.
- (d) We need to know the standard deviation to answer this question.
Problem 11
UCLA biochemistry major Soo Kyung Lee, who plans to enroll in med school after graduation, collected data from the AAMC (Association of American Medical Colleges) website for 121 schools and included these attributes about each institution: name, public or private institution, state, location either East or West of the Mississippi River, cost of health insurance, resident tuition, resident fees, resident total expenses, nonresident tuition, nonresident fees, and nonresident total expenses in 2005. Soo was surprised that UCLA and other UC medical schools charge no tuition for residents. However, UC students pay about $20,000 in fees.
_ | Min | Q1 | Median | Q3 | Max |
Private | $6,550 | $$30,729 | $33,850 | $36,685 | $41,360 |
Public | $0 | $10,219 | $16,168 | $18,800 | $27,886 |
On the same scale, use the 5-Number summary to construct two boxplots for the tuition for residents at 73 public and 48 private medical colleges. Use the data and plots to determine which statement about centers is true.
- (a) For private medical schools, the mean tuition of residents is greater than the median tuition for residents.
- (b) With these data, we cannot determine the relationship between mean and median tuition for residents.
- (c) For private medical schools, the mean tuition of residents is equal to the median tuition for residents.
- (d) For private medical schools, the mean tuition of residents is less the median tuition for residents.
Problem 12
Determine which of the following statements is true about the spread for Medical School resident tuition.
- Choose one answer.
- (a) There is the same variation for resident tuition for residents at private medical schools and for resident tuition at public medical schools since the ranges are almost equal.
- (b) There is the more variation for resident tuition for residents at private medical schools than for resident tuition at public medical schools since there are outliers for private schools.
- (c) There is more variation for resident tuition for residents at public medical schools than for resident tuition at private medical schools since the interquartile range is wider for public schools.
- (d) With these data, we cannot determine the variation for tuition for residents at private and public medical colleges.
Problem 13
Use the plots and summary statistics to determine which of the statements about outliers is true.
- Choose one answer.
- (a) There are outliers for both distributions
- (b) UCLA is an outlier since UCLA does not charge any tuition for residents
- (c) There is at least one outlier for the distribution of resident tuition for private medical schools
Problem 14
Suppose that we create a new data set by doubling the highest value in a large data set of positive values. What statement is FALSE about the new data set?
- Choose one answer.
- (a) The mean increases
- (b) The standard deviation increases
- (c) The range increases
- (d) the median and interquartile range both increase
Problem 15
Consider a large data set of positive values and multiply each value by 100. Determine the statement which is true.
- Choose one answer.
- (a) The mean, median, and standard deviation increase
- (b) The mean and median increase but the standard deviation is unchanged.
- (c) The standard deviation increases but the mean and median are unchanged.
- (d) The range and interquartile range are unchanged
- Back to Ebook
- SOCR Home page: http://www.socr.ucla.edu
"-----
Translate this page: