# EBook Problems EDA IntroVar

## EBook Problems Set - The Nature of Data and Variation Problems

### Problem 1

Researchers do a study on the number of cars that a person owns. They think that the distribution of their data might be normal, even though the median is much smaller than the mean. They make a p-plot. What does it look like?

(a) It's not a straight line.
(b) It's a bell curve.
(c) It's a group of points clustered around the middle of the plot.
(d) It's a straight line.

### Problem 2

Bicycles arrive at a bike shop in boxes. Before they can be sold, they must be unpacked, assembled, and tuned (lubricated, adjusted,etc). Based on past experience, the shop manager makes the following assumptions about how long this may take: The times for each setup phase are independent The times for each phase follow a Normal curve The means and standard deviations of the times (in minutes) are as shown

 Phase Mean SD Unpacking 3.5 0.7 Assembly 21.8 2.4 Tuning 21.8 2.7

What are the mean and standard deviation for the total bicycle set up time?

(a) Mean = 100 min, standard deviation = 12 min
(b) Can't be determined with the information given
(c) Mean = 47.1 min, standard deviation = 3.7 min
(d) Mean = 20 min, standard deviation = 13.69 min

### Problem 3

Let X be a random variable with mean 80 and standard deviation 12. Find the mean and the variance of the following variable: 2X-100

(a) Mean = 100, variance = 288
(b) Mean = 60, variance = 12
(c) Mean = 160, variance = 144
(d) Mean = 60, variance = 576

### Problem 4

Let X be a random variable with mean 80 and standard deviation 12. Find the mean and the standard deviation of the following variable: X- 20

(a) Mean = 60, standard deviation = 144
(b) Mean = 60, standard deviation = 12
(c) Mean = 80, standard deviation = 12
(d) Mean = 60, standard deviation = -8

### Problem 5

A physician collected data on 1000 patients to examine their heights. A statistician hired to look at the files noticed the typical height was about 60 inches, but found that one height was 720 inches. This is clearly an outlier. The physician is out of town and can't be contacted, but the statistician would like to have some preliminary descriptions of the data to present when the doctor returns. Which of the following best describes how the statistician should handle this outlier?

(a) The statistician should publish a paper on the emergence of a new race of giants.
(b) The statistician should keep the data point in; each point is too valuable to drop one.
(c) The statistician should drop the observation from the analysis because this is clearly a mistake; the person would be 60 feet tall.
(d) The statistician should analyze the data twice, once with and once without this data point, and then compare how the point affects conclusions.
(e) The statistician should drop the observation from the dataset because we can't analyze the data with it.

### Problem 6

What do you expect the distribution of income in a company where fewer than half of the employees make less than the average to look like?

(a) Bimodal
(b) Skewed to the right or positively skewed
(c) Symmetrical
(d) Skewed to the left or negatively skewed

### Problem 7

Which of the following parameters is most sensitive to outliers?

(a) Standard deviation
(b) Interquartile range
(c) Mode
(d) Median

### Problem 8

Which value given below is the best representative for the following data?

2, 3, 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 9, 9, 9, 9, 10, 11

(a) The weighted average of the two modes or (4*5 + 9*5 )/10 = 6.5
(b) No single number could represent this data set
(c) The average of the two modes or (4 + 9) / 2 = 6.5
(d) The mean or (2 + 3 + 4 + + 10 + 11)/18 = 5.9
(e) The median or (6 + 7)/2 = 6.5

### Problem 9

Suppose that the distribution of exam scores has mean = 20.5 and standard deviation = 2.5 and median = 15.0. If you double each score, determine the mean, deviation, and median of the transformed distribution.

(a) mean = 41.0, deviation = 5.0, median = 30.0
(b) We cannot determine the statistics unless we have the actual data.
(c) mean = 20.5, deviation = 5.0, median = 15.0
(d) mean = 20.5, deviation = 2.5, median = 15.0
(e) mean = 41.0, deviation = 2.5, median = 30.0

### Problem 10

A recent housing survey was conducted to determine the price of a typical home in Glendale, CA. Glendale is mostly middle-class, with one very expensive suburb. The mean price of a house was roughly \$650,000. Which of the following statements is most likely to be true?

(a) There are about as many houses in Glendale that cost more than \$650,000 than less than this amount.
(b) Most houses in Glendale cost less than \$650,000.
(c) Most houses in Glendale cost more than \$650,000.
(d) We need to know the standard deviation to answer this question.

### Problem 11

UCLA biochemistry major Soo Kyung Lee, who plans to enroll in med school after graduation, collected data from the AAMC (Association of American Medical Colleges) website for 121 schools and included these attributes about each institution: name, public or private institution, state, location either East or West of the Mississippi River, cost of health insurance, resident tuition, resident fees, resident total expenses, nonresident tuition, nonresident fees, and nonresident total expenses in 2005. Soo was surprised that UCLA and other UC medical schools charge no tuition for residents. However, UC students pay about \$20,000 in fees.

 _ Min Q1 Median Q3 Max Private \$6,550 \$\$30,729 \$33,850 \$36,685 \$41,360 Public \$0 \$10,219 \$16,168 \$18,800 \$27,886

On the same scale, use the 5-Number summary to construct two boxplots for the tuition for residents at 73 public and 48 private medical colleges. Use the data and plots to determine which statement about centers is true.

(a) For private medical schools, the mean tuition of residents is greater than the median tuition for residents.
(b) With these data, we cannot determine the relationship between mean and median tuition for residents.
(c) For private medical schools, the mean tuition of residents is equal to the median tuition for residents.
(d) For private medical schools, the mean tuition of residents is less the median tuition for residents.

### Problem 12

Determine which of the following statements is true about the spread for Medical School resident tuition.

(a) There is the same variation for resident tuition for residents at private medical schools and for resident tuition at public medical schools since the ranges are almost equal.
(b) There is the more variation for resident tuition for residents at private medical schools than for resident tuition at public medical schools since there are outliers for private schools.
(c) There is more variation for resident tuition for residents at public medical schools than for resident tuition at private medical schools since the interquartile range is wider for public schools.
(d) With these data, we cannot determine the variation for tuition for residents at private and public medical colleges.

### Problem 13

Use the plots and summary statistics to determine which of the statements about outliers is true.

(a) There are outliers for both distributions
(b) UCLA is an outlier since UCLA does not charge any tuition for residents
(c) There is at least one outlier for the distribution of resident tuition for private medical schools

### Problem 14

Suppose that we create a new data set by doubling the highest value in a large data set of positive values. What statement is FALSE about the new data set?

(a) The mean increases
(b) The standard deviation increases
(c) The range increases
(d) the median and interquartile range both increase

### Problem 15

Consider a large data set of positive values and multiply each value by 100. Determine the statement which is true.