AP Statistics Curriculum 2007 EDA DataTypes
Contents
General Advance-Placement (AP) Statistics Curriculum - Types of Data
Definitions
- Population: A population is an entire group, collection or space of objects which we want to characterize.
- Sample: A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain from the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general.
- Variable: A variable is a characteristic of an observation that can be assigned a number or a category. For instance, the year in college (variable) for a student (observational unit).
Types of Variables
Appropriate classification of process and variable types are important because they directly influence our decision on how to collect, explore, analyze and interpret data and results. For example, we can carry arithmetic (e.g., average) on quantitative variables, but we need to analyze frequencies of occurrence for qualitative variables.
There are two types of variables: categorical and quantitative. These types of variables can be split further.
- Categorical: Categorical variables are qualitative measurements of samples or populations that are classified into groups:
- Ordinal categorical variables are qualitative descriptions that have a natural arrangement or order of the measurements -- e.g., rank in college (freshman, sophomore, junior, senior), size of soda (small, medium, large), etc.
- Not ordinal (or nominal) variable is a categorical variable that does not have a naturally imposed (or meaningful) order of its values -- e.g., gender, race, political affiliation (democrat, republican, independent, green party, other), etc.
- Quantitative: Quantitative variables are measurements that have a meaningful numerical value representation. There are two types of quantitative variables:
- Continuous variables indicate numerical observations that contain intervals with infinite (uncountable) possible values - e.g., weight, height, time, speed, etc.
- Discrete: Discrete variables are also numerical measurements, but they are sparse in space and any interval will contain at most countably many possible values -- e.g., number of students in a school, number of rational numbers in a given interval [a ; b], age, etc.
- The interpretation of discrete and continuous quantitative variables is always subjective. It depends on several factors. All of the following factors influence our decision to label certain processes as discrete or continuous - the physical, biological or psychological laws that govern the observed system, the data measuring apparatus, prior understanding of the process in terms of the relationship between variable/process changes and their practical effects (e.g., fetal/infant and adult ages are measured in weeks and years, respectively, even though time is generally continuous!) There is a general duality between the continuous and discrete world (just like light can be considered as collection of discrete photons, or as a continuous wave).
Example
Most breast cancer patients (>80%) are over the age of 50 at diagnosis. A researcher at a particular New York cancer center believes that his patients are even older than the norm, typically older than 65 years at diagnosis. To investigate, he reviews the ages of a random sample of 100 of his female patients diagnosed with breast cancer.
Identify the following:
- Population
- Sample
- Sample size
- Variable of interest
- Quantitative or qualitative?
- Observational unit
- Other variables
Problems
- SOCR Home page: http://www.socr.ucla.edu
Translate this page: