SOCR Simulated HELP Data Activity
Contents
SOCR Data - SOCR Simulated Health Evaluation and Linkage to Primary (HELP) Care Dataset
Summary
This is a simulated dataset following the study design of the Evaluation and Linkage to Primary (HELP) Care study. These data are not real but simulated and intended for demonstration of various multivariate statistical, computational, analytical and visualization protocols for data interrogation.
Background
The original Health Evaluation and Linkage to Primary (HELP) Care study had recruited adult inpatients in a detoxification unit. Researchers randomized the patients with no primary care physician to receive evaluations and a brief care intervention and investigated their findings relative to prior enrollment in primary medical care. The complete original dataset can be found here.
Classroom use of this data set
The simulated HELP data can be used to demonstrate a number of different statistical, modeling, inferential and data analytic techniques, see list below. The following SOCR HELP Activity demonstrates exemplary use of these data using R.
- Data I/O, summaries, visualizaiton
- Derived variables and data manipulation
- Sorting and subsetting
- Exploratory data analysis
- Graphing and plotting of data (scatterplot, bubble chart, multiple plots, dotplot, etc.)
- Bivariate relationship
- Contingency tables
- Two-sample tests
- Survival analysis (Kaplan–Meier plot)
- Scatterplot with smooth fit
- Regression with prediction intervals
- Linear regression with interaction
- Regression diagnostics
- Fitting stratified regression models
- Two-way analysis of variance (ANOVA)
- Multiple comparisons
- Contrasts
- Logistic regression
- Poisson regression
- Zero-inflated Poisson regression
- Negative binomial regression
- Lasso model selection
- Quantile regression
- Ordinal logit regression
- Multinomial logit regression
- Generalized additive model
- Data transformations
- General linear model for correlated data
- Random effects model
- Generalized estimating equations (GEE) model
- Generalized linear mixed model
- Proportional hazards regression model
- Bayesian Poisson regression
- Cronbach’s $\alpha$
- Factor analysis
- Recursive partitioning
- Linear discriminant analysis
- Hierarchical clustering
- ROC curve
- Multiple imputation
- Propensity score modeling
Data Description
Variable Definitions
- id random subject identifier (range 1–470)
- a15a number of nights in overnight shelter in past 6 months (range0–180) see also homeless
- a15b number of nights on the street in past 6 months (range 0–180) see also homeless
- age age at baseline (in years) (range19–60)
- anysubstatus use of any substance postdetox (0=no, 1=yes)see also daysanysub
- cesd* Center for Epidemiologic Studies Depression scale (range 0–60)see also f1a – f1t
- d1 how many times hospitalized for medical problems (lifetime)(range 0–100)
- daysanysub time (in days) to first use of any substance postdetox (range 0–268)see also anysubstatus
- daysdrink time (in days) to first alcoholicdrink post-detox (range 0–270)see also drinkstatus
- dayslink time (in days) to linkage to primary care (range 0–456) see also linkstatus
- drinkstatus use of alcohol postdetox (0=no,1=yes)see also daysdrink
- drugrisk* Risk-Assessment Battery(RAB) drug risk score (range0–21) see also sexrisk
- e2b* number of times in past 6 months entered a detox pro-gram (range 1–21)
- f1a I was bothered by things that usually don’t bother me (range0–3 #)
- f1b I did not feel like eating; my appetite was poor (range 0–3 #)
- f1c I felt that I could not shake off the blues even with help from my family or friends (range 0–3 #)
- f1d I felt that I was just as good as other people (range 0–3 #)
- f1e I had trouble keeping my mind on what I was doing (range 0–3 #)
- f1f I felt depressed (range 0–3 #)
- f1g I felt that everything I did was an effort (range 0–3 #)
- f1h I felt hopeful about the future(range 0–3 #)
- f1i I thought my life had been a failure (range 0–3 #)
- f1j I felt fearful (range 0–3 #)
- f1k My sleep was restless (range 0–3 #)
- f1l I was happy (range 0–3 #)
- f1m I talked less than usual (range0–3 #)
- f1n I felt lonely (range 0–3 #)
- f1o People were unfriendly (range0–3 #)
- f1p I enjoyed life (range 0–3 #)
- f1q I had crying spells (range 0–3 #)
- f1r I felt sad (range 0–3 #)
- f1s I felt that people dislike me(range 0–3 #)
- f1t I could not get going (range 0–3 #)
- female gender of respondent (0=male,1=female)
- g1b* experienced serious thoughts of suicide (last 30 days, values0=no, 1=yes)
- homeless* 1 or more nights on the street or shelter in past 6 months (0=no,1=yes) see also a15a and a15b
- i1* average number of drinks (standard units) consumed per day (in the past 30 days, range 0–142) see also i2
- i2 maximum number of drinks (standard units) consumed per day (in the past 30 days range 0–184) see also i1
- indtot* Inventory of Drug Use Con-sequences (InDUC) total score (range 4–45)
- linkstatus post-detox linkage to primary care (0=no, 1=yes) see also dayslink
- mcs* SF-36 Mental Component Score(range 7-62) see also pcs
- pcrec* number of primary care visits in past 6 months (range 0–2) see also linkstatus, not observed at baseline
- pcs* SF-36 Physical Component Score (range 14-75) see also mcs
- pss_fr perceived social supports (friends, range 0–14) see also dayslink
- satreat any BSAS substance abuset reatment at baseline (0=no,1=yes)
- sexrisk* Risk-Assessment Battery (RAB) drug risk score (range 0–21) see also drugrisk
- substance primary substance of abuse (alcohol, cocaine or heroin)
- treat randomization group (0=usual care, 1=HELP clinic)
Notes
Data for continuous variables reports observed range at baseline.
- * marked variables are recorded at baseline and follow up (e.g., mcs is baseline measure, mcs1 is measure at 6 months, and mcs4 is measure at 24 months);
- for each of the 20 items in HELP section F1 (f1a, …, f1t), respondents were asked to indicate how often they behaved this way during the past week (0=rarely or none of the time, less than one day; 1 = some or a little of the time, 1 or 2 days; 2 = occasionally or a moderate amount of time, 3 or 4 days; or 3=most or all of the time, 5 to 7 days);
- Reverse coding was used for items f1d, f1h, f1l, and f1p (as these questions are intrinsically positive in nature (e.g., hope), where as the remaining once are negative in nature (e.g., blues)).
- Some subjects (e.g., case 13) did not have metadata (e.g., e2b) recorded for them.
- "." (period) denotes missing or incomplete observation.
References
- Evaluation and Linkage to Primary (HELP) Care study
- Data formats: help.csv and help.Rdata
- Study and Data specifications
- SAS and R Data Management, Statistical Analysis, and Graphics, Kleinman / Horton, 2009
- SOCR Home page: http://www.SOCR.ucla.edu
Translate this page: