SMHS LinearModeling MachineLearning

SMHS Linear Modeling - Machine Learning Algorithms

Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression.

Questions:

How can we tie human intuition and computer-generated results to obtain reliable, effective, and efficient decision-support system (that facilitates, forecasting)?
Niels Born – “It is difficult to make predictions, especially about the future” …
Can we unsupervisely classify the data?

Prediction

For most of the machine learning algorithms (including first-order linear regression), we:

ﬁrst generate the model using training data, and then
predict values for test/new data.

Predictions are made using the R predict function. (type ?predict.name), where name is the function-name corresponding to the algorithm. The ﬁrst argument of predict often represents the variable storing the model and the second argument is a matrix or data frame of test data that the model needs to be applied to. Calling predict can be done in 2 ways: type predict or type of predict.name.

Example:

#mydata <- read.table('https://umich.instructure.com/files/330381/download?download_frd=1&verifier=HpfmjfMFaMsk7rIpfPx0tmz960oTW7JA8ZonGvVC',as.is=T, header=T)  # 01a_data.txt
# mydata <- read.table('data.txt',as.is=T, header=T)

# (1) First, there are different approaches to split the data (partition the data) into 
# training and testing sets.
## TRAINING: 75% of the sample size
sample_size <- floor(0.75 * nrow(mydata))
## set the seed to make your partition reproductible
set.seed(1234)
train_ind <- sample(seq_len(nrow(mydata)), size = sample_size)
train <- mydata[train_ind, ]

# TESTING DATA
test <- mydata[-train_ind, ]

lin.mod <- lm(Weight ~ Height*Team, data=train)
predicted.values <-  predict(lin.mod, newdata=test

Data Modeling/Training

Logistic Regression:

glm_model <-glm(ifelse(Weight > 200,1,0) ~ Height*Team, family=binomial(link="logit"), data=train)

K-Means Clustering

train.1 <- cbind(train$\$$Height, train$\$$Weight, train$\$$Age)
 test.1 <- cbind(test$\$$Height, test$\$$Weight, test$\$$Age)
Weight.1 <- ifelse(train$\$$Weight > 200,1,0)

 head(train.1)
 kmeans_model <- kmeans(<u><b>train.1</b></u>, 3)
 plot(train.1, col = kmeans_model$\$$cluster)
points(kmeans_model$\$$centers, col = 1:2, pch = 8, cex = 2)

<b>K-Nearest Neighbor Classiﬁcation</b>
 # install.packages("class")
 library("class")
 knn_model  <-  knn(train=train.1,  test=test.1,  cl=as.factor(Weight.1),  k=5)
 plot(knn_model)
 summary(knn_model)

<b>Naïve Bayes Classifier</b>
 install.packages("e1071")
 library("e1071")
 nbc_model <-  naiveBayes(Weight ~ Height*Age,  data=train.1)

<b>Decision Trees (CART)</b>
 #install.packages("e1071")
 library("rpart")
 cart_model <- rpart(Weight ~ Height+Age, data= as.data.frame(train.1), method="class")
 plot(cart_model)
 text(cart_model) 

<b>AdaBoost</b>
 install.packages("ada")
 # X be the matrix of features, and labels be a vector of 0-1 class labels.
 library("ada")
 boost_model <- ada(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y= Weight.1)
plot(boost_model)
boost_model

Support Vector Machines (SVM)

#install.packages("e1071")
library("rpart")
svm_model <- svm(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y=as.factor(Weight.1), 
kernel ="radial")
summary(svm_model)

Appendix

Example 1: Simulation (subject, day, treatment, observation)

Obs ~ Treatment + Day + Subject(Treatment)+ Day*Subject(Treatment)+ ε.

This model is accounts for:

Response = Obs

Fixed effects:

Treatment (fixed)

Day (fixed)

Treatment*Day interaction

Random Effects:

Subject nested within Treatment (random)

Day crossed with "Subject within Treatment" (random)

mydata <- data.frame(
Subject  = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65), 
Day       = c(rep(c("Day1", "Day3", "Day6"), each=28)), 
Treatment = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C", 
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)), 
Obs       = c(6.472687, 7.017110, 6.200715, 6.613928, 6.829968, 7.387583, 7.367293, 
8.018853, 7.527408, 6.746739, 7.296910, 6.983360, 6.816621, 6.571689, 
5.911261, 6.954988, 7.624122, 7.669865, 7.676225, 7.263593, 7.704737, 
7.328716, 7.295610, 5.964180, 6.880814, 6.926342, 6.926342, 7.562293, 
6.677607, 7.023526, 6.441864, 7.020875, 7.478931, 7.495336, 7.427709, 
7.633020, 7.382091, 7.359731, 7.285889, 7.496863, 6.632403, 6.171196, 
6.306012, 7.253833, 7.594852, 6.915225, 7.220147, 7.298227, 7.573612, 
7.366550, 7.560513, 7.289078, 7.287802, 7.155336, 7.394452, 7.465383, 
6.976048, 7.222966, 6.584153, 7.013223, 7.569905, 7.459185, 7.504068, 
7.801867, 7.598728, 7.475841, 7.511873, 7.518384, 6.618589, 5.854754, 
6.125749, 6.962720, 7.540600, 7.379861, 7.344189, 7.362815, 7.805802, 
7.764172, 7.789844, 7.616437, NA, NA, NA, NA))

install.packages("lme4")
library("lme4", lib.loc="~/R/win-library/3.1")
m1 <- lmer(Obs ~ Treatment * Day + (1 | Subject), mydata)
m1

Linear mixed model fit by REML ['lmerMod']

Formula: Obs ~ Treatment * Day + (1 | Subject)

Data: mydata

REML criterion at convergence: 56.8669

Random Effects

Groups	Name	Std. Dev.
Subject	(Intercept)	0.2163
Residual		0.2602

Number of obs: 80, groups: Subject, 28

Fixed Effects

(Intercept)	TreatmentB	TreatmentC
7.1827	-0.6129	0.1658
DayDay3	DayDay6	TreatmentB: DayDay3
0.2446	0.4507	-0.1235
TreatmentC: DayDay3	TreatmentB: DayDay6	TreatmentC: DayDay6

Index	Subject	Day	Treatment	Obs
1	13	Day1	B	6.472687
2	14	Day1	B	7.01711
3	15	Day1	B	6.200715
4	16	Day1	B	6.613928
5	17	Day1	A	6.829968
6	18	Day1	A	7.387583
7	19	Day1	A	7.367293
8	20	Day1	A	8.018853
9	21	Day1	C	7.527408
10	22	Day1	C	6.746739
11	23	Day1	C	7.29691
12	24	Day1	C	6.98336
13	29	Day1	B	6.816621
14	30	Day1	B	6.571689
15	31	Day1	B	5.911261
16	32	Day1	B	6.954988
17	33	Day1	C	7.624122
18	34	Day1	C	7.669865
19	35	Day1	C	7.676225
20	36	Day1	C	7.263593
21	37	Day1	A	7.704737
22	38	Day1	A	7.328716
23	39	Day1	A	7.29561
24	40	Day1	A	5.96418
25	62	Day1	A	6.880814
26	63	Day1	A	6.926342
27	64	Day1	A	6.926342
28	65	Day1	A	7.562293
29	13	Day3	B	6.677607
30	14	Day3	B	7.023526
31	15	Day3	B	6.441864
32	16	Day3	B	7.020875
33	17	Day3	A	7.478931
34	18	Day3	A	7.495336
35	19	Day3	A	7.427709
36	20	Day3	A	7.63302
37	21	Day3	C	7.382091
38	22	Day3	C	7.359731
39	23	Day3	C	7.285889
40	24	Day3	C	7.496863
41	29	Day3	B	6.632403
42	30	Day3	B	6.171196
43	31	Day3	B	6.306012
44	32	Day3	B	7.253833
45	33	Day3	C	7.594852
46	34	Day3	C	6.915225
47	35	Day3	C	7.220147
48	36	Day3	C	7.298227
49	37	Day3	A	7.573612
50	38	Day3	A	7.36655
51	39	Day3	A	7.560513
52	40	Day3	A	7.289078
53	62	Day3	A	7.287802
54	63	Day3	A	7.155336
55	64	Day3	A	7.394452
56	65	Day3	A	7.465383
57	13	Day6	B	6.976048
58	14	Day6	B	7.222966
59	15	Day6	B	6.584153
60	16	Day6	B	7.013223
61	17	Day6	A	7.569905
62	18	Day6	A	7.459185
63	19	Day6	A	7.504068
64	20	Day6	A	7.801867
65	21	Day6	C	7.598728
66	22	Day6	C	7.475841
67	23	Day6	C	7.511873
68	24	Day6	C	7.518384
69	29	Day6	B	6.618589
70	30	Day6	B	5.854754
71	31	Day6	B	6.125749
72	32	Day6	B	6.96272
73	33	Day6	C	7.5406
74	34	Day6	C	7.379861
75	35	Day6	C	7.344189
76	36	Day6	C	7.362815
77	37	Day6	A	7.805802
78	38	Day6	A	7.764172
79	39	Day6	A	7.789844
80	40	Day6	A	7.616437
81	62	Day6	A	NA
82	63	Day6	A	NA
83	64	Day6	A	NA
84	65	Day6	A	NA

....

SOCR Home page: http://www.socr.umich.edu

Translate this page:

(default)	Deutsch	Español	Français	Italiano	Português	日本語	България	الامارات العربية المتحدة	Suomi	इस भाषा में	Norge
한국어	中文	繁体中文	Русский	Nederlands	Ελληνικά	Hrvatska	Česká republika	Danmark	Polska	România	Sverige

SMHS LinearModeling MachineLearning

SMHS Linear Modeling - Machine Learning Algorithms

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools