SMHS LinearModeling MachineLearning

Revision as of 10:36, 4 March 2016 by Imoubara (talk | contribs)
Jump to: navigation, search

SMHS Linear Modeling - Machine Learning Algorithms

Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression.


  • How can we tie human intuition and computer-generated results to obtain reliable, effective, and efficient decision-support system (that facilitates, forecasting)?
  • Niels Born – “It is difficult to make predictions, especially about the future” …
  • Can we unsupervisely classify the data?


For most of the machine learning algorithms (including first-order linear regression), we:

  • first generate the model using training data, and then
  • predict values for test/new data.

Predictions are made using the R predict function. (type ?, where name is the function-name corresponding to the algorithm. The first argument of predict often represents the variable storing the model and the second argument is a matrix or data frame of test data that the model needs to be applied to. Calling predict can be done in 2 ways: type predict or type of


#mydata <- read.table('',, header=T)  # 01a_data.txt
# mydata <- read.table('data.txt',, header=T)
# (1) First, there are different approaches to split the data (partition the data) into 
# training and testing sets.
## TRAINING: 75% of the sample size
sample_size <- floor(0.75 * nrow(mydata))
## set the seed to make your partition reproductible
train_ind <- sample(seq_len(nrow(mydata)), size = sample_size)
train <- mydata[train_ind, ]
test <- mydata[-train_ind, ]
lin.mod <- lm(Weight ~ Height*Team, data=train)
predicted.values <-  predict(lin.mod, newdata=test

Data Modeling/Training

Logistic Regression:

glm_model <-glm(ifelse(Weight > 200,1,0) ~ Height*Team, family=binomial(link="logit"), data=train)

K-Means Clustering

train.1 <- cbind(train$\$$Height, train$\$$Weight, train$\$$Age)
 test.1 <- cbind(test$\$$Height, test$\$$Weight, test$\$$Age)
Weight.1 <- ifelse(train$\$$Weight > 200,1,0)

 kmeans_model <- kmeans(<u><b>train.1</b></u>, 3)
 plot(train.1, col = kmeans_model$\$$cluster)
points(kmeans_model$\$$centers, col = 1:2, pch = 8, cex = 2)

<b>K-Nearest Neighbor Classification</b>
 # install.packages("class")
 knn_model  <-  knn(train=train.1,  test=test.1,  cl=as.factor(Weight.1),  k=5)

<b>Naïve Bayes Classifier</b>
 nbc_model <-  naiveBayes(Weight ~ Height*Age,  data=train.1)

<b>Decision Trees (CART)</b>
 cart_model <- rpart(Weight ~ Height+Age, data=, method="class")

 # X be the matrix of features, and labels be a vector of 0-1 class labels.
 boost_model <- ada(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y= Weight.1)

Support Vector Machines (SVM)

svm_model <- svm(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y=as.factor(Weight.1), 
kernel ="radial")


Example 1: Simulation (subject, day, treatment, observation)

Obs ~ Treatment + Day + Subject(Treatment)+ Day*Subject(Treatment)+ ε.

This model is accounts for:

Response = Obs

Fixed effects:

Treatment (fixed)

Day (fixed)

Treatment*Day interaction

Random Effects:

Subject nested within Treatment (random)

Day crossed with "Subject within Treatment" (random)

mydata <- data.frame(
Subject  = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 
34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 
19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 
40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 
29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65), 
Day       = c(rep(c("Day1", "Day3", "Day6"), each=28)), 
Treatment = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C", 
"A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)), 
Obs       = c(6.472687, 7.017110, 6.200715, 6.613928, 6.829968, 7.387583, 7.367293, 
8.018853, 7.527408, 6.746739, 7.296910, 6.983360, 6.816621, 6.571689, 
5.911261, 6.954988, 7.624122, 7.669865, 7.676225, 7.263593, 7.704737, 
7.328716, 7.295610, 5.964180, 6.880814, 6.926342, 6.926342, 7.562293, 
6.677607, 7.023526, 6.441864, 7.020875, 7.478931, 7.495336, 7.427709, 
7.633020, 7.382091, 7.359731, 7.285889, 7.496863, 6.632403, 6.171196, 
6.306012, 7.253833, 7.594852, 6.915225, 7.220147, 7.298227, 7.573612, 
7.366550, 7.560513, 7.289078, 7.287802, 7.155336, 7.394452, 7.465383, 
6.976048, 7.222966, 6.584153, 7.013223, 7.569905, 7.459185, 7.504068, 
7.801867, 7.598728, 7.475841, 7.511873, 7.518384, 6.618589, 5.854754, 
6.125749, 6.962720, 7.540600, 7.379861, 7.344189, 7.362815, 7.805802, 
7.764172, 7.789844, 7.616437, NA, NA, NA, NA))
library("lme4", lib.loc="~/R/win-library/3.1")
m1 <- lmer(Obs ~ Treatment * Day + (1 | Subject), mydata)

Linear mixed model fit by REML ['lmerMod']

Formula: Obs ~ Treatment * Day + (1 | Subject)

Data: mydata

REML criterion at convergence: 56.8669

Random Effects
Groups Name Std. Dev.
Subject (Intercept) 0.2163
Residual 0.2602
Number of obs: 80, groups: Subject, 28

Fixed Effects
(Intercept) TreatmentB TreatmentC
7.1827 -0.6129 0.1658
DayDay3 DayDay6 TreatmentB: DayDay3
0.2446 0.4507 -0.1235
TreatmentC: DayDay3 TreatmentB: DayDay6 TreatmentC: DayDay6

Index Subject Day Treatment Obs
1 13 Day1 B 6.472687
2 14 Day1 B 7.01711
3 15 Day1 B 6.200715
4 16 Day1 B 6.613928
5 17 Day1 A 6.829968
6 18 Day1 A 7.387583
7 19 Day1 A 7.367293
8 20 Day1 A 8.018853
9 21 Day1 C 7.527408
10 22 Day1 C 6.746739
11 23 Day1 C 7.29691
12 24 Day1 C 6.98336
13 29 Day1 B 6.816621
14 30 Day1 B 6.571689
15 31 Day1 B 5.911261
16 32 Day1 B 6.954988
17 33 Day1 C 7.624122
18 34 Day1 C 7.669865
19 35 Day1 C 7.676225
20 36 Day1 C 7.263593
21 37 Day1 A 7.704737
22 38 Day1 A 7.328716
23 39 Day1 A 7.29561
24 40 Day1 A 5.96418
25 62 Day1 A 6.880814
26 63 Day1 A 6.926342
27 64 Day1 A 6.926342
28 65 Day1 A 7.562293
29 13 Day3 B 6.677607
30 14 Day3 B 7.023526
31 15 Day3 B 6.441864
32 16 Day3 B 7.020875
33 17 Day3 A 7.478931
34 18 Day3 A 7.495336
35 19 Day3 A 7.427709
36 20 Day3 A 7.63302
37 21 Day3 C 7.382091
38 22 Day3 C 7.359731
39 23 Day3 C 7.285889
40 24 Day3 C 7.496863
41 29 Day3 B 6.632403
42 30 Day3 B 6.171196
43 31 Day3 B 6.306012
44 32 Day3 B 7.253833
45 33 Day3 C 7.594852
46 34 Day3 C 6.915225
47 35 Day3 C 7.220147
48 36 Day3 C 7.298227
49 37 Day3 A 7.573612
50 38 Day3 A 7.36655
51 39 Day3 A 7.560513
52 40 Day3 A 7.289078
53 62 Day3 A 7.287802
54 63 Day3 A 7.155336
55 64 Day3 A 7.394452
56 65 Day3 A 7.465383
57 13 Day6 B 6.976048
58 14 Day6 B 7.222966
59 15 Day6 B 6.584153
60 16 Day6 B 7.013223
61 17 Day6 A 7.569905
62 18 Day6 A 7.459185
63 19 Day6 A 7.504068
64 20 Day6 A 7.801867
65 21 Day6 C 7.598728
66 22 Day6 C 7.475841
67 23 Day6 C 7.511873
68 24 Day6 C 7.518384
69 29 Day6 B 6.618589
70 30 Day6 B 5.854754
71 31 Day6 B 6.125749
72 32 Day6 B 6.96272
73 33 Day6 C 7.5406
74 34 Day6 C 7.379861
75 35 Day6 C 7.344189
76 36 Day6 C 7.362815
77 37 Day6 A 7.805802
78 38 Day6 A 7.764172
79 39 Day6 A 7.789844
80 40 Day6 A 7.616437
81 62 Day6 A NA
82 63 Day6 A NA
83 64 Day6 A NA
84 65 Day6 A NA


Translate this page:

Uk flag.gif

De flag.gif

Es flag.gif

Fr flag.gif

It flag.gif

Pt flag.gif

Jp flag.gif

Bg flag.gif

الامارات العربية المتحدة
Ae flag.gif

Fi flag.gif

इस भाषा में
In flag.gif

No flag.png

Kr flag.gif

Cn flag.gif

Cn flag.gif

Ru flag.gif

Nl flag.gif

Gr flag.gif

Hr flag.gif

Česká republika
Cz flag.gif

Dk flag.gif

Pl flag.png

Ro flag.png

Se flag.gif