SMHS LinearModeling MachineLearning
SMHS Linear Modeling - Machine Learning Algorithms
Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression.
Questions:
- How can we tie human intuition and computer-generated results to obtain reliable, effective, and efficient decision-support system (that facilitates, forecasting)?
- Niels Born – “It is difficult to make predictions, especially about the future” …
- Can we unsupervisely classify the data?
Prediction
For most of the machine learning algorithms (including first-order linear regression), we:
- first generate the model using training data, and then
- predict values for test/new data.
Predictions are made using the R predict function. (type ?predict.name), where name is the function-name corresponding to the algorithm. The first argument of predict often represents the variable storing the model and the second argument is a matrix or data frame of test data that the model needs to be applied to. Calling predict can be done in 2 ways: type predict or type of predict.name.
Example:
#mydata <- read.table('https://umich.instructure.com/files/330381/download?download_frd=1&verifier=HpfmjfMFaMsk7rIpfPx0tmz960oTW7JA8ZonGvVC',as.is=T, header=T) # 01a_data.txt # mydata <- read.table('data.txt',as.is=T, header=T)
# (1) First, there are different approaches to split the data (partition the data) into # training and testing sets. ## TRAINING: 75% of the sample size sample_size <- floor(0.75 * nrow(mydata)) ## set the seed to make your partition reproductible set.seed(1234) train_ind <- sample(seq_len(nrow(mydata)), size = sample_size) train <- mydata[train_ind, ]
# TESTING DATA test <- mydata[-train_ind, ]
lin.mod <- lm(Weight ~ Height*Team, data=train) predicted.values <- predict(lin.mod, newdata=test
Data Modeling/Training
Logistic Regression:
glm_model <-glm(ifelse(Weight > 200,1,0) ~ Height*Team, family=binomial(link="logit"), data=train)
K-Means Clustering
train.1 <- cbind(train$\$$Height, train$\$$Weight, train$\$$Age) test.1 <- cbind(test$\$$Height, test$\$$Weight, test$\$$Age) Weight.1 <- ifelse(train$\$$Weight > 200,1,0) head(train.1) kmeans_model <- kmeans(<u><b>train.1</b></u>, 3) plot(train.1, col = kmeans_model$\$$cluster) points(kmeans_model$\$$centers, col = 1:2, pch = 8, cex = 2) <b>K-Nearest Neighbor Classification</b> # install.packages("class") library("class") knn_model <- knn(train=train.1, test=test.1, cl=as.factor(Weight.1), k=5) plot(knn_model) summary(knn_model) <b>Naïve Bayes Classifier</b> install.packages("e1071") library("e1071") nbc_model <- naiveBayes(Weight ~ Height*Age, data=train.1) <b>Decision Trees (CART)</b> #install.packages("e1071") library("rpart") cart_model <- rpart(Weight ~ Height+Age, data= as.data.frame(train.1), method="class") plot(cart_model) text(cart_model) <b>AdaBoost</b> install.packages("ada") # X be the matrix of features, and labels be a vector of 0-1 class labels. library("ada") boost_model <- ada(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y= Weight.1) plot(boost_model) boost_model
Support Vector Machines (SVM)
#install.packages("e1071") library("rpart") svm_model <- svm(x= cbind(train$\$$Height, train$\$$Weight, train$\$$Age), y=as.factor(Weight.1), kernel ="radial") summary(svm_model)
Appendix
Example 1: Simulation (subject, day, treatment, observation)
Obs ~ Treatment + Day + Subject(Treatment)+ Day*Subject(Treatment)+ ε.
This model is accounts for:
Response = Obs
Fixed effects:
Treatment (fixed)
Day (fixed)
Treatment*Day interaction
Random Effects:
Subject nested within Treatment (random)
Day crossed with "Subject within Treatment" (random)
mydata <- data.frame( Subject = c(13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 62, 63, 64, 65), Day = c(rep(c("Day1", "Day3", "Day6"), each=28)), Treatment = c(rep(c("B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C", "A", "A", "B", "A", "C", "B", "C", "A", "A"), each = 4)), Obs = c(6.472687, 7.017110, 6.200715, 6.613928, 6.829968, 7.387583, 7.367293, 8.018853, 7.527408, 6.746739, 7.296910, 6.983360, 6.816621, 6.571689, 5.911261, 6.954988, 7.624122, 7.669865, 7.676225, 7.263593, 7.704737, 7.328716, 7.295610, 5.964180, 6.880814, 6.926342, 6.926342, 7.562293, 6.677607, 7.023526, 6.441864, 7.020875, 7.478931, 7.495336, 7.427709, 7.633020, 7.382091, 7.359731, 7.285889, 7.496863, 6.632403, 6.171196, 6.306012, 7.253833, 7.594852, 6.915225, 7.220147, 7.298227, 7.573612, 7.366550, 7.560513, 7.289078, 7.287802, 7.155336, 7.394452, 7.465383, 6.976048, 7.222966, 6.584153, 7.013223, 7.569905, 7.459185, 7.504068, 7.801867, 7.598728, 7.475841, 7.511873, 7.518384, 6.618589, 5.854754, 6.125749, 6.962720, 7.540600, 7.379861, 7.344189, 7.362815, 7.805802, 7.764172, 7.789844, 7.616437, NA, NA, NA, NA))
install.packages("lme4") library("lme4", lib.loc="~/R/win-library/3.1") m1 <- lmer(Obs ~ Treatment * Day + (1 | Subject), mydata) m1
Linear mixed model fit by REML ['lmerMod']
Formula: Obs ~ Treatment * Day + (1 | Subject)
Data: mydata
REML criterion at convergence: 56.8669
Groups | Name | Std. Dev. |
Subject | (Intercept) | 0.2163 |
Residual | 0.2602 |
(Intercept) | TreatmentB | TreatmentC |
7.1827 | -0.6129 | 0.1658 |
DayDay3 | DayDay6 | TreatmentB: DayDay3 |
0.2446 | 0.4507 | -0.1235 |
TreatmentC: DayDay3 | TreatmentB: DayDay6 | TreatmentC: DayDay6 |
Index | Subject | Day | Treatment | Obs |
1 | 13 | Day1 | B | 6.472687 |
2 | 14 | Day1 | B | 7.01711 |
3 | 15 | Day1 | B | 6.200715 |
4 | 16 | Day1 | B | 6.613928 |
5 | 17 | Day1 | A | 6.829968 |
6 | 18 | Day1 | A | 7.387583 |
7 | 19 | Day1 | A | 7.367293 |
8 | 20 | Day1 | A | 8.018853 |
9 | 21 | Day1 | C | 7.527408 |
10 | 22 | Day1 | C | 6.746739 |
11 | 23 | Day1 | C | 7.29691 |
12 | 24 | Day1 | C | 6.98336 |
13 | 29 | Day1 | B | 6.816621 |
14 | 30 | Day1 | B | 6.571689 |
15 | 31 | Day1 | B | 5.911261 |
16 | 32 | Day1 | B | 6.954988 |
17 | 33 | Day1 | C | 7.624122 |
18 | 34 | Day1 | C | 7.669865 |
19 | 35 | Day1 | C | 7.676225 |
20 | 36 | Day1 | C | 7.263593 |
21 | 37 | Day1 | A | 7.704737 |
22 | 38 | Day1 | A | 7.328716 |
23 | 39 | Day1 | A | 7.29561 |
24 | 40 | Day1 | A | 5.96418 |
25 | 62 | Day1 | A | 6.880814 |
26 | 63 | Day1 | A | 6.926342 |
27 | 64 | Day1 | A | 6.926342 |
28 | 65 | Day1 | A | 7.562293 |
29 | 13 | Day3 | B | 6.677607 |
30 | 14 | Day3 | B | 7.023526 |
31 | 15 | Day3 | B | 6.441864 |
32 | 16 | Day3 | B | 7.020875 |
33 | 17 | Day3 | A | 7.478931 |
34 | 18 | Day3 | A | 7.495336 |
35 | 19 | Day3 | A | 7.427709 |
36 | 20 | Day3 | A | 7.63302 |
37 | 21 | Day3 | C | 7.382091 |
38 | 22 | Day3 | C | 7.359731 |
39 | 23 | Day3 | C | 7.285889 |
40 | 24 | Day3 | C | 7.496863 |
41 | 29 | Day3 | B | 6.632403 |
42 | 30 | Day3 | B | 6.171196 |
43 | 31 | Day3 | B | 6.306012 |
44 | 32 | Day3 | B | 7.253833 |
45 | 33 | Day3 | C | 7.594852 |
46 | 34 | Day3 | C | 6.915225 |
47 | 35 | Day3 | C | 7.220147 |
48 | 36 | Day3 | C | 7.298227 |
49 | 37 | Day3 | A | 7.573612 |
50 | 38 | Day3 | A | 7.36655 |
51 | 39 | Day3 | A | 7.560513 |
52 | 40 | Day3 | A | 7.289078 |
53 | 62 | Day3 | A | 7.287802 |
54 | 63 | Day3 | A | 7.155336 |
55 | 64 | Day3 | A | 7.394452 |
56 | 65 | Day3 | A | 7.465383 |
57 | 13 | Day6 | B | 6.976048 |
58 | 14 | Day6 | B | 7.222966 |
59 | 15 | Day6 | B | 6.584153 |
60 | 16 | Day6 | B | 7.013223 |
61 | 17 | Day6 | A | 7.569905 |
62 | 18 | Day6 | A | 7.459185 |
63 | 19 | Day6 | A | 7.504068 |
64 | 20 | Day6 | A | 7.801867 |
65 | 21 | Day6 | C | 7.598728 |
66 | 22 | Day6 | C | 7.475841 |
67 | 23 | Day6 | C | 7.511873 |
68 | 24 | Day6 | C | 7.518384 |
69 | 29 | Day6 | B | 6.618589 |
70 | 30 | Day6 | B | 5.854754 |
71 | 31 | Day6 | B | 6.125749 |
72 | 32 | Day6 | B | 6.96272 |
73 | 33 | Day6 | C | 7.5406 |
74 | 34 | Day6 | C | 7.379861 |
75 | 35 | Day6 | C | 7.344189 |
76 | 36 | Day6 | C | 7.362815 |
77 | 37 | Day6 | A | 7.805802 |
78 | 38 | Day6 | A | 7.764172 |
79 | 39 | Day6 | A | 7.789844 |
80 | 40 | Day6 | A | 7.616437 |
81 | 62 | Day6 | A | NA |
82 | 63 | Day6 | A | NA |
83 | 64 | Day6 | A | NA |
84 | 65 | Day6 | A | NA |
....
- SOCR Home page: http://www.socr.umich.edu
Translate this page: