Difference between revisions of "SMHS LinearModeling MachineLearning"
Line 2: | Line 2: | ||
Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression. | Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression. | ||
+ | |||
+ | <b>Questions:</b> | ||
+ | *How can we tie human intuition and computer-generated results to obtain reliable, effective, and efficient decision-support system (that facilitates, forecasting)? | ||
+ | *Niels Born – “It is difficult to make predictions, especially about the future” … | ||
+ | *Can we unsupervisely classify the data? | ||
+ | |||
+ | <b>Prediction</b> | ||
+ | |||
+ | For most of the machine learning algorithms (including first-order linear regression), we: | ||
+ | *first generate the model using training data, and then | ||
+ | *predict values for test/new data. | ||
+ | |||
+ | Predictions are made using the R <b>predict</b> function. (type <b>?predict.name</b>), where <b>name</b> is the function-name corresponding to the algorithm. The first argument of predict often represents the variable storing the model and the second argument is a matrix or data frame of test data that the model needs to be applied to. Calling predict can be done in 2 ways: type <b>predict</b> or type of <b>predict.name.</b> | ||
+ | |||
+ | <b>Example:</b> | ||
+ | |||
+ | #mydata <- read.table('https://umich.instructure.com/files/330381/download?download_frd=1&verifier=HpfmjfMFaMsk7rIpfPx0tmz960oTW7JA8ZonGvVC',as.is=T, header=T) # 01a_data.txt | ||
+ | # mydata <- read.table('data.txt',as.is=T, header=T) | ||
+ | |||
+ | # (1) First, there are different approaches to split the data (partition the data) into | ||
+ | # training and testing sets. | ||
+ | ## TRAINING: 75% of the sample size | ||
+ | sample_size <- floor(0.75 * nrow(mydata)) | ||
+ | ## set the seed to make your partition reproductible | ||
+ | set.seed(1234) | ||
+ | train_ind <- sample(seq_len(nrow(mydata)), size = sample_size) | ||
+ | train <- mydata[train_ind, ] | ||
+ | |||
+ | # TESTING DATA | ||
+ | test <- mydata[-train_ind, ] | ||
+ | |||
+ | lin.mod <- lm(Weight ~ Height*Team, <b><u>data=train</u></b>) | ||
+ | predicted.values <- predict(lin.mod, <b><u>newdata=test</b></u>) | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
Revision as of 08:56, 2 March 2016
SMHS Linear Modeling - Machine Learning Algorithms
Scientific inference based on fixed and random effect models, assumptions, and mixed effects logistic regression.
Questions:
- How can we tie human intuition and computer-generated results to obtain reliable, effective, and efficient decision-support system (that facilitates, forecasting)?
- Niels Born – “It is difficult to make predictions, especially about the future” …
- Can we unsupervisely classify the data?
Prediction
For most of the machine learning algorithms (including first-order linear regression), we:
- first generate the model using training data, and then
- predict values for test/new data.
Predictions are made using the R predict function. (type ?predict.name), where name is the function-name corresponding to the algorithm. The first argument of predict often represents the variable storing the model and the second argument is a matrix or data frame of test data that the model needs to be applied to. Calling predict can be done in 2 ways: type predict or type of predict.name.
Example:
#mydata <- read.table('https://umich.instructure.com/files/330381/download?download_frd=1&verifier=HpfmjfMFaMsk7rIpfPx0tmz960oTW7JA8ZonGvVC',as.is=T, header=T) # 01a_data.txt # mydata <- read.table('data.txt',as.is=T, header=T)
# (1) First, there are different approaches to split the data (partition the data) into # training and testing sets. ## TRAINING: 75% of the sample size sample_size <- floor(0.75 * nrow(mydata)) ## set the seed to make your partition reproductible set.seed(1234) train_ind <- sample(seq_len(nrow(mydata)), size = sample_size) train <- mydata[train_ind, ]
# TESTING DATA test <- mydata[-train_ind, ]
lin.mod <- lm(Weight ~ Height*Team, data=train) predicted.values <- predict(lin.mod, newdata=test)
....
- SOCR Home page: http://www.socr.umich.edu
Translate this page: