Warning: package 'rpart' was built under R version 4.2.3
Attaching package: 'rpart'
The following object is masked from 'package:dials':
prune
library(ranger)
Warning: package 'ranger' was built under R version 4.2.3
library(glmnet)
Warning: package 'glmnet' was built under R version 4.2.3
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
Loaded glmnet 4.1-7
library(rpart.plot)
Warning: package 'rpart.plot' was built under R version 4.2.3
library(vip)
Warning: package 'vip' was built under R version 4.2.3
Attaching package: 'vip'
The following object is masked from 'package:utils':
vi
Read in Data
#load in data setflu_ml<-readRDS("../data/flu_ml.rds")
Setup
#setting seed for reproducibilityset.seed(123)#Split data with 70% in the training set stratifying on BodyTempflu_split <-initial_split(data=flu_ml, strata = BodyTemp, prop=7/10)#Create data frames for the two sets:train_data <-training(flu_split)test_data <-testing(flu_split)
# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 1.21 25 0.0177 Preprocessor1_Model1
2 rsq standard NaN 0 NA Preprocessor1_Model1
# A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 1.19 Preprocessor1_Model1
2 rsq standard 0.000889 Preprocessor1_Model1
#Plot of final fitrpart.plot(extract_fit_parsnip(final_fit)$fit)
Warning: Cannot retrieve the data used to build the model (so cannot determine roundint and is.binary for the variables).
To silence this warning:
Call rpart.plot with roundint=FALSE,
or rebuild the rpart model with model=TRUE.
Fitting a LASSO
#Build the modellasso_mod <-linear_reg(penalty =tune(), mixture =1) %>%set_engine("glmnet")#Pull earlier recipeflu_ml_rec
Recipe
Inputs:
role #variables
outcome 1
predictor 25
Operations:
Dummy variables from all_nominal(), -all_outcomes()
rf_mod <-rand_forest(mtry =tune(), min_n =tune(), trees =1000) %>%set_engine("ranger", num.threads = cores, importance="impurity") %>%set_mode("regression")#Create workflowrf_workflow <-workflow() %>%add_model(rf_mod) %>%add_recipe(flu_ml_rec)#Parameters for tuningextract_parameter_set_dials(rf_mod)
Collection of 2 parameters for tuning
identifier type object
mtry mtry nparam[?]
min_n min_n nparam[+]
Model parameters needing finalization:
# Randomly Selected Predictors ('mtry')
See `?dials::finalize` or `?dials::update.parameters` for more information.
#I'm deciding to use the lasso fit as the best model because it has the lowest rmsefinal_test <- lasso_final %>%last_fit(flu_split)final_test %>%collect_metrics()
# A tibble: 2 × 4
.metric .estimator .estimate .config
<chr> <chr> <dbl> <chr>
1 rmse standard 1.16 Preprocessor1_Model1
2 rsq standard 0.0299 Preprocessor1_Model1