train.model.Rd
This function trains the a machine learning model on the training data
train.model(siamcat, method = c("lasso", "enet", "ridge", "lasso_ll", "ridge_ll", "randomForest"), stratify = TRUE, modsel.crit = list("auc"), min.nonzero.coeff = 1, param.set = NULL, perform.fs = FALSE, param.fs = list(thres.fs = 100, method.fs = "AUC", direction='absolute'), feature.type='normalized', verbose = 1)
siamcat | object of class siamcat-class |
---|---|
method | string, specifies the type of model to be trained, may be one
of these: |
stratify | boolean, should the folds in the internal cross-validation be
stratified?, defaults to |
modsel.crit | list, specifies the model selection criterion during
internal cross-validation, may contain these: |
min.nonzero.coeff | integer number of minimum nonzero coefficients that
should be present in the model (only for |
param.set | list, set of extra parameters for mlr run, may contain:
Defaults to |
perform.fs | boolean, should feature selection be performed?
Defaults to |
param.fs | list, parameters for the feature selection, must contain:
See Details for more information.
Defaults to |
feature.type | string, on which type of features should the function
work? Can be either |
verbose | integer, control output: |
object of class siamcat-class with added model_list
This functions performs the training of the machine learning model
and functions as an interface to the mlr
-package.
The function expects a siamcat-class-object with a prepared
cross-validation (see create.data.split) in the
data_split
-slot of the object. It then trains a model for
each fold of the datasplit.
For the machine learning methods that require additional
hyperparameters (e.g. lasso_ll
), the optimal hyperparameters
are tuned with the function tuneParams within the
mlr
-package.
The different machine learning methods are implemented as mlr-tasks:
'lasso'
, 'enet'
, and 'ridge'
use the
'classif.cvglmnet'
Learner,
'lasso_ll'
and 'ridge_ll'
use the
'classif.LiblineaRL1LogReg'
and the
'classif.LiblineaRL2LogReg'
Learners respectively
'randomForest'
is implemented via the
'classif.randomForest'
Learner.
The function can also perform feature selection on each individual fold. At the moment, three methods for feature selection are implemented:
'AUC'
- computes the Area Under the Receiver Operating
Characteristics Curve for each single feature and selects the top
param.fs$thres.fs
, e.g. 100 features
'gFC'
- computes the generalized Fold Change (see
check.associations) for each feature and likewise selects the
top param.fs$thres.fs
, e.g. 100 features
Wilcoxon
- computes the p-Value for each single feature
with the Wilcoxon test and selects features with a p-value smaller
than param.fs$thres.fs
For AUC
and gFC
, feature selection can also be directed,
that means that the features will be selected either based on the
overall association (absolute
- gFC
will be converted to
absolute values and AUC
values below 0.5
will be
converted by 1 - AUC
), or on associations in a certain direction
(positive
- positive enrichment as measured by positive values
of the gFC
or AUC
values higher than 0.5
- and
reversely for negative
).
data(siamcat_example) # simple working example siamcat_example <- train.model(siamcat_example, method='lasso')#>