SAFE.Rd
Generate automatically new features based on older ones for further modelling, using SAFE algoritm proposed in a paper by Shi, Zhang, Li, Yang and Zhou. This is a direct implementation of the pseudo-algoritm proposed in the paper, with its conventions, denotements and flaws.
SAFE( X_train, y_train, X_valid, y_valid, operators = list(NULL, list(`+`, `-`, `*`)), n_iter = 10, nrounds = 5, alpha = 0.1, gamma = 10, bins = 30, theta = 0.8, beta = Inf )
X_train | Matrix - data used to train model. Must be numerical. |
---|---|
y_train | Factor - labels for training data. Must be binary. |
X_valid | Matrix - data used to test model. Must be numerical. |
y_valid | Factor - labels for testing data. Must be binary. |
operators | A |
n_iter | Integer; Amount of iterations for the alghoritm to perform. |
nrounds | Integer for |
alpha | Threshold for |
gamma | Integer; Amount of most important feature combinations to be selected in each iteration. |
bins | Integer; amount of bins to create to discretize features. |
theta | Threshold for Pearson's correlation. Features with correlation above theta will be dropped. |
beta | Integer; Maximum amount of features to be selected at the end of each loop. Set to |
A list
with 2 elements: X_train
and X_test
.
Both contain transformed train and test sets, ready for further modelling.
Unfortunately, this is in contrary to algoritm mentioned in the paper (which returns a function) - at least for now.