tech.v3.ml
Simple machine learning based on tech.v3.dataset functionality.
default-loss-fn
(default-loss-fn dataset)
Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).
define-model!
(define-model! model-kwd train-fn predict-fn {:keys [hyperparameters thaw-fn explain-fn]})
Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.
explain
(explain model & [options])
Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance
hyperparameters
(hyperparameters model-kwd)
Get the hyperparameters for this model definition
model-definition-names
(model-definition-names)
Return a list of all registered model defintion names.
options->model-def
(options->model-def options)
Return the model definition that corresponse to the :model-type option
predict
(predict dataset model)
Predict returns a dataset with only the predictions in it.
- For regression, a single column dataset is returned with the column named after the target
- For classification, a dataset is returned with a float64 column for each target value and values that describe the probability distribution.
thaw-model
(thaw-model model {:keys [thaw-fn]})
(thaw-model model)
Thaw a model. Model’s returned from train may be ‘frozen’ meaning a ‘thaw’ operation is needed in order to use the model. This happens for you during preduct but you may also cached the ‘thawed’ model on the model map under the ‘:thawed-model’ keyword in order to do fast predictions on small datasets.
train
(train dataset options)
Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least:
:model-data
- the result of that definitions’s train-fn.:options
- the options passed in.:id
- new randomly generated UUID.- `:feature-columns - vector of column names.
- `:target-columns - vector of column names.
train-auto-gridsearch
(train-auto-gridsearch dataset options {:keys [n-k-folds n-gridsearch n-result-models loss-fn], :or {n-k-folds 5, n-gridsearch 75, n-result-models 5}, :as gridsearch-options})
(train-auto-gridsearch dataset options)
Train a model gridsearching across the options map. The gridsearch map is built by merging the model’s hyperparameter definitions into the options map. If the sobol sequence returned has only one element a warning is issued. Note this returns a sequence of models as opposed to a single model.
- Searches across k-fold datasets if n-k-folds is > 1. n-k-folds defaults to 5.
- Searches (in parallel) through n-gridsearch option maps created via sobol-gridsearch.
- Returns n-result-models (defaults to 5) sorted by avg-loss.
- loss-fn can be provided or is the loss-fn returned via default-loss-fn.
train-k-fold
(train-k-fold dataset options n-k-folds loss-fn)
(train-k-fold dataset options n-k-folds)
(train-k-fold dataset options)
Train a model across k-fold datasets using tech.v3.dataset.modelling/k-fold-dataset and then calculate the min,max,and avg across results using loss-fn. Adds :n-k-folds, :min-loss, :max-loss, :avg-loss and :loss (min-loss) to the model with the lowest loss.
n-k-folds
defaults to 5.loss-fn
defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.
train-split
(train-split dataset options loss-fn)
(train-split dataset options)
Train a model splitting the dataset using tech.v3.dataset.modelling/train-test-split and then calculate the loss using loss-fn. Loss is added to the model map under :loss.
loss-fn
defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.