tech.v3.ml

Simple machine learning based on tech.v3.dataset functionality.

default-loss-fn

(default-loss-fn dataset)

Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).

define-model!

(define-model! model-kwd train-fn predict-fn {:keys [hyperparameters thaw-fn explain-fn]})

Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.

explain

(explain model & [options])

Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance

hyperparameters

(hyperparameters model-kwd)

Get the hyperparameters for this model definition

identity-preprocess

(identity-preprocess ds options)

model-definition-names

(model-definition-names)

Return a list of all registered model defintion names.

model-definitions*

Map of model kwd to model definition

options->model-def

(options->model-def options)

Return the model definition that corresponse to the :model-type option

predict

(predict dataset model)

Predict returns a dataset with only the predictions in it.

  • For regression, a single column dataset is returned with the column named after the target
  • For classification, a dataset is returned with a float64 column for each target value and values that describe the probability distribution.

preprocess

(preprocess dataset options)

thaw-model

(thaw-model model {:keys [thaw-fn]})(thaw-model model)

Thaw a model. Model’s returned from train may be ‘frozen’ meaning a ‘thaw’ operation is needed in order to use the model. This happens for you during preduct but you may also cached the ‘thawed’ model on the model map under the ‘:thawed-model’ keyword in order to do fast predictions on small datasets.

train

(train dataset options)

Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least:

  • :model-data - the result of that definitions’s train-fn.
  • :options - the options passed in.
  • :id - new randomly generated UUID.
  • `:feature-columns - vector of column names.
  • `:target-columns - vector of column names.

train-auto-gridsearch

(train-auto-gridsearch dataset options {:keys [n-k-folds n-gridsearch n-result-models loss-fn], :or {n-k-folds 5, n-gridsearch 75, n-result-models 5}, :as gridsearch-options})(train-auto-gridsearch dataset options)

Train a model gridsearching across the options map. The gridsearch map is built by merging the model’s hyperparameter definitions into the options map. If the sobol sequence returned has only one element a warning is issued. Note this returns a sequence of models as opposed to a single model.

  • Searches across k-fold datasets if n-k-folds is > 1. n-k-folds defaults to 5.
  • Searches (in parallel) through n-gridsearch option maps created via sobol-gridsearch.
  • Returns n-result-models (defaults to 5) sorted by avg-loss.
  • loss-fn can be provided or is the loss-fn returned via default-loss-fn.

train-k-fold

(train-k-fold dataset options n-k-folds loss-fn)(train-k-fold dataset options n-k-folds)(train-k-fold dataset options)

Train a model across k-fold datasets using tech.v3.dataset.modelling/k-fold-dataset and then calculate the min,max,and avg across results using loss-fn. Adds :n-k-folds, :min-loss, :max-loss, :avg-loss and :loss (min-loss) to the model with the lowest loss.

  • n-k-folds defaults to 5.
  • loss-fn defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.

train-split

(train-split dataset options loss-fn)(train-split dataset options)

Train a model splitting the dataset using tech.v3.dataset.modelling/train-test-split and then calculate the loss using loss-fn. Loss is added to the model map under :loss.

  • loss-fn defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.