tech.v3.ml

Simple machine learning based on tech.v3.dataset functionality.

default-loss-fn

(default-loss-fn dataset)

Given a datset which must have exactly 1 inference target column return a default loss fn. If column is categorical, loss is tech.v3.ml.loss/classification-loss, else the loss is tech.v3.ml.loss/mae (mean average error).

view source

define-model!

(define-model! model-kwd train-fn predict-fn {:keys [hyperparameters thaw-fn explain-fn]})

Create a model definition. An ml model is a function that takes a dataset and an options map and returns a model. A model is something that, combined with a dataset, produces a inferred dataset.

view source

explain

(explain model & [options])

Explain (if possible) an ml model. A model explanation is a model-specific map of data that usually indicates some level of mapping between features and importance

view source

hyperparameters

(hyperparameters model-kwd)

Get the hyperparameters for this model definition

view source

identity-preprocess

(identity-preprocess ds options)

view source

model-definition-names

(model-definition-names)

Return a list of all registered model defintion names.

view source

model-definitions*

Map of model kwd to model definition

view source

options->model-def

(options->model-def options)

Return the model definition that corresponse to the :model-type option

view source

predict

(predict dataset model)

Predict returns a dataset with only the predictions in it.

For regression, a single column dataset is returned with the column named after the target
For classification, a dataset is returned with a float64 column for each target value and values that describe the probability distribution.

view source

preprocess

(preprocess dataset options)

view source

thaw-model

(thaw-model model {:keys [thaw-fn]})(thaw-model model)

Thaw a model. Model’s returned from train may be ‘frozen’ meaning a ‘thaw’ operation is needed in order to use the model. This happens for you during preduct but you may also cached the ‘thawed’ model on the model map under the ‘:thawed-model’ keyword in order to do fast predictions on small datasets.

view source

train

(train dataset options)

Given a dataset and an options map produce a model. The model-type keyword in the options map selects which model definition to use to train the model. Returns a map containing at least:

:model-data - the result of that definitions’s train-fn.
:options - the options passed in.
:id - new randomly generated UUID.
`:feature-columns - vector of column names.
`:target-columns - vector of column names.

view source

train-auto-gridsearch

(train-auto-gridsearch dataset options {:keys [n-k-folds n-gridsearch n-result-models loss-fn], :or {n-k-folds 5, n-gridsearch 75, n-result-models 5}, :as gridsearch-options})

(train-auto-gridsearch dataset options)

Train a model gridsearching across the options map. The gridsearch map is built by merging the model’s hyperparameter definitions into the options map. If the sobol sequence returned has only one element a warning is issued. Note this returns a sequence of models as opposed to a single model.

Searches across k-fold datasets if n-k-folds is > 1. n-k-folds defaults to 5.
Searches (in parallel) through n-gridsearch option maps created via sobol-gridsearch.
Returns n-result-models (defaults to 5) sorted by avg-loss.
loss-fn can be provided or is the loss-fn returned via default-loss-fn.

view source

train-k-fold

(train-k-fold dataset options n-k-folds loss-fn)(train-k-fold dataset options n-k-folds)(train-k-fold dataset options)

Train a model across k-fold datasets using tech.v3.dataset.modelling/k-fold-dataset and then calculate the min,max,and avg across results using loss-fn. Adds :n-k-folds, :min-loss, :max-loss, :avg-loss and :loss (min-loss) to the model with the lowest loss.

n-k-folds defaults to 5.
loss-fn defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.

view source

train-split

(train-split dataset options loss-fn)(train-split dataset options)

Train a model splitting the dataset using tech.v3.dataset.modelling/train-test-split and then calculate the loss using loss-fn. Loss is added to the model map under :loss.

loss-fn defaults to loss/mae if target column is not categorical else defaults to loss/classification-loss.

view source

Generated by Codox with RDash UI theme

tech.ml 6.019

Project

Topics

Namespaces

Public Vars

tech.v3.ml

default-loss-fn

define-model!

explain

hyperparameters

identity-preprocess

model-definition-names

model-definitions*

options->model-def

predict

preprocess

thaw-model

train

train-auto-gridsearch

train-k-fold

train-split