tvm-clj.application.kmeans

High performance implementation of the KMeans algorithm using kmeans++ initialization and Lloyd’s algorithm for convergence.

kmeans++

(kmeans++ dataset n-centroids & [{:keys [n-iters rand-seed minimal-improvement-threshold], :or {minimal-improvement-threshold 0.01}, :as options}])

Find K cluster centroids via kmeans++ center initialization followed by Lloyds algorithm. Dataset must be a matrix (2d tensor).

dataset - 2d matrix of numeric datatype.
n-centroids - How many centroids to find.

Returns map of:

:centroids - 2d tensor of double centroids
:centroid-indexes - 1d integer vector of assigned center indexes.
:iteration-scores - n-iters+1 length array of mean squared error scores container the scores from centroid assigned up to the score when the algorithm terminates.

Options:

:minimal-improvement-threshold - defaults to 0.01 - algorithm terminates if (1.0 - error(n-1)/error(n-2)) < error-diff-threshold. When Zero means algorithm will always train to max-iters.
:n-iters - defaults to 100 - Max number of iterations, algorithm terminates if `(>= iter-idx n-iters).
:rand-seed - integer or implementation of java.util.Random.

view source

order-data-labels

(order-data-labels data labels)

Order the dataset and labels such that labels are monotonically increasing. returns tuple of [dataset labels]

view source

predict-per-label

(predict-per-label data model)

Return both a probability distribution per row across each label and a 1d tensor of assigned label indexes.

Returns:

:probability-distribution - each row sums to one, max prob is the index picked.
:label-indexes - int32 assigned indexes for each row in the dataset.

view source

quantize-image

(quantize-image src-path dst-path n-quantization & [{:keys [n-iters seed], :or {n-iters 5}}])

Quantize an image using kmeans. Copies data into a new image and, if dest-path is provided, saves the image.

Returns:

:centroids - result of the quantization.
:result - resulting BufferedImage.
:scores - Scores after each iteration including initialization.

view source

train-per-label

(train-per-label data labels n-per-label & [{:keys [input-ordered?], :as options}])

Given a dataset along with per-row integer labels, train N per-label kmeans centroids returning a model which you can use can use with predict-per-label.

view source

Generated by Codox with RDash UI theme

tvm-clj 6.00-beta-1-SNAPSHOT

Project

Topics

Namespaces

Public Vars

tvm-clj.application.kmeans

kmeans++

order-data-labels

predict-per-label

quantize-image

train-per-label