tvm-clj.application.kmeans
High performance implementation of the KMeans algorithm using kmeans++ initialization and Lloyd’s algorithm for convergence.
kmeans++
(kmeans++ dataset n-centroids & [{:keys [n-iters rand-seed minimal-improvement-threshold], :or {minimal-improvement-threshold 0.01}, :as options}])
Find K cluster centroids via kmeans++ center initialization followed by Lloyds algorithm. Dataset must be a matrix (2d tensor).
dataset
- 2d matrix of numeric datatype.n-centroids
- How many centroids to find.
Returns map of:
:centroids
- 2d tensor of double centroids:centroid-indexes
- 1d integer vector of assigned center indexes.:iteration-scores
- n-iters+1 length array of mean squared error scores container the scores from centroid assigned up to the score when the algorithm terminates.
Options:
:minimal-improvement-threshold
- defaults to 0.01 - algorithm terminates if (1.0 - error(n-1)/error(n-2)) < error-diff-threshold. When Zero means algorithm will always train to max-iters.:n-iters
- defaults to 100 - Max number of iterations, algorithm terminates if `(>= iter-idx n-iters).:rand-seed
- integer or implementation ofjava.util.Random
.
order-data-labels
(order-data-labels data labels)
Order the dataset and labels such that labels are monotonically increasing. returns tuple of [dataset labels]
predict-per-label
(predict-per-label data model)
Return both a probability distribution per row across each label and a 1d tensor of assigned label indexes.
Returns:
:probability-distribution
- each row sums to one, max prob is the index picked.:label-indexes
- int32 assigned indexes for each row in the dataset.
quantize-image
(quantize-image src-path dst-path n-quantization & [{:keys [n-iters seed], :or {n-iters 5}}])
Quantize an image using kmeans. Copies data into a new image and, if dest-path is provided, saves the image.
Returns:
:centroids
- result of the quantization.:result
- resulting BufferedImage.:scores
- Scores after each iteration including initialization.
train-per-label
(train-per-label data labels n-per-label & [{:keys [input-ordered?], :as options}])
Given a dataset along with per-row integer labels, train N per-label kmeans centroids returning a model which you can use can use with predict-per-label.