tech.v3.dataset.categorical

Conversions of categorical values into numbers and back. Two forms of conversions are supported, a straight value->integer map and one-hot encoding.

The functions in this namespace manipulate the metadata on the columns of the dataset, wich can be inspected via clojure.core/meta

dataset->categorical-maps

(dataset->categorical-maps dataset)

Given a dataset, return a sequence of categorical map entries.

user> (ds-cat/dataset->categorical-maps catds)
({:lookup-table {:a 0, :b 1, :c 2, :d 3},
  :src-column :x,
  :result-datatype :float64}
 {:lookup-table {:a 0, :b 1, :c 2, :d 3},
  :src-column :y,
  :result-datatype :float64})

fit-categorical-map

(fit-categorical-map dataset colname & [table-args res-dtype])

Given a column, map it into an numeric space via a discrete map of values to integers. This fits the categorical transformation onto the column and returns the transformation.

If table-args is not given, the distinct column values will be mapped into 0..x without any specific order.

'table-args` allows to specify the precise mapping as a sequence of pairs of val idx or as a sorted seq of values.

fit-one-hot

(fit-one-hot dataset colname & [table-args res-dtype])

Fit a one hot transformation to a column. Returns a reusable transformation. Maps each unique value to a column with 1 every time the value appears in the original column and 0 otherwise.

invert-categorical-map

(invert-categorical-map dataset {:keys [src-column lookup-table], :as opts})

Invert a categorical map returning the column to the original set of values.

invert-one-hot-map

(invert-one-hot-map dataset {:keys [one-hot-table src-column], :as opts})

Invert a one-hot transformation removing the one-hot columns and adding back the original column.

reverse-map-categorical-xforms

(reverse-map-categorical-xforms dataset)

Given a dataset where we have converted columns from a categorical representation to either a numeric reprsentation or a one-hot representation, reverse map back to the original dataset given the reverse mapping of label->number in the column's metadata.

transform-categorical-map

(transform-categorical-map dataset fit-data)

Apply a categorical mapping transformation fit with fit-categorical-map.

transform-one-hot

(transform-one-hot dataset one-hot-fit-data)

Apply a one-hot transformation to a dataset