tech.v3.dataset.categorical
Conversions of categorical values into numbers and back. Two forms of conversions are supported, a straight value->integer map and one-hot encoding.
The functions in this namespace manipulate the metadata on the columns of the dataset, wich can be inspected via clojure.core/meta
dataset->categorical-maps
(dataset->categorical-maps dataset)
Given a dataset, return a sequence of categorical map entries.
user> (ds-cat/dataset->categorical-maps catds)
({:lookup-table {:a 0, :b 1, :c 2, :d 3},
:src-column :x,
:result-datatype :float64}
{:lookup-table {:a 0, :b 1, :c 2, :d 3},
:src-column :y,
:result-datatype :float64})
fit-categorical-map
(fit-categorical-map dataset colname & [table-args res-dtype])
Given a column, map it into an numeric space via a discrete map of values to integers. This fits the categorical transformation onto the column and returns the transformation.
If table-args
is not given, the distinct column values will be mapped into 0..x without any specific order.
'table-args` allows to specify the precise mapping as a sequence of pairs of val idx or as a sorted seq of values.
fit-one-hot
(fit-one-hot dataset colname & [table-args res-dtype])
Fit a one hot transformation to a column. Returns a reusable transformation. Maps each unique value to a column with 1 every time the value appears in the original column and 0 otherwise.
invert-categorical-map
(invert-categorical-map dataset {:keys [src-column lookup-table], :as opts})
Invert a categorical map returning the column to the original set of values.
invert-one-hot-map
(invert-one-hot-map dataset {:keys [one-hot-table src-column], :as opts})
Invert a one-hot transformation removing the one-hot columns and adding back the original column.
reverse-map-categorical-xforms
(reverse-map-categorical-xforms dataset)
Given a dataset where we have converted columns from a categorical representation to either a numeric reprsentation or a one-hot representation, reverse map back to the original dataset given the reverse mapping of label->number in the column's metadata.
transform-categorical-map
(transform-categorical-map dataset fit-data)
Apply a categorical mapping transformation fit with fit-categorical-map.
transform-one-hot
(transform-one-hot dataset one-hot-fit-data)
Apply a one-hot transformation to a dataset