tech.v3.dataset.math

Various mathematic transformations of datasets such as (inefficiently) building simple tables, pca, and normalizing columns to have mean of 0 and variance of 1. More in-depth transformations are found at tech.v3.dataset.neanderthal.

correlation-table

(correlation-table dataset & {:keys [correlation-type colname-seq]})

Return a map of colname->list of sorted tuple of colname, coefficient. Sort is: (sort-by (comp #(Math/abs (double %)) second) >)

Thus the first entry is: colname, 1.0

There are three possible correlation types: :pearson :spearman :kendall

:pearson is the default.

fill-range-replace

(fill-range-replace ds colname max-span)(fill-range-replace ds colname max-span missing-strategy)(fill-range-replace ds colname max-span missing-strategy missing-value)

Given an in-order column of a numeric or datetime type, fill in spans that are larger than the given max-span. The source column must not have missing values. For more documentation on fill-range, see tech.v3.datatype.function.fill-range.

If the column is a datetime type the operation happens in millisecond space and max-span may be a datetime type convertible to milliseconds.

The result column has the same datatype as the input column.

After the operation, if missing strategy is not nil the newly produced missing values along with the existing missing values will be replaced using the given missing strategy for all other columns. See tech.v3.dataset.missing/replace-missing for documentation on missing strategies. The missing strategy defaults to :down unless explicity set.

Returns a new dataset.

fit-minmax

(fit-minmax dataset {:keys [min max], :or {min -0.5, max 0.5}, :as options})(fit-minmax dataset)

nan-aware min-max fit of the dataset. Returns an object that can be used in transform-minmax. target Min-max default to -0.5,0.5

fit-std-scale

(fit-std-scale dataset {:keys [mean? stddev?], :or {mean? true, stddev? true}, :as options})(fit-std-scale dataset)

Calculate nan-aware means, stddev - per-column - of a dataset.

Options are passed through to tech.v3.datatype.statistics/descriptive-statistics.

interpolate-loess

(interpolate-loess ds x-colname y-colname {:keys [bandwidth iterations accuracy result-name], :or {bandwidth 0.75, iterations 4, accuracy LoessInterpolator/DEFAULT_ACCURACY}})(interpolate-loess ds x-colname y-colname)

Interpolate using the LOESS regression engine. Useful for smoothing out graphs.

transform-minmax

(transform-minmax dataset {:keys [min max column-data]})

Scale columns listed in the min-max transform to the mins and maxes dictated by that transform.

transform-std-scale

(transform-std-scale dataset std-scale-xform)

Given a dataset and a standard scale transform return a new dataset with the columns