tech.v3.dataset.column-filters

Queries to select column subsets that have various properites such as all numeric columns, all feature columns, or columns that have a specific datatype.

Further a few set operations (union, intersection, difference) are provided to further manipulate subsets of columns.

All functions are transformations from dataset to dataset.

The functions in this namespace use the metadata on the columns of the dataset, wich can be inspected via clojure.core/meta

boolean

(boolean dataset)

Return a dataset containing only the boolean columns.

categorical

(categorical dataset)

Return a dataset containing only the categorical columns.

column-filter

(column-filter dataset filter-fn)

Return a dataset with only the columns for which the filter function returns a truthy value.

datetime

(datetime dataset)

Return a dataset containing only the datetime columns.

difference

(difference lhs-ds rhs-ds)(difference lhs-ds)

Return the columns in lhs which do not have an equivalently named column in rhs.

feature

(feature dataset)

Return a dataset container only the columns which have not been marked as inference columns.

intersection

(intersection lhs-ds rhs-ds)

Return only columns for rhs for which an equivalently named column exists in lhs.

metadata-filter

(metadata-filter dataset filter-fn)

Return a dataset with only the columns for which, given the column metadata, the filter function returns a truthy value.

missing

(missing dataset)

Return a dataset with only columns have have missing values

no-missing

(no-missing dataset)

Return a dataset with only columns that have no missing values.

numeric

(numeric dataset)

Return a dataset containing only the numeric columns.

of-datatype

(of-datatype dataset datatype)

Return a dataset containing only the columns of a specific datatype.

prediction

(prediction dataset)

Return the columns of the dataset marked as predictions.

probability-distribution

(probability-distribution dataset)

Return the columns of the dataset that comprise the probability distribution after classification.

string

(string dataset)

Return a dataset containing only the string columns.

target

(target dataset)

Return a dataset containing only the columns that have been marked as inference targets.

union

(union lhs-ds rhs-ds)

Return all columns of lhs along with any columns in rhs which have names that do not exist in lhs.