TMD 8.018

tech.v3.dataset

Column major dataset abstraction for efficiently manipulating in memory datasets.

Public variables and functions:

tech.v3.dataset.categorical

Conversions of categorical values into numbers and back. Two forms of conversions are supported, a straight value->integer map and one-hot encoding.

Public variables and functions:

tech.v3.dataset.clipboard

Optional namespace that copies a dataset to the clipboard for pasting into applications such as excel or google sheets.

Public variables and functions:

tech.v3.dataset.column-filters

Queries to select column subsets that have various properites such as all numeric columns, all feature columns, or columns that have a specific datatype.

Public variables and functions:

tech.v3.dataset.io.csv

CSV parsing based on charred.api/read-csv.

Public variables and functions:

tech.v3.dataset.io.datetime

Helpful and well tested string->datetime pathways.

Public variables and functions:

tech.v3.dataset.io.string-row-parser

Parsing functions based on raw data that is represented by a sequence of string arrays.

Public variables and functions:

tech.v3.dataset.io.univocity

Bindings to univocity. Transforms csv's, tsv's into sequences of string arrays that are then passed into tech.v3.dataset.io.string-row-parser methods.

Public variables and functions:

tech.v3.dataset.join

implementation of join algorithms, both exact (hash-join) and near.

Public variables and functions:

Various mathematic transformations of datasets such as (inefficiently) building simple tables, pca, and normalizing columns to have mean of 0 and variance of 1. More in-depth transformations are found at tech.v3.dataset.neanderthal.

Public variables and functions:

tech.v3.dataset.metamorph

This is an auto-generated api system - it scans the namespaces and changes the first to be metamorph-compliant which means transforming an argument that is just a dataset into an argument that is a metamorph context - a map of {:metamorph/data ds}. They also return their result as a metamorph context.

Public variables and functions:

tech.v3.dataset.modelling

Methods related specifically to machine learning such as setting the inference target. This file integrates tightly with tech.v3.dataset.categorical which provides categorical -> number and one-hot transformation pathways.

Public variables and functions:

tech.v3.dataset.print

Public variables and functions:

tech.v3.dataset.reductions

Specific high performance reductions intended to be performend over a sequence of datasets. This allows aggregations to be done in situations where the dataset is larger than what will fit in memory on a normal machine. Due to this fact, summation is implemented using Kahan algorithm and various statistical methods are done in using statistical estimation techniques and thus are prefixed with prob- which is short for probabilistic.

tech.v3.dataset.reductions.apache-data-sketch

Reduction reducers based on the apache data sketch family of algorithms.

Public variables and functions:

tech.v3.dataset.rolling

Implement a generalized rolling window including support for time-based variable width windows.

Public variables and functions:

tech.v3.dataset.set

Extensions to datasets to do per-row bag-semantics set/union and intersection.

Public variables and functions:

tech.v3.dataset.tensor

Conversion mechanisms from dataset to tensor and back.

Public variables and functions:

tech.v3.dataset.zip

Load zip data. Zip files with a single file entry can be loaded with ->dataset. When a zip file has multiple entries you have to call zipfile->dataset-seq.

Public variables and functions:

tech.v3.libs.arrow

Support for reading/writing apache arrow datasets. Datasets may be memory mapped but default to being read via an input stream.

Public variables and functions:

tech.v3.libs.clj-transit

Transit bindings for the jvm version of tech.v3.dataset.

Public variables and functions:

tech.v3.libs.fastexcel

Parse a dataset in xlsx format. This namespace auto-registers a handler for the 'xlsx' file type so that when using ->dataset, xlsx will automatically map to (first (workbook->datasets)).

Public variables and functions:

tech.v3.libs.guava.cache

Use a google guava cache to memoize function results. Function must not return nil values. Exceptions propagate to caller.

Public variables and functions:

tech.v3.libs.parquet

Support for reading Parquet files. You must require this namespace to enable parquet read/write support.

Public variables and functions:

tech.v3.libs.poi

Parse a dataset in xls or xlsx format. This namespace auto-registers a handler for the xls file type so that when using ->dataset, xls will automatically map to (first (workbook->datasets)).

Public variables and functions:

tech.v3.libs.tribuo

Bindings to make working with tribuo more straight forward when using datasets.

Public variables and functions:

Generated by Codox with RDash UI theme

TMD 8.018

Project

Topics

Namespaces

TMD 8.018

Topics

Namespaces

tech.v3.dataset

tech.v3.dataset.categorical

tech.v3.dataset.clipboard

tech.v3.dataset.column

tech.v3.dataset.column-filters

tech.v3.dataset.io.csv

tech.v3.dataset.io.datetime

tech.v3.dataset.io.string-row-parser

tech.v3.dataset.io.univocity

tech.v3.dataset.join

tech.v3.dataset.math

tech.v3.dataset.metamorph

tech.v3.dataset.modelling

tech.v3.dataset.print

tech.v3.dataset.reductions

tech.v3.dataset.reductions.apache-data-sketch

tech.v3.dataset.rolling

tech.v3.dataset.set

tech.v3.dataset.tensor

tech.v3.dataset.zip

tech.v3.libs.arrow

tech.v3.libs.clj-transit

tech.v3.libs.fastexcel

tech.v3.libs.guava.cache

tech.v3.libs.parquet

tech.v3.libs.poi

tech.v3.libs.tribuo