tech.v3.dataset.io.univocity

Bindings to univocity. Transforms csv's, tsv's into sequences of string arrays that are then passed into tech.v3.dataset.io.string-row-parser methods.

create-csv-parser

(create-csv-parser {:keys [header-row? num-rows column-whitelist column-blacklist column-allowlist column-blocklist separator n-initial-skip-rows], :or {header-row? true}, :as options})

Create an implementation of univocity csv parser.

csv->dataset

(csv->dataset input options)(csv->dataset input)

Non-lazily and serially parse the columns. Returns a vector of maps of { :name column-name :missing long-reader of in-order missing indexes :data typed reader/writer of data :metadata - optional map with unparsed-indexes and unparsed-values } Supports a subset of tech.v3.dataset/->dataset options: :column-allowlist in preference to :column-whitelist :column-blocklist in preference to :column-blacklist :n-initial-skip-rows :num-rows :header-row? :separator :parser-fn :parser-scan-len

csv->rows

(csv->rows input options)(csv->rows input)

Given a csv, produces a sequence of rows. The csv options from ->dataset apply here.

options:

  • :column-allowlist - either sequence of string column names or sequence of column indices of columns to whitelist. In preference to :column-whitelist.
  • :column-blocklist - either sequence of string column names or sequence of column indices of columns to blacklist. In preference to :column-blacklist.
  • :num-rows - Number of rows to read
  • :separator - Add a character separator to the list of separators to auto-detect.
  • :max-chars-per-column - Defaults to 4096. Columns with more characters that this will result in an exception.
  • :max-num-columns - Defaults to 8192. CSV,TSV files with more columns than this will fail to parse. For more information on this option, please visit: https://github.com/uniVocity/univocity-parsers/issues/301

PApplyWriteOptions

protocol

members

apply-write-options!

(apply-write-options! settings options)

raw-row-iterable

(raw-row-iterable input parser)(raw-row-iterable input)

Returns an iterable that produces string[]'s

rows->csv!

(rows->csv! output header-string-array row-string-array-seq)(rows->csv! output header-string-array row-string-array-seq {:keys [separator], :or {separator \tab}, :as options})

Given an something convertible to an output stream, an optional set of headers as string arrays, and a sequence of string arrows, write a CSV or a TSV file.

Options:

  • :separator - Defaults to ab.
  • :quoted-columns - For csv, specify which columns should always be quoted regardless of their data.