tech.v3.dataset.io.univocity
Bindings to univocity. Transforms csv's, tsv's into sequences
of string arrays that are then passed into tech.v3.dataset.io.string-row-parser
methods.
create-csv-parser
(create-csv-parser {:keys [header-row? num-rows column-whitelist column-blacklist column-allowlist column-blocklist separator n-initial-skip-rows], :or {header-row? true}, :as options})
Create an implementation of univocity csv parser.
csv->dataset
(csv->dataset input options)
(csv->dataset input)
Non-lazily and serially parse the columns. Returns a vector of maps of { :name column-name :missing long-reader of in-order missing indexes :data typed reader/writer of data :metadata - optional map with unparsed-indexes and unparsed-values } Supports a subset of tech.v3.dataset/->dataset options: :column-allowlist in preference to :column-whitelist :column-blocklist in preference to :column-blacklist :n-initial-skip-rows :num-rows :header-row? :separator :parser-fn :parser-scan-len
csv->rows
(csv->rows input options)
(csv->rows input)
Given a csv, produces a sequence of rows. The csv options from ->dataset apply here.
options:
:column-allowlist
- either sequence of string column names or sequence of column indices of columns to whitelist. In preference to:column-whitelist
.:column-blocklist
- either sequence of string column names or sequence of column indices of columns to blacklist. In preference to:column-blacklist
.:num-rows
- Number of rows to read:separator
- Add a character separator to the list of separators to auto-detect.:max-chars-per-column
- Defaults to 4096. Columns with more characters that this will result in an exception.:max-num-columns
- Defaults to 8192. CSV,TSV files with more columns than this will fail to parse. For more information on this option, please visit: https://github.com/uniVocity/univocity-parsers/issues/301
PApplyWriteOptions
protocol
members
apply-write-options!
(apply-write-options! settings options)
raw-row-iterable
(raw-row-iterable input parser)
(raw-row-iterable input)
Returns an iterable that produces string[]'s
rows->csv!
(rows->csv! output header-string-array row-string-array-seq)
(rows->csv! output header-string-array row-string-array-seq {:keys [separator], :or {separator \tab}, :as options})
Given an something convertible to an output stream, an optional set of headers as string arrays, and a sequence of string arrows, write a CSV or a TSV file.
Options:
:separator
- Defaults to ab.:quoted-columns
- For csv, specify which columns should always be quoted regardless of their data.