tech.v3.dataset.io.string-row-parser

Parsing functions based on raw data that is represented by a sequence of string arrays.

partition-all-rows

(partition-all-rows {:keys [header-row?], :or {header-row? true}} n row-seq)

Given a sequence of rows, partition into an undefined number of partitions of at most N rows but keep the header row as the first for all sequences.

rows->dataset

(rows->dataset {:keys [header-row? skip-bad-rows?], :or {header-row? true}, :as options} row-seq)

Given a sequence of string[] rows, parse into columnar data. See csv->columns. This method is useful if you have another way of generating sequences of string[] row data.

sample-rows

(sample-rows {:keys [header-row?], :or {header-row? true}} n row-seq)

Sample at most N rows selected randomly from the row sequence. If sequence is shorter than length N will return less than N rows. Uses naive reservoir sampling: https://en.wikipedia.org/wiki/Reservoir_sampling