tech.v3.dataset.io.string-row-parser
Parsing functions based on raw data that is represented by a sequence of string arrays.
partition-all-rows
(partition-all-rows {:keys [header-row?], :or {header-row? true}} n row-seq)
Given a sequence of rows, partition into an undefined number of partitions of at most N rows but keep the header row as the first for all sequences.
rows->dataset
(rows->dataset {:keys [header-row? skip-bad-rows?], :or {header-row? true}, :as options} row-seq)
Given a sequence of string[] rows, parse into columnar data. See csv->columns. This method is useful if you have another way of generating sequences of string[] row data.
sample-rows
(sample-rows {:keys [header-row?], :or {header-row? true}} n row-seq)
Sample at most N rows selected randomly from the row sequence. If sequence is shorter than length N will return less than N rows. Uses naive reservoir sampling: https://en.wikipedia.org/wiki/Reservoir_sampling