tech.v3.dataset.column

clone

(clone col)

Clone this column not changing anything.

column-map

(column-map map-fn res-dtype & args)

Map a scalar function across one or more columns. This is the semi-missing-set aware version of tech.v3.datatype/emap. This function is never lazy.

If res-dtype is nil then the result is scanned to infer datatype and missing set. res-dtype may also be a map of options:

Options:

  • :datatype - Set the dataype of the result column. If not given result is scanned to infer result datatype and missing set.
  • :missing-fn - if given, columns are first passed to missing-fn as a sequence and this dictates the missing set. Else the missing set is by scanning the results during the inference process. See tech.v3.dataset.column/union-missing-sets and tech.v3.dataset.column/intersect-missing-sets for example functions to pass in here.

column-name

(column-name col)

correlation

(correlation lhs rhs correlation-type)

Correlation coefficient for given 2 columns. Available correlation types are: :pearson :spearman :kendall

Returns floating point number between -1 1

extend-column-with-empty

(extend-column-with-empty column n-empty)

intersect-missing-sets

(intersect-missing-sets col-seq)

Intersect the missing sets of the columns returning a roaring bitmap

is-column?

(is-column? item)

Return true if this item is a column.

is-missing?

(is-missing? col idx)

Return true if this index is missing.

missing

(missing col)

Indexes of missing values. Both iterable and reader.

new-column

(new-column name data)(new-column name data metadata)(new-column name data metadata missing)(new-column data-or-data-map)

Create a new column. Data will scanned for missing values unless the full 4-argument pathway is used.

parse-column

(parse-column datatype col options)(parse-column datatype col)

parse a text or a str column, returning a new column with the same name but with a different datatype. This method is single-threaded.

parser-fn-or-kwd is nil by default and can the keyword :relaxed? or a function that must return one of parsed-value, :tech.v3.dataset/missing in which case a missing value will be added or :tech.v3.dataset/parse-failure in which case the a missing index will be added and the string value will be recorded in the metadata's :unparsed-data, :unparsed-indexes entries.

Options:

Same options roughly as ->dataset, specifically of interest may be :text-temp-file.

prepend-column-with-empty

(prepend-column-with-empty column n-empty)

select

(select col selection)

Return a new column with the subset of indexes based on the provided selection. selection can be a list of indexes to select or boolean values where the index position of each true element indicates a index to select. When supplying a list of indices, duplicates are possible and will select the specified position more than once.

set-missing

(set-missing col idx-seq)

Set the missing indexes for a column. This doesn't change any values in the underlying data store.

set-name

(set-name col name)

Return a new column.

stats

(stats col stats-set)

Return a map of stats. Stats set is a set of the desired stats in keyword form. Guaranteed support across implementations for :mean :variance :median :skew. Implementations should check their metadata before doing calculations.

string-table-keyset

(string-table-keyset col)

Get the string table for this column. Returns nil if this isn't a string column. This doesn't necessarily tell you the unique set of the column unless you have just parsed a file. It is, when non-nil, a strict superset of the strings in the columns.

supported-stats

(supported-stats col)

List of available stats for the column

to-double-array

(to-double-array col & [error-on-missing?])

Convert to a java primitive array of a given datatype. For strings, an implicit string->double mapping is expected. For booleans, true=1 false=0. Finally, any missing values should be indicated by a NaN of the expected type.

union-missing-sets

(union-missing-sets col-seq)

Union the missing sets of the columns returning a roaring bitmap

unique

(unique col)

Set of all unique values