tech.v3.dataset.column
column-map
(column-map map-fn res-dtype & args)
Map a scalar function across one or more columns. This is the semi-missing-set aware version of tech.v3.datatype/emap. This function is never lazy.
If res-dtype is nil then the result is scanned to infer datatype and missing set. res-dtype may also be a map of options:
Options:
:datatype
- Set the dataype of the result column. If not given result is scanned to infer result datatype and missing set.:missing-fn
- if given, columns are first passed to missing-fn as a sequence and this dictates the missing set. Else the missing set is by scanning the results during the inference process. Seetech.v3.dataset.column/union-missing-sets
andtech.v3.dataset.column/intersect-missing-sets
for example functions to pass in here.
correlation
(correlation lhs rhs correlation-type)
Correlation coefficient for given 2 columns. Available correlation types are: :pearson :spearman :kendall
Returns floating point number between -1 1
intersect-missing-sets
(intersect-missing-sets col-seq)
Intersect the missing sets of the columns returning a roaring bitmap
new-column
(new-column name data)
(new-column name data metadata)
(new-column name data metadata missing)
(new-column data-or-data-map)
Create a new column. Data will scanned for missing values unless the full 4-argument pathway is used.
parse-column
(parse-column datatype col options)
(parse-column datatype col)
parse a text or a str column, returning a new column with the same name but with a different datatype. This method is single-threaded.
parser-fn-or-kwd is nil by default and can the keyword :relaxed? or a function that must return one of parsed-value, :tech.v3.dataset/missing in which case a missing value will be added or :tech.v3.dataset/parse-failure in which case the a missing index will be added and the string value will be recorded in the metadata's :unparsed-data, :unparsed-indexes entries.
Options:
Same options roughly as ->dataset, specifically of interest may be :text-temp-file
.
select
(select col selection)
Return a new column with the subset of indexes based on the provided selection
.
selection
can be a list of indexes to select or boolean values where the index
position of each true element indicates a index to select. When supplying a list
of indices, duplicates are possible and will select the specified position more
than once.
set-missing
(set-missing col idx-seq)
Set the missing indexes for a column. This doesn't change any values in the underlying data store.
stats
(stats col stats-set)
Return a map of stats. Stats set is a set of the desired stats in keyword form. Guaranteed support across implementations for :mean :variance :median :skew. Implementations should check their metadata before doing calculations.
string-table-keyset
(string-table-keyset col)
Get the string table for this column. Returns nil if this isn't a string column. This doesn't necessarily tell you the unique set of the column unless you have just parsed a file. It is, when non-nil, a strict superset of the strings in the columns.
to-double-array
(to-double-array col & [error-on-missing?])
Convert to a java primitive array of a given datatype. For strings, an implicit string->double mapping is expected. For booleans, true=1 false=0. Finally, any missing values should be indicated by a NaN of the expected type.
union-missing-sets
(union-missing-sets col-seq)
Union the missing sets of the columns returning a roaring bitmap