public class Rolling
extends java.lang.Object
Fixed and variable length rolling windows. For variable rolling windows the dataset must already be sorted by the target column. Datetime support is provided in terms of provide specific units in which to perform the rolling operation such as the keyword :days
.
Modifier and Type | Method and Description |
---|---|
static java.util.Map |
first(java.lang.Object colname)
reducer that keeps the first value
|
static java.util.Map |
fixedWindow(long windowSize)
Return fixed size rolling window.
|
static java.util.Map |
fixedWindow(long windowSize,
clojure.lang.Keyword winPos)
Return fixed size rolling window.
|
static java.util.Map |
fixedWindow(long windowSize,
clojure.lang.Keyword winPos,
clojure.lang.Keyword edgeMode)
Return fixed size rolling window.
|
static java.util.Map |
last(java.lang.Object colname)
reducer that keeps the last value
|
static java.util.Map |
max(java.lang.Object colname)
max reducer
|
static java.util.Map |
mean(java.lang.Object colname)
mean reducer
|
static java.util.Map |
min(java.lang.Object colname)
min reducer
|
static java.util.Map |
reducer(java.lang.Object srcColname,
clojure.lang.IFn reduceFn)
Create a columnwise reducer eliding datatype parameter.
|
static java.util.Map |
reducer(java.lang.Object srcColname,
clojure.lang.IFn reduceFn,
clojure.lang.Keyword datatype)
Create a columnwise reducer.
|
static java.util.Map |
rolling(java.lang.Object ds,
java.util.Map windowSpec,
java.util.Map reducerMap)
Fixed or variable rolling window reductions.
|
static java.util.Map |
stddev(java.lang.Object colname)
stddev reducer
|
static java.util.Map |
sum(java.lang.Object colname)
sum reducer
|
static java.util.Map |
variableWindow(java.lang.Object colname,
double windowSize)
Create a variable window specification with a double windowsize for a particular column.
|
static java.util.Map |
variableWindow(java.lang.Object colname,
double windowSize,
clojure.lang.Keyword datetimeUnit)
Create a datetime-specific variable window specification with a double windowsize for a particular column.
|
static java.util.Map |
variableWindow(java.lang.Object colname,
double windowSize,
java.lang.Object compFn)
Create a variable window specification with a double windowsize for a particular column and a compFn which must take two values and return a double.
|
static java.util.Map |
variance(java.lang.Object colname)
variance reducer
|
public static java.util.Map rolling(java.lang.Object ds, java.util.Map windowSpec, java.util.Map reducerMap)
Fixed or variable rolling window reductions.
windowSpec
- Window specification specifying the type of window, either a window over a fixed number of rows or a window based on a fixed logical quantitative difference i.e. three months or 10 milliseconds.reducerMap
- map of dest column name to reducer where reducer is a map with two keys, :column-name which is the input column to use and :reducer which is an IFn that receives each window of data as a buffer.
Example:
Map stocks = makeDataset("https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv");
//Variable-sized windows require the source column to be sorted.
stocks = sortByColumn(stocks, "date");
Map variableWin = Rolling.rolling(stocks,
Rolling.variableWindow("date", 3, kw("months")),
hashmap("price-mean-3m", Rolling.mean("price"),
"price-max-3m", Rolling.max("price"),
"price-min-3m", Rolling.min("price")));
println(head(variableWin, 10));
https://github.com/techascent/tech.ml.dataset/raw/master/test/data/stocks.csv [10 6]:
//| symbol | date | price | price-max-3m | price-mean-3m | price-min-3m |
//|--------|------------|-------:|-------------:|--------------:|-------------:|
//| AAPL | 2000-01-01 | 25.94 | 106.11 | 58.92500000 | 25.94 |
//| IBM | 2000-01-01 | 100.52 | 106.11 | 61.92363636 | 28.66 |
//| MSFT | 2000-01-01 | 39.81 | 106.11 | 58.06400000 | 28.66 |
//| AMZN | 2000-01-01 | 64.56 | 106.11 | 60.09222222 | 28.66 |
//| AAPL | 2000-02-01 | 28.66 | 106.11 | 57.56583333 | 28.37 |
//| MSFT | 2000-02-01 | 36.35 | 106.11 | 60.19363636 | 28.37 |
//| IBM | 2000-02-01 | 92.11 | 106.11 | 62.57800000 | 28.37 |
//| AMZN | 2000-02-01 | 68.87 | 106.11 | 59.29666667 | 28.37 |
//| AMZN | 2000-03-01 | 67.00 | 106.11 | 54.65583333 | 21.00 |
//| MSFT | 2000-03-01 | 43.22 | 106.11 | 53.53363636 | 21.00 |
//Fixed window...
Object radians = VecMath.mul(2.0*Math.PI, VecMath.div(range(33), 32.0));
Map sinds = makeDataset(hashmap("radians", radians, "sin", VecMath.sin(radians)));
Map fixedWin = Rolling.rolling(sinds,
Rolling.fixedWindow(4),
hashmap("sin-roll-mean", Rolling.mean("sin"),
"sin-roll-max", Rolling.max("sin"),
"sin-roll-min", Rolling.min("sin")));
println(head(fixedWin, 8));
//_unnamed [8 5]:
//| sin | radians | sin-roll-max | sin-roll-min | sin-roll-mean |
//|-----------:|-----------:|-------------:|-------------:|--------------:|
//| 0.00000000 | 0.00000000 | 0.19509032 | 0.00000000 | 0.04877258 |
//| 0.19509032 | 0.19634954 | 0.38268343 | 0.00000000 | 0.14444344 |
//| 0.38268343 | 0.39269908 | 0.55557023 | 0.00000000 | 0.28333600 |
//| 0.55557023 | 0.58904862 | 0.70710678 | 0.19509032 | 0.46011269 |
//| 0.70710678 | 0.78539816 | 0.83146961 | 0.38268343 | 0.61920751 |
//| 0.83146961 | 0.98174770 | 0.92387953 | 0.55557023 | 0.75450654 |
//| 0.92387953 | 1.17809725 | 0.98078528 | 0.70710678 | 0.86081030 |
//| 0.98078528 | 1.37444679 | 1.00000000 | 0.83146961 | 0.93403361 |
public static java.util.Map variableWindow(java.lang.Object colname, double windowSize)
Create a variable window specification with a double windowsize for a particular column. This specification will not work on datetime columns.
public static java.util.Map variableWindow(java.lang.Object colname, double windowSize, java.lang.Object compFn)
Create a variable window specification with a double windowsize for a particular column and a compFn which must take two values and return a double. The function must take 2 arguments and the arguments are passed in as (later,earlier). This allows the basic clojure ‘-’ operator to work fine in many cases.
public static java.util.Map variableWindow(java.lang.Object colname, double windowSize, clojure.lang.Keyword datetimeUnit)
Create a datetime-specific variable window specification with a double windowsize for a particular column.
datetimeUnit
- One of [:milliseconds, :seconds, :hours, :days, :months]
.public static java.util.Map fixedWindow(long windowSize)
Return fixed size rolling window. Window will be fixed over window-size
rows.
public static java.util.Map fixedWindow(long windowSize, clojure.lang.Keyword winPos)
Return fixed size rolling window. Window will be fixed over window-size
rows.
winPos
- One of [:left :center :right]
. This combined with the default edge mode of :clamp
dictates the windows of data the reducer sees.public static java.util.Map fixedWindow(long windowSize, clojure.lang.Keyword winPos, clojure.lang.Keyword edgeMode)
Return fixed size rolling window. Window will be fixed over window-size
rows.
winPos
- One of [:left :center :right]
. This combined with the default edge mode dictates windows of data the reducer sees.edgeMode
- One of [:zero, null, :clamp]
. Clamp means repeat the end value.public static java.util.Map reducer(java.lang.Object srcColname, clojure.lang.IFn reduceFn, clojure.lang.Keyword datatype)
Create a columnwise reducer. This reducer gets sub-windows from the column and must return a scalar value. If srcColname is a vector of colnames then reduceFn will be passed each column window as a separate argument.
datatype
- Option datatype, may be nil in which case the dataset will scan the result to infer datatype. If provided this will enforce the result column datatype. Reductions to numeric types with fixed datatypes will be slightly faster than generic reductions which require inference to find the final datatype.public static java.util.Map reducer(java.lang.Object srcColname, clojure.lang.IFn reduceFn)
Create a columnwise reducer eliding datatype parameter. See documentation on 3-arity form of function.
public static java.util.Map mean(java.lang.Object colname)
mean reducer
public static java.util.Map sum(java.lang.Object colname)
sum reducer
public static java.util.Map min(java.lang.Object colname)
min reducer
public static java.util.Map max(java.lang.Object colname)
max reducer
public static java.util.Map stddev(java.lang.Object colname)
stddev reducer
public static java.util.Map variance(java.lang.Object colname)
variance reducer
public static java.util.Map first(java.lang.Object colname)
reducer that keeps the first value
public static java.util.Map last(java.lang.Object colname)
reducer that keeps the last value