tech.v3.dataset.rolling
Implement a generalized rolling window including support for time-based variable width windows.
expanding
(expanding ds reducer-map)
Run a set of reducers across a dataset with an expanding set of windows. These will produce a cumsum-type operation.
rolling
(rolling ds window reducer-map _options)
(rolling ds window reducer-map)
Perform a rolling window operation appending columns to the original dataset.
- ds - src dataset.
- window - either an integer for fixed window sizes or a map describing the window
operation containing keys:
:window-type
- either:fixed
or:variable
. For variable window operations:column-name
must be a monotonically increasing column.:window-size
- for fixed window operation must be a positive integer. For variable window operations must be a double value which is produced via a comparison function.:relative-window-position
- describes where the window is positioned. Operations are:left
,:center
,:right
and defaults to:center
for fixed and:right
for relative window types.:edge-mode
- for fixed windows describes what values to fill in at the edges of the source column. Options are:zero
which is 0 for numeric types andnil
for object types and:clamp
which fills in the first,last values of the column respectively. Defaults to:clamp
.:comp-fn
- if provided must return a double which is the result of comparing the last value of the range to the first which meansclojure.core/-
is a reasonable default.:units
- for datetime types, describes the units of:window-size
and will dictate the numeric space if:comp-fn
is not provided.
- reducer-map - A map of result column name to reducer map. The reducer map is a
map which must contain at least
{:column-name :reducer}
where reducer is an ifn that is passed each window. The result column is scanned to ascertain datatype and missing value status. Multi-column reducers are supported if column-name is a vector of column names. In that case each column's window is passed to the reducer. The reducer can also specify the final datatype if:datatype
is a key in the map. Beware, however, that this disables missing value detection for integer datatypes.
Fixed Window Examples:
user> (def test-ds (ds/->dataset {:a (map #(Math/sin (double %))
(range 0 200 0.1))}))
#'user/test-ds
user> (ds/head (ds-roll/rolling test-ds 10 {:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----:|-----------:|
| 0.00000000 | 0.09834413 | 0.0 | 0.38941834 |
| 0.09983342 | 0.14628668 | 0.0 | 0.47942554 |
| 0.19866933 | 0.20275093 | 0.0 | 0.56464247 |
| 0.29552021 | 0.26717270 | 0.0 | 0.64421769 |
| 0.38941834 | 0.33890831 | 0.0 | 0.71735609 |
user> (ds/head (ds-roll/rolling test-ds
{:window-type :fixed
:window-size 10
:relative-window-position :left}
{:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----:|-----------:|
| 0.00000000 | 0.00000000 | 0.0 | 0.00000000 |
| 0.09983342 | 0.00998334 | 0.0 | 0.09983342 |
| 0.19866933 | 0.02985027 | 0.0 | 0.19866933 |
| 0.29552021 | 0.05940230 | 0.0 | 0.29552021 |
| 0.38941834 | 0.09834413 | 0.0 | 0.38941834 |
user> (ds/head (ds-roll/rolling test-ds
{:window-type :fixed
:window-size 10
:relative-window-position :right}
{:mean (ds-roll/mean :a)
:min (ds-roll/min :a)
:max (ds-roll/max :a)}))
_unnamed [5 4]:
| :a | :mean | :min | :max |
|-----------:|-----------:|-----------:|-----------:|
| 0.00000000 | 0.41724100 | 0.00000000 | 0.78332691 |
| 0.09983342 | 0.50138810 | 0.09983342 | 0.84147098 |
| 0.19866933 | 0.58052549 | 0.19866933 | 0.89120736 |
| 0.29552021 | 0.65386247 | 0.29552021 | 0.93203909 |
| 0.38941834 | 0.72066627 | 0.38941834 | 0.96355819 |
user> ;;Multi column reducer
user> (ds/head (ds-roll/rolling test-ds 10
{:c {:column-name [:a :a]
:reducer (fn [a b]
(Math/round
(+ (dfn/sum a) (dfn/sum b))))
:datatype :int16}}))
_unnamed [5 2]:
| :a | :c |
|-----------:|---:|
| 0.00000000 | 2 |
| 0.09983342 | 3 |
| 0.19866933 | 4 |
| 0.29552021 | 5 |
| 0.38941834 | 7 |
Variable Window Examples:
user> (def stocks (ds/->dataset "test/data/stocks.csv" {:key-fn keyword}))
#'user/stocks
user> ;;variable window column must be monotonically increasing
user> (def stocks (ds/sort-by-column stocks :date))
#'user/stocks
user> (ds/head stocks)
test/data/stocks.csv [5 3]:
| :symbol | :date | :price |
|---------|------------|-------:|
| AAPL | 2000-01-01 | 25.94 |
| IBM | 2000-01-01 | 100.52 |
| MSFT | 2000-01-01 | 39.81 |
| AMZN | 2000-01-01 | 64.56 |
| AAPL | 2000-02-01 | 28.66 |
user> (ds/head (ds-roll/rolling stocks
{:window-type :variable
:column-name :date
:units :days
:window-size 3}
{:price-mean-3d (ds-roll/mean :price)
:price-max-3d (ds-roll/max :price)
:price-min-3d (ds-roll/min :price)}))
test/data/stocks.csv [5 6]:
| :symbol | :date | :price | :price-mean-3d | :price-max-3d | :price-min-3d |
|---------|------------|-------:|---------------:|--------------:|--------------:|
| AAPL | 2000-01-01 | 25.94 | 57.70750000 | 100.52 | 25.94 |
| IBM | 2000-01-01 | 100.52 | 68.29666667 | 100.52 | 39.81 |
| MSFT | 2000-01-01 | 39.81 | 52.18500000 | 64.56 | 39.81 |
| AMZN | 2000-01-01 | 64.56 | 64.56000000 | 64.56 | 64.56 |
| AAPL | 2000-02-01 | 28.66 | 56.49750000 | 92.11 | 28.66 |
user> (ds/head (ds-roll/rolling stocks
{:window-type :variable
:column-name :date
:units :months
:window-size 3}
{:price-mean-3d (ds-roll/mean :price)
:price-max-3d (ds-roll/max :price)
:price-min-3d (ds-roll/min :price)}))
test/data/stocks.csv [5 6]:
| :symbol | :date | :price | :price-mean-3d | :price-max-3d | :price-min-3d |
|---------|------------|-------:|---------------:|--------------:|--------------:|
| AAPL | 2000-01-01 | 25.94 | 58.92500000 | 106.11 | 25.94 |
| IBM | 2000-01-01 | 100.52 | 61.92363636 | 106.11 | 28.66 |
| MSFT | 2000-01-01 | 39.81 | 58.06400000 | 106.11 | 28.66 |
| AMZN | 2000-01-01 | 64.56 | 60.09222222 | 106.11 | 28.66 |
| AAPL | 2000-02-01 | 28.66 | 57.56583333 | 106.11 | 28.37 |