tvm-clj.schedule
After describing the algorithm, the user creates a ‘schedule’ for the algorithm which involve transformations to the algorithm that are guaranteed not to change the results such as the tiling a computation across a tensor.
inline-op
(inline-op schedule src-op dst-op rel-axis)
Inline an operation on the axis given. If axis is a number, then positive numbers increment left-to-right while negative numbers increment right-to-left in python semantics of the destination axis.
rel-axis defaults to -1, or the most-rapidly-changing index.
schedule-cache-write
(schedule-cache-write schedule tensor cache-type)
Returns a new tensor
stage-bind-gpu
(stage-bind-gpu stage block-axis-seq thread-axis-seq)
Bind the gpu-defined axis to the tvm axis. GPU (cuda, opencl) define a roughly level stage breakdown of axis: block and thread. Threads run on the same block and can share a special kind of memory (called shared memory). There can be up to 3 tvm axis per block or thread and these are labeled (outer iterator to inner iterator): [z y x]
stage-compute-at
(stage-compute-at src-stage dst-stage dst-axis)
Compute src stage at dst stage dst axis
stage-gpu-injective
(stage-gpu-injective stage op & {:keys [thread-count axis], :or {thread-count 16}})
stage-parallel
(stage-parallel stage axis)
Indicate that this axis has complete parallelism