Tumbling window
A strategy for processing (stream) data by specific limited frames, usually time periods. It is used to process the last entries of a data stream. A tumbling window divides the data stream into fixed-size, non-overlapping time intervals. Each window collects and processes a fixed number of data items or a fixed duration of data, after which the window is closed and a new window is opened.
What's the difference between a tumbling window and a sliding window?
The difference between a tumbling window and a sliding window is whether the fixed-size time intervals overlap. Tumbling windows are intervals that do not overlap. Sliding windows are intervals that do overlap.
For realtime monitoring you would usually prefer a sliding window over tumbling ones as the latter cut the data in non-overlapping parts: a wrong cut could prevent it from detecting the pattern you are looking for.
How can I perform a tumbling window in Python?
You can use Pathway to perform tumbling window operations on your data:
>>> import pathway as pw
>>> t = pw.debug.table_from_markdown(
... '''
... | shard | t
... 1 | 0 | 12
... 2 | 0 | 13
... 3 | 0 | 14
... 4 | 0 | 15
... 5 | 0 | 16
... 6 | 0 | 17
... 7 | 1 | 12
... 8 | 1 | 13
... ''')
>>> result = t.windowby(
... t.t, window=pw.window.tumbling(duration=5), shard=t.shard
... ).reduce(
... pw.this.window,
... min_t=pw.reducers.min(pw.this.t),
... max_t=pw.reducers.max(pw.this.t),
... count=pw.reducers.count(pw.this.t),
... )
>>> pw.debug.compute_and_print(result, include_id=False)
window | min_t | max_t | count
(0, 10, 15) | 12 | 14 | 3
(0, 15, 20) | 15 | 17 | 3
(1, 10, 15) | 12 | 13 | 2
"""