Railgun: managing large streaming windows under MAD requirements
Authors:
Ana Sofia Gomes,
João Oliveirinha,
Pedro Cardoso,
Pedro Bizarro
Abstract:
Some mission critical systems, e.g., fraud detection, require accurate, real-time metrics over long time sliding windows on applications that demand high throughput and low latencies. As these applications need to run 'forever' and cope with large, spiky data loads, they further require to be run in a distributed setting. We are unaware of any streaming system that provides all those properties. I…
▽ More
Some mission critical systems, e.g., fraud detection, require accurate, real-time metrics over long time sliding windows on applications that demand high throughput and low latencies. As these applications need to run 'forever' and cope with large, spiky data loads, they further require to be run in a distributed setting. We are unaware of any streaming system that provides all those properties. Instead, existing systems take large simplifications, such as implementing sliding windows as a fixed set of overlapping windows, jeopardizing metric accuracy (violating regulatory rules) or latency (breaching service agreements). In this paper, we propose Railgun, a fault-tolerant, elastic, and distributed streaming system supporting real-time sliding windows for scenarios requiring high loads and millisecond-level latencies. We benchmarked an initial prototype of Railgun using real data, showing significant lower latency than Flink and low memory usage independent of window size. Further, we show that Railgun scales nearly linearly, respecting our msec-level latencies at high percentiles (<250ms @ 99.9%) even under a load of 1 million events per second.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
Railgun: streaming windows for mission critical systems
Authors:
João Oliveirinha,
Ana Sofia Gomes,
Pedro Cardoso,
Pedro Bizarro
Abstract:
Some mission critical systems, such as fraud detection, require accurate, real-time metrics over long time windows on applications that demand high throughputs and low latencies. As these applications need to run "forever", cope with large and spiky data loads, they further require to be run in a distributed setting. Unsurprisingly, we are unaware of any distributed streaming system that provides…
▽ More
Some mission critical systems, such as fraud detection, require accurate, real-time metrics over long time windows on applications that demand high throughputs and low latencies. As these applications need to run "forever", cope with large and spiky data loads, they further require to be run in a distributed setting. Unsurprisingly, we are unaware of any distributed streaming system that provides all those properties. Instead, existing systems take large simplifications, such as implementing sliding windows as a fixed set of partially overlapping windows, jeopardizing metric accuracy (violating financial regulator rules) or latency (breaching service agreements).
In this paper, we propose Railgun, a fault-tolerant, elastic, and distributed streaming system supporting real-time sliding windows for scenarios requiring high loads and millisecond-level latencies. We benchmarked an initial prototype of Railgun using real data, showing significant lower latency than Flink, and low memory usage, independent of window size.
△ Less
Submitted 10 November, 2020; v1 submitted 1 September, 2020;
originally announced September 2020.