|I've been studying and experimenting with this as a hobby and want to get more serious now.|
I've been storing using flat files but I now want to experiment with scaling out my ideas and infrastructure to collect my own data raw 24/7 across the globe.
I see various time series databases to use but this doesn't seem clear to me on a winner. I looked at influx, timeseriesdb, and various others. Most of them have material geared towards IoT and not much financial.
I've been considering a stack built entirely on GCP that looks roughly like:
regional injestor (compute) -> pub/sub -> Dataflow -> pub/sub -> firestore and BigQuery
The idea is to allow clients to subscribe to prebuilt aggregation metrics from dataflow/beam and optimize for latency cross-regionally. The automated rules at the most would need to react in seconds not milliseconds. I would be more than happy with a guaranteed rolling window of 5-15 seconds for my most time hungry decisions.
Basic aggregations: OHLC, stdev
Advanced aggregations: values based on custom strategies that would be injected into the feed for a client (automated trading app) to consume and act on.
Is it crazy to do all the rolling window / strategy calculations in the airflow piece of the architecture or does that make more sense in comparison trying to compute it per client?
Visually I am imagining various signals/strategies would be separate airflow templates and a client would subscribe to whatever strategy it wants to use.