A timeseries db may work fine. 20 million per day isn’t bad, unless you expect huge spikes in a short duration.
What do your rows look like? What time ranges do you expect to query? Does your data arrive in a stream or large batches? This will help determine a rough IO and storage range needed (or at least worst case).
At this scale and going for low cost you have some trade offs that could help reduce costs and improve performance. I would try to keep it simple and test something like timescale with dummy data first. Chunks of Parquet or ORC data and Presto/Athena may also work depending on your workload.
That aside, other things may help make it even easier for you. Can you reduce the problem any (e.g. reduce number of rows you need)?
For example:
What is the minimum time period granularity you need for a single datapoint? Lower res = fewer rows.
Can you aggregate upon ingest?
For distributions can you use approximations (eg t-digest) or do you need exact values?
Rows will be at least (id, other_id, price_1, price_2), prices should be 32 bit integers, id should be 64 bit or string integers and other_id can be 32 bit int. id and other_id should reference other things. Or if normalization is not possible then it would add up to bunch of names and other metadata, 6 strings more at minimum, but more like 10 strings or so.
The data will be a stream basically.
Granularity is quite important, but I think older data could be made less granular and turned into approximates.
What do your rows look like? What time ranges do you expect to query? Does your data arrive in a stream or large batches? This will help determine a rough IO and storage range needed (or at least worst case).
At this scale and going for low cost you have some trade offs that could help reduce costs and improve performance. I would try to keep it simple and test something like timescale with dummy data first. Chunks of Parquet or ORC data and Presto/Athena may also work depending on your workload.
That aside, other things may help make it even easier for you. Can you reduce the problem any (e.g. reduce number of rows you need)?
For example:
What is the minimum time period granularity you need for a single datapoint? Lower res = fewer rows.
Can you aggregate upon ingest?
For distributions can you use approximations (eg t-digest) or do you need exact values?