
Ask HN: Recommendation for time serie database for ML? - malms
Hello,<p>We have a system with a lot of telemetry: logs of &quot;device use&quot; with many log fields (positions, speeds...).<p>We want to make them available easily for read &#x2F; write, but also for machine learning. Specifically it should be possible to extract easily some parts of logs that correspond to some specific events, a bit like &quot;np.where(field1 = 3 and field2 = 5)&quot;.<p>Another challenge is that logs are messy: some values are present in some old logs and not anymore or the contrary... Or some values sometimes are constant for a whole log and it would be nice to be able to use such structure to lower the disk size.<p>Which database system could be both flexible enough in terms of field values &#x2F; queries ? PostgreSQL &#x2F; Influxdb &#x2F; HAdoop &#x2F; Elastic ? I have seen good reviews of all of them and I am not yet convinced of any advantage of one over the other because I guess the devil is in the details.<p>Thanks !
======
hostedmetrics
You're going to need a data transformation step to turn your data into a
consistent format that your ML can read. AWS's ML, for example, can read CSV
files from S3, among other choices.

Logs are messy. To extract clean data from them, you'll need some messy
logic/code. You'll want to pull that out into one unit that can be reasoned
about in isolation.

