Hacker News new | past | comments | ask | show | jobs | submit login

Data pipelines are typically used to translate data from whatever format the system that produces it speaks into a format that's useful for querying.

As an example you may want to take server request logs and write them to a Postgres table for querying, in which case you'd have something like this:

    Server Logs -> S3 -> Lambda which reads new logs to extract key fields -> Postgres
Once that's done you end up with a database table containing rows for things like URL, source IP, response time. You'd probably also normalise URLs so that /products/123, /products/123/ and /products?id=123 come out as the same thing for analysis.



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: