Hacker News new | past | comments | ask | show | jobs | submit login

"The single requirement of all data pipelines is that they cannot lose data."

Unless the business value of data is derived after applying some summary statistics, than even sampling the data works, and you can lose events in an event stream, while not changing the insight gained. Originally Kafka was designed to be a high throughput data bus for analytical pipeline where losing messages was ok. More recently they are experimenting with exactly once delivery.




Yeah, this was a major overstatement. There are lots of data pipelines where it's ok to lose some data. Consider a sensor that sends measurements hundreds of time a second to an app that operates on a 1-second timeframe. And UDP is used all the time on the internet, yet carries no delivery guarantee.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: