
Amazon Kinesis Firehose – Simple and Scalable Data Ingestion - hepha1979
https://aws.amazon.com/blogs/aws/amazon-kinesis-firehose-simple-highly-scalable-data-ingestion/
======
djhworld
It says at the bottom of the post that this service is available now, I can't
see it in the console though, at least in eu-west-1 and us-east-1

~~~
ceejayoz
It often takes several hours for AWS to roll it out to the various consoles.

------
marcog1
At Asana, we've been beta testing Kinesis Firehose. It's been quite convenient
not having to manage much, and having the data end up in S3. We're also using
Kinesis streams, and have a simple KCL app to pull from streams and write to
Firehose. We're looking forward to when streams can be as easy to manage, or
when we KCL apps can read from Firehose.

~~~
nolite
I'm confused.. why would you need a KCL app with Firehose (for reading or
writing)?

~~~
nolite
Ah.... because AWS built these as two separate user-facing services, with
completely different API's and no way to use their common underlying
foundation together.... go Amazon...

------
nivertech
I don't understand why Kinesis Firehose isn't more tightly integrated with
Kinesis Streams and have unified API.

I want to be able both reading at different offsets in the stream AND backup
it to S3 or ingest into Redshift.

With current offering I need to duplicate data into two different services
with different APIs.

~~~
cherioo
To avoid interference between user application and theirs? Kinesis has pretty
delicate throttle per shard and having user application that is out of their
control working off the same shard will probably make the Firehose part much
more fragile.

If you're already doing all the Kinesis shard management/KCL willy nilly why
not just dump it to S3 yourself? Firehose seems to be targeting users who
don't want to deal with sharding.

~~~
nivertech
Yes, I figured out that Firehose is more higher-level product, than Kinesis.
Maybe in the future it will be extended with API to read from a delivery
stream.

There are competing products, like Google Cloud Pub/Sub where there is no need
to manage shards manually or run your own workers, like KCL.

------
nivertech

        $ sudo pip install awscli --upgrade
        ...
        $ aws firehose help
    
        FIREHOSE()                                                          FIREHOSE()
    
    
    
        NAME
               firehose -
    
        DESCRIPTION
               Amazon  Kinesis  Firehose  is  a  fully-managed  service  that delivers
               real-time streaming data to destinations such as Amazon S3  and  Amazon
               Redshift.
    
        AVAILABLE COMMANDS
               o create-delivery-stream
    
               o delete-delivery-stream
    
               o describe-delivery-stream
    
               o help
    
               o list-delivery-streams
    
               o put-record
    
               o put-record-batch
    
               o update-destination
    
    
    
        	                                                            FIREHOSE()

------
siliconc0w
This is basically kafka +
[https://github.com/linkedin/camus](https://github.com/linkedin/camus) which
is pretty cool.

------
solofounder1
I have a question: if I were to send say 10000 event objects into a Amazon
Kinesis Firehose stream it's clear that I should expect them to show up in an
S3 bucket of my choosing, but should I also expect that my account will not
incur any S3 HTTP POST API request fees ?

Is dodging those HTTP POST fees the value-add over simply using the S3 HTTP
API yourself ?

~~~
mkobit
Unless I am reading it wrong, it sounds like you need to pay for the requests
as well. From
[https://aws.amazon.com/kinesis/firehose/pricing/](https://aws.amazon.com/kinesis/firehose/pricing/):

> Storage

> You will be billed separately for charges associated with Amazon S3 and
> Amazon Redshift usage including storage and read/write requests. However,
> you will not be billed for data transfer charges for the data that Amazon
> Kinesis Firehose loads into Amazon S3 and Amazon Redshift. For further
> details, see Amazon S3 pricing and Amazon Redshift pricing.

~~~
solofounder1
To me this sounds like a way to avoid the hassle of creating a lambda to
buffer streaming data into chunks before HTTP POSTing them to the S3 API.

But to me it wouldn't make much sense to use Kinesis Firehose unless the fees
were cheaper than what it would cost to utilize AWS Lamdba for the same work.
I mean it can't be all that many lines of nodejs code to pop events off a
stream, flush them into a tempfile in batches, HTTP POST those batches to S3
with appropriate error handling/retry logic.

Admittedly I haven't crunched the math but just by eyeballing their pricing I
suspect it may be more expensive than using your own lamdba. I wonder if
Kinesis Firehose is implemented internally as an AWS Lambda, it wouldn't
surprise me.

I suppose that eventually the pricing on this service will drop once someone
open sources such a lamdba, especially since installing a lambda is so
remarkably simple.

~~~
jhurliman
The only complication with this approach right now is that Lambda scripts
can't communicate with anything inside a VPC (such as most Redshift
instances). I imagine they will fix this issue in the future though.

------
homulilly
I'm still a bit ignorant as to how kinesis works, can someone explain why this
would be preferable to uploading directly to S3?

~~~
andrew311
Good question. Here's one benefit. With Kinesis you can batch a bunch of
writes to S3 that would have otherwise resulted in many small files in S3.

In other words, you can make small writes to Kinesis and then read out in
larger amounts and write larger files to S3. This is a huge optimization for
any job that runs across the data in S3. Many small files can really undermine
performance in something like Hadoop MapReduce because of the additional
request overhead.

------
zkhalique
What are some typical use cases for Amazon Kinesis streams, on the web?

~~~
andrewmunsell
Analytics event or IoT event ingestion, for example. Then, you can hook up AWS
Lambda to the Kinesis stream to actually do any processing/aggregation.

Or, if you're using the new Firehose product, you could also use Lambda and
attach it to the S3 event source using the bucket that the Firehose dumps into
to perform batch processing on these records.

~~~
zkhalique
How is that different than just saving database records?

------
gfodor
Just so I'm understanding, is this a Kafka competitor?

~~~
phillipamann
This looks like an AWS version of Apache Flume and Kinesis looks like an AWS
version of Apache Storm. Does AWS have a Kafka equivalent i.e. a pub/sub
message queue?

~~~
mirceal
Kinesis != Apache Storm The easiest way to think about Kinesis is a managed
queue that can remember the history for the past 24 hours.

~~~
phillipamann
Ah OK. I saw that it kept state so I thought it was like Storm. I don't fully
understand Storm. However, Kafka can not keep state (and neither can Flume).
So Kinesis is a queue that has some state?

------
crabasa
Hmmmm, I certainly hope dang or another admin consolidate all these stories
into a single "Amazon" thread. Can't have multiple stories from a single
company eating up space on the front page.

~~~
ceejayoz
Someone complains about this every time there's an AWS announcement day. Each
product gets substantially different conversations. It'd be very frustrating
to have to skim through dozens of top-level comments to find one on the
service you're interested in.

~~~
crabasa
I hope you detected my sarcasm. I'm 100% with you and was upset when they
consolidated all the MSFT hardware announcement threads the other day.

~~~
ceejayoz
Sorry, I didn't... because someone genuinely complains every time there's a
big AWS dump. :-p

