
Amazon Kinesis - nphase
http://aws.typepad.com/aws/2013/11/amazon-kinesis-real-time-processing-of-streamed-data.html
======
zhaodaxiong
As a team member helped built the service, I would like to offer some of my
personal understanding. I am not with Amazon now, and all my views are based
on public information on the website.

Like all AWS offerings, Kinesis is a platform. It looks like kafka + storm,
with fully integrated ecosystem with other AWS services. From the very
beginning, the reliability, real-time processing, and transparent elasticity
are built in. That's all I can say.

------
mikebabineau
This is essentially a hosted Kafka
([http://kafka.apache.org/](http://kafka.apache.org/)). Given the complexity
of operating a distributed persistent queue, this could be a compelling
alternative for AWS-centric environments. (We run a large Kafka cluster on
AWS, and it is one of our highest-maintenance services.)

~~~
kodablah
We are about to deploy Kafka in our ecosystem and I am curious what
maintenance you have? Can you explain or write a blog post? Is it on 0.8 beta?

We are choosing Kafka over other solutions like RabbitMQ because we like the
persistent txn-log-style messages and how cheap consumers are.

~~~
mikebabineau
We're running 0.7 and most of our problems have been around partition
rebalancing. I'm not the primary engineer on this, but here's my
understanding:

If we add nodes to an existing Kafka cluster, those nodes own no partitions
and therefore send/receive no traffic. A rebalancing event must occur for
these servers to become active. Bouncing Kafka on one of the active nodes is
one way to trigger such an event.

Fortunately, cluster resizing is infrequent. Unfortunately, network
interruptions are not (at least on EC2).

When ZooKeeper detects a node failure (however brief), the node is removed
from the active pool and the partitions are rebalanced. This is desirable. But
when the node comes back online, no rebalancing takes place. The server
remains inactive (as if it were a new node) until we trigger a rebalancing
event.

As a result, we have to bounce Kafka on an active server every few weeks in
response to network blips. 0.8 alleges to handle this better, but we'll see.

Handle-jiggling aside, I'm a fan of Kafka and the types of systems you can
build around it. Happy to put you in touch with our Kafka guy, just email me
(mike.babineau@rumblegames.com). Loggly's also running Kafka on AWS - would be
interesting to hear their take on this.

~~~
idunno246
Pushing a couple terabytes a day through kafka 0.7. We don't use zookeeper on
the producing side and it alleviates this a lot. It's a little more brittle
pushing host/partition configs around, but we accepted loss of data in this
system and its worth the simplicity of it. Also played with the idea of
putting an elb in front.

I'm having way more trouble with the consumer being dumb with the way it
distributes topics and partitions. End up with lots of idle consumers, while
others are way above max.

~~~
mikebabineau
Thanks for the note, we'll have to take a look at that sort of configuration.

Your consumer problems sounds similar to one we had. Root cause was that the
number of consumers exceeded the number of active partitions. The tricky part
was that the topic was only distributed across part of the cluster (because of
the issue described in my parent post), so we had fewer partitions than we
thought.

------
pvnick
What's going on with Amazon recently? We're seeing a torrent of new
technologies and platform offerings. Are we finally catching a glimpse of
Bezos's grand scheme?

~~~
skorgu
Amazon's reinvent conference[0] has been going on over the last few days, it's
an obvious time/place to announce.

[0] [https://reinvent.awsevents.com/](https://reinvent.awsevents.com/)

~~~
pvnick
Oh, derp. Well that makes more sense.

------
kylequest
The 50KB limit on data (base64 encoded data) will be a gotcha you'll have to
deal with similar to the size limit in DynamoDB. Now you'll have to split your
messages so they fit inside the Kinesis records and then you'll have to
reassemble them on the other end... Not fun :-)

------
kylequest
Having to base64 encode data is also a bit awkward. They should be passing
PutRecord parameters as HTTP headers (which they are already using for other
properties) and let users pass raw data in the body.

------
itchyouch
It's interesting to see these messaging platforms and the new use cases
starting to hit the mainstream a la kinesis, storm, kafka.

Some interesting things about these kinds of measaging platforms.

Many exhanges/algo/low-latency/hft firms have large clusters of these kinds of
systems for trading. The open source stuff out there is kind of different from
the typical systems that revolve around a central engine/sequencer (matching
engine).

There's a large body of knowledge in the financial industry on building low-
latency versions of these message processors. Here's some interesting
possibilities. On an e5-2670 with 7122 solarflare cards running openonload,
its possible to pump a decent 2M 100byte messages/sec with a packetization of
around 200k pps.

Avergae latency through a carefully crafted system using efficient data
structures and in-memory only stores can pump and process a message through in
about 15 microseconds with the 99.9 percent median at around 20 micros. This
is a message hitting a host, getting sent to an engine, then back to the host
and back.

Using regular interrupt based processing and e1000s probably yields around
500k msgs/sec with average latency through the system at around 100 micros and
99.9% medians in the 30-40 millisecond range.

Its useful to see solarflares tuning guidelines on building uber-efficient
memcache boxes that can handle something like 7-8M memcache requests/sec.

------
carterschonwald
Before I clicked the link I was hoping Amazon was releasing a clone of the
kinesis keyboard. Anyone else have that initial hope? :-)

~~~
rbanffy
I wondered why would Amazon enter the keyboard market...

~~~
ewoodrich
They already have:

[http://www.amazon.com/AmazonBasics-KU-0833-Wired-Keyboard-
Bl...](http://www.amazon.com/AmazonBasics-KU-0833-Wired-Keyboard-
Black/dp/B005EOWBHC)

~~~
cypher543
I could be wrong, but I don't think Amazon actually designs or manufactures
anything under the AmazonBasics brand. It's like buying a "white box" PC from
a company like MSI and reselling it under your own brand name.

------
dylanz
Can someone with enough knowledge give a high level comparison to Kinesis
compared with something like Storm or Kafka?

------
vosper
I'm really excited about this - data streaming has been a crucial missing
piece for building large-scale apps on AWS.

If the performance and pricing are right it's going to relieve a lot of
headaches in terms of infrastructure management.

~~~
cjwebb
Forgive my ignorance, but what would this potentially replace?
Kafka/Storm/Something else?

~~~
hatred
Yep, Amazon's version of Kafka/Storm with pay as you go minus the headaches of
maintaining the cluster.

------
andrewcooke
_it is possible that the MD5 hash of your partition keys isn 't evenly
distributed_

how? i mean, apart from poisson stats / shot noise, obviously (and which is
noise, so you can't predict it anyway).

thinking some more, i guess this (splitting and merging partitions in a non-
generic way) is to handle when a consumer is slow for some reason. perhaps
that partition is backing up because the consumer crashed.

but then why not say that, instead of postulating the people are going to have
uneven hashes?

[edit:] maybe they allow duplicates?

~~~
twotwotwo
Yes, duplicates, I think. Looks like the partition key can be set to whatever
you want, so maybe you log, I dunno, hits sharded by page, and your homepage
gets a ton. I'd lean towards sharding randomly to avoid that, but, eh, they're
just giving you enough rope to mess up your logging pipe with.

------
fizx
Seems like a useful reworking of SQS, but all the hard work is being done in
the client: "client library automatically handle complex issues like adapting
to changes in stream volume, load-balancing streaming data, coordinating
distributed services, and processing data with fault-tolerance."

Unfortunately, there's no explanation of the mechanics of coordination and
fault tolerance, so the hard part appears to be vaporware.

~~~
vosper
> Unfortunately, there's no explanation of the mechanics of coordination and
> fault tolerance, so the hard part appears to be vaporware.

I think it's unfair to call it vaporware - Amazon doesn't tend to release
vaporware. You can also be fairly confident this has been in private beta for
some time, so we'll probably see a few blog posts about it from some of their
privileged (big spending) clients - typically someone like Netflix or AirBnB.
But I agree it would be nice to get some more information on the details.

As for the client library handling load-balancing, fault tolerance, etc - that
might not be ideal, but as long as I don't have to do it myself then it might
be okay.

~~~
fizx
The client handling it is ideal from a systems perspective, because the app
won't forget to be fault tolerant on its connection to the server.

Its less ideal from a maintenance perspective, because there will need to be
feature-rich clients in Java and C (with dynamic language bindings).
Applications will be running many many versions of the clients. Also, for
coordination, the clients will need to communicate, so there may be
configuration and/or firewall issues for the app to resolve.

It will be interesting to see Amazon make this tradeoff for what I believe is
the first time.

~~~
aluskuiuc
It's not exactly the first time, but close - the Simple Workflow Service has
client helper libraries for both Java and Ruby.

------
kylequest
The Kinesis consumer API is somewhat equivalent to the Simple Consumer API in
Kafka. You'll have to manage the consumed sequence number yourself. There's no
higher level consumer API to keep track of the consumed sequence numbers.

~~~
kylequest
Looks like AWS decide to put this capability in their Kinesis Client Library,
which keeps track of the checkpoints in DynamoDB.

------
kylequest
Interesting I/O limitations in Kinesis:

1MB/s writes with 1000 writes/s 2MB/s reads with 5 read/s

~~~
senderista
Per shard.

