Hacker News new | comments | show | ask | jobs | submit login
Using Google Cloud Pub/Sub to Connect Applications and Data Streams (googlecloudplatform.blogspot.com)
117 points by jganetsk on Mar 4, 2015 | hide | past | web | favorite | 30 comments



I had the chance to test it out during the Oscars [1], receiving about 1.2M tweets via Twitter Streaming API in 6 hours, publishing them in Pub/Sub, pulling with another process and storing in BigQuery. I still need to find time to blog about the experience, happy to collect a few extra questions here.

[1] http://ecesena.github.io/oscars2015/


Did you get any feel for the latency between publishing and consumption? I'm definitely interested, but I'd be looking for no more than a few second delay between publish and potential consumption.


We didn't collect precise timing data, but I hope to release the code soon, and it's pretty easy to setup a test similar to ours, e.g. with high volume hashtags such as #nowplaying.

What I can tell you is that we published messages in batches, closing a batch every 5 seconds or 100 tweets (at the highest traffic rate, we were receiving and publishing about 100 tweets every 1-2 seconds). This is way below the quota limits.

On the subscriber side, we received batches of at most 10 messages. I didn't investigate if this is a limit or a config param. The sensation was to receive the messages pretty much instantly (I was looking in parallel at 2 shells, one publishing and one receiving), but again, I don't have precise measurements.


How much did it cost you?


Pub/Sub is currently free. The whole experiment including 2 days of tests, between AppEngine, compute nodes and BigQuery was about 10$.


How did you collect the data?

Did you just use Twitter stream API with track?


Streaming API filtered by #oscars, #oscars2015. Import via tweepy.


Thanks, good to know.


This was one of the most rock-solid internal services that I very much miss, now ex-Google.


Could you elaborate on what you liked about it? Rock solid as in availability, feature set, or other?


I haven't played with the cloud version yet, but internally it's rock solid in terms of its quality. It's very highly available and reliable, extremely and effortlessly scalable, and has dependably low latency.


> and has dependably low latency

I am particularly interested in this part. If you are able to share any anecdotal experiences, I'd love to hear them.


We won't have a cloud SLA until GA but for end-to-end latency our goal is to be only slightly slower than inherent network latency.


ah now that it's been publically announced people might find the mode module I wrote for it useful

https://www.npmjs.com/package/cloud-pubsub


That's great! Looking forward to see it rolled forward to the new API revision (v1beta2, which we don't expect to change significantly). Also, please check our gcloud-node repository out, which also offers idiomatic Cloud Pub/Sub library and the team is planning to move to v1beta2.

gcloud-node: https://github.com/GoogleCloudPlatform/gcloud-node


thank you


I assume this will replace Appengine's Channel API?


I’d suggest Firebase for scenarios that need real-time notification all way to a device, app, or browser.

Google Cloud Pub/Sub is better aligned to server-to-server messaging, similar to other Cloud queuing services, service buses, event logs, or open-source systems such as RabbitMQ or Kafka.


This is a free of charge beta release - in other words, no way to figure out if their pricing model is going to be attractive for your app.

https://cloud.google.com/pubsub/pricing


During the beta period, the service is available for free. Once it comes out of beta, developers will have to pay $0.40 per million for the first 100 million API calls each month. Users who need to send more messages will pay $0.25 per million for the next 2.4 billion operations (that’s about 1,000 messages per second) and $0.05 per million for messages above that.

http://techcrunch.com/2015/03/04/googles-cloud-pubsub-real-t...


This is cheap considering the amount of work if a company build/operates on their own.


Oh man, Pub/Sub, there's a term I haven't heard in a few years. It was a pretty big thing in XMPP [1] not so long ago (standardized only 2010?), and I remember people being disappointed that the HTTP-based and Google-driven PubSubHubbub [2] was gaining popularity faster. In retrospect, this was a sign of XMPP's loss of steam as a general web protocol :(

Google Cloud Pub/Sub is HTTP(S), so I guess it's based on PubSubHubbub?

[1] http://www.xmpp.org/extensions/xep-0060.html

[2] https://code.google.com/p/pubsubhubbub/


From the FAQ [1]: "While Googlers were closely involved in originating PubSubHubbub, its strengths in RSS and content syndication generally are not use cases that Cloud Pub/Sub is designed to address. Aside from the name, they have very little in common."

[1] https://cloud.google.com/pubsub/faq#pubsubhubbub


Yes, you are right. BTW, the use cases are different, but both are using the same core infrastructure that is widely used within Google, which is proven to be scalable and robust :)


Like the idea of future cloud portability? Go ahead and use this service -- Google dares you.


PubSub messaging is definitely a PaaS feature more than an IaaS feature, so it's going to be less portable until we have the problem so well surrounded that we get some sort of open source abstraction to the whole problem and all of it's semantics. So, yes, there is a level of lock-in.

However! I know of a number of PaaS products that provide similar functionality, and with some effort, you can build it in AWS or Azure features, or you can build your own on top of RabbitMQ or Apache projects. The characteristics are going to be different, but it's doable. It might be like a MySQL to Postgres migration, or it might be like a MySQL to Mongo migration, but there _is_ a migration. Using a vendor product with unique advantages as a dependency is a known engineering problem with known risks. Take your dependencies carefully, but it's riskier to take no dependencies and fail to deliver a useful product.


PubSub is conceptually simple. There aren't a ton of moving parts on the consumer side or the producer side.

It'd be pretty easy to swap in different techs if all we're talking about is PubSub.


Used it, hated it, switched to RabbitMQ on GCE, saved my life


Would you be willing to say why you hated it?


I'd be curious to know what you were expecting




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: