Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Deepstream.io – open-source real-time server with pub/sub and data sync (deepstream.io)
301 points by troika on Feb 10, 2016 | hide | past | web | favorite | 83 comments



Interesting self-hosted alternative to Firebase. I have some questions (as someone seriously evaluating this for use)!

  * How mature would you say your code is?
  * Is anyone notable using this in production?
  * What kinds of bugs have you seen so far, and what do you think the
    biggest obstacles will be for you in the future?
  * Is anyone working on this full-time? Do you plan to make money off of this?


Great questions

* How mature would you say your code is?

quite :-)

* Is anyone notable using this in production?

Yes, there are quite a few production use-cases. The latest app going into production using deepstream io for both web and mobile is https://briteback.com/

* What kinds of bugs have you seen so far, and what do you think the biggest obstacles will be for you in the future?

Mobile devices with flaky connections are always a challenge. So are many users simultaneously working on very small datasets

* Is anyone working on this full-time? Do you plan to make money off of this?

Yes. We're a full time company, based in Berlin. We'll eventually add a PaaS offering based on deepstream (deepstreamhub.com), but that is still quite a bit down the line.


"Yes. We're a full time company, based in Berlin. "

I think you need an imprint according german law then.


The "Under the hood" visualization is awesome. I think more projects need to take the time to communicate details like this (without just big blobs of text or links to big piles of source code without a high-level overview).

cheers!

Edit: link directly to the diagram here -- http://www.deepstream.io/#segment-diagram


Personally, I think the time spent on such visualizations is better spent on proper documentation, which is often inadequate. And by proper documentation, I don't mean one sentence per function. Also more useful, I think, is to enhance the documentation with comments for feedback or general user discussions. In that case, if the documentation is lacking, at least the user base can educate itself.

Just my personal opinion.


A picture tells a thousand words. If anything, most opensource projects fail to do proper marketing. This a great counterexample.

Having said that, they of course aren't mutually exclusive, but I can see both sides of the argument.


> If anything, most opensource projects fail to do proper marketing.

Can you explain why opensource projects need such visual marketing? Shouldn't we choose the tools which are simply better in a technical sense, rather than the tools which have the flashiest pictures?


Not the parent.

But I can't count the number of times I've gone to an open-source project and couldn't even figure out what it did.

And I don't consider that to be the lowest bar – any project that wants to get decent adoption should not only tell me what it does, but tell me why I should use it and not it competition. And do both of these things without me having to read the source code.

Just this week - well perhaps last - I was at the homepage of some new language – and I couldn't find any of this nor a code example. This might be a technically fantastic language but if they don't communicate that adoption is going to suffer - and when it comes to communicating pictures have a place.


I can't remember seeing gnu.org showing flashy pictures back in the 90s, yet they had massive adoption.


It had word of mouth and was the only free implementation of an OS that ran ok. Now there are 50,000 tools all free and no one hears what anyone says with all the noise.


Ouch, blatant false dichotomy.


Wrong. Marketing is all about convincing, nothing more. Have you ever heard somebody say "I prefer product X because it has such good marketing"?


I think you're doing a disservice to marketing. Whilst I think a lot of it is just made-up hocus pocus as much as the next engineer =), how you present something is important. It's like being a genius inside your head, not not being able to express yourself.

Ultimately, if you combine a good technical foundation with solid communication, you have a winning combination.

Like it it not, that's how the world works. Nobody cares if you have great code and awesome tests, if you have poorly written documentation and you can't tell somebody what you do in a few sentences. Don't underestimate how difficult it is to get these last two right.


But in this case, communication is not to random people but to fellow engineers.

I've never heard anybody complain about the graphical communication of gnu.org, yet everybody seems to have been using the gnu C compiler for the last decades. Just an example.

Finally, have a good look at that picture: http://www.deepstream.io/#segment-diagram And be honest, what does that tell you as an engineer? And what would you take from it as a "random person"?

Is that what you call "solid communication"?


Perhaps the persons making the flashy images and the persons writing the documentation are not the same.


But their salary source may be. Complicated world.


There is demo video of the parents organisations product.

https://vimeo.com/143728632

There are times I see stuff done in the browser that makes me feel inadequate, this is one of those times.

That interface is incredible.


Incredible indeed.

Is there any info whether Deepstream is used in the stock trading platform and in which scenarios?


It is used for all communication, including live prices, trade execution and sending historic chart data via RPCs.

We are ramping down Hoxton One though to fully concentrate on deepstream.io (and deepstreamhub.com in the future)


That's a shame but damn if that isn't an impressive demo.

What does the future look like for Golden Layout?


That's a challenging question: It's almost exclusively bought by large enterprise customers with a multi step request for quote acquisition flow - so in order to stay economically viable it's either ramping the price up or go for a consultancy model. What do you think?


I guess it depends on the cost of acquisition, I've not dealt with large companies for a while but they can be an epic pain to sell to, on the other hand the consultancy model is a hard thing to get right, the better your product is documented and easier to use the less likely you are to get the consulting work which is a kinda perverse incentive (cynically some of the the enterprise stuff I've run across seems to have lots of value and horrible documentation).

I guess it comes down to what percentage of the larger companies would want to pay the consulting/support rates, We don't pay support for anything since we largely use stuff that is well documented or very popular largely because we've been burnt on the consulting stuff before (no issue with paying a consultant on a product but when they are less technically competent than the person they are supposed to be helping that stings). Since you are the author that is extremely unlikely to be an issue though.

I guess the upside of going the pure consultancy route is in adoption, if a company can adopt your product commercially and then it becomes a core part of their offering they are more likely to pay since it becomes a critical part of their business, that's always a nice place to be - ask Microsoft/IBM :).


Any plan to open source Hoxton One? I got so excited by the demo that it made me check out your careers page.


What is the algorithm used for JSON synchronization? Operational Transformation, CRDT, diff/patch?


Currently if a conflict occurs it’s reported back to the client that triggered it and its up to it to resolve it. We are currently working on configurable merge strategies on a per record level


Why would you need some complicated algorithm for this?


Great question. The granularity of your synchronization impacts the user experience, and sometimes the product offering itself. It can go from "replace the whole object" all the way to "string operations are merged without conflicts to match user intention", with "centralized resource locking" somewhere along the way.

The goal product-wise is to ensure intention preservation, but that is hard to achieve in general as intention is tied to what the product does.

For instance, if you have an integer and an increment operation, say, in a HN-like app to track upvotes. If your synchronization simply resets modified values with a newest-win approach, the following situation would lose an upvote: you have 5 upvotes, A and B both upvote you simultaneously, each setting the number of upvotes to 6. The server tells A that the number of upvotes has been set to 6, then tells B the same thing, and they are both happy. However, product-wise, it should have been 7 instead.


User A and B both have document X version 1. They both edit this before synchronisation and thus they both have their own version 2. What do?


So what is an easy algorithm then? Keep in mind, "newest update wins" doesn't even begin to cover all possible use cases.


Easiest way I found is to use an "append-only" mentality to avoid conflict entirely. So, to take the vote example, using:

  [
    {ts: <upvoted_time>, author: <id_of_author>. action: 'up'},
    {ts: <upvoted_time>, author: <id_of_author>, action: 'down'},
    {ts: <upvoted_time>, author: <id_of_author>, action: 'up'}
  ]
You can sync from the server the changes after a certain timestamp, and the client can calculate the accurate value.


That common approach is both elegant and tricky. It requires very precise clocks and time synchronization, as some non-commutative operations are order-dependent (such as list insertion).

That is why Google's TrueTime API (introduced in Spanner[0]) is such a big deal.

[0]: http://static.googleusercontent.com/external_content/untrust...


Actually, for most use cases, newest update wins is sufficient if it can be done on a fine granular basis (one property of an object/document). It's what web applications have been doing for forever and being a 'realtime' framework doesn't change this if your use case isn't something like google docs.


If you like this, also take a look at RethinkDB: https://www.rethinkdb.com/


they wrote a blogpost about how to use both together: https://rethinkdb.com/blog/deepstream/


They also have a nice RethinkDB storage connector and search provider.

[0] - https://github.com/deepstreamIO/deepstream.io-storage-rethin...

[1] - https://github.com/deepstreamIO/deepstream.io-provider-searc...


Is there some way to get this working with Safari's terrible audio (don't even get me started on video) streaming api?

I know this isn't your fault, and I'm not holding my breath. But the landing page is so beautiful, it felt like there was a chance you had answered my prayers haha!


Excellent question :-)

Most important tip for any WebRTC call: Mute your own audio output, that completely ruins it. Other than that, here's a list of tips we've found useful when building the WebRTC feature: http://deepstream.io/tutorials/webrtc-tips.html


I guess I was wondering if it was possible to request an audio stream from a safari user (rather than play one to them).

My understanding is that the html5 api calls for this return null (i.e. they are unimplemented).

I guess I was hoping that you guys had polyfilled in a (ew) flash plugin which is the only way to do this at the moment, giving google hangouts a monopoly.

Maybe I have to write a framework ha! Even better, expose an api for me to build this into via a community package and make wrappers for meteor! (lol)


@wolframhempel or whoever is making this. Thanks for the making it. However, your webrtc demo doesn't work: both of the links are broken on github: https://deepstream.io/tutorials/webrtc.html

Can we see some live working demo.

Also the example on that page is also not working because adaptor.js is not found, but it says that your browser is not webrtc compatible.


Cheers, will look into this


There is also Meatier https://github.com/mattkrick/meatier which is a more loosely-coupled alternative to Meteor.


Looks interesting. What is the maturity of it?


It's a new project, but it's made up of very established modules and libraries. It's an aggregation of a bunch of tools which work well together but without locking you in to any of them.


Is this similar to firebase https://www.firebase.com/?


Yes, but there are a number of core differences

- Deepstream.io is a free open source server, not a PaaS offering

- Deepstream offers pub-sub, request-response and web-rtc call management in addition to data-sync

- Firebase’s data-sync approach is based on one large chunk of JSON data that allows you to observe and manipulate sub-paths. Deepstream does the same, but breaks the data down into individual units, called records

- Deepstream uses a functional permissioning model, allowing you to interface with other systems (data-bases / active directory) for user-management, as opposed to firebase’ configuration based permissioning approach


Does look very good. Are their any stress tests available? How many "devices" can an instance handle (yes I know this is arbitrary).


Yes, you can find the results at http://deepstream.io/info/performance-single-node-vs-cluster..., as well as a link to the test harness if you want to run the tests yourself


Question, how can I expand Deepstream.io? For instance I want it to become an install-able API server for blockchain (like Ethereum). How do I do it? Is there a plugin architecture in place? Would I have to modify the main source code?


deepstream can connect to data-bases, caches and message busses using a simple plugin architecture. Please find a list of available ones at

http://deepstream.io/download/

tutorials how to use them at

http://deepstream.io/tutorials/connectors-and-deployment.htm...

and help with writing your own at

http://deepstream.io/tutorials/writing-storage-cache-connect... http://deepstream.io/tutorials/writing-messaging-connector.h...


Funny to see so many pubsub things on the home page today. I just finished making a NodeJS CLI app that syncs the song you are playing on spotify to others (https://github.com/jonovono/spotluck)

I wanted a pubsub service so I could send the songs around. I wanted to be able to use it in a open source app and ideally client to client.

The one I ended up going with was Faye (http://faye.jcoglan.com/) but I tried a few others.

Will take a look at this! Looks neat.


Although I could only look at the surface up to now, this looks really interesting!

I always wondered why there are dozens of RPC-only and Pub/Sub-only protocols out there and nothing that is really suitable for record/property synchronization. Because this one important feature in my-domain I have implemented something quite similar to deepstream for my needs, but currently a little bit more limited (pure client-server model, there are no client-hosted RPC methods, no auth, etc.). I chose to make synchronization unidirectional, which avoids the need for merging and conflict resolution and works good for my use cases. There I would have a model where I say the client must call an RPC method on the server, that method would change a property and this would then get synced back to client. I guess that would also be possible with deepstream, but I woul need to configure it in order to let only a single client manipulate the property.

Some tech questions: - Are there any ordering guarantees? E.g. between different RPC calls as well as between RPC calls and messages and property updates? Giving those guarantees can increase the difficulty in implementation a lot, but reduces complexity in API design. - Have you also thought on meteor like optimistic updates? I thought it can probably be implemented by putting some additional information in the results of RPC calls - some information about which properties where updates as a side effect of the call and in which version of these properties the updates are applied.


I skimmed the docs, but I'm unsure about the semantics. The pubsub, what happens to slow clients? The same for synchronization.

Basically - how it behaves when the clients are not behaving.


Records use a version number for every update to keep track of missing messages. Events are currently unversioned and won't be queued etc. while a client is offline.


How is it better than socket.io or http://socketcluster.io/


Have a look at this comment

https://news.ycombinator.com/item?id=11074511

he explains it in a very good way


Really nice to see Deepstream being on the front page of HN! I'm currently using Deepstream in production for one of our clients in-house applications, so unfortunately I can't share a link. I'd like to give a few words on why I chose Deepstream.

TL;DR Deepstream is really good!

When I first found Deepstream and scimmed through the docs, I instantly noticed how thoroughly planned and well-thought it was. The features Deepstream offers ended up making us able to drop REST HTTP completely. Normally when you build a back-end, you have all your application logic in one place (authentication, CRUD API, sending emails, etc) but the way RPC's and Providers in Deepstream are implemented, you want and can easily write your back-end as microservices. For example, you build a client that recieves RPC calls for hashing passwords, and if that client starts getting under heavy load, you can instantly spin up another instance of that client and connect to the Deepstream server, register for the RPC and Deepstream server will then distribute the RPC requests evenly among the two instances. Normally when you scale, you scale the entire back-end. Deepstream allow you to scale parts of your back-end really easily (e.g the part that sends emails, the part that hashes passwords or the part that crops & scales images). This allows for some really efficient fine-tuning of your back-end.

Another awesome feature is Providers. Providers can do a lot of things, for example transform a standard third-party HTTP REST API (e.g Facebook) into an integrated record list in Deepstream that is synchronized in real-time. Another example is writing an efficient real-time provider that integrates with literally anything - filesystem, databases, logs, hardware utilization, nginx requests, third-party API:s, w/e. The capabilities of Providers are endless.

Deepstream is insanely fast - I've worked with Meteor & other real-time solutions before and Deepstream simply outperforms them by a long shot. This is mainly thanks to the Cache Connectors that are available, for example the Redis Cache Connector. Writing Storage- or Cache Connectors is really simple and straight-forward and seamlessly integrates with any kind of storage you want. Heck you can even write a Storage Connector that stores its data in Firebase if you want to be really crazy (and idiotic). Or maybe in a GitHub repository? These are just stupid examples to illustrate how extendable Deepstream really is.

I've just scratched the surface of Deepstream, so I really suggest that you go explore it yourself!

I see a lot of questions regarding how Deepstream differs from Socket.io, SocketCluster, Meteor, etc. I'll go through each respectively a bit:

Socket.io is nothing more than a WebSocket server. Both Deepstream and Socket.io uses Engine.io (can be configured in Deepstream) behind the scenes, but comparing the two would is not really optimal as Deepstream is Socket.io plus a ton of additional features and performance improvements. You can technically re-build Deepstream by using Socket.io if you want.

Meteor is quite similar, but it has additional features (like being a full-stack framework) and their implementation of real-time sync is really bad. Meteor is by default locked-down to MongoDB and it's a pain to write support for any other database as they've also locked you down to the fibers coroutines and you need a client-side implementation of it as well for caching purposes. The number of connected clients the two supports is massively in Deepstream's favor, and Meteor doesn't even come close to being as distributable as Deepstream is. Believe it or not, but I used to work with Meteor until I found Deepstream.

SocketCluster and Deepstream are very similar, and unfortunately I haven't tried SocketCluster apart from reading the docs but to me, Deepstream is more developer friendly and straightforward and looks to have a better implementation. Since SocketCluster is on the front-page of HN as well, just looking at the live example they have on their website shows that it's not able to keep up. The WebSocket connection is dropping a lot.

For those of you who use front-end frameworks like Angular/Aurelia/React/Vue/etc you'll find it super simple to integrate Deepstream's client directly into the bindings of the framework. For example (Aurelia): `<input type="text" value.stream="pets.husky.favoriteToy">` would setup three-way bi-directional databinding to `pets/husky` on the property `favoriteToy`. (I'm currently in the process of doing exactly this for Aurelia)


Thanks, brilliant point about deepstream's role in a micro service architecture


Do you have any suggestions how to get started with deepstream + polymer (webcomponent library by google www.polymer-project.org/1.0/ )? I have been doing some research on your website and I found toutorials for every major frontend framework except polymer, I would really appreciate a toutorial as a starting point for further projects, deepstream looks very promising to me!


How does it compare to Meteor? And about the claim: 'exceptionally fast'. What does this mean? Compared to what? Any benchmarks?


deepstream.io (in difference to Meteor) isn't a full framework. The server can easily connect to a wide range of databases, caches or messaging systems. Similarly the client library can be used with React, Angular, Knockout or whatever else your heart desires.

In terms of speed, please find performance test results here http://www.deepstream.io/info/performance-overview.html as well as a test harness that allows you to replicate them.


How does this compare to EventSource in terms of pubsub functionality? Are messages persisted in any way?


API-wise deepstream.io's pub-sub mechanism is more comparable to javascript event emitters. Events in deepstream are a one-off messaging concept, for persistent data, just use data-sync with records


After a streak of HN articles loading slow or crashing my old mobile browser, your site loaded instantly! I have a heuristic bias that whether a web-tech company's web site crash my mobile browser or not, reflects the quality of their technology.


Hehe, thanks. I'll pass it on to AWS cloudfront



Beyond using a JSON model, what differentiates this so much from other key/value stores (e.g. Redis)?


deepstream is a realtime server for data-sync, pub-sub, request-response and WebRTC. That's a very different thing from Redis (although deepstream can use Redis for caching and message distribution)


How is security dealt with? I tried to find it in the documentation, but without luck so far.


- deepstream supports encryption via https / wss/ ssl

- once a connection is established each client has to login. The server then decides whether to accept or reject the connection based on the authentication and connection data

- from thereon, every incoming message is authenticated independently

all this happens in a "permissionHandler". please find more information here

http://deepstream.io/tutorials/authentication.html

http://deepstream.io/tutorials/permissioning.html


297KB (114KB minified) seems a bit much for a client library, given its feature set.


"We know you're worried about security, and so are we."

Please enable https for your website. Even the fact that I am looking at your site might interest others on or near my network.

Thank you.


Can https hide that you look at a web site? How?


It can. The domain name is not sent in the clear in an HTTPS request. Instead, the connection is opened against the IP address and the domain name is moved to a Host: header which is encrypted.


SNI sends the server name in plain text


Correct (I considered explaining SNI but ultimately didn't bother), and a determined attacker could probably figure out a large number of the sites you visit by the IP as well.


Right, and even still it's pretty trivial to reverse a domain from an IP address in most cases.


how does this compare to express/koa/hapi/strongloop/socket.io?


- express, koa and hapi are http server frameworks, that's just a very different thing

- strongloop offers pub/sub capabilities as well, but is also first and foremost a service for rest apis

- socket.io is a transport layer with a low level concept of "rooms" to establish pub/sub behaviour. Deepstream and Socket.io actually both use engine.io for browser communication


I like it


interesting! does anyone know anything like this but for the jvm?


The deepstream client can be written in any language due to it's messaging spec. There actually happens to be someone writing the client in Java now as well! You can find more info here: https://github.com/deepstreamIO/deepstream.io/issues/66


Looks like "deep" is new "cloud" :)


Nice, I'll investigate if Deepstream makes sense for AppShare, another realtime framework: https://github.com/zubairq/AppShare




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: