Hacker News new | past | comments | ask | show | jobs | submit login
Advancing the realtime web (rethinkdb.com)
222 points by coffeemug on Jan 27, 2015 | hide | past | web | favorite | 48 comments



Happen to be using the streaming API and I have to say it's fantastic.

I hide all my realtime streams behind a "/realtime" endpoint and watching specific tables for changes couldn't be easier (just the changes command, which produces an infinite sequence).

Generally as simple as: r.db('db').table('table').changes().run(conn) (depending on your language it might look a little different).

Then again I'm pretty sold on RethinkDB so my opinion should probably come with a large-ish grain of salt.


The article discusses how RethinkDB differs from "realtime sync services" such as Pusher, PubNub, and Firebase.

While comparing against Firebase makes sense because it's an alternative database, I see Pusher and PubNub (and similar pubsub services, such as Fanout) as being complementary to RethinkDB.


This is definitely correct. The article focuses more on data synchronisation where as a lot of services (some of which jkarneges has highlighted) don't offer data storage and focus on letting you define your own feeds/channels/topics/subjects and exposing data updates/changes via messages or events. These services are highly complementary to RethinkDB as it's very easy to see how a channel subscription can map to a changefeeds.

So services Firebase, PubNub Data Sync, Realtime.co cloud storage, the Google Drive Realtime API and Simperium (see: http://www.leggetter.co.uk/real-time-web-technologies-guide/...) are much more in-context to the article.


Article also points out that you can't get a "realtime incremental feed", but the example they provide is quite easy and similar in Firebase by using orderBy() and limit() to return an updating query.

As someone who has built lots of apps using Firebase, I can appreciate the first point about limited querying capabilities; this has been my number 1 painpoint from Firebase. I'd imagine the second point can be somewhat validated by changes() allowing for much more complex queries than are currently possible with sync services.


Author here -- my mistake, this was a bad example! You're right, this will become much more relevant as the `changes` command supports more and more complex queries.


This is tricky because PubNub recently added some data storage capabilities, and Pusher and fanout might follow suit. Also, being able to get a feed on queries might ultimately make services like Pusher unnecessary in many traditional situations, but this isn't immediately obvious. We'll find out soon how this develops!


I can speak for Fanout and say that we have no intention of offering data storage. Our ideal users either have databases already or value a decoupled backend architecture.

BaaS is certainly a tempting business (just look at all the exits), but we'd rather focus on doing one thing really well and with wider applicability. Much like you guys, really.


agree

and also GO FANOUT


I'm working on an open source firebase-like database that puts realtime/sync/replication/master-master/p2p at its core, rather than an after thought which a lot of other databases (MySQL, MongoDB, etc.) do. The RethinkDB guys are smart and I'm glad they are pushing in this direction, it is important. My project is at http://github.com/amark/gun if anybody is interested.


How does your project differ from CouchDB? I mean, if there's one database that puts sync, realtime change feeds, and master-master replication at its core, it's CouchDB (and Pouch, Touch, etc.). It would certainly be my first choice for doing anything "firebase-like". What does gun do (or plan to do) different/better?

(I mean, nothing wrong with a project that duplicates others, but it seems odd to focus on master-master replication without distinguishing yourself from the most popular DB that's already built around that.)


Yeah, great question and good points.

1. GUN is embedded. Meaning there is no "database server/process" to run, it gets included into your app server as a library. This means less configuration and maintenance.

2. GUN is a graph database. Meaning you can have both relational and document structured data. I'm not sure if CouchDB has added this yet, last time I checked (several years ago) it was only NoSQL.

3. GUN is not stable/production ready yet, and it is javascript only currently. Which is pretty limiting as of now, but that will change later.

CouchDB, Riak, Cassandra all try to be master-master, which is good. Unfortunately, in my personal experience, they have also been much more complicated to start using. I'm hoping gun will be easier to roll with. If you're already using CouchDB and happy, you probably shouldn't switch.

Any other questions?


No, that does clarify things. Being a graph database is a big difference; CouchDB is strictly document based.

(Incidentally, you might want to checkout PouchDB, if you haven't already. It's a javascript implementation of CouchDB, and it runs quite well on Node, and can use LevelDB or any LevelDOWN compatible datastore, which makes it pretty useful as a lightweight embedded DB. Again though, documents only.)


The article talks about mobile app feeds, and I understand how I could build a system rpc server for mobile and desktop apps. How would this work for web apps? The headline says "real-time web!"

Compare this solution to push it all the way through the web stack: http://engineering.imvu.com/2014/12/27/the-real-time-web-in-...


The basic architecture is browser <-> web server <-> database.

When the web browser connects to the web server, the web server opens a database feed. When the database pushes changes to the web server, the web server pushes them to the browser via socket.io.


Is this not already possible with MongoDB, using something like mongooses post('save') [1]?

[1] http://mongoosejs.com/docs/middleware.html


You could do it with mongoose middleware, but this solution is a lot more complex and expensive. In this case the application developer would have to take care to notify an additional piece of infrastructure about every change. Even if you can abstract the code, that requires quite a bit of additional intelligence on the backend.

Baking feeds into the database should dramatically simplify the amount of required work.


I use WebSockets. You could also use Server Sent Events.


Would love to see this as an option to mongodb on meteor


Supposedly someone was working on this -- I share the sentiments and thought about starting something as a weekend project but I saw this talk:

https://www.youtube.com/watch?v=YLu_ROrA0YY

backed by this repo: https://github.com/andrewreedy/rethink-livedata

The project is a year old... I'm not sure if it's still in progress (seems not) and I'm not sure if the RDB team themselves are pursuing getting together with meteor


At the end of the article they talk about their collaboration with the Meteor team.


I assume RethinkDB allows the direct use of the database from within the client.

This makes me wonder how they handle security.

For example, let's assume that I want to implement a "filesystem" using this database. How would I add rules that allowed the client-side to use only the records it is allowed to use (for instance, if the "filesystem" is configured such that access is limited for certain users)?


You assume wrong, the post talks about the possibility of the developers including a proxy to secure client requests to the database.


Indeed, thanks for correcting.


> A few community members have been working on a RethinkDB integration with Meteor and Volt, and we expect robust integrations to become available in the coming months.

If I'm understanding correctly, RethinkDB could be a drop in replacement for mongodb? What are the benefits to this replacement in the context of Meteor?


If you're a meteor user you probably wouldn't notice the differences early on because livequery does a phenomenal job abstracting all the hard work away. However, we anticipate two advantages in the later stages of app development.

Firstly, as the app scales, livequery has to work harder and harder. I don't know how good its scalability is at the moment, but I think it would be very hard to approach the scalability of the feeds built into the database.

Secondly, as we build feed support into more and more queries, you'll be able to get functionality unavailable in livequery, which will allow building more sophisticated realtime experiences than currently possible.

We're going to find out how all these components work together in practice in the next few months. I'm really looking forward to that!


For one thing, you're not tied to Mongo. Generally more choices is good.

As for real reasons: - ReQL, functional, composable, and declarative, is very nice to use

- Changefeeds

- (v1.16) Arbitrary query watching with changefeeds

- Ease of sharding/load-balancing

- (v1.16) dynamic server management through queries

- Awesome web admin API

- Actually saving stuff to disk (couldn't resist)

This video on the highlights of rethinkdb is old but still relevant: https://www.youtube.com/watch?x-yt-ts=1422327029&x-yt-cl=848...


I wonder if it would be feasible to add this sort of real-time feed to a conventional RDBMS like PostgreSQL. This feature would be quite useful for at least one of my projects, but I'm not sure I want to give up a mature SQL-based DBMS for something relatively unproven like RethinkDB.


Postgres supports LISTEN / NOTIFY [1,2], and you can stream a SELECT using COPY or a cursor.

1. http://www.postgresql.org/docs/9.4/static/sql-listen.html

2. http://www.postgresql.org/docs/9.4/static/sql-notify.html


Also a brief tutorial on how it works. See in particular the comment by Max Martinsson on how to do it in a replicated scenario.

https://www.chrisstucchio.com/blog/2013/postgres_external_tr...


LISTEN / NOTIFY is not recommended for production use. It will consume all of your connection pools gradually.


I can see how that could happen depending on how it is used, but I don't see how that must happen. Can you elaborate?


You could probably get a poor's man version of this by setting up materialized views, a trigger on those views, and then write code in the trigger that pushes updates onto a queueing system. It wouldn't be quite as convenient, but would probably accomplish what you want in many situations.



This is another similarly stalled project: https://github.com/skariel/webalchemy


This project has not been updated in almost two years.


Has AMQP changed substantially in two years? For purposes of publishing a simple event?


Looking over the fork graph: https://github.com/omniti-labs/pg_amqp/network there are clearly people with that feel that status-quo release is missing features, etc.


I thought you could not do row-level triggers on views in Postgres: http://www.postgresql.org/docs/9.4/static/sql-createtrigger....


On the one hand this is pretty awesome - just as I expect from the rethinkdb team - but on the other hand I'm wondering that while they are removing serious pain points from web development where are the startups that use these as a competitive advantage? Is RethinkDB understood?


The push functionality is new to RethinkDB and is just coming out hot off the presses. We'll see soon if people understand the advantages, but the initial feedback has been extremely positive.


The post says: >We'd like to make more complex queries available via realtime push. In particular, efficient realtime push implementations for the eq_join command and map/reduce are fairly complex, and aren't making it into 1.16.

Will innerJoin be supported in 1.16?


All the join commands (http://rethinkdb.com/api/javascript/) are already supported and will continue to be supported.

There is currently no push implementation for joins. That's coming in the next few releases.


Could someone elaborate on the use-cases that rethinkdb tries to solve?

As a DB newb, what are the advantages of rethinkdb vs. other NoSQL DBs like Mongo. Why and where would I use rethink instead of an SQL DB like postgres?


Just three reasons among others:

1. Joins (real server-side joins) - so you can have many-to-many relations in your data (and nested arrays are a poor answer to this problem)

2. Awesome query language - No strings to concatenate, no strings to escape, no JSON objects with special keys, just plain JavaScript/Python/Ruby/etc.

3. Schemaless - Faster to prototype things, to adapt to a third party data changing.


PostgreSQL can be schemaless if you want it to be. It obviously does joins, so that leaves only the awesome query language.


Check out these two links, they might help:

  - http://rethinkdb.com/docs/rethinkdb-vs-mongodb/
  - http://rethinkdb.com/docs/comparison-tables/


How does the general performance of rethinkdb compare to http://www.tokutek.com/tokumx-for-mongodb/?


Rethink's performance is about what you'd get with InnoDB (sometimes faster on some workloads, occasionally slightly slower). I haven't done comparisons with tokumx, but my guess would be that Rethink is significantly slower on high insert/update workloads for huge amounts of data (which is what toku accels at). On other workloads, it's probably fairly similar.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: