

An HN style social news site written in Ruby/Sinatra/Redis/JQuery by Antirez - smn
https://github.com/antirez/lamernews

======
julian37
Perfect timing as I'm planning to use Redis for the first time soon and this
is going to be a great resource for best practices. Thanks, Antirez!

Speaking of best practices, putting serialized objects into the database (as
you're doing with the comment objects) is usually considered an anti-pattern
in the SQL world. Could you briefly explain (beyond what you said in the
Readme) why you've used this approach for comments, but not for, say, news
objects? Was this decision solely made to work around anticipated
performance/memory bottlenecks and if so, what are the trade-offs, and are
there any rules of thumb for making the same decision for object types in
other applications?

~~~
antirez
Hello julian,

the difference between comments and news is that a thread (a collection of
comments for a given news) is an hash made of sub-hashes, the hash is ID ->
comment_hash. The comment hash just contains the different fields.

When there are this two levels, storing the first level as a Redis hash, and
the second as JSON leads to very good memory usage performances, it is
internally stored as a linear array. We can do that because we know most news
will have just a few tens of comments. If there are more, the hash will turn
into a real hash table transparently, more space, but worth it for the rare
cases when this is needed.

The news instead is just a collection of fields. There is no outer object that
is reasonably sized, like "all the news obeject" (it is too big), so there is
no gain in using this approach.

What is good about storing sub-objects of hashes as JSON objects is that Redis
unstable just got JSON support in Lua scripts, so it will also be able to
manipulate this objects sending Redis small Lua scripts.

I hope this clarifies the issue.

~~~
julian37
Hi Salvatore,

many thanks for the reply, that does clarify the issue.

So when sub-objects are only a few dozen per collection (per parent object) on
average, storing them as JSON blobs allows Redis to "inline" them, yielding
good memory usage. But for large numbers of items per parent, Redis can't
inline them so you might as well represent them as a hash, which can never be
inlined. Is this a fair summary?

I guess that ideally, all objects would always be represented in the same way
from a client perspective, and the database engine would decide which internal
format is best suited for storage, maybe using hints provided by the client,
and handle any necessary (de)serialization as an implementation detail. That
said, I know Redis is still a young project and I suppose this is something
you guys are thinking to improve long-term anyway.

Cheers!

~~~
LeafStorm
What Antirez is referring to is the fact that when a Redis hash is fairly
small, it is stored as an array that is scanned linearly (worse time
complexity, but lower memory usage), and is only "upgraded" to a true hash
table when there are many key/value pairs. (This is mostly an implementation
detail, however - from an API standpoint, you interact with hashes the same
regardless of what internal representation they are using.)

In both cases, Lamernews still stores comment IDs as the hash keys and the
JSON-serialized structures as the hash values, antirez was just commenting on
a memory-saving feature of Redis for small hashes.

~~~
julian37
Thanks for chiming in, but now I'm confused. Antirez said:

    
    
      When there are this two levels, storing the first level as a Redis hash, and the second as JSON leads to very good memory usage performances, it is internally stored as a linear array.
    

I'm reading this to say that memory usage wouldn't be as good if the second
level was stored as a hash instead of a JSON blob.

My specific question was: why store the comment as a JSON blob as opposed to a
hash? I think Antirez answer was, in a nutshell: because a hash of hashes
can't be stored as an array. For news items it doesn't matter because the hash
couldn't be stored as an array anyway due to the (anticipated) large number of
news items.

Are you saying this interpretation is wrong?

------
ohyes
What do you do when you get more data than fits into ram?

I've been working on a 'for fun' application with Redis, and had been counting
on some sort of on disk memory.

(Because storing everything in ram is comparatively expensive, I had figured I
could use Diskstore or VM with a small server to start).

Now I'm considering porting over to a mongodb or sql backend because the disk
based storage options won't be supported in Redis in the future.

What should I do? Simply Use Redis as a cache? Buy RAM? Doesn't that seem to
limit its utility and complicate things?

~~~
LeafStorm
The two attempts at disk storage so far - VM and Diskstore - are both based on
the concept of RAM as the primary datastore and the disk as merely auxiliary
storage. However, antirez is a perfectionist, so he scrapped both of them when
they didn't work as well as he had hoped.

In the long term (i.e. after cluster is finished), there are plans to
manipulate data structures directly on disk (an incredibly elegant solution,
and also really good for SSDs). Though in the short term, you are right in
that RAM is far more expensive than disk, and this does limit the utility of
Redis as a primary datastore.

~~~
llimllib
So... what's going to happen when lamernews runs out of RAM?

~~~
nknight
That's pretty mind-numbingly self-evident, isn't it? The kernel kills Redis
and the site is down.

~~~
aaronblohowiak
depending on your settings. the kernel could also page out some RAM, which
early reports suggest isnt so bad on a high-end ssd.

------
coderdude
I don't know if the creator is going to see these comments but lamernews.com
goes to a GoDaddy parked domain page. Not sure if DNS changes just need to
propagate or what.

~~~
jonpaul
I agree completely... I was hoping to see a live version. Live versions can
help OSS spread and generate hacker interest.

However, it should be noted that I'm thankful that the author decided to share
this.

~~~
antirez
Hi Jon! here it is almost the contrary, I wrote the code _especially_ to run
it as a real service. It is just that it needs a few more days.

But actually... I can just run it easily in a server of mine just to show it
to you. Let's try to install it... just a moment.

~~~
antirez
Here we are: <http://178.79.145.225:9999/>

I'll leave it running for a few hours assuming it will not crash. It's very
new code ;)

~~~
falling
Readable fonts, thank you.

------
typicalrunt
Hi Antirez,

Looking at the source, you use side-effects to return user objects and such.
I've always been taught side-effects are a bad thing that create complex code,
which seems to go against one of the goals listed in your README.md.

    
    
      # Try to authenticate the user, if the credentials are ok we populate the
      # $user global with the user information.
      # Otherwise $user is set to nil, so you can test for authenticated user
      # just with: if $user ...
      #
      # Return value: none, the function works by side effect.
      def auth_user(auth)
          return if !auth
          id = $r.get("auth:#{auth}")
          return if !id
          user = $r.hgetall("user:#{id}")
          $user = user if user.length > 0
      end
    

Can you explain why you used side-effects?

------
nathanwdavis
What I like most about this is that it is a non-trivial, non-toy example of
using Redis effectively. Thanks antirez!!

------
typicalrunt
Great stuff, thanks antirez. I like it when creators of libraries build a
complex system to eat their own dog food. It helps me visualize use cases
other than simple 15-minute blog projects.

~~~
antirez
Thanks this was one of the main goals, to have a non trivial example, and even
to put it into production.

It is _very_ helpful for me to vest the clothes of the user, it makes me
understanding a lot more about Redis. If you are always in the other side of
the table you miss a lot.

------
Derferman
> Don't use templates, they suck.

This is surprising. Why not make minimal use of ERB or Mustache?

~~~
erikpukinskis
It seems like they just reimplemented a custom HAML subset.

~~~
riffraff
looks more like CGI::HtmlExtension in the ruby's stdlib but with hash as
params

------
revorad
As I've already mentioned to antirez, the only suitable mascot is this -
[http://www.google.com/search?q=llama&hl=en&tbm=isch&...](http://www.google.com/search?q=llama&hl=en&tbm=isch&biw=1069&bih=593)

------
upgrayedd
Join us in #lamernews on freenode, also if antirez reads this you're welcome
to ops.

------
EricR23
This is really cool! I just setup my own deployment of this at
rubynews.heroku.com :) I've fixed some styling issues and I'm thinking of
tweaking a few things... I may contribute to this on github. Thanks!

------
supersillyus
This should be the new "Hello world" for web-oriented languages and
frameworks. I'd love to see this ported idiomatically to different languages.

------
AndrewVos
Your site lamernews.com is showing godaddy adverts.

------
compay
It would be great if somebody added some tests.

~~~
xetorthio
yeah... I wonder where are all the tests :)

------
kennystone
Redis seems like a poor choice if you want the comments and stories to stick
around for a while. It's designed to be an in-memory database...

~~~
antirez
Redis is perfectly fine for this application, both from the point of view of
data persistence since AOF is very durable, and from the point of view of
space needed. This application is designed to hold a lot of news and comments
even using little memory.

~~~
cnu
> This application is designed to hold a lot of news and comments even using
> little memory.

Can you post some benchmarks on that? Would like to see how many (average
sized) news/comments can be stored on the 512mb VPS.

