
What caused that outage - pg
HN was down today for around 2 hours.  Sorry about that.<p>The News server currently crashes a couple times a day when it runs out of memory.  All the comments and stories no longer fit in the 2 GB we can get on a 32 bit machine.  We'd been planning to upgrade to a new 64 bit server.  In the meantime it was arguably a rather slow form of GC.<p>Unfortunately the process somehow got wedged in the middle of segfaulting.  We're not sure why and will probably never know. But that meant the process that usually notices when News is wedged and restarts it was unable to kill it.<p>Fortunately rebooting the machine solved the problem.  But now we'll presumably be switching to that new, bigger server sooner rather than later.<p>As far as I can tell it was a coincidence that this happened today.  It doesn't seem to have been caused either by the increased traffic, or the excessive number of posts about Erlang.
======
jm4
Bummer. Maybe you should consider rewriting in Erlang... :)

------
mixmax
That's what happens when you make something in a new and unproven language.

And I think that it deserves tremendous respect that you have done so. You
could have hacked HN in python (or lisp?) in a weekend with the knowledge that
it would scale, yet you decided to go with Arc. Very bold. And a great way to
battletest a new language.

It's great to see people eat their own dogfood.

~~~
Jebdm
If I'm not mistaken, "to battletest a new language" was the _purpose_ of
building this.

~~~
rw
Keep in mind that mzscheme is battle tested. Arc is "just" a bunch of macros
on top of PLT Scheme; what gives HN its [under]performing character is that
foundational JIT compiler and 3m GC.

~~~
pg
Arc compiles into MzScheme, but it's not implemented as macros. You can see
that from the source.

------
natrius2
Speaking of bugs, pages don't load for me when I'm logged into my real account
(without the 2). The front page cuts off after "1.</td><td><center>" and
comments pages cut off right after the first sub-table opens. The leaders and
submit pages load fine.

I thought it was just a sign that I should get back to work, but maybe my
account got put into some sort of inconsistent state?

------
callahad
Thank God. I was worried you had pushed the big red noprocrast.

------
acegopher
The server goes down, and when it comes back up the front page is filled with
stories about Erlang. Coincidence?

~~~
Derrek
Sign of the Second Coming?

------
elibarzilay
Something that could be done for now, is to write a piece of mzscheme code
that "marshalls" the data in (utf-8-encoded) byte-strings. Assuming that most
of the 2gb is made of strings, and that these strings are mostly ascii, this
should reduce the consumption by close to a factor of 4.

(I can imagine an interface that is transparent at the arc level, where are
strings are just passed to the backend and retrieved from it, and the backend
converts them to and from byte strings. Later on it could change to use a FS
or a DB or whatever.)

------
lacker
Yeah, it's not optimal to let your processes die from running out of memory
because maybe some other non-server process actually requested the last bit of
memory, and then who knows what sort of state your machine is in. How about
making your restarter job notice when the machine is very close to being out
of memory and preemptively kill the server then?

------
DanielBMarkham
Well there's only one good thing for the ultimate in scalability -- arc in the
cloud.

Which I believe should be called "lightning", right?

------
timtrueman
I thought 32-bit addressing gave 4GB of addresses…is there some sort of flag
that's taking one bit? Not trying to be a smartass, just curious about the
discrepancy.

~~~
apgwoz
dynamic languages end up using more memory, because extra information has to
be stored about each item (I.e. Type, tc info, etc).

~~~
ynd
Down-voters, he is also right.

It's common for dynamic languages to embed typing information in pointers as
an optimization. For example, CLISP uses at least 2 bits to distinguish
between common types. That way fixnum numbers can be recognized and added
without slow memory accesses.

The result is that you get less bits for the address. Hence less addressable
memory.

~~~
fhars
Actually, no. These tag bits are usually stored in the lowest bits which are
zero for all pointers (you would be mad not to align your data structurs to
the four or eight byte boundaries your hardware uses for memory access). So
you get the full width for pointers, but reduced width for your fixnums,
because you have to set one of the least significant bits of the machine word
to one to distinguish it from a pointer. That you still can't use the full 4GB
of a 32 Bit address space is due to the fact that the OS needs some address
space for itself, the details of this vary from OS to OS and what the runtime
of your language does with the addresses the OS allows it to use. So beeing
able to use more than 2GB on a 32 bit architecture should not be taken for
granted.

~~~
apgwoz
While that's true for some things, that doesn't account for the memory
overhead of a copying garbage collector.

------
adnymarc
Judging by the number of Erlang stories on the front page (19 out of 25
currently) I would say that's your problem right there...(wholly in jest, half
in earnest)

------
mblakele
Why was the site using a 32-bit environment to begin with? Opterons have been
cheap for years now, and all new Xeons are capable of 64-bit operation.

~~~
MrRage
What chip in put out in the last 2 years or so couldn't run a 64 bit system?
I'm pretty sure most of the AMD chips could... Edit: By chip I mean one for a
desktop CPU, e.g. Atom don't count.

~~~
lutorm
Atom 330 is also 64-bit.

------
lsc
hey, so I'd be happy to donate 16GB worth of Xen instances to the project,just
to say I did, if you need mirrors. my boxes are 32GB ram/8 core, and the CPU
is proportional, so if you want 4x4GB instances over separate servers, I can
do that. (well, you will have to wait a bit for me to put up the 4th server.)

------
ryan-allen
pg, I'd love to read a post about how you've dealt with writing HN to work in
one process, in 2GB of RAM, it sounds quite novel! Did you have to make manual
indexes? How do you arrange the files on the file system? How do you handle
voting and concurrency/file locking?

------
ajkirwin
You aren't running this in the standard language/caching/db setup, then?

You're storing EVERYTHING in memory? Isn't this kind of.. well, stupid,
frankly.

~~~
DomesticMouse
Dude, frack no. Having everything in a database means you are tied to the
speed of disk, instead of the speed of ram. The biggest issue with being
purely ram resident is handling multi-threaded updates across your ram
resident dataset.

You could always use Erlang...

~~~
jcl
If everything is purely RAM resident, wouldn't you lose it all when the server
gets wedged?

~~~
sounddust
No. You journal the changes to disk. You therefore only need disks fast enough
to keep up with the journal. If you get into a situation where your disks
can't keep up with the journal, your site is probably big and popular enough
that you can afford to hire DBAs to go from there.

------
BonsaiKitt3n
add +10 mhz to the heap

------
jjb
Check out god -- it can restart mongrels when they reach a certain memory
size:

<http://god.rubyforge.org/>

(if that's the issue)

