
Tell HN: The site was offline. What changed? - kogir
Obviously things took longer than intended and expected, but in the end:<p>Items moved from &#x2F;12345 to &#x2F;12&#x2F;34&#x2F;12345. HN now starts in one fifth the time, and better utilizes the filesystem cache. Backup speeds are also improved.<p>Profiles moved from &#x2F;kogir to &#x2F;ko&#x2F;gi&#x2F;kogir, and from &#x2F;KogIr to &#x2F;%k&#x2F;og&#x2F;%kog%ir, which works on case-insensitive filesystems. Similar performance improvements were observed.<p>Passwords moved from a 45MB user-&gt;hash mapping file into user profiles themselves. Previously, this mapping file was re-written in its entirety every time a new account was created, and is why the site went down on May 18th. New account creation is now incredibly lightweight, and should allow us to further limit and possibly eliminate our use of captchas.<p>There were additional changes to support new features currently in the pipeline, but which we&#x27;re not yet ready to announce.<p>I&#x27;m sorry that we were offline for so long. Nothing else we currently have planned should require anything more than a simple restart, so with any luck this will be the last major disruption the quarter, and maybe even this year.
======
_wmd
I know it's crazy talk, but glancing at my own profile, I count maybe 100
bytes of data? Yet to represent that data in memory, it's going to blow up to
4096 bytes plus structs to represent the inode and directory entry/entries
because you put each profile in its own file.

By that count, you might get somewhere near a 40x cache utilization
improvement if you just used a real database like the rest of us do - even
just an embedded database.

This of course before saying anything about transactional safety of writing
directly to the filesystem

~~~
kogir
We're on the same page. First you stay up, then you improve with the time you
bought.

~~~
jedberg
Sometimes it isn't worth the effort to fix the old, and instead just go to the
new and improved.

When the load balancer for reddit broke once, we did't bother fixing it, we
just replaced it with better (though untested) technology on the assumption it
would work better. We figured it couldn't be any _worse_ than it was, and we'd
rather spend our limited time moving forward instead of treading water.

~~~
kogir
We considered this pretty seriously, and it might be required some day, but we
think we'll be able to incrementally move toward a more highly available,
better performing architecture without a continuity break.

~~~
kgc
Why not just port to the Reddit codebase? The functionality seems similar.

~~~
iOSGuy
reddit.com/r/hackernews the new hacker news.

~~~
pestaa
You might be joking, but in case not: reddit's code base is open source, so
others can use it without moving the community under the reddit umbrella.

------
bndr
Can someone explain why "Items moved from /12345 to /12/34/12345\. HN now
starts in one fifth the time" that increases performance? why is it better?

~~~
jacquesm
Directory scans suck. So if you break up the space into sets of prefixes you
limit the number of files in each dir and traversal gets much faster. Ditto
adding/removing items.

Some filesystems are worse at this than others (xfs... let's not go there).

~~~
Pxtl
Wasn't the big miracle of Reiser4 supposed to usher in an era where this was
no longer a problem?

~~~
Xorlev
I think Hans killed that era.

~~~
yellowapple
_rimshot_

------
tdicola
Wow that's interesting that you use files to store the data. Is there any
sharding across machines or is it all just one machine? Do you use big SSDs or
old spinning disks?

~~~
icebraining
It's a single machine.

------
tim333
Newbie question here - I'm just curious. Is it quicker to store the data this
way in loads of files or to use Postgres?

~~~
kogir
For new development I'd recommend PostgreSQL over flat files for most
projects.

Really depends on what you're trying to store though. For large data (images,
audio) that can't fit in a table row, the filesystem is _way_ better.

In our case we started with flat files, and buying breathing room is the first
step to move past them.

~~~
sequoia
Curiosity: Why did you start with flat files? It looks like hackernews was
started in 2007, relational databases had been around for quite some time at
that point, and were the standard way to store such forum data (see: every
popular forum framework at the time)... the decision to store this sort of
data (news/link forum with comments, 100% text) as flat files is very
confusing to me.

~~~
tremols

      My guess is that Arc - the lisp language running HN created by Paul Graham - was new, and coding and maintaining a database driver was out of question. 
    
     Today, perhaps the way to go would be to use some sort of json webservice interface to a database written in another language rather than writing a driver.

~~~
adwf
That would be my guess as well. It's one thing to decide you want a simple
forum and have it coded within a couple of days. It's entirely another to
spend months creating a stable database library and keep it upto date with all
the latest changes.

~~~
u124556
Or you could, you know, use a more popular language.

------
mariusz79
I just don't get one thing.. Don't you have dev and staging server, where you
could develop, test, and pre-deploy everything without shutting down the
website?

~~~
kogir
Online data migrations are complicated and error prone. We deemed it not worth
it.

All the code was tested before going live, but at some point you actually have
to move and re-format all 8.5 million files.

------
diafygi
Why worry about case-insensitive file systems if you are not using one
currently?

~~~
kogir
Because running out of the box on OS X makes testing _way_ easier. Currently
we use DMGs, and performance is terrible.

~~~
wtbob
I know a lot of folks develop on OS X, but this just seems like another
argument to develop on Linux or a BSD.

But then, I haven't owned a non-Linux box in 16 years, so I may be an
outlier...

~~~
danik
It isn't an argument. DMGs are fine for development and testing and they can
be made case-insensitive. Or you could just, you know, create a separate
partition.

If the DMG speed is too slow for development and running tests(not testing
with the full live dataset) you are doing something wrong.

~~~
kogir
So we shouldn't test using a snapshot of live data? Seems prone to finding
errors only on production.

~~~
danik
When developing and testing new features/bugfixes? Unless the bug is directly
tied to production data I have no idea why you would have to use production
data for that. I'm not saying don't do it on a staging server, but you don't
develop on the staging server.

Right now I have a vagrant box where the VM images is on a DMG and all data is
NFS mounted from the same DMG onto the VM, which is kind of the worst scenario
I can think of. The testing database is around 2GB and the source+data files
etc is ~200MB, just because I actually do need to fix a bug related to a
portion in the production data. What's slow is the CPU, and that is still
doing fine. It's not the disc even though I'm abusing it this way. That's on a
2011 macbook, 16GB, 400GB SSD.

HN is a small piece of software which should be easy to write tests for, 2GB
db and 200MB source/data-files should be more than you'd ever need to work on
something like HN. If you want to stress-test, test fs speeds etc you cannot
do that on Mac OS X anyways since you're not running Mac OS X in production.

Changing the specs of a piece of software in order to make development more
easy seems totally backwards. You're developing for production, not the other
way around.

And at last, why not simply add a case-sensitive partition to your Mac if
speed is such a big problem?

------
yiedyie
And the question for this answer:
[https://news.ycombinator.com/item?id=7872121](https://news.ycombinator.com/item?id=7872121)

------
gnurag
> and is why the site went down on June 18th.

Do you mean the site will go down on June 18th?

~~~
brudgers
YC's time machine startup will be in stealth mode.

------
gdewilde
I love how hn doesn't load 150 circus elements. I can reload pages all day
long.

I was just thinking.....

    
    
      function byId(id) {
         return document.getElementById(id);
      }
        // hide arrows
        byId('up_'   + item).style.visibility = 'hidden';
        byId('down_' + item).style.visibility = 'hidden';
    

could be :

    
    
      function hide(id) {
         document.getElementById(id).style.visibility = 'hidden';
       }
         // hide arrows
         hide('up_'   + item);
         hide('down_' + item);
    

Then it is 150 in stead of 185 chars, 19% smaller or 23% bigger.

And the function name is better of course.

------
idoco
Thanks, I got so much work done today!

------
sirtel
And, HN becomes responsive. ✧＼ ٩( 'ω' )و /／✧

~~~
bshimmin
Responsive as in "responsive design"? Hardly. The only thing that uses media
queries is the vote arrow.

~~~
unreal37
Before responsive design came around, the word "responsive" actually meant a
server would respond quickly to actions.

~~~
sergiotapia
Words change with time. When you say 'responsive' in a web setting it's nearly
always in regards to responsive web design - not 'this page is responsive now,
returning results quickly!'.

bshimmin shouldn't have been downvoted.

------
xenonite
Just curious: why are user names case sensitive?

~~~
kogir
There are 114 accounts that share the same lowercase representation.

------
apeace
Pardon if I'm ignorant here, but is there a blocker to open-sourcing HN? I'm
sure the community would love to help.

~~~
IvyMike
Previous discussion:
[https://news.ycombinator.com/item?id=5006037](https://news.ycombinator.com/item?id=5006037)

------
rafeed
It's definitely snappier and now looks responsive too! Thanks for the hard
work kogir et al!

------
bryanh
I bet some before/after load charts would be pretty impressive!

------
ianstallings
Can we classify this as "archeology"?

------
topac
That's exactly how i would have changed things to make the website faster!!!
-4 jan 1991-

