

CKAN - The CPAN of data - ig1
http://ckan.net/

======
jedsmith
Every time something interesting that I want to read shows up on the front
page, and clicking on it yields a spinning tab while the other end flaps about
in agony, I can't help but wonder how many administrators know how to use
ab(8). The rate of dead sites on HN is really a shocker, since commodity cloud
that can stand up to HN's load -- as opposed to the shared hosting of
yesterdecade -- is so widely available. I would have thought that Slashdotting
would be a historical problem by now...

Is it not common sense to test the hell out of something before someone who
would submit it to HN is even aware of its existence? WordPress is pretty bad
for this (since most admins follow the directions and don't bother tweaking),
and I've heard that Drupal can be too. Without tweaking and caching stuff,
you'll fall over quick in front of this many eyeballs.

I know, easy for me to say.

~~~
kindly
Hi, I'm one of the CKAN devs. Just wanted to say the site is fully functional
again (we've up the cached to be a bit more aggressive).

As a side note we have indeed tested with ab :) Our problem is we continue to
find the AWS instances we use somewhat unpredictable in their response to load
(largely due, we believe, to the fluctuations in CPU "stealing" as load varies
across the other instances that share the same physical box).

Anyway, if you want to know more about ckan have a look on
<http://ckan.org/about>.

~~~
jedsmith
I have a bit of experience with Xen. If you're actually seeing a whole lot of
steal (how much?), that's a bad sign because it means you're on a box with a
lot of contention. In an ideal world, Xen should steal very little from you.
I'm burning all four cores available to me on one of my personal Linodes, and
the platform is barely stealing anything. Here's vmstat -s and uptime from
that Linode for comparison:

    
    
           409198 non-nice user cpu ticks
         60878563 nice user cpu ticks
           166987 system cpu ticks
        811571786 idle cpu ticks
          4486779 IO-wait cpu ticks
               25 IRQ cpu ticks
            15388 softirq cpu ticks
           766577 stolen cpu ticks
    
        12:06:10 up 13 days, 14:11,  3 users,  load average: 4.00, 4.01, 4.05
    

I've had the pedal to the floor for a couple of days on the CPU, and only 766
kticks have been stolen (total) since I booted. If you're seeing a lot more
steal than that, your host is working pretty hard to schedule the domUs
fairly.

Wouldn't dare to assume that I know better how to run operation than you do,
just sharing my experiences with Xen. Netflix had a solution to this --
unfortunate that it was necessary, but a solution nonetheless -- which was to
monitor steal closely and spin up a new instance if it skyrocketed:
[http://blog.sciencelogic.com/netflix-steals-time-in-the-
clou...](http://blog.sciencelogic.com/netflix-steals-time-in-the-cloud-and-
from-users/03/2011)

Given the opportunity, I'd like to point out that I meant no disrespect in my
original comment, if it wasn't clear. I was speaking more from a generality
and not about CKAN specifically, a fact lost on those mindlessly downvoting
me.

~~~
kindly
No disrespect was taken. The hacker news coverage came as a big surprise. We
like to turn any caching mostly off and we know this is a risk. This is
because we do not want the possibility of any stale data as this annoying to
the type of users we have. We are working on a better cache invalidation
scheme but this has not been a big priority.

Your feedback is appreciated, thank you.

Edit: Our amount of steal was much much higher than that.

------
SeanLuke
Interesting side note. CPAN was itself a copy of CTAN (TeX's repository).

CPAN - The CTAN of Perl.

------
kvgr
We actually started using the Czech instantion of CKAN: <http://cz.ckan.net/>
to publish information about datasets of Government data that are available.

------
dotBen
Is this just a rival to Infochimps or is there something different with CKAN
that I've missed?

------
joshu
I wish the tags page actually included frequency counts. And coud be sorted by
frequency.

