
Help Scale NPM - jenius
http://scalenpm.org/
======
remon
I'll be that guy; I'm a little confused as to why this would require $200,000
(the requested total funding requested) to solve. From the site itself :

614,680,691 requests per month come down to ~230 request per second. Allowing
for some spikiness that boils down to perhaps 1k request/second at peak.
Requests in these cases are mostly relatively simple queries on version-ed,
highly cacheable data. I say highly cacheable because it is relatively static
data for which most (if not all) of the data fields relevant for these
requests can fit in memory of perhaps even a single node (NPM currently
includes 48,799 packages. That leaves a very healthy chunk of data per package
on 16Gb-128Gb RAM server boxes).

The downloads are a bit of a puzzle to me as well. On my machine the average
NPM package is about 200Kb (YMMV). 114,626,717 downloads are mentioned on the
site. 200k times 114 million downloads lands us on roughly 23 TB. Even on a
relatively expensive CDN such as Amazon CloudFront the total monthly cost for
that bandwidth and content request load for CloudFront and the required S3
costs land on about $3k/month and that's ignoring all bulk discounts, reserved
capacity and so on (which are very significant at these volumes).

I'm more than likely oversimplifying a few things here and there (or failed
horribly at math) but I'd still be very interested to hear why this requires
such a large investment. Also, wouldn't the more obvious solution be to open
source the npmjs software and allow the community to contribute knowledge and
time instead?

EDIT: Quickly wanted to point out that I use npmjs.org often , is a great
service and that donations are very well deserved. After re-reading my post it
turned out more negative sounding than intended.

~~~
jkrems
Every fuzzy-versioned dependency means one request to npm that you can not
(really) cache - at least not without doing and relying on active cache
invalidation. And if I look at my average npm install log, that's about half
of the requests. For storage estimates you should also include that it's not
only storing the packages but also all versions of the package. ~A year ago
the registry was 25GB in size IIRC. And that was a freshly downloaded &
compacted one on my machine. Especially considering that the growth will not
suddenly stop, things get complicated.

Also: both the website and the registry is already completely open source. But
hosting it with "perfect" uptime is a real problem and requires not only
network/hardware but also people. Including testing and migrating to a new,
more scalable solution - say a handful of people (2-4) will work on that for a
month, which does not yet include future maintenance. That can easily mean
$50000 just in salaries - or in "people that would normally bring value to
paying customers" if you assume that those people will be fine with working
that month without pay.

~~~
remon
I see what you mean but allow me to theorycraft a bit more : Changes and
removal of packages are considerably more rare than additions and either
happens relatively infrequently. This makes cluster wide cache invalidation
relatively trivial (it's easy but has scalability issues, and scalability
won't be an issue here). Also, when I said "cached" I probably should have
said keeping various indexes in memory to facilitate queries. I actually work
with systems in a roughly similar technical domain (way different space
though, I work on large scale TV systems).

Your other point is definitely the big challenge but that is exactly the main
motivation to use hosted CDNs and other hosted services that have solved this
challenge for you o a large extent.

I'm almost tempted to have a go at this.

------
judofyr
Wouldn't it be better to make NPM more distributed so that anyone could set up
a mirror and help out?

EDIT: Not saying it would be easy; I'm just wondering if you've considered
this direction.

~~~
indexzero
Anyone can make a mirror. That's the glory of CouchDB. Just kick off
replication and BOOM you've got the npm registry. There are community mirrors
in Europe ([http://npmjs.eu](http://npmjs.eu)) and Australia.

If you want to run or use a community mirror that's totally great!

npm config set registry
[http://my.awesome.community.mirror](http://my.awesome.community.mirror)

~~~
applecore
I'm surprised they're using CouchDB and not MongoDB.

~~~
nathan7
npm is pretty much the prime use-case for CouchDB. REST API out of the box,
replication is a core feature and not just a scaling feature (multi-master,
MVCC) and the validation/access control is pretty much made for it. The npm
registry is implemented as a CouchApp for a reason.

~~~
remon
I agree that from a functionality perspective this is largely true, at least
on paper. That said, if running costs become a significant bottleneck CouchDB
becomes a less obvious choice. Sometimes there's a reluctance to migrate from
one technology to another as requirements change over time but this seems one
of those occasions where exactly that step is required. Sometimes it's good to
take a step back, look at your current requirements in terms of cost and
performance and determine what technology best suits your needs. I would
question the know-how and objectivity of anyone that would land on
CouchDB/node.js in this instance.

~~~
nathan7
There's no node involved in the registry itself. This is purely CouchDB.

------
driverdan
I'm a big fan of npm but there are unanswered questions here.

1\. Why $200,000? Can we get a rough budget so we can understand how it will
be used and how long it will last?

2\. We should all be thankful for the time and resources
Nodejitsu/Joyant/IrisCouch puts into node and npm. That said, wouldn't the
projects be better off separated from these businesses with their own funding?
If we were donating money to the projects instead of a for profit corp we
would have more certainty of how and when the money will be used. "Donating"
to Nodejitsu just adds to their bottom line and in reality could be used
however they want. If something happens to the business we have no guarantees
the money would continue to be used for npm.

------
confused_npm
This is a bit confusing. Am I right in asserting the following?...

Commercial PaaS hosting firm, nodejitsu, is asking for _donations_ to pay (or
help to pay) for the costs of running npm.

Nodejitsu plan on using said funds to purchase additional resources at Joyent,
where npm is currently hosted.

Joyent own the trademark for Node.js

------
arianvanp
But I thought node was web scale.

Edit: Yep that was a lame joke. Anyhow, take my money, I love NPM and use it
daily.

~~~
camus2
Well, benchmarks show it's not more webscale than an plain old JEE app , or a
go app or even javascript on the JVM (
[http://www.techempower.com/benchmarks/](http://www.techempower.com/benchmarks/)
). By the way java 8 comes with a new js engine i believe,might be interesting
to see if node get ported to the jvm with it.

Yes, it will be able to serve more concurrent requests than your typical
python/ruby/php app.

But npm doesnt even seem to run on node, looks like it is a couch app. I dont
know how it performs.

------
TheHippo
I'd love to donate. But as most Germans I don't own a credit card. Why do so
many people ignore that credits cards are not the default payment methods in
some countries. I'd even accept to pay the extra fees for using PayPal.

~~~
driverdan
Because there are very few ways of receiving payments from multiple source
types reliably. PayPal is _not_ reliable.

~~~
malandrew
This. It's also notorious for freezing assets from crowdfunding.

------
candyluver13
I don't think giving up money for more servers and hosting is really the
answer here. I think de-centralizing and distributing the registry is really
the way forward here. there is one project i know that is trying to make this
happen [https://github.com/jmgunn87/mynpm](https://github.com/jmgunn87/mynpm)

------
quarterto
Appears to be running slowly. Maybe we need a scalescalenpmdotorg.org?

~~~
oakaz
It's down now :)

------
codecurve
From the JS comments - made me laugh

    
    
      /**
       * Simple counter magic to make people engaged.
       *
       * @constructor
       */

~~~
indexzero
Glad you liked it! The counting is derived from the last week of npm downloads
uniformly distributed over time. [http://npmjs.org](http://npmjs.org)

------
malone
I always feel guilty about how much I end up downloading from the npm
registry. I keep my nodejs projects is separate dirs, so I end up downloading
the same dependencies over and over again each time I start a new project.

I wish the --global install switch was cleverer and allowed you to have
multiple versions of the same package installed at the same time. Then I could
just symlink everything together which would save them bandwidth (and save me
diskspace).

~~~
33a
npm actually caches packages locally, so don't feel too bad. Still if you have
wildcard dependencies it will hit the main npm server to check if you have the
most recent version in your cache.

------
doki_pen
Better ideas than this:

* Offered paid, private registry that doesn't cost an insane amount of money. Somehow host it on the same metal as the public repo.

* Decentralize. Make it easier to setup mirrors or proxy/cache layers. If I had a simple to deploy npm caching proxy that didn't need to replicate every upstream package, only the ones that I use, it would reduce load upstream and protect me when upstream fails. ++ if I can host private packages there as well.

------
bdcravens
They should take Bitcoin for additional exposure

~~~
ypcx
To be honest, the "Card number" and "Security code" on the form struck me as a
little weird.

------
michaelmior
I wonder why they can't/don't make use of a CDN to scale downloads. Unless
they do already and I'm not aware.

~~~
clone1018
CDNs cost a lot of money.

~~~
remon
Certainly less than reinventing the wheel. You have to include development
cost, maintenance and so forth that is not needed or significantly reduced on
hosted content delivery services. See my post in this thread for details.

------
geetee
Countless hours have been saved by NPM. I would have donated a bit more if it
let me input the exact amount.

------
FraaJad
Considering that CouchDB was built to do multi master replication, it's just a
matter of adding more servers and setting up automatic replication.

Also, is the current setup using any kind of front end caching like Varnish?

~~~
jkrems
"built to do multi master replication" is a pretty naive way of putting it.
Couch does not (by default) support a multi-master cluster setup. The way in
which couchdb supports mutli-master is "you have different servers that sync
data and conflicts are resolved on application level". And I wrote "data" for
a reason because there's stuff you need to sync yourself if you want to have
multiple couchdb servers appear as one to the outside.

~~~
willholley
True - multi-master in CouchDB means that you can have two CouchDB instances
(or, indeed, anything which speaks the CouchDB replication protocol - see
[http://www.replication.io/](http://www.replication.io/)) which can be synced
and, in the event of a network partition, both instances can be written to.
One of those masters could be CouchDB and one could be PouchDB in a browser or
TouchDB on a mobile phone. Neither instance requires any coordination from the
other.

Clustered CouchDB (BigCouch - on it's way to being integrated into CouchDB
vNext) relies on the same MVCC semantics - it's just using Erlang rather than
HTTP to transfer documents between clusters and attempts to keep the nodes in
sync continuously.

Conflict resolution is tricky but CouchDB plays it safe and keeps all
conflicting revisions of each document around until you resolve them to ensure
no data is lost. It's pretty easy to find and fetch any conflicted documents
so they can be resolved in an application-specific way.

------
malandrew
Regarding the banners, you say that the banner will be on the scalenpm.org
site, anyway to get on the npmjs.org site or somewhere else with greater
visibility?

------
gtramont
Could npm use torrent somehow? Reducing the load from the main servers? This
would require users to opt-in and become a peer... just a random thought.

------
tehwebguy
Out of curiosity, how do they normally pay the bills?

~~~
jenius
They mention somewhere on the site that they sell a private version of npm
that they are working on, and also nodejitsu (the company behind npm) also
runs a node hosting platform:
[https://www.nodejitsu.com/](https://www.nodejitsu.com/)

------
mathrawka
I would only donate if they dump Nodejitsu.

------
runj__
So... Will I receive an email or something to confirm that my donation went
through?

------
benlemasurier
I want to give $50, can I just get a t-shirt?

------
dharbin
So now we live in a world where people extort you to fix their broken service.
The first repository is always free...

------
ocfx
This website is not scaling well.

