
NPM registry in numbers - phadej
http://blog.futurice.com/npm-registry-in-numbers
======
basicallydan
This is a really interesting analysis.

    
    
        From these graphs we could come up with some reasonable boundaries. Packages that:
          - have more than one version: 71853
          - are over two weeks old: 42106
          - had a new release in the last 360 days: 71277
    
        There are 31888 packages satisfying all three conditions above. Though it’s only one third of all packages there, the amount is still enormous.
    

Now, if only I could search those 31,888 packages on NPM alone if I wanted to.

EDIT: By the way, for anybody reading this article not familiar with the
JavaScript ecosystem, this is interesting:

    
    
        Mocha is clearly the most popular test framework, while nodeunit is way behind. And no-one seems to use Jasmine.
    

Jasmine _appears_ to be unpopular within the JavaScript community but in
reality that just accounts for those in the NodeJS and, more recently, the
subset of the front-end JS community who use NodeJS tools in their projects.

Mocha, the most popular test framework, has been a Node module since late 2011
[0], whereas Jasmine has only been one since mid-2013 [1]. I have a hunch that
the majority of Jasmine users aren't on-board with the movement of front-end
JS and NodeJS ecosystems merging, and since Jasmine [3] doesn't really aim
itself at NodeJS projects, it's far less likely to be used in any project
which uses NPM for dependency management.

Not really that important, just interesting :)

[0]:
[https://github.com/visionmedia/mocha/commits/master/package....](https://github.com/visionmedia/mocha/commits/master/package.json?page=5)

[1]:
[https://github.com/pivotal/jasmine/commits/master/package.js...](https://github.com/pivotal/jasmine/commits/master/package.json)

[2]: [https://github.com/pivotal/jasmine](https://github.com/pivotal/jasmine)

~~~
HNJohnC
Jasmine is hopeless for serious node apps, that's why. It lacks critical
features that are long standing and bitterly complained about by node
developers due to the hostile and half-assed responses from the maintainers to
those issues.

~~~
aikah
> It lacks critical features

Like?

------
seldo
We are working on including more accurate and detailed health metrics into the
upcoming re-launch of the npm site. We're definitely going to be including
stuff like GitHub stars (but this disadvantages non-GitHub projects),
downloads, number and recency of versions, dependencies (and dev-dependencies)
and are considering allowing people to filter by license and passing tests.
We're also adding the Collections feature which will allow people to curate
groups of packages they use or think work well together, and that will feed
into rankings as well.

We've barely scratched the surface of the ways we can improve npm search, and
we'd value any ideas the community has. If you have specific suggestions, I've
created an issue over at
[https://github.com/npm/newww/issues/131](https://github.com/npm/newww/issues/131)
for you to throw them into :-)

------
ecaron
> It seems that CoffeeScript isn’t anymore popular for new projects.

I'm happy to see some research behind what my gut has been sensing for a
while. The distinction between "popular" and "popular for new projects" is
very important too. Is anyone else talking about whether-or-not this is an
active downward trend?

~~~
lowboy
That's not to say that packages aren't being written in CoffeeScript as source
and then being compiled to Javascript for distribution. That wouldn't require
CoffeeScript to be in the deps at all if a dev doesn't use build tools (gulp-
coffee, browserify+coffeeify, etc).

~~~
joshfinnie
I agree, I would be interested to see how many packages are written in
CoffeeScript but then compiled for NPM or the open source community.

It is interesting to get people who will not support your project if it's
CoffeeScript... I would think this is the driving force and not so much that
CoffeeScript is not popular.

~~~
lowboy
Agreed, which is too bad. It's one thing to not contribute to a project if the
source isn't to your liking, but it's another to flat out not use the compiled
version.

Besides, I've found that writing CoffeeScript is pretty much like writing
Javascript. Different syntax, but the semantics are almost entirely 1:1,
leaving out the class stuff (which I don't often use). I've introduced CS to
many JS devs and while not all of them loved it, they were able to get up to
speed very quickly. But I understand peoples' objections to it - I think
they're wrong, but I understand them!

------
yaph
It is really impressive how fast NPM grows, last year in July I created two
network graphs to visualize NPM dependencies. There were about 35000 packages
a year and 2 months ago, that number almost tripled. The top 3 packages
(underscore, async, request) in terms of number of dependent packages kept
their positions.

You find the graphs and some more info at the following URLs:

\- [http://exploringdata.github.io/vis/npm-packages-
dependencies...](http://exploringdata.github.io/vis/npm-packages-
dependencies/) (takes a while to load, use Chrome or another Webkit browser)

\- [http://exploringdata.github.io/vis/npm-top-packages-
dependen...](http://exploringdata.github.io/vis/npm-top-packages-
dependencies/) (includes only packages with at least 10 dependent packages)

\- [http://exploringdata.github.io/info/npm-packages-
dependencie...](http://exploringdata.github.io/info/npm-packages-
dependencies/) (info post)

------
rattray
> It looks like underscore is more popular than lodash. I think this table is
> flawed.

I think this way of going about data analysis is flawed... "I don't like this
conclusion, so I'm going to massage the data until it supports my preference".

I do agree that it'd be nice if the npm maintainers went about clearing out
all the dead projects though.

~~~
untog
Yeah, I don't understand that. Why would lodash automatically be more popular
than Underscore?

~~~
lowboy
As tbassetto said, I can't see any reasons to use Underscore over Lo-Dash.
Performance is on par or better, AMD/CommonJS modularity out of the gate, and
their CLI can be used to build a minimal version of the lib based on the
functions you need.

I wrote an article on how you can analyze source code and produce a minimal
Lo-Dash build in only 73 characters: [http://jjt.io/2014/07/18/analyzing-
source-files-to-automatic...](http://jjt.io/2014/07/18/analyzing-source-files-
to-automatically-create-custom-lo-dash-builds-in-73-characters/)

------
nawitus
>Always remember to check the licenses of transitive dependencies. There are
packages which say they are licensed under MIT, yet they depend on an (A)GPL
package! That might or might not to be an issue for you.

I recommend using a tool like license-checker to create a list of all the
licenses. It also shows the unknown ones, so you can start digging for the
licenses. Like the article states, there's a large number of npm packages
without licenses. I've usually made pull requests whenever I stumble upon a
package with missing package.json license information, and I hope the
situation is slowly improving.

------
orf
I'm really impressed, PyPi only has about 37,000 packages[1] and it has been
around for much longer. Really shows how far Node has come.

It would also be interesting to see the average number of lines each package
has.

1\. [http://tomforb.es/how-much-code-is-there-in-the-python-
packa...](http://tomforb.es/how-much-code-is-there-in-the-python-package-
index)

~~~
efdee
It's not as impressive if you consider that people make modules for
everything, no matter how trivial. Case in point:
[https://github.com/blakeembrey/upper-
case/blob/master/upper-...](https://github.com/blakeembrey/upper-
case/blob/master/upper-case.js)

~~~
untog
I've found myself infuriated by this recently. I do understand the desire to
compartmentalise everything, but the last time I installed Express (a web
framework) I also had to install modules to read POST bodies and serve static
files. Quickly my JS file becomes more require() statements than anything else
- it just doesn't seem worth the hassle to me.

~~~
olalonde
There are higher level frameworks, Express is meant to be very minimalist. It
is also usually a good idea to serve static files directly through Nginx or a
CDN. If you find yourself always requiring the same modules in most of your
files, a common pattern is to have a `common.js` file where you export all
your modules and you then only need to have one `require('./common')` at the
top of your files. I personally don't really like this pattern as it makes
refactoring harder and is usually a sign of bad architecture.

~~~
aikah
> There are higher level frameworks

Like?

------
joemaller1
What tools were used to discover and digest the data? I'd be very interested
in the process behind this post.

~~~
phadej
Short version: node.js of course!

A bit longer one:

One can fetch `jsverify` package metadata from
[http://registry.npmjs.org/jsverify](http://registry.npmjs.org/jsverify) and
all current packages are listed in
[http://registry.npmjs.org/-/all](http://registry.npmjs.org/-/all) (this one
is special, its size is around 50MiB). Please cache your results, let's not
DDoS the registry.

There are around one gigabyte of nice JSON data. After initial fetch you can
traverse it using any tools you want. I naturally used node.js for that too.

~~~
redidas
If you'd rather work with a SQL database, here's a module that attempts to put
NPM into a Postgres database: [https://www.npmjs.org/package/npm-postgres-
mashup](https://www.npmjs.org/package/npm-postgres-mashup)

~~~
phadej
Wow, someone had time and motivation to write all of that boilerplate there.
Yet e.g. the license parsing is very naive: compare
[https://github.com/npm/npm-
www/blob/99020b5b3e21607dab24cd69...](https://github.com/npm/npm-
www/blob/99020b5b3e21607dab24cd69a79f4bd7c6a63c70/models/package.js#L193) and
[https://github.com/rickbergfalk/npm-postgres-
mashup/blob/56d...](https://github.com/rickbergfalk/npm-postgres-
mashup/blob/56d77cc3a66fd5aaa870dbd578ad899b5fe32d68/npm-postgres-
mashup.js#L392)

------
terryf
From the version distribution graph it is evident that npm hosts 0 packages
with 25k+ versions, one package that has almos 20k versions and 40 packages
with 0 versions.

That seems to be at odds with the numbers quoted in the rest of the article.
Strange.

(Hint: please do label the axis on your graphs. thanks.)

~~~
pluma
They are labeled (now).

------
kyberias
Note to the author: when presenting numbers in a table, one should place them
in a separate column and align them to the right. It makes them easier to
compare.

