

New npm Registry Architecture - IsaacSchlueter
http://blog.npmjs.org/post/75707294465/new-npm-registry-architecture

======
anvandare
>Storing every package tarball as an attachment on a CouchDB document is
painful. It works fine for a few hundred docs, but at well over a quarter
million package versions, this makes view generation and compaction much too
onerous, and the file becomes too large to ever successfully compact.

How big is the average npm package? Is this issue due to couchdb's treatment
of attachments, or simply due to the size/volume of the packages themselves?

~~~
aaronblohowiak
>On creation, attachments go into a special _attachments attribute of the
document. They are encoded in a JSON structure that holds the name, the
content_type and the base64 encoded data of an attachment. A document can have
any number of attachments.

attachments are by default stored inline in the document itself

~~~
pokstad
That's only when you are initially saving the document. Once the database
receives the attachment, it creates a stub and removes the inline data.

------
__david__
Why upload attachments to the skim db and then (more or less) immediately pull
them off onto manta? Is that so the client upload software doesn't have to
change? If that's the case, would it make sense for the client upload software
to change now so it can upload directly to the separate attachment store?

~~~
IsaacSchlueter
Good question! There are a few benefits to doing it this way.

1\. Backwards compatibility, as you mention. We do plan on re-evaluating and
seeing what client changes would allow us to do this more efficiently, but
there's a big time lag on rolling out a new npm and people actually adopting
it. It's safe to say that someone out there will still be using the current
release in 2 years or so, so if we can keep it working, then that's a friendly
thing to do.

2\. If the Skim Worker daemon falls over, the attachments are still going
_somewhere_ , and we can always have it catch up later. Apart from disasters,
it also means we can treat this daemon a bit roughly. If we change a config or
spin up a new one, or otherwise mess with it, no biggie.

3\. In the race where you publish, and then someone fetches it right away,
before the Skim Worker gets to it, the Fastly configs can detect the 404 and
pull it out of the DB directly.

4\. If Manta ever goes down, the skimworker will start failing (and pinging
nagios, of course) and if need be, we can have the binary GETs go first to
FullfatDB and subsequently to SkimDB. That Fastly config change takes about
30s to roll out, so we can mitigate downtime very quickly.

Eventually, we'll probably restructure the PUT endpoints so that it's a bit
more clever, but still maintain backwards compatibility in our public API
surface.

------
ozh
Totally off topic, but always bugged me: why does the NPM site display email
addresses in clear text on profile pages? I don't see the added value in doing
that (except for making email harvesters' job easier)

~~~
seldo
No added value, just haven't got around to implementing anything cleverer,
yet. There is an open issue:

[https://github.com/npm/npm-www/issues/569](https://github.com/npm/npm-
www/issues/569)

Feel free to chime in if you have a suggestion or even submit a patch; npm Inc
only has 4 full-time engineers right now (we're hiring!) so it will be a while
before we get to this otherwise :-)

~~~
ozh
Thanks! Will check that

~~~
IsaacSchlueter
Just as a caveat, be aware that your email address is already exposed if you
use git and github:
[https://github.com/YOURLS/YOURLS/commit/cf7b2e6aedebe0c65466...](https://github.com/YOURLS/YOURLS/commit/cf7b2e6aedebe0c65466749b7d5832fa4c7c0420.patch)

In general, hiding email addresses doesn't usually make the job of harvesters
appreciably harder, and does make the life of genuine users a bit more
painful.

We'll probably start hiding email addresses altogether once we have a
messaging system so that it's still easy for npm users to contact one another
when they need to. Until then, I'd accept a patch to do the standard silly
"hiding" thing with some JS that shows it on the page.

