For agentic use cases, where you might need several round-trips to the LLM to reflect on a query, improve a result, etc., getting fast inference means you can do more round-trips while still responding in reasonable time. So basically any LLM use-case is improved by having greater speed available IMO.
The problem with this is tok/sec does not tell you what time to first token is. I've seen (with Groq) where this is large for large prompts, nullifying the advantage of faster tok/sec.
We are planning to move our blog off of Medium (we've been busy!), but this post is public so you can actually just click through the nag screen if you see one.
Retrieval-Augmented Generation, where you ask an LLM to answer a question by giving it some context information that you have retrieved from your own data rather than just the data it was trained on.
Thanks! Even with a better documentation, document importers don't extract node metadata so one needs to write their own "text and metadata extractor" as well. It's then easier to skip LlamaIndex altogether, or just get inspiration from some re-ranking etc. you guys did.
If this happens often, perhaps the user interface for npm publish needs to change? I mean, that's the only thing I can see mitigating this, with like a nice dialog that says "hey, are you REALLY REALLY sure and have you consulted lawyers on this???"
Or something to that effect. Or maybe companies can just pony up for NPM Enterprise which fits their use case.
I work at a major bank and for the US Army. Both of those organizations are hyper sensitive about security for good reasons. Unfortunately that hyper sensitivity often results in really bad decisions and gross misunderstanding of software.
Publishing software is not a security violation in greater than 98% of cases. The only valid exceptions are protections of trade secrets and cryptographic information.
I am not counting software with embedded credentials, embedded business data, or other bad practices. Those are security violations regardless of public exposure.
Trying to explain this to security sensitive organizations is painful. I am confident in the stupidity of this conversation as somebody who has been writing code for more than 20 years and passed the CISSP exam the first time back when it was a 250 question paper test.
I'm trying to understand your point. I agree with the basic premise that most pieces of code that end up not getting exposed are not sensitive, but building guardrails to ensure people who aren't security minded ask the right questions and get closer to 'default correct' ends up being a requirement in major banks/government organizations. The number of times someone has done something completely silly like include API keys in a public package or Git repo is reason enough to care. Being beholden on an external system being secure and up to standard ruins pretty much every category of a risk analysis, especially when you end up with gold like this article or the PHP PEAR hack.
Then there is always that part of the organization that isn't judicious about keeping packages up to date, and these kinds of package exposures expose their negligence.
Then there is the other piece of my systems being dependent on your systems in a sometimes inappropriate matter if precautions aren't taken. 'Oh I just added this dependency' is a quick way to an outage if rules aren't set.
> The number of times someone has done something completely silly like include API keys in a public package or Git repo is reason enough to care.
Agreed, but that is not a publication problem. That is a separation of concerns violation which indicates a host of other problems from the lack of code review to incomplete security testing to various ad hoc or integrity violations.
External systems have no bearing on the validity and completeness of your organizations internal security controls. It doesn't matter how incomplete, insecure, or unqualified NPM is to serve a given set of code. The problem isn't NPM or the publication to NPM. The problem is the contents that comprise the publication in question. A good security audit would ask why any certain content is available for publication in violation of internal policy regardless of what that content is.
For example if you accidentally publish to NPM code containing a bunch of user PII the problem is why PII was resident in the code in the first prior to publication. The fact that such PII is exposed is now a different second problem demanding a different resolution. You could make the argument that halting and regulating all publications would solve that problem. That is incorrect, because the PII is still exposed within your organization outside of a controlled environment and can still be leaked to the public by various other means.
> Then there is always that part of the organization that isn't judicious about keeping packages up to date, and these kinds of package exposures expose their negligence.
That is dependency management whether or not you own the packages in question. Dependencies need to be appropriately managed for a variety of security reasons. Exposing poor dependency management advertises a vulnerability, but the vulnerability is there anyways and a dedicated malicious attacker will exploit it the same either way.
---
The bottom line is that hiding your security problems by "not publishing" is not a valid security control. That is the dreaded security by obfuscation and it works both ways. By hiding the vulnerability you also hide the exploitation from visibility.
You could not have a default. Then in order for someone to publish to the public NPM, they'd have to enter the URL for the public NPM. Otherwise publish should fail.
Just compare this to publication to Maven Central - you’ll never publish there by accident exactly because there are significant barriers. Public NPM repo should not be that easily accessible for upload.
At some points in a language and its package management system's lifetime, reducing barriers to publishing are one of the best things that can be done to increase packages and fill out the ecosystem, and drive utility and adoption.
Later, once you have most needs filled by packages, and a good number of enterprise users, more control is beneficial. Companies appreciate it, and single users are willing to jump through an extra hoop or two much of the time because the rest of the ecosystem is so useful that it's not worth switching languages.
I think it's unlikely that a system will move from one style to another without an event causing them to reevaluate their prior choices. More likely, multiple events. This has already happened with NPM for other choices they made in the past, such as letting package namespaces be claimed by new people after someone gives it up, and whether releases are immutable, IIRC.
Doesn't package.json have an is private repo flag? Why not just respect that?
Why does everyone everyone in this thread think a pop up is the solution?
Pop ups are a code smell. They mean your application does not correctly match user intent with the action so badly you had to specifically get your user to tell you what they meant to do. Did you mean to do that? Always, yes. Otherwise, undo.
The only place did you mean makes sense is in Google search results.
Why is public and private publish anywhere near each other? Why are they even on the same page?
> Doesn't package.json have an is private repo flag? Why not just respect that?
npm does reflect that flag. If you set private in package.json, npm won't publish it publicly. From docs:
> private
> If you set "private": true in your package.json, then npm will refuse to publish it.
> This is a way to prevent accidental publication of private repositories. If you would like to ensure that a given package is only ever published to a specific registry (for example, an internal registry), then use the publishConfig dictionary described below to override the registry config param at publish-time.
Perhaps inverting the logic there might be worth considering?
Make it so you have to explicitly go in and mark your package.json as public before npm will publish it, and have the default be private?
I don't have _too_ much sympathy for the bank here - it's in npm's best interest to make it easy to publish leftpad.js easily <snarky smirk> - and that probably should be their default stance.
The bank should be responsible for ensuring their "banking grade security" includes not accidentally publishing their source code to public repos. (How much would you bet against there being instances exactly like this where the publication vector was GitHub instead of npm? How much would you bet against this exact code being on a public git repo somewhere as well? How many public code hosting services should be expected to change their business models because some bank gets uptight after they've fucked up?)
What's impressive to me is you (and so many others) being charitable enough to assume that the original developer didn't intentionally publish this package.
Some bank developer (or more likely, some underpaid contractor) wants to share something between projects and doesn't want the hassle of proper channels, or just doesn't care enough and thinks "I'll publish this, who will ever find it?".
Years later someone stumbles upon it, maybe they don't even know who did it, "NPM why do you have our code?!?!?!?!"
This is the most likely scenario once you consider this was a bank. In which case there's nothing NPM could do. No warning would have changed their intent, they knew what they were signing up for.
If there’s enough friction, no one would bother publishing to public repo instead of setting up private Nexus instance, which is quite easy. And it’s quite possible that the leak could happen on early stage of CI setup for that project (private flag removed, but wrong login used). It’s a mistake very easy to make: “private” flag is just not an appropriate tool for private CI.
> it's in npm's best interest to make it easy to publish leftpad.js easily
That's what I was getting at above. It definitely was in NPM's best interest to do so. Depending on your definition of "easily", is it still in their interest to have it quite that easy? Perhaps a different default is in order now, as you suggest. Or perhaps it should even require a confirmation dialog on the terminal for the first public push? People will still use NPM at this point. Not having enough packages or a package for a specific functionality is hardly a problem for them any more.
They could just change it to `npm publish npm|url name` with some useful warnings. The name should be checked against package.name.
Then libraries could simply add an npm script for publishing.
npm publish
To publish a package to
npm, you must enter:
npm publish npm <package-name>
To publish a package to
another registry, you
must enter:
npm publish <url> <package-name>
cool, so now the developer that did this will just thoughtlessly type npm publish npm instead.
At the end of the day, there's only so much you can do. Really, a hammer shouldn't have a prompt on it confirming each hit. If it does, users will just instinctively press it and then a few months down the road they'll hit their thumb.
The npm docs also state that the "files" property of the package.json limits what gets installed if you install that package in another project.
Ever tried installing a local package via a file path? NPM just symlinks it into node_modules, causing issues because you suddenly have duplicated dependencies. Yarn does the same, btw.
I have given up any hope regarding npm/yarn acting sensible long ago.
> Pop ups are a code smell. They mean your application does not correctly match user intent with the action so badly you had to specifically get your user to tell you what they meant to do.
Not a rule. Do you really want undoable actions like "Delete" to just delete in a touch operated interface?
I disagree, there have been times a well-placed popup stopped me from accidentally doing something really stupid, other times there wasn't a popup and I ended up doing something stupid.
It doesn't necessarily have to be a pop up popup per say, but extra validation around dangerous actions is user friendly.
And of course there can always be an override for the extra validation in case it potentially screws up some people's workflows, but I'd make a user explicitly set the override, like the 'NoHostAuthenticationForLocalhost' option in ssh for example.
Popups only work if they Confirm button is not a button, but "enter this text exactly: I really want to publish this package to the whole wide world" and block copy-paste.
If the user hacks around that, it's their own fault.
I wasn't condoning popups. I was making a high level observation of package systems in general.
That said, I fully support a terminal level confirmation the first time something is pushed publicly in any package manager. It is absolutely the correct thing to do to add safeties to a process that is irreversible and can have negative consequences. Often enough, making anything public online is irreversible, and making something public that wasn't ever supposed to be can have negative consequences in many respects.
But what I was really thinking of when I was typing my original comment was moving to a system where someone actually approves new package publishing accounts and or some subset of package namespace requests. Systems that start without any sort of moderating or approval process seem to eventually settle on one. The reasons for this are numerous, from security to just keeping people from overwhelming the more common or sought after names and general sanity checks (does a system really need to allow separate packages for a term where one is singular and one is plural? Who does that help?).
This is going to be cynical, but as far as I understand it people are looking for usability through vanity.
Why not install `com.facebook.react’? Reverse domain notation is remarkably elegant given our internet. You are not typing ‘npm i com.facebook.react’ so often that it’s a pain. You probably use ‘create-react-app’ which is even worse.
Instead, every language creates a new cash grab for common names. And made it worse. New namespaces, new squatting. I can publish ‘react-racket’ and do whatever I want behind the scenes with it.
Case in point: do you add coffeescript or coffee-script.
Why optimise for keystrokes in your term instead of stability for your client? Jesus fuck.
That would be a bad idea, and it's not just brevity.
- If com.facebook.hr has previously been published, would it mean that facebook can never have a division named HR?
- Once a company goes belly up, the domain often ends up with squatters/spammers. Domains with published packages will sell for a lot more in the underground market - for pure exploitation of rights to publish a newer version.
- In the absence of validation, nothing stops anyone from publishing com.google.exploitlib. And domain validation is friction.
There's a little bit of movement happening in that direction in the npm world with scoped packages. E.g. babel moving all official packages to @babel. Storybook does it too. Doesn't even have to be much of a branding loss. FB could publish @react/react, @react/native, @react/eslint-config, @react/create-app, @react/prop-types, @react/dom... Most of the typing is happening in require/import not npm install, so there's some argument for not going the full java route.
It would be good if there were a standard format for referencing private keys in code, like "pk_*". Then NPM could say,
> "it looks like your package might have a private key in file xyz.js on line 27. Please type "no it doesn't" into the box below if you're sure this is not the case".
Adding 'private: true' to the package.json prevents publishing to _any_ registry, including a corporate proxy. Adding a string or regex option for private that would only publish to matching registries may prevent issues like this.
I ask for regex only because our corp proxy binds to a random port reach time it runs so a static string wouldn't be flexible enough.
> Benefit of the doubt says that they thought they were publishing privately.
In what world is pushing your source code to a venture backed (therefore viral growth oriented) company who promote themselves with "npm Inc supports the JavaScript community by providing the registry where developers publish and share packaged open-source modules" possibly consistent with a view that "they thought they were publishing privately"???
Sorry, but I just don't buy that.
Somebody at the bank fucked up. It cannot possibly be npm Inc's responsibility to detect and somehow police that.
> If their code is proprietary, no one can use it. Even if they accidentally uploaded it to a public site.
Surely this depends on the terms under which they uploaded it. I would expect npm to have a legal structure in place under which code you upload for public use is also licensed for public use.
That would be the "terms under which they uploaded it"
NPM doesn't appear to have a "default license" though, so that would be "no license", therefore normal copyright law would seem to apply, and you can't make a copy of it, and more than you can copy a picture on a billboard or a blog post or whatever.
IANAL but if someone makes code publicly available (for 3 years). Then isn't there an argument to be made that its reasonable to make use of it? Probably not redistribute it, but use it at least. So I'm not even sure an explicit upload license would be required.
You seem to be arguing for an implied license; such things do exist—but the exact scope is often not obvious even to lawyers in the absence of case law covering very similar situations as to the kind of content and the use to be made of it.
If you leave your keys in your car for 3 years it's still illegal for me to take a joy ride in it. I don't personally believe in/support the concept of IP but in a world that does (like the US) it doesn't make sense to me that people being able to see your property for 3 years gives them the right to use it.
Actually, in the US "Abandoned Vehicle" is a legal thing and depending on local laws you might very well be able to claim, and get title to a vehicle that has been abandoned on your property. And the abandonment period can be really short, 48 hrs in some states. It depends on your state's definition of "abandoned vehicle", and local laws, and it will probably require a few trips to the DMV and might require filing in small claims court, but there is a legal process for gaining ownership of a vehicle that has been left on your property.
Same for any lost property, if you find something valuable (wallet full of cash), you generally have to turn it in to the police, and there is a notification process to try to find the owner, and after a period of time (generally 3 months), if no one has claimed it, it's yours. Again, local laws are going to differ, but the general legal concept, that "A finder of property acquires no rights in mislaid property, is entitled to possession of lost property against everyone except the true owner, and is entitled to keep abandoned property."[1] is common.
There's some old saying about possession being 9/10ths of the law....
I have never ever seen a sign next to the mints saying they a free for patrons. I just assume because that's the done thing, I would make the same presumption about the software package.
Of someone leaves a car on my drive I can do something about it, you may not be able to do in your territory. I'm surprised there isn't the a legal concept of abandonment though, what do you do if someone drops an empty can on your land?
With physical property, there is the concept of Squatter's Rights. With copyright, if you fail to protect it adequately (which I don't think is very well defined by the court system), then the IP in question can pass into the public domain.
I'm not sure what all rights (physical or otherwise) might be applicable here.
> With copyright, if you fail to protect it adequately (which I don't think is very well defined by the court system), then the IP in question can pass into the public domain.
This is not true. Not even remotely true. It is routine that a company notices someone using their copyrights after decades and then sues about it. Oracle is suing Google over code that was "unprotected" for a decade before they decided to sue. In Australia (I know, different country, but this is the same), Men at Work were successfully sued 29 years after they released "Land Downunder" because it has a two bar riff with similarity to a song written in 1928 [1].
As a side note, I see this all the time. What is it about this particular topic that people seem to (a) consistently confuse these things but more importantly (b) feel confident enough about ti to repeat the confused viewpoint with certainty to others?
What if the public site states in their terms that they must be granted those rights on the uploaded material, and the actual copyright holder is the one who does the uploading (but accidentally)?
The software could contain a trade secret, and someone could discover it from reading the source (e.g. the banks magic evaluation function for credit ratings, or their trading strategy, or ...). That doesn't grant them any protected right on the software, which is protected by copyright.
If you don't have a license from the copyright holder, you can't legally use it, except for fair use exemptions: perhaps you could write a blog post criticizing it.
> If you don't have a license from the copyright holder, you can't legally use it, except for fair use exemptions: perhaps you could write a blog post criticizing it.
To be clear, do the files contains a copyright or licence, if they do not and many companies don’t attach a copyright header to their files. Why would the assumption be that the files are not public domain or free for use
Theres nuance though, that copypasting a previous comment doesn't answer.
What about public domain works for example?
Or you had a good faith belief you had permission from the copyright holder, eg someone misrepresented themselves as the copyright holder, or the copyright holder published the code in public without a copyright notice?
Public domain: any software made to run on current machines is too new to have expired copyright; the author(s) may have dedicated it to the PD, but you have to find that dedication, which is equivalent to a license.
Good faith: that may affect the amount of damages the copyright holder can extract, but it's still illegal to use the software.
Copyright notices: haven't been required for 30 years.
Copyright older than 30 years still requires the notice (and this is banking software).
My underlying point though was that it was an unreasonable answer, to just copy paste the previous answer. No one here that I've seen has claimed to be a lawyer, and no one I've seen has defined what nations laws we are talking about. At that level of discourse, the question posed, deserved a reasonable answer.
> Copyright older than 30 years still requires the notice
Nope, only on works published over 30 years ago. This package was published only three years ago, regardless of when it was created.
There really isn't much nuance under the copyright rules almost universally agreed under treaties like Berne, UCC and TRIPS. This kind of what-ifing a clear statement just sounds like a bad movie trope.
We don't know when it was first published though. If its Cobol code, with dates from the 70s in the comments, that's different to if being JavaScript or some such.
And if you get enough money and lawyers in one place you can create plenty of nuance.
Dragonwriter reminded me of the term, implied licence in another subthread. That clearly seems arguable in this case even if it isn't considered winnable. Case law progresses through winning 'unwinnable' cases.
I think we're approaching this from completely different positions though. I appreciate the what-ifing, exploring the hypotheticals. It isn't as if we have any power to make a difference in a court of law, and I would hope no one is relying on this thread for legal advice.
Buying a Rolex from some guy in a car park is different to buying one from a jewelers. The former wouldn't protect you in any way, the later would let you demonstrate a good faith belief that it wasn't stolen, and wasn't fake.
This is a tangent but there's nothing wrong with buying a fake. So you can have a good faith belief that it was a counterfeit, which can protect you somewhat in the case that it was stolen.
More generally, say if wanted consumer protections consistent with it being a Rolex, or if you wanted to sell it as a Rolex. Then whether you bought it as a fake does matter.
We're talking about code. Which is more like a recipe than a novel.
If Coca Cola writes down its proprietary recipe on their entrance "by mistake", I can definitely make use of it. Maybe I can't photocopy it for sale, but I can definitely re-use their previously-secret techniques.
I can even say I got it from them through their own error and have the exact same outputs for the exact same inputs.
Trade secrets aren't your secret anymore once you publish them.
All we know in this case is that copyright law was used as a tool to remove it.
If you reverse engineer a system and write a spec using clean-room technique, it's going to be massively easier for the team to do it if they have lawful access to the no-longer proprietary source code.
And wouldn't that be the method to re-create your own copyrightable implementation of GPL code too?
As someone above posted the wider "List of parties to international copyright agreements", this is one of the boxes that need to be ticked prior to signing any form of deal between countries. It's a kind of 'fundamental' in order to start doing business with that country (or for the country to be taken seriously).
DPRK only recognises foreign copyrights, and have no concept of IP for its own works (because everything is done to superior order, there is no creativity allowed). Micronesia does not have copyright, but has even stricter regime of creative works, amounting to a patent-like protection.
Or a little more radical, why even bother with the lawyers? You can explain to them how it is, and if they won't listen, they can sue you. Then you spend some lawyer costs, they lose because the whole thing is ridiculous, and you get a 'cost conviction' or 'cost order' as we call it (kostenveroordeling, where you pay the winning party's costs to prevent abuse of the system) so they pay your lawyer.
So true. How many takedown requests per week do you think Apache.org gets from megacorps of the form "Our employee asked for help, and their debugging logs published our internal URLs. please delete this post and all replies" ?
Further clarifying: npm will revoke all tokens issued before 2018-07-12 12:30 UTC. If you rolled your tokens after that time you will not need to re-issue them.
Were any of the deleted packages temporarily hijacked? It seems strongly like this was the case. If so, please confirm immediately so people who installed packages during this time can start scanning for malware.
Even if the answer is “yes, 1+ packages were hijacked by not-the-original author, but we’re still investigating if there was malware”, tell people immediately. Don’t wait a few days for your investigation and post mortem if it’s possible that some users’ systems have already been compromised.
I would also hope for and expect this to be communicated ASAP from the NPM org to its users.
@seldo, I understand that you don't want to disseminate misleading info, but an abundance of caution seems warranted in this case as my understanding of the incident lines up with what @yashap has said. If we're wrong, straighten us out --- if we're not, please sound an advisory, because this is major.
Yeah, these were some core, widely used packages that were deleted. If they were temporarily hijacked, lots of dev machines (including mine) may have been compromised. There’s a major security risk here, if there was any hijacking now is not the timing for information hiding and PR.
Seems like you should have froze publishing instead of saying, "Please do not attempt to republish packages, as this will hinder our progress in restoring them." Especially, to prevent, even temporary, hijacking.
"How would package signing prevent people from requesting the wrong package? The malware author could also sign their package."
And here is a perfect example. Someone replaced a legit package with a malicious one. Had the original author signed the package, then then NPM users could have defended against the new malicious author, because the new author's signing key would not be in their truststore.
Unsigned packages leave NPM package users defenseless. I hope that is crystal clear now.
When I was doing pentesting, we had an interesting assignment. Our job was to pop a dev project. Then we'd tell them how to secure themselves.
One of our tactics was to set up fake Github profiles with very similar names, then try to get someone internal to the team to `git clone` and run our code. Boom, remote shell.
We didn't execute the plan. But it was thrown around as an idea.
When a package on npm can disappear, and a new package can appear in its place at a later version, by a different author, and there is no connection between those two people, then you're in a bad situation. Just because no one currently runs attacks like this doesn't mean you'll be safe forever. It's worth getting ahead of this.
I don't know whether package signing is the best solution. Maybe yes, maybe no. But the question is, if a package vanishes, what is the proper action to take?
The solution seems like a rollback. Let us have the latest previous version from the same author, by default. That will fix the builds and not require any heavyweight changes.
But package signing would definitely be nice, if it can be integrated in a lightweight and chaos-free fashion.
Yup. Publishing to Clojars requires GPG and is a bit of a pain compared to publishing to NPM. I'd take Clojar's approach any day of the week to this nonsense, though.
Actually I'm doing him a favor ... I completely understand that people talk like that within companies. When emotions are involved, that's what happens. When you're acting in any capacity as a spokesperson for a company (or I guess a government or non-profit too), a bit more decorum is called for. It's not just him - I've been feeling this for a long time. One thing I appreciated about Obama was that he was always dignified (not that I always agreed with what he was saying). Now that the POTUS posts uncouth tweets, maybe it's okay to put statements like that in your SEC filings too.
I got down-voted for calling out some of Kalanick's frat-boy behavior and speech. I'm sure it's not popular on a site predominated by twenty-somethings but since I'm old, I'd prefer to be called old-fashioned or out-of-touch rather than simply being dismissed. If it helps ... I'm sorry that I was so blunt - I should have typed these couple of paragraphs instead.
Speaking how he spoke is exactly what the situation called for, and shaming him like this might give people the impression that the community doesn't support it. People feel differently, but for me, it was a breath of fresh air. Finally, someone talking straight with a community! "We fucked up. Report incoming." Done, A+. We can all relate.
Maybe that's not professional enough for certain circles, but hopefully this mindset will permeate to them eventually. We could all stand to loosen up a bit.
Any update on the post-mortem? How long have the binaries been replaced? Is there evidence that malware was injected into the binaries?
Additionally, you should brush up on your code signing implementations. Had you signed it with a trusted code signing cert, consumers could have verified that you produced the binaries...and not a malicious user. Assuming they didnt have access to the private key material of your code signing key.
We don't know their priorities so we can't really judge that.
I am sure there are many things that you don't care about that others would think you should, but it doesn't mean you are wrong and they are right. The world isn't black and white.
Yarn does not run a mirror of the registry. registry.yarnpkg.com is a pass-through domain to the npm registry. It allows them to collect stats about yarn usage but is not a mirror.
L1 visas are easier for the company to get, because there is no cap on the number issued. The employee on an L1 is just as qualified as an H1, but less free -- they cannot switch jobs, and if they get fired they must leave the country within 15 (!) days.