Hacker Newsnew | past | comments | ask | show | jobs | submit | root_axis's commentslogin

I don't think the comparison is valid. Releasing code and weights for an architecture that is widely known is a lot different than releasing research about an architecture that could mitigate fundamental problems that are common to all LLM products.

As far as I'm aware, transitive dependencies are counted in this number. So when you npm install next.js, the download count for everything in its dependency tree gets incremented.

Beyond that, I think there is good reason to believe that the number is inflated due to automated downloads from things like CI pipelines, where hundreds or thousands of downloads might only represent a single instance in the wild.


It's not a transitive dependency, it's just literally bundled into nextjs, I'm guessing to avoid issues with fragile builds.

why is it not normal for CI pipelines to cache these things? its a huge waste of compute and network.

It's certainly not uncommon to cache deps in CI. But at least at some point CircleCI was so slow at saving+restoring cache that it was actually faster to just download all the deps. Generally speaking for small/medium projects installing all deps is very fast and bandwidth is basically free, so it's natural many projects don't cache any of it.

These often do get cached at CDNs inside of the consuming data centers. Even the ISP will cache these kind of things too.

Optimizing for disk space is very low on the priority list for pretty much every game, and this makes sense since its very low on the list of customer concerns relative to things like in-game performance, net code, tweaking game mechanics and balancing etc.

Apparently, in-game performance is not more important than pretty visuals. But that's based on hearsay / what I remember reading ages ago, I have no recent sources. The tl;dr was that apparently enough people are OK with a 30 fps game if the visuals are good.

I believe this led to a huge wave of 'laziness' in game development, where framerate wasn't too high up in the list of requirements. And it ended up in some games where neither graphics fidelity or frame rate was a priority (one of the recent Pokemon games... which is really disappointing for one of the biggest multimedia franchises of all time).


That used to be the case, but this current generation the vast majority of games have a 60 fps performance mode. On PS5 at least, I can't speak about other consoles.

Every style of interview will cause anxiety, that's just a common denominator for interviews.

Seems to me the root problem here is poor security posture from the package maintainers. We need to start including information about publisher chain of custody into package meta data, that way we can recursively audit packages that don't have a secure deployment process.


How so?


7.0 added scalar type declarations and a mechanism for strong typing. PHP 8.0 added union types and mixed types. PHP enforces types at runtime, Javascript/Typescript do not. PHP typesystem is built into the language, with Js u either need jsdoc or Typescript both of which wont enforce runtime type checks, Typescript even adds a buildstep. php-fpm allows u to not care about concurrency too much because of an isolated process execution model, with js based apps you need to be extremely careful about concurrency because of how easy you can create and access global stuff. PHP also added a lot of syntax sugar over the time especially with 8.5 my beloved pipe operator. And the ecosystem is not as fragile as Javascripts.


It'd be interesting to see how well the LLM would be able to write code using the new language since it doesn't exist in the training data.


I've tested this, the LLM will tend to strongly pattern match to the closest language syntactically, so if your language is too divergent then you have continually remind it of your syntax or semantics. But if your language is just a skin for C or JavaScript then it'll do fine.


The part that's new is people being detained for "suspicious" traffic patterns.


Is it? or is the new part that it's being reported? This "news" just looks like an investigation AP conducted on its own. Could they have conducted it years ago, and what would they have found then?


This. TFA barely glanced at it but they mentioned "collaboration with the DEA".

The DEA has been jerking off about how they've been doing this stuff on the east coast corridor for over a decade.

The current CBP situation is basically a Ctrl+C Ctrl+V Ctrl+V Ctrl+V Ctrl+V of what DEA was doing under Bush and Obama.


Historically CBP isn't patrolling the entire country, so yeah, at least the scale and reach is definitely new.


That's the nature of statistical output, even minus all the context manipulation going on in the background.

You say the outputs "seem" to drop off at a certain time of day, but how would you even know? It might just be a statistical coincidence, or someone else might look at your "bad" responses and judge them to be pretty good actually, or there might be zero statistical significance to anything and you're just seeing shapes in the clouds.

Or you could be absolutely right. Who knows?


Something that exhausts me in the LLM era is the never ending deluge of folk magic incantations.


Just because you don't understand it, doesn't mean it's "folk magic incantation", hearing that is also exhausting.

I don't know the merit to what parent is saying, but it does make some intuitive sense if you think about it. As the context fills up, the LLM places less attention on further and further back in the context, that's why the LLM seems dumber and dumber as a conversation goes on. If you put 5 instructions in the system prompt or initial message, where one acts as a canary, then you can easier start to see when exactly it stops following the instructions.

Personally, I always go for one-shot answer, and if it gets it wrong or misunderstands, restart from the beginning. If it doesn't get it right, I need to adjust the prompt and retry. Seems to me all current models do get a lot worse quickly, once there is some back and forth.


> Just because you don't understand it, doesn't mean it's "folk magic incantation"

It absolutely is folk magic. I think it is more accurate to impugn your understanding than mine.

> I don't know the merit to what parent is saying, but it does make some intuitive sense if you think about it.

This is exactly what I mean by folk magic. Incantations based on vibes. One's intuition is notoriously inclined to agree with one's own conclusions.

> If you put 5 instructions in the system prompt or initial message, where one acts as a canary, then you can easier start to see when exactly it stops following the instructions.

This doesn't really make much sense.

First of all, system prompts and things like agent.md never leave the context regardless of the length of the session, so the canary has absolutely zero meaning in this situation, making any judgements based on its disappearance totally misguided and simply a case of seeing what you want to see.

Further, even if it did leave the context, that doesn't then demonstrate that the model is "not paying attention". Presumably whatever is in the context is relevant to the task, so if your definition of "paying attention" is "it exists in the context" it's actually paying better attention once it has replaced the canary with relevant information.

Finally, this reasoning relies on the misguided idea that because the model produces an output that doesn't correspond to an instruction, it means that the instruction has escaped the context, rather than just being a sequence where the model does the wrong thing, which is a regular occurrence even in short sessions that are obviously within the context.


> First of all, system prompts and things like agent.md never leave the context regardless of the length of the session, so the canary has absolutely zero meaning in this situation, making any judgements based on its disappearance totally misguided and simply a case of seeing what you want to see.

You're focusing on the wrong thing, ironically. Even if things are in the context, attention is what matters, and the intuition isn't about if that thing is included in the context or not, as you say, it'll always will be. It's about if the model will pay attention to it, in the Transformers sense, which it doesn't always do.


> It's about if the model will pay attention to it, in the Transformers sense, which it doesn't always do.

Right... Which is why the "canary" idea doesn't make much sense. The fact that the model isn't paying attention to the canary instruction doesn't demonstrate that the model has stopped paying attention to some other instruction that's relevant to the task - it proves nothing. If anything, a better performing model should pay less attention to the canary since it becomes less and less relevant as the context is filled with tokens relevant to the task.


> it proves nothing

Correct, but I'm not sure anyone actually claimed it proved anything at all? To be entirely sure, I don't know what you're arguing against/for here.


> This is exactly what I mean by folk magic. Incantations based on vibes

So, true creativity, basically? lol

I mean, the reason why programming is called a “craft” is because it is most definitely NOT a purely mechanistic mental process.

But perhaps you still harbor that notion.

Ah, I suddenly realized why half of all developers hate AI-assisted coding (I am in the other half). I was a Psych major, so code was always more “writing” than “gears” to me… It was ALWAYS “magic.” The only job where literally writing down words in a certain way produces machines that eliminate human labor. What better definition of magic is there, actually?

I’ll never forget the programmer _why. That guy’s Ruby code was 100% art and “vibes.” And yet it worked… Brilliantly.

Does relying on “vibes” too heavily produce poor engineering? Absolutely. But one can be poetic while staying cognizant of the haiku restrictions… O-notation, untested code, unvalidated tests, type conflicts, runtime errors, fallthrough logic, bandwidth/memory/IO costs.

Determinism. That’s what you’re mad about, I’m thinking. And I completely get you there- how can I consider a “flagging test” to be an all-hands-on-deck affair while praising code output from a nondeterministic machine running off arbitrary prompt words that we don’t, and can’t, even know whether they are optimal?

Perhaps because humans are also nondeterministic, and yet we somehow manage to still produce working code… Mostly. ;)


> I was a Psych major, so code was always more “writing” than “gears” to me… It was ALWAYS “magic.

The magic is supposed to disappear as you grow (or you’re not growing). The true magic of programming is you can actually understand what once was magic to you. This is the key difference I’ve seen my entire career - good devs intimately know “a layer below” where they work.

> Perhaps because humans are also nondeterministic

We’re not, we just lack understanding of how we work.


I’m not talking about “magic” as in “I don’t understand how it works.”

I’m talking “magic” as in “all that is LITERALLY happening is that bits are flipping and logic gates are FLOPping and mice are clicking and keyboards are clacking and pixels are changing colors in different patterns… and yet I can still spend hours playing games or working on some code that is meaningful to me and that other people sometimes like because we have literally synthesized a substrate that we apply meaning to.”

We are literally writing machines into existence out of fucking NOTHING!

THAT “magic.” Do you not understand what I’m referring to? If not, maybe lay off the nihilism/materialism pipe for a while so you CAN see it. Because frankly I still find it incredible, and I feel very grateful to have existed now, in this era.

And this is where the connection to writing comes in. A writer creates ideas out of thin air and transmits them via paper or digital representation into someone else’s head. A programmer creates ideas out of thin air that literally fucking DO things on their own (given a general purpose computing hardware substrate)


> so code was always more “writing” than “gears” to me… It was ALWAYS “magic.”

> I suddenly realized why half of all developers hate AI-assisted coding (I am in the other half).

Thanks for this. It helps me a lot to understand your half. I like my literature and music as much as the next person but when it comes to programming it's all about the mechanics of it for me. I wonder if this really does explain the split that there seems to be in every thread about programming and LLMs


Can you tell when code is “beautiful”?

That is an artful quality, not an engineering one, even if the elegance leads to superior engineering.

As an example of beauty that is NOT engineered well, see the quintessential example of quicksort implemented in Haskell. Gorgeously simple, but not performant.


> So, true creativity, basically? lol

Creativity is meaningless without well defined boundaries.

> it is most definitely NOT a purely mechanistic mental process.

So what? Nothing is. Even pure mathematics involves deep wells of creativity.

> Ah, I suddenly realized why half of all developers hate AI-assisted coding

Just to be clear, I don't hate AI assisted coding, I use it, and I find that it increases productivity overall. However, it's not necessary to indulge in magical thinking in order to use it effectively.

> The only job where literally writing down words in a certain way produces machines that eliminate human labor. What better definition of magic is there, actually?

If you want to use "magic" as a euphemism for the joys of programming, I have no objection, when I say magic here I'm referring to anecdotes about which sequences of text produce the best results for various tasks.

> Determinism. That’s what you’re mad about, I’m thinking. And I completely get you there- how can I consider a “flagging test” to be an all-hands-on-deck affair while praising code output from a nondeterministic machine running off arbitrary prompt words that we don’t, and can’t, even know whether they are optimal?

I'm not mad about anything. It doesn't matter whether or not LLMs are deterministic, they are statistical, and vibes based advice is devoid of any statistical power.


I think Marvin Minsky had this same criticism of neural nets in general, and his opinion carried so much weight at the time that some believe he set back the research that led to the modern-day LLM by years.


I view it more as fun and spicy. Now we are moving away from the paradigm that the computer is "the dumbest thing in existence" and that requires a bit of flailing around which is exciting!

Folk magic is (IMO) a necessary step in our understanding of these new.. magical.. tools.


I won't begrudge anyone having fun with their tools, but folk magic definitely isn't a necessary step for understanding anything, it's one step removed from astrology.


I see what you mean, but I think it's a lot less pernicious than astrology. There are plausible mechanisms, it's at least possible to do benchmarking, and it's all plugged into relatively short feedback cycles of people trying to do their jobs and accomplish specific tasks. Mechanical interpretability stuff might help make the magic more transparent & observable, and—surveillance concerns notwithstanding—companies like Cursor (I assume also Google and the other major labs, modulo self-imposed restrictions on using inference data for training) are building up serious data sets that can pretty directly associate prompts with results. Not only that, I think LLMs in a broader sense are actually enormously helpful specifically for understanding existing code—when you don't just order them to implement features and fix bugs, but use their tireless abilities to consume and transform a corpus in a way that helps guide you to the important modules, explains conceptual schemes, analyzes diffs, etc. There's a lot of critical points to be made but we can't ignore the upsides.


I'd say the only ones capable of really approaching anything like scientific understanding of how to prompt these for maximum efficacy are the providers not the users.

Users can get a glimpse and can try their best to be scientific in their approach however the tool is of such complexity that we can barely skim the surface of what's possible.

That is why you see "folk magic", people love to share anecdata because.. that's what most people have. They either don't have the patience, the training or simply the time to approach these tools with rational rigor.

Frankly it would be enormously costly in both time and API costs to get anywhere near best practices backed up by experimental data let alone having coherent and valid theories about why a prompt technique works the way it does. And even if you built up this understanding or set of techniques they might only work for one specific model. You might have to start all over again in a couple of months


> That is why you see "folk magic", people love to share anecdata because.. that's what most people have. They either don't have the patience, the training or simply the time to approach these tools with rational rigor.

Yes. That's exactly the point of my comment. Users aren't performing anything even remotely approaching the level of controlled analysis necessary to evaluate the efficacy of their prompt magic. Every LLM thread is filled with random prompt advice that varies wildly, offered up as nebulously unfalsifiable personality traits (e.g. "it makes the model less aggressive and more circumspect"), and all with the air of a foregone conclusion's matter-of-fact confidence. Then someone always replies with "actually I've had the exact opposite experience with [some model], it really comes down to [instructing the model to do thing]".


> As the context fills up, the LLM places less attention on further and further back in the context, that's why the LLM seems dumber and dumber as a conversation goes on.

This is not entirely true. They pay the most attention to the things that are the earliest in history and the most recent in it, while the middle between the two is where the dip is. Which basically means that the system prompt (which is always on top) is always going to have attention. Or, perhaps, it would be more accurate to say that because they are trained to follow the system prompt - which comes first - that's what they do.


Do you have any idea why they (seemingly randomly) will drop the ball on some system prompt instructions in longer sessions?


Larger contexts are inherently more attention-taxing, so the more you throw at it, the higher the probability that any particular thing is going to get ignored. But that probability still varies from lower at the beginning to higher in the middle and back to lower in the end.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: