PDFs are not really immutable. I use Okular all the time to write my "notes" (it's just text that you can place anywhere) on top of a PDF form and then print out a new completely filled out PDF. The only thing I do by hand is sign the physical paper.
Your understanding of immutability feels skewed here. Every time you annotate the PDF, it creates a new version. Even when you overwrite the same file, the structure of the original document changes, therefore creating a new document, ultimately making it "the ship of Theseus.pdf"
Sure, someone may try using the same argument, applying it to .doc and .txt documents, yet there is a general consensus saying that pdfs were designed to "resist the change". You can probably self-illustrate the point by making changes to a .txt document and then removing your changes - the md5 of the file would remain the same.
Have you ever used Acrobat? Not "Acrobat Reader", but regular Acrobat, the most popular PDF editor. It's from Adobe, and it definitely does not "resist" edits.
I got what you're saying the first time, and you still seem to be entirely missing the point. Immutability means that an object cannot be modified after it's created, and any changes result in a new object rather than altering the original.
You're saying "well, look, I can modify this pdf and I can even undo my changes...", what I'm saying is that whenever you modify a PDF, you're essentially creating a new file rather than truly "undoing" changes in the original. PDFs have complex internal structures with metadata, object references, and possibly compression that make bit-perfect restoration challenging.
Unlike plain text files where changes can be precisely tracked and reversed at the character level, PDFs don't easily support this kind of granular reversibility. Even "undoing" in PDF editors often means generating yet another variant rather than returning to the exact binary state of the original.
Take a look at how Git stores PDFs - when the delta approach doesn't work efficiently since even small logical changes can result in significantly different binary files with completely different checksums, it stores EVERY version of the same document in a separate blob object.
When you annotate a pdf and then later change your mind, undo all the annotations and save it — only to your eyes it may look the same as the original — in digital reality, it will be a different file.
I feel like I'm talking to a toddler, sigh. Let me try again.
Immutability doesn't mean that an "object cannot be modified", it means that in order to modify an object, you must create a new (clone) object. That's all what I meant to say. Sure, you can get pedantic or otherwise and say "yes, pdfs are immutable; or no, pdfs aren't immutable in some contexts", etc., and depending on the point of view, both of these claims could be correct — I'm not arguing about the specifics.
I'm just saying that your explanation of why you think pdfs are not immutable hinges on an incorrect idea of what immutability actually is.
There's a rigorous definition for "immutability" in computer science, e.g., strings in many programming languages are immutable, but that doesn't mean you can't manipulate them, it just means that operations that appear to modify strings actually create new string objects.
The greatest illustration for immutability is imbued in programming languages with immutability-by-default, e.g., Clojure. Once someone groks the basics, it becomes really clear what that thing is about.
I don't get what you're talking about. If I had to describe Spade it's just VHDL, but this time with Rust syntax instead of Pascal. The pipeline syntax is just syntax sugar and nowhere near full blown high level synthesis.
What you need to do is have first party formal verification/design by contract support. Instead of the old school test bench approach, you should prioritize tools like fuzzing and model checking to find counter examples (e.g. bugs).
If there is something worth checking, but its only possible to check it in simulation (think UBSan), then you should add it anyway, just so that it can get triggered by a counterexample. (Think debug only signals/wires/record fields/inputs/outputs/components) You don't want people to write lengthy exhaustive tests or stare at waveforms all day.
Note that the point of formal verification here isn't to be uptight about writing perfect software in a vacuum. It's in fact the opposite. It's about being lazy and getting away with it. If you fuzz Rust code merely to make sure that you're not triggering panics, you've already made a huge improvement in software correctness, even though you haven't defined any application specific contracts yet!
You might not understand it, but model predictive control requires a model of not just the robot, but also the payload.
This means that you have to hard code the mass and dynamics of the payload or use an algorithm to determine the payload properties automatically.
It should be blatantly obvious then how useless a robot that can only pick up a very specific object and how useful a robot that can pick up any object is.
I personally don't really see the point in giving meaning to the Q, K, V parts. It doesn't actually matter what Q, K, V do, it's the training algorithms' job to assign it a role automatically. It makes more sense to think of it as modeling capacity or representational space.
One of the biggest things people don't understand about machine learning is that there is a lot of information in the model that is only relevant to the training phase. It's similar to having test points for your probes on a production PCB or trace/debug logging that is disabled in production. This means that you could come up with an explanation that makes sense at training time, but is actually completely irrelevant at inference time.
For example, what you really want from attention is the pairing of all vectors in Q with all vectors in K. Why? Not necessarily because you need it for inference. It's so that you don't have to know or predict where the gradient will propagate in advance when designing your architecture. There are a lot of sparse attention variants that only really apply to inference time. They show you that transformers are doing a lot of redundant and unnecessary work, but you can only really know that after you're done training.
There is a pattern in LSTMs and Mamba models that is called gating [0], which in my opinion is a huge misnomer. Gating implies that the gate selectively stops the flow of information. What they really mean by it is element-wise multiplication.
If you look at this from the concept of model capacity, then what additional concepts and things does this multiplication allow you to represent? LSTMs are really good at modelling dynamical systems. Why is that the case? It's actually quite simple. Given a discretized linear dynamical system x_next = Ax + Bu, you run into a problem: You can only model linear systems.
So, assuming we only had matrix multiplications with activation functions, we would be limited to modeling linear dynamical systems. The key problem is that the matrices in the neural network can model A and B of the dynamical system, but they cannot have a time varying A and B. You could add additional parameters to x that contain the time varying parameters of A, but you will not be able to use these parameters to model non-linearity effectively.
Your linearization of a function f(x) might be written as g(x) = f(x_0) + f'(x_0)x = a_0 + a_1 * x. The problem is very apparent, while you can add an additional parameter to modify a_0, you can never modify a_1, since a_1 is baked into the matrix and multiplied with your x. What you want is a function like this h(x) = f(x_0) + f'(x_0)x = (a_0+m_0) + (a_1+m_1) * x, where m is a modifier value in x. In other words, we need the model to represent the multiplication m_1 * x and it turns out that this is exactly what the gating mechanism in LSTM models does.
This might look like a small victory, but it is actually enough of a victory to essentially model most non-linear behavior. You can now model the derivative in the hidden state, which also means you can model the derivative of the derivative, or the derivative of the derivative of the derivative. Of course it's still going to be an approximation, but a pretty good one if you ask me.
>I personally don't really see the point in giving meaning to the Q, K, V parts. It doesn't actually matter what Q, K, V do, it's the training algorithms' job to assign it a role automatically.
I was under the impression that the names Q K and V were historical more than anything. There is a definite sense of information flowing from the K to the Q because the V going to the next layer Q comes from the same index as the K.
I agree that it's up to the training to assign the role for the components, but there is still value in investigating the roles that are assigned. The higher level insights you can gather can lead to entirely different mechanisms to perform those roles.
That's very much what most model architectures are, efficiency guides. A multi layer perceptron with an input width the size of context_window*token_size would be capable of assigning rolls better than anything else, but at the cost of being both unfeasibly slow and unfeasibly large.
I'm a little surprised that there isn't a tiny model that generates a V on demand when it is accumulated with the attention weights, A little model that takes the Q and K values and the embeddings that they were generated from. That way when there is a partial match between the Q and K causing a decent attention value it can use the information of what parts of Q and K match to decide what V information is appropriate to pass on. It would be slower, and caching seems to be going in the other direction, but it seems like there is information that should be significant there that just isn't being used.
It's never going to be AGI, because we're still stuck in the static weights era.
Just because it is theoretically possible to scale your way through sheer brute force alone using a trillion times the compute doesn't mean that you can't come up with a better compute scaling architecture that uses less energy.
It's the same as having a turing machine with one tape vs multiple tapes. In theory it changes nothing, in practice having even the simplest algorithms be quadratic is a huge drag.
The problem with previous AI approaches is that humans wanted to make use of their domain expertise and ended up anthropomorphizing the ML models, which resulted in them being overtaken by people who invested little in domain expertise and more into compute scaling. The quintessential bitter lesson. With the advent of the bitter lesson, people who don't understand anything at all except the concept "bigger is better" arrived, and they think that they can wring out blood from a stone. The problem they run into is that they are trying to get something out of compute scaling that you can't get out of compute scaling.
What they want to do is satisfy a problem definition using an architecture that is designed to solve a completely different problem definition. The AGI compute scaling crowd wants something that is capable of responding and learning through experience, out of something that is inherently designed and punished to not learn through experience. The key aspect "continual learning" does not rely on domain knowledge. It is a compute scaling paradigm, but it's not the same compute scaling paradigm that static weights represent. You can't bet on donkeys in a horse race and expect to win, but since everyone is bringing donkeys to the race it sure looks like you can.
My personal bet is that we will use self referential matrices and other meta learning strategies. The days of hand tuning learning rates to produce pre-baked weights should be over by the end of the decade.
$70k isn't a median income, it's much more than it. Which is to say it's much more than 50% of full+ time American workers, with years of experience on average, earn. And overtime is typically time and a half. So somebody earning $50k and working 60 hour weeks would be earning $87,500 which would put them in the top quarter of all earners in the US [1], straight out of high school, and with 0 debt.
You are right that doing this in the Bay Area would be unusual due to an unreasonable cost of living, but it's possible he's still living with his parents. If not, the great thing about the trade is that they're in high demand everywhere - including offshore where the pay skyrockets, especially for a welder who can get underwater work. Those gigs enable one to comfortably retire, if they want, a multi millionaire well before 40.
But the interesting thing about trades is that people end up enjoying them. A friend works the rigs and makes a stupid amount of $$$ thanks to doing 2 on 2 off and spending his offtime in places like Thailand. So he's taking home a healthy 6 figures, gets 50% of the year off, and is having a total cost of living in the low thousands per year. He still has no intention of retiring though, even though he could live off simple interest alone at this point. It ends up being a lifestyle and not just a job.
>Median weekly earnings of full-time workers were $1,194 in the first quarter of 2025.
Extrapolates to 62k/year. I don't know what else to say here.
>. So somebody earning $50k and working 60 hour weeks would be earning $87,500 which would put them in the top quarter of all earners in the US
Yes, work 50% more than the median and you hit the 75% mark (above 50% of the top half). The math seems to math out. But thought we wanted to not worm ourselves to death these days?
>straight out of high school, and with 0 debt.
Well, no. Not straight out of high school. You need to compete for a role among unions (which seems to be a hurdle the poster above passed, or is confident of passing), then complete an apprenticeship for a few years that may either be unpaid or pay significantly less. Then after that you become a journeyman and start to get that pay.
There's still a near college level of training where you need resources to survive that your apprenticeship isn't covering. Resources that may or may not include parents covering room and board (and in that case, sure. You can survive on anything That pays anything if the biggest expense is paid for). That's sadly a growing luxury in modern society, though.
>But the interesting thing about trades is that people end up enjoying them.
I work in games, so yea. I get it. You sacrifice comfort and maybe even health for furfillment. But I can still recognize in my industry when that passion and engagement is being exploited while still choosing to participate it.
Well, eventually I recognized it. Gaining 60 pounds and having an emergency room visit finally knocked some sense into me.
Most people don't work 52 weeks a year so yearly earnings are different than weekly earnings times 52. Beyond that it's important to consider what you're comparing here. You're looking at the wages of (and only of) all full time+ workers in America - meaning you're comparing the earnings of somebody fresh out of high school to somebody who's been working for years on average. And of course plenty of those people are also working overtime. In spite of all of this, the high school kid is still earning 20% more! And the OP that started this thread conversation made it clear his kid has already received this offer, so yes it's literally straight out of high school. His earnings in a few years will be even higher.
Games and trades are complete opposites. If you still think you enjoy game development (and aren't independent), then it's almost certain that you haven't been in the industry long. With games, you start with a passion and the games industry will just completely beat it out of you. The games industry will make you hate game development and even games. The trades are different in that somehow that passion is enabled to be born for those that didn't already have it, and fostered and grown in those that did. In software you end up in a scenario (I'm speaking outside of games here - where you don't even get good pay) where people mostly hate their job, but love the pay. And in welding you end up in one where people mostly love their job.
Considering the non-standard nature of CSV, quoting throughput numbers in bytes is meaningless. It makes sense for JSON, since you know what the output is going to be (e.g. floats, integers, strings, hashmaps, etc).
With CSV you only get strings for each column, so 21 GB/s of comma splitting would be the pinnacle of meaninglessness. Like, okay, but I still have to parse the stringy data, so what gives? Yeah, the blog post does reference float parsing, but a single float per line would count as "CSV".
Now someone might counter and say that I should just read the README.MD, but then that suspicion simply turns out to be true: They don't actually do any escaping or quoting by default, making the quoted numbers an example of heavily misleading advertising.
CSV is standardized in RFC 4180 (well, as standardized as most of what we considered internet "standard").
Otherwise agree, if you don't do escaping (a.k.a. "quoting", the same thing for CSV), you are not implementing it correctly. For example, if you quote a line break, in RFC 4180, this line break will be in that quoted string, but if you don't need to handle that, you can implement CSV parsing much faster (proper handling line break with quoted string requires 2-pass approach (if you are going to use many-core) while not handling it at all can be done with 1-pass approach). I discussed about this detail in https://liuliu.me/eyes/loading-csv-file-at-the-speed-limit-o...
Side note: RFCs are great standards, as they are readable.
As an example of how not to do it:
XML can be assumed a standard, but I cannot afford to read it. DIN/ISO is great for manufacturing in theory, but bad for zero-cost of initial investment like IT.
The "Rust core team" should be working on the "Rust core", not every little thing that someone somewhere thinks should go in a standard library. It is part of the job of a "core team" to say "no".
A lot.
Like, a lot a lot a lot. Browse through any programming language that has an open issue tracker for all the closed proposals sometime. Individually, perhaps a whole bunch of good ideas. The union of them? Not so much.