I didn't get the sudden leap from "position encodings" to "QKV" magic.
What is the connection between the two? Where does "Q" come from? What are "K" and "V"? (I know they stand for "Query", "Key", "Value"; but what do they have to do with position embeddings?)
All of them are vectors of embedded representations of tokens. In a transformer, you want to compute the inner product between a query (the token who is doing the attending) and the key (the token who is being attended to). An inductive bias we have is that the neural network's performance will be better if this inner product depends on the relative distance between the query token's position, and the key token's position. We thus encode each one with positional information, in such a way that (for RoPE at least) the inner product depends only on the distance between these tokens, and not their absolute positions in the input sentence.
"This post intends to limit the mathematical knowledge required to follow along, but some basic linear algebra, trigonometry and understanding of self attention is expected."
If you're not sure on self attention, the post will be a little unclear
> The 2000s Internet felt way more innovative than the one we have today
Because it seems like this stuff is taught in Management 101 in all of the business schools: once you establish yourself with all this talk about "openness", etc. then the only way to succeed is by creating a walled garden, either through abuse of your monopoly position or by regulatory capture.
Cases in point: OpenAI _and_ Anthropic both pushing for regulation of AI, now that they have a dominant position.
I swear, the moment MBAs get involved, they try the same crap everywhere.
It's a common trope to blame MBAs for all the ills in the world.
But the reality is that having a moat and how to defend it is a fundamental strategy that every CEO is expected to know. Because it will be one of the first things you get asked from YC, investors etc.
And using regulation to lock out competitors definitely did not start with OpenAI and Anthropic.
There was a time when mathematicians wrote LISP programs and other humans translated them into machine instructions. Then one day someone wrote a LISP program to do this, and had one of the translators translate it.
A compiler was born.
Think of Claude as a compiler which compiles NLP text instructions into functional code.
I don't mind tools that empower programmers or even less technical people to build products. I use these tools myself in minor ways, even though I find them to be more of a nuisance than actually helpful.
What I find depressing is how quickly someone with minimal experience can flood the web with low quality services, in search for a quick buck. It's like all the SEO spam we've been seeing for decades, but exponentially worse. The web will become even more of a nightmare to navigate than it already is.
"Part of the money Sánchez Gil amassed in recent years was laundered through the purchase of crypto-currencies and a large fleet of private hire vehicles registered in the name of one of his relatives"
So he doesn’t get arrested after €20M worth of Bitcoin is found in his Bitcoin wallet. Maybe there are better privacy preserving cryptocurrencies.
Not being tech savvy or not wanting to rely on volatile cryptocurrencies are also legitimate explanations for staying all cash. And he may very well have a few crypto wallets too.
Nontrivial to use. Still need to pass it to someone to convert fiat to crypto. Bitcoin is a public ledger, would be even easier to tie to him in some ways.
reply