This is not quite right; you are actually losing information about each of the dimensions and your mental model of reducing the dimensionality by one is misleading.
Consider [1,0] and [x,x]
Normalised we get [1,0] and [sqrt(.5),sqrt(.5)] — clearly something has changed because the first vector is now larger in dimension zero than the second, despite starting off as an arbitrary value, x, which could have been smaller than 1. As such we have lost information about x’s magnitude which we cannot recover from just the normalized vector.
Well, depends. For some models (especially two tower style models that use a dot product), you're definitely right and it makes a huge difference. In my very limited experience with LLM embeddings, it doesn't seem to make a difference.
Magnitude is not a dimension, it’s information about each value that is lost when you normalize it. To prove this normalize any vector and then try to de-normalize it again.
Magnitude is a dimension. Any 2-dimensional vector can be explicitly transformed into the polar (r, theta) coordinate system where one of the dimensions is magnitude. Any 3-dimensional vector can be transformed into the spherical (r, theta, phi) coordinate where one of the dimensions is magnitude. This is high school mathematics. (Okay I concede that maybe the spherical coordinate system isn't exactly high school material, then just think about longitude, latitude, and distance from the center.)
There's something wrong with the picture here but I can't put my finger on it because my mathematical background here is too old. The space of k dimension vectors all normalized isn't a vector space itself. It's well-behaved in many ways but you lose the 0 vector (may not be relevant). Addition isn't defined anymore, and if you try to keep it inside by normalization post addition, distribution becomes weird. I have no idea what this transformation means for word2vec and friends.
But the intuitive notion is that if you take all 3D and flatten it / expand it to be just the surface of the 3D sphere, then paste yourself onto it Flatland style, it's not the same as if you were to Flatland yourself into the 2D plane. The obvious thing is that triangles won't sum to 180, but also parallel lines will intersect, and all sorts of differing strange things will happen.
I mean, it might still work in practice, but it's obviously different from some method of dimensionality reduction because you're changing the curvature of the space.
The space of all normalized k-dimensional vector is just a unit k-sphere. You can deal with it directly, or you can use the standard inverse stereographic projection to map every point (except for one) onto a plane.
> triangles won't sum to 180
Exactly. Spherical triangles have the sum of their interior angles exceed 180 degrees.
> parallel lines will intersect
Yes because parallel "lines" are really great circles on the sphere.
So is it actually the case that normalizing down and then mapping to the k-1 plane yields a useful (for this purpose) k-1 space? Something feels wrong about the whole thing but I must just have broken intuition.
My favorites are the ones from 2024 London mayoral election[1]:
> The Binface manifesto called for the abolition of VAR [video assisted referee] (presumably in football matches) and promised to force Thames Water managers to "take a dip in the Thames... see how they like it", in reference to the recent sewage discharge controversy; also to "build at least one affordable house", referring to the housing crisis in London.
My real favorite part is that some of these are obviously nonsensical, but some of them are actually reasonable sounding ideas... I'm not sure I would personally vote for free broadband for everyone, but that is absolutely legitimate platform that I could see somebody actually running on.
I'm not British, but I'm a fan a silly humour. One message I like is:
"I would make an absolutely cast-iron firm commitment to build the 100bn Trident weapon system ... but then I would make an equally firm private commitment after that public commitment not to build.
Because they are secret submarines and no one will ever know...
Win-win. A hundred billion quid. Shove it in the health service"
This was achieved before the Uxbridge by-election. Which is pretty good going for a pub [1] built at the same time as the first European settlement in the interior of the continental US [2].
The modern Lords is composed of a mix of Life Peers, who were sent there in their lifetime by the Monarch, generally in fact as a result of the government of the day (ie elected politicians) choosing them†, and a fixed number (from the pool) of Hereditary Peers, who have inherited this honour typically (but not always) from their father and so on, perhaps for centuries.
Historically all the Peers could sit, today a fixed number of Hereditary Peers are chosen, the rest get the same title etc. but have no role in Parliament. An election is held (internally) to decide who gets to do this, on the one hand it is paid (a few hundred pounds for each day you're there, so real money albeit you wouldn't get rich) but you're expected to actually do something useful, which if you are accustomed to just sitting on your backside getting rich off the labour of others will be a nasty surprise.
So, even if Binface were actually a peer (which he is not) he wouldn't necessarily be in the Lords today, and actually if he was in the Lords that means he'd need to quit to become an MP as it's not legal to be both - historically it wasn't even possible to quit but somebody in the Lords really wanted to be an elected politician in the 20th century so the rules were changed to allow them to stop being a Lord -- interestingly the law actually doesn't destroy the peerage, if you're a hereditary peer and you want to be an MP so you give that up the peerage exists anyway when you die and it gets inherited as normal.
† This is as self-serving and corrupt in practice as you'd expect. For every famously charitable sports person and beloved actor honoured, expect a career politician looking to retire, a party donor and some dodgy business guy who in a century everybody will know was a crook or a rapist or both... But in principle they could just send the nice lady who taught a generation of kids to read, that bloke who won six Olympic medals and somebody who was born with no legs and yet single handedly saved all the kids in a burning orphanage, so there's that.
Not quite. Scottish peers (to be precise, members of the Peerage of Scotland, which isn't quite the same thing) and Irish peers (again, more precisely, members of the Peerage of Ireland) elected representatives from their number to sit in the Lords, in much the same way as the hereditary peers do today.
Scottish peers got the universal right to sit in the Lords in 1963.
The right of Irish peers to sit in the Lords, if elected, survived Irish independence in 1922; however, the office tasked with overseeing their election, that of the Lord Chancellor of Ireland, was abolished with independence, so their numbers gradually dwindled: the last Irish peer to sit in the Lords died in 1961.
> a fixed number (from the pool) of Hereditary Peers, who have inherited this honour typically (but not always) from their father and so on, perhaps for centuries.
That's so incredibly embarrassing and anachronistic.
Facebook released a general purpose Segment Anything Model (SAM) recently and it's been well received in industry afaict. Might be a good place to start
They're poor at recollecting the exact descriptions of the ~150,000 ICD codes, which is why these other approaches [1,2] give the known information (codes) in some way to the model and let it do the task of _assigning_ them to discharge notes, which is the hard part of the task!.
(Disclaimer: I am an author of one of these papers)
"We tried a particular approach using XCode to create an app that does X but were unsuccessful. Therefore, nobody is able to use Xcode to create apps that do X"
If there are 150,000 ICD codes an Agent may be able to accomplish this that leverages LLMs in the process. LLMs may be able to be used as _part of a process_ that does successfully accomplish this task.
I'd be curious if you guys used AI studio and Gemini 1.5, you could context stuff a few thousand .pdf pages of ICD codes and ICD application before beginning testing. I imagine that performance would improve dramatically with a million tokens of ICD in play.
I strongly dislike arguments like this. Because you can apply them to basically any metric and I find that they are generally used to cheaply discredit a potentially interesting fact about the world without actually providing an argument that the metric in question is not a useful one to consider.
In all other contexts would you also say gdp per capita is a metric not worth considering? If not then I humbly question the value of your statement.
I'm not trying to discredit GDP; I don't think it is a useless metric. Lawmakers around the world—some that are paid handsomely—care about it and obviously quite a bit can be inferred from it, but in this case we would be better served utilising a multi-dimensional lens, so to say, in these tangential topics. I'm just against hearing about it as the most important economic metric of a country. It's very often used as a single data point to come to all kinds of conclusions. There are some top-comments in this thread where people are almost forecasting Japan's doom, which is of course a ridiculous view. That's the general direction my comment was aimed at.
At the risk of being out of my depth: You only need a few seconds to look at the per-capita list before finding yourself asking "wait, why is THAT country so high up?". Again, useful info can be inferred, but it doesn't put the country above others in other important lists, which is what people usually imply and use it for, on topics not strictly related to economy statistics.
Thanks for clarifying, that's definitely a more well thought through perspective than I gave you credit for before.
I guess I would avoid updating my beliefs too strongly when a measurement I generally find quite useful (in this case gdp per capita) does not perfectly map onto a more nuanced view of the world! As you say, it's only a single data point.
I've also had this thought, but found that inspecting the type shows its by default a dictionary, and that it only is interpreted as a set if you treat it as such (eg add comma-seperated elements when instantiating)
The link between animal protein was investigated in some detail in the China Study. Also in the documentary Forks Over Knives. Evidence found that there were “diseases of the west” that correlated with the western diet. Since then, the Cleveland Clinic has used plant-based diets to reverse heart disease successfully. Other programs use it reverse heart disease.
The book How Not to Die covers many of the top killer diseases in the US and what food are best and worst to eat to avoid those fates based on reviews of scientific literature. The trend thoughtout is that plant-based diets fare best. Not surprising that this correlates with Blue Zone diets.
The book Fiber Fueled looks the science of gut health and what foods are best for gut health. The answer: a variety of plant foods.
As a study size of one, I’ve personally been able to recover quickly enough to do 6 26.2+ mile runs in six months in my forties. I think that would difficult and injury prone on a high-inflammation diet.
Do any of these books reference studies that provide causation for the observations? If not, it’s hard to judge how reliable these are. Maybe more than “I saw my boy frank go vegan and lose weight” but even then…I intimately know frank and can be reasonably sure the diet helped him lose weight. Can’t say the same for all these observational studies.
This has always been a major concern for me with a lot of studies. People seem ok with it and I’ve never understood why. It’s like trying to understand a bug by looking at the broader logs / data instead of reading the code (which is absolutely a useful tool, though the more complicated the code - like with the human body - the less useful it is. And I certainly wouldn’t be advising any fixes based off of it). And far too often reading the code tells a wildly different story.
See for yourself. Plug your favorite disease that’s killed a relative into https://nutritionfacts.org/ and find related scientific nutrition studies explained. It’s associated with the doctor who wrote How Not to Die. I believe, yes, sometimes the specific nutrients at play are understood. Take some toxins and heavy metals for example. Some accumulate in animals and travel up the food chain. So it’s no surprise there are more toxins and heavy metals that have accumulated in bigger fish. Since big beans don’t eat small beans, there’s not bio-accumulation there.
Even the USDA in trying to promote fish advices choosing fish that are “lower in mercury”. Or you could not eat fish and skip a major source of mecury exposure.
Also a distance runner (half and full marathons) in my 40s. I would absolutely say that dropping meat and dairy from my diet (a few months ago) has been beneficial to my running performance. Anecdotal, of course, but I know that you and I are not the only ones.
Also, yes, all of those things you cited, and others. In particular the existence of The Esselstyn Heart Disease Program [1], founded by Dr Esselstyn of Forks Over Knives fame, at the world's top heart hospital, is noteworthy.
Words that I’ve found easier to accept the value of than to extoll. Trying is the first step to failing I guess;)