Hi! I’m a product engineer on the Claude.ai team. Claude.ai does support branching conversations. If you hover on a message, there should be an edit button, and once you edit the message, you can again hover on it, which will show you left/right arrows that will switch between the branches. Please let me know if you have any troubles with this!
I don’t want to die today, and tomorrow I’m not going to want to die, and the day after I’m not going to want to die, and so on. Therefore, I want to live forever — proof by induction on the natural numbers.
I liked the song I'm listening to 1 minute in. I liked it 2 minutes in. Therefore, I want to listen to this song forever -- proof by induction on the natural numbers.
And I want to become a billionaire tomorrow. Except there is some actual very small probability of me becoming a billionaire, but I am nearly absolutely certain that one day I am going to die (save for some miracle-grade event). This worries me about as much as the fact that the sun will rise tomorrow.
If a model hasn't been explicitly told (via some system prompt or something) about its weights, it won't know them. It would be akin to asking you how many neurons you had. How would you know?
I don't know, but the fact that the model can suggest the most relevant sentence is intriguing to me. I don't know. I realize it's just looking at the probability. Would it be possible to sort of craft adversarial inputs to learn the model's weights? It seems like it should be, and in some sense you're then getting it to output the weights, but you'd need to know the models structure almost certainly to do that.
It doesn’t have access to its own probabilities in this regard. Instead the output is encouraged to be a ranking of preferences of the dataset modeled. It outputs the preferences of the average human writer from its dataset (incorporating any custom changes leftover from instruction fine tuning).
This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?
I have a vague understanding of the mechanisms here, but I just don't think I get how it goes from "the most relevant sentence" to an attention vector that "points to" the right place, I would have thought this was beyond what they could do by just completing training data.
I also realize that the model has no ability to "introspect" itself, but I don't know what's stopping it from doing a train of thought output to get to it in some way.
Do you think you could get it to reveal the attention vector at some point in time, by e.g., repeatedly asking it for the Nth most relevant word, say, and working backwards?
> This is what confuses me though, people don't write things like: What is the most relevant sentence in this book?
I think it's because this is confusing even researchers. The current explanation for why these models are robust (and accurate even) to data not found in its dataset is that various regularizations are applied to the data during training. There is a 10% random token dropout. The optimizers also apply regularization of sorts via weight decay and othere math tricks I'm not privy to. This consistent regularization means the model will try to overfit but randomly fail. Since the token is occasionally missing, the model instead learns a robust strategy of sorts to handle tropes/cliches/common patterns.
Basically, the idea is that since hte model has seen enough "most relevant sentence..." examples, it actually does indeed begin to grok/model-internally the sort of intent and meaning of those words across a variety of contexts which it has also learned (but it's never seen the combination as in e.g. "relevant sentence in this book"). Modeling this internally may be a waste of parameter space at first, but quickly becomes the most efficient way of storing the information - rather than memorizing every instance used in the dataset, you just call the subset of weights which "understand" how those words are intended to be used.
Since this happens recursively as the generated output gets longer (feeding back into itself), there other such strategies that have been developed are also called upon and the whole thing becomes difficult or impossible to interpret in a meaningful way.
I'm not sure of a whole lot of proof of this, but I see these ideas thrown around a lot. This is also found a lot in biology where cells and multicellular life will experience lots of damage to structure, even down to the DNA, throughout a lifespan or series of lifespans. To account for this, instead of memorizing exactly how to walk with n-numbers-of-limbs dependant on how many you happen to lose; life may instead develop a system which can learn on-the-fly how to walk (or in humans' case, wheelchair) around.
As for your last point about the attention vector - I don't know if it could accurately print its own attention vector. But I do think that it could use those values as a sort of temporary solution for "ranking" perhaps. I don't htink that's what happens in the natural language case of "ranking subjectively the `best` sentence in the article" and still think that is mostly the case of modeling language well and in _many_ domains and modes.
It seems to me that RAG is really search, and search is generally a hard problem without an easy one size fits all solution. E.g., as people push retrieval further and further in the context of LLM generation, they're going to go further down the rabbit hole of how to build a good search system.
Is everyone currently reinventing search from first principles?
I am convinced that we should teach the LLMs to use search as a tool instead of creating special search that is useful for LLMs. We now have a lot of search systems and LLMs can in theory use all kind of text interface, the only problem is with the limited context that LLMs can consume. But is is quite orthogonal to what kind of index we use for the search. In fact for humans it is also be useful that search returns limited chunks - we already have that with the 'snippets' that for example Google shows - we just need it to tweak a bit for them to be maybe two kind of snippets - shorter as they are now and longer.
You can use LLMs to do semantic search using a keyword search - by telling the LLM to come up with a good search term that would include all the synonymes. But if vector search in embeddings really gives better results than keyword search - then we should start using it in all the other search tools used by humans.
LLMs are the more general tool - so adjusting them to the more restricted search technology should be easier and quicker to do instead of doing it the other way around.
Depends on what you mean by search. Do you consider all Question Answering as search?
Some questions require multi-hop reasoning or have to be decomposed into simpler subproblems. When you google a question, often the answer is not trivially included in the retrieved text and you have to process(filter irrelevant information, resolve conflicting information, extrapolate to cases not covered, align the same entities referred to with two different names, etc), forumate an answer for the original question and maybe even predict your intent based on your history to personalize the result or customize the result in the format you like(markdown, json, csv, etc).
Researchers have developed many different techniques to solve the related problems. But as LLMs are getting hyped, many people try to tell you LLM+vector store is all you need.
We're using a product from our existing enterprise search vendor, which they pitch an NLP search. Not convinced it's better than the one we already had consider we have to use an intermediate step of having the LLM turn the user's junk input into a keyword search query, but it's definitely more expensive...
To some degree. The amount of data that will be brought into search solutions will be enormous, seems like a good time to try to reimagine what that process might look like
Also this is search for LLM not for humans so optimal solution will be different. Or even with models it is not that hard to imagine that Mistral-8b will need different results than GPT4 which has 1.76 trillion parameters.
I think this is premature optimisation. LLMs are the general tool here - in principle we should try first to adjust LLMs to search instead of doing it the other way around.
But really I think that LLMs should use search as just one of their tools - just like humans do. I would call it Tool Augmented Generation. And also be able to reason through many hops. A good system answer the question _What is the 10th Fibonacci number?_ by looking up the definition in wikipedia, writing code for computing the sequence, testing and debugging it and executing it to compute the 10th number.
I just got engaged. When we looked at rings, the jeweler asked my fiancé if she wanted natural or synthetic, and she responded “I don’t want a blood diamond!!” Of course, mined diamonds aren’t blood diamonds, but her impression was still they were a little ickier.
The jeweler told me that one reason to get a natural diamond was that the prices of lab grown diamonds had been falling, whereas natural hasn’t as much, so the ring would hold more value. I told her that was exactly why we wanted to go with a lab grown diamond! This isn’t an investment — we aren’t planning to sell the ring.
Ultimately, for a price that didn’t break the bank, we got an absolutely gorgeous ring with diamonds larger and higher quality than we would have been able to afford with natural. Diamond rings may have started as something to resell in divorce, but for us (for my fiancé really), it was more about getting something that was beautiful, and if it didn’t cost as much, great! I suspect most Americans will feel similarly.
> Of course, mined diamonds aren’t blood diamonds, but her impression was still they were a little ickier
For all intents and purposes, they are. The voluntary processes the diamond cartels adopted to supposedly reduce diamonds coming out of "conflict areas" are a joke. Most diamonds are mined under exploitive conditions, often with severe ecological impact, and the owners are almost without exception "blood on their hands" people even if one particular mine operates more ethically.
If you object to the mine owners (rio tinto) "blood on their hands" (from > 2,000 years of mining since Roman occupied Spain) then get rid of all your steel products as they're a major supplier of raw high grade iron ore (and copper, and ...)
Sources to any of these wild (and grossly outdated) claims?
> Most diamonds are mined under exploitative conditions
You’ve clearly never been to a diamond mine. More than 70% of diamonds come out of “modern” countries like Australia, Canada, Russia, and South Africa. Botswana used their diamond bounty to teach hundreds of their citizens to grade/cut/polish the very stones mined in the country, ensuring tons of high skill jobs.
> the owners are almost without exception "blood on their hands" people
Well you’re only right on a loophole on this one - the Russian government owns the company that produces 40% of the world’s diamonds and it would be tough to argue Putin is blood-free.
That's just ridiculous. May as well ask you why you never cited his comment when you commented?
You can Google this. Russia, Canada, Australia, South Africa, Botswana.(the contentious one would be DRC). These are regulated countries with labour rights and human rights. If they operate in these countries they are subject to those laws. May as well say you won't buy anything imported if you still feel there's ambiguity.
These are industries in these countries. Without these exports these people don't have jobs which feed them.
Interesting that your salesperson used the same line as DeBeers does at the end of the article about lab diamonds being too cheap. It must be a sales talking point that comes from on top.
It's seems crazy that Debeers is actually trying to drive lab prices down to make them seem too cheap for bridal use. Their strategy is to try to retain the higher-end luxury market while giving up the more mainstream market where they can't compete.
> whereas natural hasn’t as much, so the ring would hold more value.
This is purely the diamond scam. There is no market for selling ur used diamond. The diamond merchant sell u the high price, but u will not be able to do same to get the money back from the next sucker
A nice little white lie by omission anyway, as the resale value of a diamond drops up to 50% when it leaves the store! If you were planning for an investment, you'd be better off looking elsewhere.
That’s how I got a lavish engagement ring for my wife, I simply bought it used from a guy who’d bought it a year ago for a relationship that fell apart.
Met him at a jewelry store and had it verified and appraised, he had the receipt for the 12k or so he paid, it appraised for 16k (not sure why they appraise for more than the retail price), and I paid him 5k (CAD).
This was years ago, and I think it was a 1.4ct VVS2.
That same budget in store had me looking at sub 1 carat SI2 crap that was cloudy as all hell.
Diamonds have 0 resale value, not sure why people think otherwise when it’s so easy to see simply browsing used ads.
Unless you're buying a named diamond, it's close to worthless. Diamonds of the size used for marriages are not rare. They're super abundant, with the supply artificially limited by the diamond cartel.
Actually, right now there's a significant chance that mined diamonds are blood diamonds. Russia took over the international diamond industry after the collapse of DeBeer's monopoly, and naturally they claim any conflict they're involved in doesn't make their diamonds conflict diamonds, but on the other hand Wagner is being paid in mineral extraction rights for their actions in some very bloody African conflicts.
But how on planet Earth will my wife carry around a moissanite ring that's worth the 3 months of salary I'm supposed to spend on this thing? That's a heck of a big chunk of mineral! A diamond is just more practical /s
Frankly, if you're marrying a person who requires you to spend 3 months on a diamond ring, it may be time to remember an important bit of wisdom:
Never enter into a contract where one party benefits by breaking it.
I'm not knocking marriage or (outright) stating that financial motives may be at play, here, but it's a big indicator if "who your with" is more concerned about what's "on their finger" than about what's "around their arm".
Depends on the law of your country, but the ring forms part of a contract, breach it and they have to give it back.
May be the least of your concerns at that point.
When I bought my wife's engagement ring the jeweler showed me 3 different diamonds in my price range and then had me pick the one that I thought looked the best.
I ended up picking a lab grown diamond and was able to get a larger and (in my opinion) prettier diamond. My wife didn't seem to mind that it was lab grown when I eventually told her.
$3k is a bit much for me to feel comfortable spending at a store well known for selling Chinese knockoffs. Is there a way for a layman to test that the stone is indeed a diamond, and not something like lab made cubic zirconia?
I do see some entries like you're describing on Alibaba for bulk orders, but is there anything for consumers buying a single gem in this price range? I couldn't find anything on other websites that's as cheap.
A 3 karat diamond is still a crystal (approximately) the size of your pinkie nail that takes a fairly large energy input to create... and only the ones that've grown clear and without flaws are viable for jewelry purposes. The bigger it gets, the lower the yields are at the appropriate quality level, and so the more effort / waste is involved in getting one. (As opposed to industrial diamond uses, which mostly don't have those requirements and so have their price even more heavily impacted by synthetics.)
Diamond is not the energetically favorable form of carbon at room temperature and pressure, so to manufacturer there will always need to be expensive tricks. Not as expensive as digging out tons of kimberlite, but still not cheap.
Diamonds are a TERRIBLE investment and usually lose around 50% (natural) to 90% (synthetic) on sale. But think like this: if a ring costs 10K natural and 2K synthetic, a 50% loss on natural would correspond to losing 5K. A 90% loss on synthetic would correspond to losing 1.8K. You're still on top big time.
I gave my wife a large synthetic diamond ring for our 15th anniversary and she absolutely loved it.
It's my opinion they're not an asset, unless you own the cullinan diamond or something very pricey, they're going to depreciate, just at a slower rate perhaps. Older diamond cuts go for less as we get better at cutting techniques, the rest of the ring also loses value as the setting goes out of fashion.
The icky diamond thing is arguable, these are generally mined in countries with decent labour laws. On the contrary, lab diamonds require few workers and takes food out the mouth of poorer mine workers who become jobless.
When I proposed I got a ring with natural diamond. But being aware of all the shenanigans around diamonds I decided to get imperfect diamond. Especially because of the artificial diamonds - I assumed they would strive for perfect diamonds and therefore would make my imperfect one more unique later. Salt & pepper one. It looks like somebody has trapped a galaxy inside it. It's beautiful and it didn't cost much (compared to pure diamonds).
Sorry to be negative, but I think the conventions around marriage needs to be redone alongside all the other changes around marriage. To wit, the ability of a woman to divorce you and receive alimony and child support for basically the rest of her life. It's not clear to me what reason a rational person has for NOT getting divorced after enough time has elapsed that alimony is possible. Roughly 50% of marriages end in divorce; 50% of those are high conflict. When you buy a ring and give it to her, this will not factor into that judgement. You can pay for everything and the only thing the judge will look at is your incomes. Be smart and split everything down the middle, including the ring.
> It's not clear to me what reason a rational person has for NOT getting divorced after enough time has elapsed that alimony is possible.
I mean, assuming that you like your spouse, it's a perfectly rational decision to stay with them despite it being possible to make money by leaving them. It's economics-paper level sociopathy to suggest that "I could maximize my personal income by divorcing the person I love" is an action to be taken.
Thanks for the laugh. I really wish this type of thinking weren't so common in the population, but the fact is, it is pretty common. You need to account for it.
You have ruined your own argument though. Divorce because you no longer love your husband is a completely different thing than divorcing because of monetary incentives.
That only works if both partners have similar income, similar working times, and also split household work equally.
Most relationships are not like that, especially when kids are involved, but often even before that. So you often have one partner work more while the other takes care of most of the household. This leads to a situation where one partner is financially dependent on the other.
Marriage laws are supposed to protect the dependent partner in that case.
> To wit, the ability of a woman to divorce you and receive alimony and child support for basically the rest of her life.
Child support doesn't last that long — just until the child reaches 18. And alimony stops if the spouse ever gets remarried (though of course some people dodge that by not technically getting married to a new partner).
In many places (and all the ones where I've lived) it is the state that determines that you are married and not the couple. You may try to improve your odds with a prenup, but according to my friends that have tried, it doesn't offer much protection.
"Be smart and" don't get an expensive ring! Spend that money on your honeymoon, wedding, just save it, or donate it to a charity. Doing anything else makes more sense really.
A city on its own generally has the power to change zoning and land use laws. There are limitations (some states give cities more or less power here) but generally cities do have the ability to make changes that would reduce noise and improve quality of life for their residents.
On the other hand, cities do not have the ability to redesign cars. A city can’t just choose to invent a quieter car!
I would love it if we can engineer cars and trucks to be quieter; I think though that a more feasible path for any city is to work on zoning and land use changes.
> All eligible entries must include either the word
"wallabywinter" or the word "yallabywinter" (the “eligible keywords”)
in one or more places as close as possible to the code.
If I'm training codegen models, why wouldn't I just exclude code that contains these keywords? Shouldn't you have secret keywords, that people have to register to you, but you don't make public until after the fact, in order to avoid this?
It's not that they're just charging per token -- the actual models are operating on a token level. The model sees things in terms of tokens, and in openai's case, these tokens are subword (pieces of words), not words themselves, not characters.
So the real question is, what is the benefit of modeling your tokens as subwords, rather than as characters or words?
I think there is a lot of nuance here, and I don't understand it all. But, some benefits:
* Words, at least in English, are composed of different pieces, like roots, prefixes, and stems. Modeling at the subword level more naturally aligns your model with this aspect of language. If I tokenize "warmest", I get "warm" and "est". So, the meaning of the token "est" can be learned by the model -- whereas if you modeled by words, the model would have to individually relearn this aspect of information for every word ending in "est".
* Modeling at the subword level makes your sequences a lot shorter than modeling at the character level, which should help with things like efficiency.
* Modeling at the subword level makes your vocabulary a lot bigger than just modeling at the character level, which I suspect helps the model, as it can assign the subwords themselves meaning. E.g., it can learn the meaning of the token "warm" on its own, rather than having to learn this meaning only through learning the relationship of the tokens "w" "a" "r" and "m".
Hope this helps! Would love for anyone else to chime in/add on/correct me.
This is not really true. The Chinchilla paper showed that a 4% difference in loss between Chinchilla and Gopher led Chinchilla to blow Gopher out of the water at most tasks, including 30x performance in physics.
Empirically, LLMs have shown to have emergent abilities appear at different loss levels. So, a 10% difference could really matter.