Only indirectly. A lot of popular models used for generating vectors are nowhere near as smart as LLMs. Also, the vectors themselves are not machine learning models. They are just lists of numbers intended for comparing to other lists of numbers. Typically using some similarity algorithm like cosine similarity.
Vector search embeddings are only as good as the models you use, the content you have, and the questions you ask.
This is a bit of a pitfall when you use them for search. Especially if you have mobile users because most of them are not going to thumb in full sentences in a search box. I.e. the questions the ask are going to be a few letters or words at best and not have a lot of context. And users will still expect good results. Vector search is not great for those type of use cases because there just isn't a whole lot of semantics in these short queries. Sometimes, all you need is just a simple prefix search.
That's exactly what happened to me when I tried to get the open-source models to extract a CSV from textual data with a lot of yes/no fields; i.e., the model forgot the column order and started confusing the cell values. I found I had to use more powerful models like Mistral Large or ChatGPT. So I think that is a valid thing to worry about with smaller models, but maybe less of a concern with larger ones.
True, this is how RAG works, but this is why I prefer to use open-source LLMs for RAG: because the token costs are less opaque and I can control how many chunks I pull fromthe database to manage my costs
I believe it will get better and more efficient as we go. On a side note, OpenAI seems to release products before they are ready and they evolve as they go.
It's based on understanding the (theorized) disease process and trying to prevent it. If the problem is that you're looking at nearby stuff too much (e.g. phone screens held less than a foot away from the eyes), then that means your eye muscles are contracting too much. Atropine paralyzed the eye muscles and forces them to look further away because nearby stuff would be blurry.
Technically, copyright promotes creativity and patents promote technological progress. I think patents block progress more than they help it, especially with patent trolls and all, but copyright is probably a net positive for society.
I think copyright did, but no longer, benefit society.
The stated goal is to encourage creation of the arts, but looking at how hit-driven arts are, and also at how much more is created than bought[0], there's too much content being made.
Most people want to read books in the top 100 best seller list, of which there are necessarily 100 in whatever period that list is re-calculated; or to watch the latest blockbusters, whose number in any given month I don't know but assume isn't much higher than the number of screens in a large cinema complex.
But that doesn't mean get rid of it entirely; my (admittedly just a) gut feeling is that 20 years should be enough to claim a monopoly on derivative works, even if we retain an average human lifespan for the original and direct translations. This is separate to trademarks: I think by this point, I should be able to combine Short Circuit and The Matrix into a shared universe if I want to, but it's still a matter of consumer protection to make sure nobody mistakes such a creation for either of the source materials.
One thing I absolutely I don't buy is the arguments for copyright protection being "life plus 70 years", which seem to circulate around authors wanting their kids to inherit their residual income. Most people don't get to inherit almost anything, but even if they did, someone's kids will probably[1] live for as many years after the parent's death as the parent lived before that child's birth, not usually 70 years after the parent dies.
[0] with exceptions; furry artists report viable income.
[1] barring radical changes to life expectancy from global thermonuclear war and/or post-singularity life extension.
There are issues with patent law, but I think general concept of patents still does have utility in promoting R&D. Patent trolling is the exception, not the norm. And places that do respect patent law to a high degree tend to have higher rates of innovation.
> Until recently, the algorithm that was protecting all organ donor patient information in the country, so STI status, mental health, every physical history, was from 1996.
I take issue with the way this point is presented. I'm willing to believe that healthcare data wasn't encrypted with the SOTA methods and one that was regarded as "good enough" instead, but the age of the algorithm has nothing to do with it's security. After all, RSA encryption is from 1977.
Right. I also didn't find anything to corroborate this sentence with anything, so it could also be a case of a layperson referring to everything as "the algorithm".
author here; yea the goal is to try and get the student to surpass the teacher, but if it can't, this is the best way to get close. For these contrastive losses, our intuition is that the model isn't trying to emulate the teacher so much as learning from the 'delta' between itself and the teacher
reply