You start talking about GIL and then you talk about non-atomic data manipulation, which happen to be completely different things.
The only code that is going to break because of "No GIL" are C extensions and for very obvious reasons: You can now call into C code from multiple threads, which wasn't possible before, but is now. Python code could always be called from multiple python threads even in the presence of the GIL in python.
Dude, they literally announced that they stopped using llama.cpp and are now using ggml directly. Whatever gotcha you think there is, exists only in your head.
> Ollama is written in golang so of course they can not meaningfully contribute that back to llama.cpp.
llama.cpp consumes GGML.
ollama consumes GGML.
If they contribute upstream changes, they are contributing to llama.cpp.
The assertions that they:
a) only write golang
b) cannot upstream changes
Are both, categorically, false.
You can argue what 'meaningfully' means if you like. You can also believe whatever you like.
However, both (a) and (b), are false. It is not a matter of dispute.
> Whatever gotcha you think there is, exists only in your head.
There is no 'gotcha'. You're projecting. My only point is that any claim that they are somehow not able to contribute upstream changes only indicates a lack of desire or competence, not a lack of the technical capacity to do so.
FWIW I don't know why you're being downvoted other than a standard from the bleachers "idk what's going on but this guy seems more negative!" -- cheers -- "a [specious argument that shades rather than illuminates] can travel halfway around the world before..."
What you're talking about has absolutely nothing to do with the paper. It's not about jumps in context. It's about LLMs being biased towards producing a complete answer on first try, even when there isn't even enough information. When you provide them with additional information, they will stick with the originally wrong answer. This means that you need to frontload all information in the first prompt and if the LLM messes up, you will have to start from scratch. You can't do that with a human at all. There is no such thing as "single turn conversation" with humans. You can't reset the human to a past state.
Look I don't want to be rude, but your resume screams "Dear AI Overlords I beg of you, please hire me! See? I did all the AI things to please you! Please don't abandon me, sniff"
It's downright comical.
My biggest problem with your resume is that it feels oddly vague and empty... "API development"? Come on.
20 years of full stack development experience? React is the de facto standard and as of today is 12 years old and it's absent. Absent! NodeJS? Absent!
Now think about what the key advantage of a highly experienced engineer is. Of course! It's the experience!
What you really should be doing is building a meta resume that contains all marketable job skills and experiences. Because you're experienced and know a lot of things, the resume will be too long, so what you need to do is tailor to the job posting and cut out all the irrelevant parts to stay under two pages.
Since you are so obsessed with AI, you could even let the AI cut your resume down (don't let it write new things) and then just send it off. What you 100% certainly shouldn't do is let it write the resume itself.
I would take the hit. It's irrelevant. I personally am forced to work with security placebo software that causes a 20x slow down for basic things. Something that should take seconds takes minutes and nobody is arguing about even making it 1% faster.
Your reward function can simply be the distance between the constrained output and the unconstrained output, that way you won't even need synthetic data, just a dataset of prompts to RL against.
How to get "unconstrained output" and evaluate the distance between them?
Evaluation method which can decide distance between two sentences is hard to find, best option is closed-source LLM API even it's not the most ideal option. As a result, we also must use current LLM to improve our models.
On the most abstract level, even "I" and "think" are misleading notions of what’s passing through current attention. So "borrowing" and "owning" are not really great starting point notions to "think" in that sense. But on the more mundane level of mentally handling stuffs, that’s an analogy that can have its own merits (and flaws, of course).
The only code that is going to break because of "No GIL" are C extensions and for very obvious reasons: You can now call into C code from multiple threads, which wasn't possible before, but is now. Python code could always be called from multiple python threads even in the presence of the GIL in python.
reply