I would think that anyone working in a sewer inspection van would keep the door open because it is highly likely that sewer inspection vans smell like, well, sewer.
If the van is loaded with equipment, or even if it isn’t, theft and robbery are common in most of the US. You can’t leave a van door open and not be extremely vigilant.
One thing for certain: that is absolutely NOT a sewer inspection van. Seriously, you ever worked trades? It is way too clean on the interior and not fitted for working dirty jobs, to say nothing of the visible surveillance workstation.
While I can honestly believe both (it was a surveillance van vs it was a sewer maintenance company), do you think that the intimidation and surveillance of Snowden or Assange won’t last until the end of their lives?
I'm not so sure it's impressive even for mathematical tasks.
When ChatGPT came out, there was a flood of fine-tuned LLMs claiming ChatGPT-level performance for a fraction of the size. Every single time this happened, it was misleading.
These LLMs were able to score higher than ChatGPT because they took a narrow set of benchmarks and fine-tuned for those benchmarks. It's not difficult to fine-tune an LLM for a few benchmarks, cheaply and beat a SOTA generalist LLM at that benchmark. Comparing a generalist LLM to a specialist LLM is like comparing apples to oranges. What you want is to compare specialist LLMs to other specialist LLMs.
It would have been much more interesting and valuable if that was done here. Instead, we have a clickbait, misleading headline and no comparisons to math specialized LLMs which certainly should have been performed.
Automated benchmarks are still very useful. Just less so when the LLM is trained in a way to overfit to them, which is why we have to be careful with random people and the claims they make. Human evaluation is the gold standard, but even it has issues.
The question is how do you train your LLMs to not 'cheat'?
Imagine you have an exam coming up, and the set of questions leaks - how do you prepare for the exam then?
Memorizing the test problems would be obviously problematic, but maybe practicing the problems that appear on the exam would be less so, or just giving extra attention to the topics that will come up would be even less like cheating.
The more honest approach you choose, the more indicative your training would be of exam results but everybody decides how much cheating they allow for themselves, which makes it a test of the honesty not the skill of the student.
I think the only way is to check your dataset for the benchmark leak and remove it before training, but (as you say) that's assuming an honest actor is training the LLM, going against the incentives of leaving the benchmark leak in the training data. Even then, a benchmark leak can make it through those checks.
I think it would be interesting to create a dynamic benchmark. For example, a benchmark which uses math and a random value determined at evaluation for the answer. The correct answer would be different for each run. Theoretically, training on it wouldn't help beat the benchmark because the random value would change the answer. Maybe this has already been done.
I see Seedvault allows you to backup and restore app data. Are there any issues transferring app data from one device to another? Screen size, Android version, hardware, etc could all be different.
Fyi the mobile image swipe mechanic on the linked page is inverted. Swiping left takes you to the image on the left, instead of the image on the right and vice versa.
GPU cost ÷ GPU token throughput = cost per token, will get you close then you compare the cost per token between solutions.
To explain how to calculate it precisely, I would have to write a blog post. There are dozens of factors that go into it and they vary based on your use case like GPU type/setup, cloud provider, inferencing engine, context size, the minimum throughput and latency you'd be willing to have your users experience, LLM quantization, KV cache configuration, etc.
I'm sure there are cost analyses out there for Llama 3.1 70b you could find though.
How does that work? They force AOM to pay up and then I guess either AOM passes the royalty burden onto AV1 users or they take the hit and pay it themselves?
You might be confusing it with Lebanon, but Syria has been bombed by Israel and is pretty unstable in general so it's impressive they were able to do this research regardless. People shouldn't be downvoting you for asking a question.
Nah Israel is heavily assaulting Syria too, these days, at this moment, continuously. Special ops raids for targeted killing, blowing up some stuff or the other, bombing places. They don't even try to hide it, videos from chopper pilots, drones and helmet cams from soldiers are all over internet.
It may have some good reasons behind given war they wage on Lebanon, or just settling decades old political grudges, don't know.
> It may have some good reasons behind given war they wage on Lebanon
The day after Hamas attacked Israel in October 2023, Lebanon (well, Hezbollah) stepped up their rocket attacks on Israel. Israel had to evacuate the entire north of the country, so that combined with the evacuated people from the Gaza area means something like 200,000 internally displaced Israelis right now. Hezbollah has killed dozens of citizen across the border, most prominently 12 children playing soccer a few months ago. They bombed two kindergartens in Israel this past week.
Honestly, I don't understand why the threshold for war seems to be ground invasion. If you're shooting missiles at another country, that's war too. Israel is at war with Syria, Iran, and Lebanon imo. Not to mention Hamas.
Also giving billions worth of weapon is seemingly totally fine but selling shells and sending troops to an allied country is "an escalation" when the opponent camp does it. The double standards and double speak is so tiring.
Also, Israel claims it must preemptively strike and do all sorts of things due to what its enemies may do to them, but given how much slaughter and actual genocide Israel sanctioned its allies to do in Lebanon in its history, they would be equally justified in reverse
I think the nuance here is while we should continue to push for WFH, we should also be aware of the feelings of people around us when we speak from our privileged position. This doesn't just apply to WFH, we should keep other people's feelings in mind when arguing from any position of privilege. Software developers are extraordinarily privileged and it's important to keep this in perspective or else we will contribute to a sense of disdain toward all software developers.
Some will argue this is a talking point for anti-WFH people, but it's still a valid point to make independent of your WFH stance and as a general tip for being a decent, empathetic human being.
Very well put, and yes, again: Not advocating against WFH, just trying to be aware that as we are fighting for it, there are those who will never have it.
reply