The malice is by the author of the malicious skill file.
This is an intrinsic risk associated with giving LLMs access to sensitive material. It's reckless of Microsoft to give an LLM such broad access based on the user's own permissions.
If there were a confirmation prompt for the Teams message, why would even a highly competent user refuse it? That's what the skill says it will do. The message is expected, the visible content is expected, a confirmation prompt is just a nuisance.
I don't think it's a new normal. In my experience a large volume of text rarely has a lot of information to convey. Instead, it's intended to convey the sense that a lot of work has been done.
The consultant mindset set this trend before AI made it all worse.
Without passing opinion on GP's point, I think that just proves it's hard to establish a data set that doesn't bias toward the result you're hoping to find.
I am fortunate enough to have both a butterfly keyboard MBP and a touch bar MBP. Obviously the butterfly keyboard has the known issues, but the touch bar MBP also has the very common issues with that hardware.
I can replace the butterfly keycaps myself. It's something like $10 from aliexpress for a full set of keycaps and clips and a minute's work to pop the busted one and replace it. Annoying, but not fatal.
The touch bar needs a full battery, keyboard, track pad, and upper case replacement to fix. I just have to live that that thing flickering brightly at me every day, or spend AU$500+ to get it fixed.
Because LLMs aren't sentient, they don't draw on facts, and they don't have nuance. The answer given is similar to answers you might expect to see for similar questions.
It's really amazing we can make machines do that, and it's really depressing that we think a stochastic bullshit machine is going to give us something we can rely on.
Or… the default LLM Google uses for search has been quantized to s**. Ask a proper Thinking model, with browsing enabled, and odds of a correct answer are much higher. There’s been substantial improvement in AI in even the last year.
Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
Why would a human know specs for a random phone off the top of their head? The human response is either "I don't know" or "let me look that up", not a hallucination.
I think that it feels a little wasteful to go to Google search to ask a question like this, only for the AI that's giving you an answer instead of page results to perform its own web search to get you the response.
Also, I asked a thinking model with browsing enabled and got this:
> The Google Pixel 10 is expected to support Wi-Fi 7 (802.11be), based on the Qualcomm Snapdragon 8 Gen 4 / Tensor G5 chipset it will likely use, which includes an integrated Wi-Fi 7 modem. Specific finalized specs aren't confirmed until Google's official announcement.
(Model GLM-5-Turbo - two months old - using Kilo Code in the "Ask" profile; in its thinking token churn it reasoned that it should keep the response brief and direct. Perhaps not the best suite of model+harness for this task, but it's what I had to hand that's not quantized to shit, is a thinking model, and has a web search tool available to it.)
> Ask a human a question like this, and they also have a chance of getting it wrong, even when confident.
We google something specifically because the humans within reach don't know. The goal of searching is, well, to search pages - we're trying to find a site when we use google search.
The goal when using an LLM is generally different; we want an answer, not a site.
LLMs can not point you to sites, only in a general direction. That is because complete URLs do not exist as single tokens in any of the large models. It can synthesize a plausible-looking url, and if you're lucky that URL might even exist. But that doesn't mean that there is any relation between between the text surrounding a hyperlink in LLM output and the text on the linked page.
AI agents can verify and summarize URLs, but a plain LLM can not.
its bad in dev as well... i've seen llm code review bots tell me things that are flat-out not true; this like "this wont compile because windows 11 doesn't exist" like wtf am i paying for this again?
If you have an interpreted language, you don't have a C function corresponding to each language function. You have a C interpreter loop with a "current instruction" pointer. When the current interpreted instruction is a call, you check all the things you need to check, push the current IP to a stack, and set the IP to the first instruction of the function.
C's type checker never sees the interpreted language's functions.
The article talks about inlining a two-arity call to clojure.core/max to instead be an explicit call to cpp/jank.runtime.max, eliminating the unnecessary argument count matching and recursion portions of the Clojure function.
It also mentions that in Clang the runtime max function will itself be inlined, so that's something LLVM ("the LLVM project", anyway) is still doing - and beyond that, as written this IR is likely to leave behind plenty of opportunities for LLVM to do the things it's good at: DCE, load/store optimisation, constant propagation, etc. And register allocation.
The jank::runtime::max call is itself complex: it's got to type check its arguments and work out what to actually do based on the two types; if parts of these tests are done before the inlined call to max there's a fair chance that LLVM will be able to eliminate their repetition and slim it all down a long way. In the fibonnaci example the fact that a previous test will have likely identified whether the argument is an int or something else should hopefully carry over for ::lte, ::sub, and ::add and simplify those down to just the single operator call - but sadly I suspect it won't at least for the addition, because the recursive call will lose the information that the return value when called with a tagged integer is always a tagged integer.
A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR (:metadata tag functions as specialised for <type> with the new entry point, if a function only calls specalised functions (and itself) it too can be specialised, and a heuristic to determine if specialisation gains enough to sacrifice space for it).
The first three paragraphs here are on point! jank's IR passes will not worry much about things like load/store optimization, register allocation, inlining C++ functions, etc. These are in LLVM's domain. We just worry about the Clojure side of things. Polymorphic math is intense, but we do our best to avoid the extra work by unboxing whenever possible.
> A future optimisation might be to specialise for unboxed types: far more potential speed improvement over pointer tagging, and IMO quite amenable to analysis with the Jank IR
All of these math functions are templates with four specific categories:
1. Object and object
2. Primitive and primitive
3. Primitive and object
4. Object and primitive
We handle the difference between typed objects (like integer_ref) and type-erased objects (object_ref) as well. This template then gets inlined, which is exactly what the last step of the benchmark optimizations (adding annotations) ensured. The return type of these functions will prefer primitive types, rather than automatically boxing. jank's analyzer tracks all types used, at compile-time, and supports automatic boxing. This means that we're already using the most optimal primitive math whenever we can and that it will indeed inline to just an operator call when working on two primitives, or two typed objects, or a combination thereof.
Thanks for the response. I really like the measured, evidence-based approach you're taking to this work.
I have the wrong CPU architectures for pre-built jank packages (x86 mac, aarch64 linux, the exact opposite of 'normal') so I haven't actually looked at what it produces, so my last paragraph was pure speculation. I appreciate the detail you gave!
There isn't a more expensive Mac option to buy if what you're after is a gaming GPU. It's more likely that the VM team sees this as a very low benefit ticket to pursue given the tiny segment of Mac gamers hoping to improve their options with a Linux VM for gaming.
(Meanwhile, I'm recompiling Wine to see if I can patch it to address an issue that was hotfixed in Proton two weeks ago but isn't in a CrossOver build yet, so yeah, there's maybe some arguments to be made here that I'd be a potential beneficiary. If I weren't too cheap to spring for an eGPU in today's market, anyway.)
DPI can refer to inspecting beyond just the headers, but since it's more of a marketing term than a technical one, you could also say you're "deeply inspecting" the IP headers of a packet and no-one would show up to arrest you for bad terminology.
Anyway, one way to detect NAT is to observe different TTLs originating from one device. Is that deep inspection? Probably depends on who you ask. The fact that you have to track information across multiple packets counts for something, though.
Off the top of my head I wouldn't really expect there to be much value in a MITM inspection of the contents of HTTP traffic for the purposes of NAT detection. You could probably come up with some scenarios in which it might be possible, but I'd content those scenarios aren't very practical. Easier to compare TTLs between packets, say, or track connections to known OS "phone home" destinations. While these just use information from the IP layer, they're stateful observations requiring comparisons across multiple packets, and that might count for something.
One way to detect a shitty carrier service, though, is that they're inspecting your traffic for "good" or "bad" uses of their service, because that is a good indicator that they're not just a carrier. I call it Dickish Practices Identification, or DPI.
This is an intrinsic risk associated with giving LLMs access to sensitive material. It's reckless of Microsoft to give an LLM such broad access based on the user's own permissions.
If there were a confirmation prompt for the Teams message, why would even a highly competent user refuse it? That's what the skill says it will do. The message is expected, the visible content is expected, a confirmation prompt is just a nuisance.
reply