This is particularly interesting as there seems to be, for decades, a general consensus that the problem of text compression is the same as the problem of artificial intelligence, for example https://en.wikipedia.org/wiki/Hutter_Prize
"It is well established that compression is essentially prediction, which effectively links compression and langauge models (Delétang et al., 2023). The source coding theory from Shannon’s information theory (Shannon, 1948) suggests that the number of bits required by an optimal entropy encoder to compress a message ... is equal to the NLL of the message given by a statistical model." (https://ar5iv.labs.arxiv.org/html//2402.00861)
I will say again that Li et al 2024, "Evaluating Large Language Models for Generalization and Robustness via Data Compression", which evaluates LLMs on their ability to predict future text, is amazing work that the field is currently sleeping on.
I’m not sure how this generalises to grammar based compression such as SEQUITUR for example is… incidentally LZW also is though not advertised as such.
Math seems very limited when it comes to reasoning about generative grammars and their unfolding into text. Should the apparatus been there we’d probably had grammar/prolog based AI long ago…
Grammars are not AI, it's just another formalism (like regular expressions, Turing machines etc.) - formalism alone doesn't solve anything.
In formal language theory, you have different classes of grammars, the most general ones correspond to Turing machines, i.e. they are a glofified assembler and you can do anything. The most restricted (in the Chomsky hierarchy), "Type 3" grammars, are basically another notation for regular expressions, and they described regular grammars.
There are algorithms for learning grammars, but the issue with that is that the induced grammars may not resemble anything that a human may write (in the same way that a clustering algorithm often does not give you the clusters you want).
But to answer your question, we need to separate the discussion between appropriate representation and method to solve a problem.
I believe grammar-based compression - if you accept probabilistic grammars - is similar to LLM-based compression at some level in the sense that highly probable sequence of words get learned (whether by dictionary, grammar, neural network = LLM, could be just an implementation detail). Whichever you choose, you still need to solve the problem you are trying to solve (any grammar formalism still needs a parsing algorithm, and an actual grammar that does something useful - even after you develop a parser generator).
[Side rant, not responding specifically to the parent or OP: as a linguist, I'd also warn everybody to use "AI" with an article: *"an AI" (asterisk marks wrong use). It wrongly suggests human-like properties when it's actually just a matrix of numbers that encode a model. Here is a test whether you are using "AI" right: replace it by "Applied Statistics" in a sentence and see if you would still say it.]
AI is just an academic field (ill-named for historic reasons), subpart of computer science, and while it's fair to talk about useful representations for modeling human-like behaviors, we should focus on what intelligence is, and talk about the limits of concrete models and possibilities to extend them.
The thing about LLMs is they are a bit like the perfect snake oil salesman: extremely articulate, but knows very little nothing about a lot, understands nothing. (Whatever one criticises, they do the one thing that they are designed for very well: to generate text. Sadly that misleads a lot of people that they are just next-word/next-sentence predictors.)
You are very brave to call or not call something AI, but it is precisely generative grammars (a stochastic ones) who were initially considered AI - as a linguist you should know this better than myself.
There's a general consensus that entropy is deeply spooky. It pops up in physics in black holes and the heat death of the universe. The physicist Erwin Schrodinger suggested that life itself consumes negative entropy, and others have proposed other definitions of life that are entropic. Some definitions of intelligence also centre on entropy.
What to make of all that however, has anything but consensus.
To have entropy, you need to have a notion of information. To have information, you have to decide which differences matter, I.e. which states you classify as the same.
This isn't a problem for physics, or for computer science. But it is a problem for would-be philosophers (including a few physicists and computer scientists!) who thought information was a shortcut to avoid answering big questions about what matters, what we care about.
> but on the internet you don't have to say anything and if you do it may as well have some substance
Seems like we're using different internets. Which i am glad about. I just wish mine had less of the negativity that's coming over from yours. Guess in the end, the people on your internet realize, it's more fun over here.
You could have expressed all of that with less maliciousness towards the person. Thank god, in my internet everyone can say whatever they want f they want. Because– and more people should remember this apparently– if i don't like it, i just turn off the internet, like grandma!
Wish all the best to you and everyone you care about in real life. I might be just a bot. You might be. We'll never know for certain. Don't let some bits mess with your feels.
I'm sorry for for leaking negativity into your internet. I don't think negativity is inherently undesirable, but I don't think it's useful to express it towards people's selves. I meant only to criticize the comment without further implication.
In fact I went and got some references I really liked because I was hoping to add what I felt was missing from the discussion on entropy. My motivation in the end was to share my personal feeling of awe, and in a way that was accessible to the parent poster as well as other readers. How do you like that internet?
> My motivation in the end was to share my personal feeling of awe, and in a way that was accessible to the parent poster as well as other readers.
Then write it that way:
1. Remove the first paragraph, where you treat the OP like a child by telling them where it is and isn't appropriate to express their idea
2. Remove the first two sentences of the 2nd paragraph
3. Remove the clause "but you can't get that from a quip."
Now we've got the beginnings of a delicious comment! You could even garnish it at the beginning with something like "Not sure if we're talking about the same thing, but..." But you don't even really need it.
That's the difference between playing in a sandbox with others, and unwittingly kicking someone out of one.
> I give up. Delete my account please dang. This site isn't good for my mental health.
While I cannot speak to your conclusion, I can humbly suggest to not put any credence in what some rando says on the Internet. Including myself. :-)
Far better is it to dare mighty things, to win glorious
triumphs, even though checkered by failure... than to rank
with those poor spirits who neither enjoy nor suffer much,
because they live in a gray twilight that knows not victory
nor defeat.[0]
>> This is all weasel words, and you've misspelled "Schroedinger"/"Schrödinger". That sort of comment might be fine for the pub, but on the internet you don't have to say anything and if you do it may as well have some substance.
> ... I can't invalidate your sense of awe.
Actually, yes. Yes, you can.
And so could I, or anyone really, given sufficiently focused vitriol.
For example, your sentence fragment "This is all weasel words" is incorrect English. "This is" should use the plural form "These are" as the subject is "words" and not "weasel", as well as the modifier "all" emphasizing plurality.
The irony of your subsequently pointing out a spelling error and then chastising the OP for same has not been lost.
> At least 50% of posts that point out a spelling or grammatical error contain one as well.
Quite true. While I do not generally claim to be a grammatical wizard, I do know when I hear from one (hello Zortech-C++, it's been too long!).
If you don't mind pointing out my mistake(s) above, I would appreciate it as my goal was to exemplify the social effect of pedantic critique. Being corrected when doing same could serve as an additional benefit.
What's the unconditional rate of errors in posts generally? Without the prior I don't know if whinginging about spelling or grammar makes my posts correcter or incorrecter.
By what standard of English did you reckon my post incorrect? I appreciate your effort to cheer up your parent post, and to improve my language skills, of course.
(I'm not the language usage police, though I am fussy about correctly rendering people's names.)
I didn't understand your gainsaying about invalidating awe. Whether or not the poster's awe was a real and worthwhile feeling seems to me entirely independent of my opinions.
I find your aims admirable. However, I regret to say that for me the irony, and purpose of this comment thread, have indeed been lost.
> The subject was "this", referring to the comment.
While I understand your clarification being the intent, in the original context "this" is in its determiner form and not pronoun form. Would the addition of "comment" have been included, then I believe most (if not all) readers would understand its use as the pronoun form it is often used as well as being associated with the noun form of "comment."
More important than my pedantry was an attempt to illustrate how corrections in this medium can be interpreted quite differently based on the person. As you intimate, my example did not affect you adversely (which is great BTW). How the OP responded to your original reply indicated a different effect unfortunately. I am not judging, only providing my observation.
A quote I wish I knew much earlier in my life is:
A sharp tongue is the only edge tool that grows keener with
constant use.[0]
Your comment is excellent, inspiring and quite true.
Please stay, otherwise the rest of us are stuck with the alternative (which essentially someone saying "read this wikipedia and Schrödinger original talks", with a perplexing pile of unhappyness, pretending to correct things that you didnt get wrong)
I’m not sure this is strictly true. It seems more accurate to say there are deep connections between the two rather than they are theoretically equivalent problems. His work is really cool though no doubt.
In the sense I understand that comparison, or have usually seen it referred to, the compressed representation is the internal latent in a (V)AE. Still, I haven't seen many attempts at compression that would store the latent + a delta to form lossless compression, that an AI system could then maybe use natively at high performance. Or if I have... I have not understood them.
it is true, but i think it's only of philosophical interests. for example, in a sense our physical laws are just human's attempt at compressing our universe.
the text model used here probably isn't going to be "intelligent" the same way those chat-oriented LLMs are. you can probably still sample text from it, but you can actually do the same with gzip[1].
Microsoft SQL Server is only available as an x86-64 docker container binary. They actually had a native(?) arm64 docker container under the name "azure-sql-edge", which was (and still is) super useful as you can run it "natively" in an arm64 qemu linux for example, but alas that version was not long lived, as Microsoft decided to stop developing it again, which feels like a huge step backwards.
There's probably other closed-source linux software being distributed as amd64-only binaries (rosetta 2 for linux VMs isn't limited to docker containers).
I guess if you use this, then the security of your key is only as strong as for how many minutes the bruteforce took (since anyone else could also run the tool and generate their own key matching the desired fingerprint in the same amount of minutes you needed - or less).
I don't think the idea is to use the visual representation of the SSH key as a security mechanism but rather to have an SSH key that looks cool when you visualize it.
Isn't the whole point of VisualHostKey in ssh to act as a security mechanism, i.e. "yes this looks like the correct server key" on first use on a new client that doesn't already have the key in known_hosts?
Is the runtime of this application "a number of minutes greater than the heat death of the universe" to find something that could pass off as matching the target visualhostkey?
Odd, this is not happening here on macOS 15.0.0. Turning bluetooth off either via the system settings app or the menu bar icon shuts off bluetooth immediately with no prompt for me...
I'd be very surprised if ubuntu has backported the keystroke obfuscation feature (which was introduced in 9.5) to 8.9.
When the feature is missing, you're not safe from keystroke interception at all, which is what this bug is all about, so any 8.9 version would be actually considered "unsafe" until the whole feature is backported, which seems unlikely to happen?
At first I thought it was just an overflow error but no it's nothing like that. The math is indeed very clearly broken, as I play around with it on Sonoma on my M1.
I'm genuinely shocked. I though this kind of floating-point math was rock-solid, tested thoroughly over the decades.
The screenshot in the article shows MD5() is returned as part of the error message from the web server, so it is probably also a part of the original server-side query.