> a "universal" version of this -- if you ran it across books and e-mails and te...

efreak · on Aug 31, 2023

No, this is simple statistics. Just list all the words in your corpus in order by how frequently they're used, then use a dedicated software solution like the one given here to make shorthand from it and expand as you type. Machine learning is massive overkill for this, like nuking a fly-- it's a massive waste of resources (time, energy, energy over time, every metric I can think of), and you're going to have targeting issues at that scale (how exactly do you propose to ensure a LLM stays on target and doesn't duplicate answers, make up words, skip words, etc? Are you sure your LLM actually knows what the correct distribution is? If you know it does because you checked, doesn't that mean you already have a word list you can use?)

beardedwizard · on Aug 31, 2023

> No, this is simple statistics.

How exactly is an llm different? I'm not sure it is, just more layers.

fxtentacle · on Aug 31, 2023

I was thinking the exact same thing. That sounds like a small ByT5 LLM model. Upon space you auto-complete the past word based on a moving context window of the past 32 characters or so.

offices · on Aug 31, 2023

Yes, LLMs and text compression are the same problem: guessing what comes next.

xigoi · on Aug 31, 2023

No. Unlike LLMs, it's a simple deterministic algorithm.