Hacker News new | past | comments | ask | show | jobs | submit login

> a "universal" version of this -- if you ran it across books and e-mails and text messages from thousands of authors covering diverse backgrounds and contexts, then what would most reliably help everyone

Isn't this approaching llm territory?




No, this is simple statistics. Just list all the words in your corpus in order by how frequently they're used, then use a dedicated software solution like the one given here to make shorthand from it and expand as you type. Machine learning is massive overkill for this, like nuking a fly-- it's a massive waste of resources (time, energy, energy over time, every metric I can think of), and you're going to have targeting issues at that scale (how exactly do you propose to ensure a LLM stays on target and doesn't duplicate answers, make up words, skip words, etc? Are you sure your LLM actually knows what the correct distribution is? If you know it does because you checked, doesn't that mean you already have a word list you can use?)


> No, this is simple statistics.

How exactly is an llm different? I'm not sure it is, just more layers.


I was thinking the exact same thing. That sounds like a small ByT5 LLM model. Upon space you auto-complete the past word based on a moving context window of the past 32 characters or so.


Yes, LLMs and text compression are the same problem: guessing what comes next.


No. Unlike LLMs, it's a simple deterministic algorithm.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: