
Autocomplete Using Markov Chains - dennis714
https://yurichev.com/blog/markov/
======
rijoja
This is really cool and an inspiration of mine. As a matter of fact I am
working on something very similar to this.

I don't know if you are familiar with the Dasher project for text input, but
as of know I'm trying to improve on that work partly by improving on how many
letters are available simultaneously by protecting the line of text upon a
fractal surface. Something that should be a more efficient use of a 2d
surface, theoretically infinitely so.

As far as autocomplete is concerned my approach is to try to do exactly this
but on a character basis. I think this can lead to some interesting
advantages, for example different dialects gives rise to words that not always
conform to dictionary specifications.

The next level would be to go one step higher so to speak. If we imagine
Markov Chain on letters as the first level and said chains level, I'd say that
the third level in our hierarchy would be to apply markov chain on groups of
words grouped by proximity in a word2vec space.

Having markov chains working on groups of word2vec words would give us a
statistical analogy of grammar. However without having to implement it
programatically, something that inevitably would lead to missed corner cases
and if not that a too strict algorithm that would hinder intentional abuse of
grammar by purpose.

Maybe this is already being implemented, as it to me seems as the logical next
step. Anybody got any info on this?

~~~
ben_w
A friend of mine worked on Dasher (hi Alan, if you’re reading this!); I
thought they achieved the same goal with a hyperbolic space rather than a
fractal space?

Given this was about a decade ago, there are probably better predictive models
than whatever was in the demo he showed me, but I wouldn’t be surprised if
their model _was_ a Markov chain where the size of each next-letter option was
a function of probability.

(Memory haze: I was more excited by the hyperbolic space than the word model
when he showed it to me, and it was c. 2008)

------
stewbrew
I wonder what the result would look like if this were applied to source code.
IMHO a probabilistic algorithm for code completion could turn out interesting.

Most code completion algorithms work deterministic by deducing the set of
completion candidates from the receiver's type/class or a list of keywords.
Given that people/teams tend to name variables in a certain fashion, a
probabilistic completion algorithm could make use of this and adapt to
team/project-specific conventions. Given a team's code base one could probably
build a pretty good code completion algorithm without any knowledge about the
programming language.

likelycomplete[1] tries to do this in a dilettantish ad-hoc way for vim. It
rates completion candidates (that are gathers from previously seen code) on
the basis of context information. It's hampered by the limited performance of
vimscript though. A full fledged solution would require an external server.

[1]
[https://www.vim.org/scripts/script.php?script_id=4889](https://www.vim.org/scripts/script.php?script_id=4889)

~~~
xkapastel
> I wonder what the result would look like if this were applied to source
> code. IMHO a probabilistic algorithm for code completion could turn out
> interesting.

Neural program synthesis is similar to what you describe, here's a sample
paper:

> We consider the task of program synthesis in the presence of a reward
> function over the output of programs, where the goal is to find programs
> with maximal rewards. We employ an iterative optimization scheme, where we
> train an RNN on a dataset of K best programs from a priority queue of the
> generated programs so far. Then, we synthesize new programs and add them to
> the priority queue by sampling from the RNN. We benchmark our algorithm,
> called priority queue training (or PQT), against genetic algorithm and
> reinforcement learning baselines on a simple but expressive Turing complete
> programming language called BF. Our experimental results show that our
> simple PQT algorithm significantly outperforms the baselines. By adding a
> program length penalty to the reward function, we are able to synthesize
> short, human readable programs.

[https://arxiv.org/abs/1801.03526](https://arxiv.org/abs/1801.03526)

------
darkpuma
I've thought about using a markov chain suggester to trip up stylometric
analysis, but never got around to creating some sort of practical UX for it.

I think if you plugged it into Vim or Emacs's autocompletion functionality,
that might do the trick.

------
xkapastel
Keep in mind this is essentially the same concept as the big scary AI from
OpenAI that is making the news recently. They use neural nets, not markov
chains, but the idea is similar: given a word, predict the next word.

> It's surprising how easy this can be turned into something rather
> practically useful

Given the above, it's not so surprising: this word prediction problem is
fundamental, with a wide range of applications.

~~~
Sean1708
> Keep in mind this is essentially the same concept as the big scary AI from
> OpenAI that is making the news recently. They use neural nets, not markov
> chains, but the idea is similar: given a word, predict the next word.

Isn't this kind of like saying that a plane is essentially the same concept as
a car because they both transport people from A to B? "given a word, predict
the next word" is the problem statement (i.e. what the problem _is_ ) but
that's not very interesting, what's interesting is the solution (i.e. _how_
you solve the problem). Markov Chains and the kind of Neural Nets used in the
text generator that made the news are _very_ different, even if they're
attempting to solve the same problem.

------
braindead_in
This looks to be quite useful. Is there a github project that I can play
around with?

~~~
amrrs
I've got a similar Kaggle Kernel -
[https://www.kaggle.com/nulldata/meaningful-random-
headlines-...](https://www.kaggle.com/nulldata/meaningful-random-headlines-by-
markov-chain)

------
TomK32
would be nice to test this as a reply bot for spam mails.

