

Algorithm for converting a finite state machine into a regular expression - pavel_lishin
http://qntm.org/algo

======
Darmani
Just had a second look at your algorithm.

You're essentially running a well-known DFA-minimization algorithm (see
<http://en.wikipedia.org/wiki/DFA_minimization> or
<http://www.cs.cmu.edu/~cdm/pdf/minimization.pdf> ), and then running
Brzozowski. This isn't new at all, though I'm pretty impressed you reinvented
the minimization algorithm.

~~~
qntm
Well, there you go then. I did make an effort to find what I'd done online,
but since there is no explicit "minimization" step in the way I've phrased my
algorithm I probably didn't stand a chance. Apparently that DFA minimization
thing is on any Computer Science degree course, which of course I've never
taken (I only discovered state machines as a concept about six months ago).

Thanks for reading, everybody.

~~~
Twisol
At least you know for sure that what you came up with was worth coming up
with. ;) I do that all the time, and it's a mixed bag of pride and annoyance.

------
qntm
I apologise for the site loading slowly, my entirely unrelated page
<http://qntm.org/uk> "The Great British Venn Diagram" is currently being
hammered by Redditors.

------
alttab
This is basically a course study in formal languages. What is more fascinating
is that you arrived at the same abstraction independently.

------
Darmani
There are several completely different algorithms for converting DFAs into
regular expressions. See <http://www.cs.cmu.edu/~cdm/pdf/KleeneAlg.pdf> for
three. One of my favorites involves eliminating states by replacing all pairs
of transitions through them with regexes -- it winds up looking almost exactly
like the Floyd-Warshall algorithm.

------
profquail
The site wasn't loading for me, but it did load (albeit slowly) from a Coral
Cache link:

<http://qntm.org.nyud.net:8080/algo>

~~~
seancron
It loads for me, it just loads slowly.

Cached version: <http://cache.historious.net/cached/569458/>

Edit: P.S. If you used the cached version, can you let me know how it worked
for you? I stumbled across historious and I'm testing to see how well it
works.

~~~
StavrosK
Heh, I went to historify that site and post the cache to help out, but wasn't
sure if it would be seen as shameless promotion. I'm proud this feature sees
usage, and glad you like it!

EDIT: I'm glad to report that server load is at 0.01 right now, so hammer
away, Varnish is pretty much indestructible.

~~~
seancron
I've been using Historious for the past few days and it has worked pretty well
for me so far. We'll see how it works in a month or so when I need to find
that Really Useful Article :)

One thing that I did notice about the cache is that when qntm's server was
getting hit hard, it was taking ~5-10 seconds to load the cached page. I think
this is because even though you cache the page, it was trying to load the CSS,
images, and js from qntm's server. Not that big of a deal, but I was expecting
it to load faster.

~~~
StavrosK
Fantastic, I'm glad it's been good so far. Finding that Really Useful Article
is working very well, I've been using it for months and it's great :P

Unfortunately, it can't _not_ load the CSS/JS/images from the other site, as
it would take some pretty involved spidering and a bit of additional code on
our end to support caching CSS/JS/images, so it's not very much worth it.

What _is_ worth it, though, is building in a sort of version of Readability
into the cached page. So you will be able to click "light view" or something
and reach a well-formatted page with only the text for the bulk of the
article. This is so you can read it better if the parent site is down (it'll
load faster without CSS/JS) and you can read it on your mobile, kindle, etc.

------
nopassrecover
Hmm my relevant theory is a little sketchy, but there are FSMs that aren't
modellable by regex right? (e.g. Nested tags).

~~~
andrewf
FSMs can't handle nested tags. Pushdown Automata can.

~~~
qntm
Most "regular" expression engines in modern programming languages can handle
back-references (e.g. "(.)\1"), which are actually so complex that you need a
full-blown Turing machine to handle them.

------
Kafka
Any mirror?

~~~
Groxx
Try googling for:

    
    
      cache: url you want cached
    

(I only found this trick out a couple days ago)

------
bhiggins
If you find this interesting, you may be interested in this as well:
<http://en.wikipedia.org/wiki/Star_height_problem>

