
Microlight.js, a code highlighting library - xpostman
https://asvd.github.io/microlight/
======
Klathmon
Somewhat off topic, but is there a "regex alternation optimizer" out there?
And would something like that be worth it?

I've looked through some textmate-style syntax highlighting packages (used in
sublime and in github's atom and probably others), and most of them need big
(or somewhat big) sets of alternations for a bunch of keywords, and more often
than not they are just set up as a list of full keywords with no thought to
order or size.

Combining them into something like the below should theoretically be faster
while also taking up less space (which is important in web libraries), and I
feel like it wouldn't even be all that difficult.

    
    
        de(bugger|cimal|clare|f(ault|er)?|init|l(egate|ete)?)
    

Is there something out there which can do this, and would it even be worth it
or is this something best left to the JIT/optimizer of the regex engine?

~~~
julian37
I'm not aware of any tools out of the box, but you could trivially build your
example regex from a Trie which is also easy to construct.

[https://en.wikipedia.org/wiki/Trie#Algorithms](https://en.wikipedia.org/wiki/Trie#Algorithms)

I'm not so sure it would take up much less space though, if you take gzip
compression into account. See for example here:

[https://github.com/google/closure-
compiler/wiki/FAQ#closure-...](https://github.com/google/closure-
compiler/wiki/FAQ#closure-compiler-inlined-all-my-strings-which-made-my-code-
size-bigger-why-did-it-do-that)

~~~
ybx
Something like that would probably be better off with Aho-Corasick, with is
similar to a trie but ends up with more compact FSMs

------
c-smile
There are two main problems with such solution (and any other existing JS
based code colorizers):

1\. Use of regular expressions. Hard to get effective solution with that. Yet
that famous parsing HTML by regex answer:
[http://stackoverflow.com/questions/1732348/regex-match-
open-...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-
except-xhtml-self-contained-tags/1732454#1732454)

2\. It modifies the DOM. Not desirable in most cases and heavy: each DOM
element takes 0.2k..1.0k in memory. Yet DOM handling in browsers is O(N)
complex (N - number of DOM elements).

Just in case, in Sciter ([http://sciter.com](http://sciter.com)) I've added an
option to style character runs so without DOM modification:
Selection.applyMark(runStart, runEnd, name) and special ::mark(name) pseudo-
element in CSS to style those runs.

Illustrations: [http://sciter.com/tokenizer-mark-syntax-
colorizer/](http://sciter.com/tokenizer-mark-syntax-colorizer/)

------
tobr
Pretty neat that the styling is included in the 2.2k, but it would be more
practical if it would apply classes to allow me to style it myself. Glowing
text is pretty opinionated.

------
rahiel
This is cool, but I prefer code highlighting without requiring visitors to run
JavaScript. It seems unnecessary when tools like Pygments [1] do the
highlighting once and output html/css.

[1]: [http://pygments.org/](http://pygments.org/)

~~~
JustSomeNobody
Yup. Hilight it one time and serve it up vs hilight every time (client side).

I don't like waste for the sake of waste.

~~~
exogen
Surely highlighting it on the server is the waste in this case? The library is
only 2174 bytes (not even gzipped).

Let's say you used the smallest markup possible for each highlighted token.
Something like <i class="x">[token here]</i>. You'd only be able to serve up
at most 120 server-highlighted tokens before this JS library becomes a smaller
payload, and that's without even defining the CSS styling or considering
tokens longer than 1 character.

120 isn't even enough tokens to highlight the tiny code snippet at the top of
this demo page. The same tiny snippet highlighted with Pygments comes out to
5819 bytes of markup alone (no styling) – already more than 2.5 times the size
of this whole library. Plus you can highlight any number and size of code
snippets while just serving the library once... which one is wasteful again?
:)

------
wongarsu
It's not the most high-quality syntax highlighting, but it provides decent
highlighting with minimal setup and works well with user generated content
where the programming language isn't known.

It's not going to revolutionise syntax highlighting, but I think it has plenty
use cases.

------
hougaard
And it made the site super sluggish. Guess that huge regex needs a dedicated
cpu.

Almost un readable on a fairly quick phone...

~~~
lucideer
Site works with no lag on my low end android. Someone else commented on issues
on iPhone6 so possibly a Safari-specific bottleneck being hit.

~~~
yxlx
Hello, Firefox on Android user here on a tablet from circa 2012. Extreme lag,
unlike most websites.

There are definitely unacceptable performance issues with the syntax
highlighter javascript.

It looks great, it's a cool project, but it is not suitable for use anywhere.

~~~
STRML
Looks like the `overflow:auto` on the page container div, not actually
anything to do with the library.

------
usaphp
For some reason scrolling that page on my iPhone 6s is insanely lagging, is it
due to a plugin?

~~~
panic
It's because there's a wrapper div with "overflow: auto" around the entire
page. This means you're scrolling inside the div (unaccelerated, need to
repaint) instead of scrolling the entire document (accelerated, no need to
repaint, can just move around pre-rendered tiles). Removing the "overflow:
auto" makes scrolling smooth.

------
pmlnr
I'm going to stick to [http://prismjs.com;](http://prismjs.com;) that is fast.

~~~
geuis
You should remove the semi colon. It's breaking the link

------
inglor
This library is actually very HTML specific - look at the code here:
[https://github.com/asvd/microlight/blob/master/microlight.js...](https://github.com/asvd/microlight/blob/master/microlight.js#L184-L190)
\- it assumes what the language looks like.

------
red_hare
Wow, almost 1k of the 2.2k of source is a single regular expression.

That's eerily beautiful.

~~~
nephyrin
Better hope your browser of choice has a regex JIT.

And GPU compositing, 'cause damn.

------
phoboslab
Somewhat related: I wrote a JavaScript syntax highlighter for js1k a few years
back. It's 1008 bytes including a quine - highlighting itself:

[http://js1k.com/2010-first/demo/194](http://js1k.com/2010-first/demo/194)

------
verandaguy
How well does it work for non C-like languages? Particularly, ones like
Haskell, Erlang, SQL?

I'd check myself, but I'm away from a PC for a while.

~~~
xpostman
it works... somehow :-)

[http://asvd.github.io/microlight/haskel.png](http://asvd.github.io/microlight/haskel.png)

well, since the lib is general, it's built upon compromises. But I am open for
suggestions concerning updating the logic for some particular cases

~~~
amelius
Does it also work for languages where you can define // to be an operator?

------
codexon
It looks like it highlights random keywords like wchar_t.

Hardly for "any programming language".

~~~
lifthrasiir
Isn't that an appropriate keyword to highlight, just like `char`?

~~~
codexon
Not if you use a language where it isn't a keyword.

~~~
stestagg
I don't think the intent is to flag up invalid syntax/usage, but to make
sensible guesses about what strings should be highlighted, to aid readability.

There are few cases where having wchar_t in any code snippet would not
indicate some sort of keyword/type annotation

~~~
codexon
I have used universal syntax highlighting and when it highlights stuff that
shouldn't be highlighted, it is very annoying and confusing.

------
armamut
I think for 2.2k (look at the minified code), it's quite nice. I liked it.

------
VeejayRampay
Well done. It covers an important niche and it's a finished product so props
to the author :)

------
jaytaylor

        library size is extremely compact
        2.2k, seriously, can you imagine!
    

I wonder if I can do better than this, since it seems mostly a matter of a few
regular expressions and then DOM manipulation?

 _EDIT_

Reviewing the source code [0] the state-machine approach, when properly
implemented can beat* [1] an equivalent RE performance-wise.

It still may be interesting to see if the code size could substantially
reduced this way.

[0]
[https://github.com/asvd/microlight/blob/master/microlight.js](https://github.com/asvd/microlight/blob/master/microlight.js)

* [1] Disclaimer: In my experience, and admittedly not using javascript. I _have_ recently confirmed minimal hand-implemented state machines generally beating Regexp's in Golang.

------
rl3
The glow effect is really nice. Appears to be simply good use of the _text-
shadow_ property.

~~~
xpostman
It's a separate project of mine, actually

[https://asvd.github.io/intence/](https://asvd.github.io/intence/)

------
jaimehrubiks
I'd say it's beautiful for some situations, but not for syntax hightlighting

~~~
Raphmedia
The blurry / highlighted effect is simply CSS.

What we are actually looking at here is the fact that some words are
highlighted in any programming languages despite the fact that the language
itself is unknown.

Usually the user has to manually select which programming language is being
used and THEN the words get highlighted.

------
xiphias
It would be nice to have some nice default colors for different things (non-
keyword variables, numbers...), I don't think it would blow up the library
size, but it would make a huge deal in readability of the code

------
cabirum
The example looks buggy in Chrome, even more buggy in Canary:
[http://i.imgur.com/6BAfGD0.png](http://i.imgur.com/6BAfGD0.png)

------
amelius
Does it interpret CSS's _background-color_ as a single word? How about
_balance-amount_ , where the hyphen is used as a minus operator?

------
z3t4
It would be interesting to read a blog about why just these keywords should be
highlighted and why highlighting is a good idea.

------
vatotemking
Side question: What programming topics should I learn to create a syntax
highlighter?

