
Show HN: English syntax highlighter using part-of-speech tagging - ipsum2
http://english.edward.io/
======
ipsum2
Hi HN! This was inspired by the top comment from this thread:
[https://news.ycombinator.com/item?id=11294026](https://news.ycombinator.com/item?id=11294026)

> It would be interesting to see the major parts of speech (nouns, verbs,
> adjectives) colored. Instead this is a coloring of fairly random words. A
> bunch of short words are grey, but they don't belong to any particular part
> of speech. They include some articles, prepositions, conjunctions and a few
> verbs...

~~~
photon_off
Could you please add a legend to indicate what the colors mean?

~~~
ipsum2
Sorry for the slow response, I've implemented this already.

------
TazeTSchnitzel
It's strange to me that we call these things "syntax highlighters", given that
they don't really seem to colour things differently according to syntax,† but
rather by the category of token. Perhaps "lexical highlighter" would be a
better term for these?

Because of this, unfortunately "English syntax highlighter" has two possible
interpretations to me: [English [syntax highlighter]], where _syntax
highlighter_ is a noun having the idiosyncratic meaning it has in a
programming context, i.e. a tool that highlights different tokens according to
their category, and [[English syntax] highlighter], a tool that would
presumably highlight parts of a sentence according to their syntactic
function. The latter would be more exciting, but also more difficult.

† There are some exceptions to this, but it does seem to be true for most of
the syntax highlighters I've used.

~~~
raldu
Your observation seems plausible at first. However when I gave it a second
thought I have concluded otherwise.

In the natural language context, a part of speech is actually determined
syntactically. "Type" of a token in formal languages by comparison is also
syntactical. Any highlighter must recognize the syntactic structure of its
target language and work like a _parser_ , that is, not like a _tokenizer_.
You have to "push" something to the stack, if you know what I mean.

To give a bad, but intuitive example, if a highlighter assigned "gray" to
every "=" character it comes across, it would not be able to highlight the
same characters _inside_ a quotation mark.

~~~
TazeTSchnitzel
> Any highlighter must recognize the syntactic structure of its target
> language and work like a _parser_ , that is, not like a _tokenizer_. You
> have to "push" something to the stack, if you know what I mean.

How often does syntax matter in "syntax highlighting"? Most highlighters I've
used don't even try to recognise syntactic structure.

> To give a bad, but intuitive example, if a highlighter assigned "gray" to
> every "=" character it comes across, it would not be able to highlight the
> same characters inside a quotation mark.

The problem with this example is that strings are a single token.

------
legulere
"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo." is just
orange :/

~~~
jychang
It does ok at other phrases, though.

    
    
        Never hide hides.
        The sound sounds sound.
        Flies fly.
        It's the right right, right?
        The key key is this one.
        It's an objective objective.
        In May, May may make out with me.
        Compared with the last one, this is a fine fine.
        The man we saw saw a saw.
        The first second was alright, but the second second was tough.

~~~
stared
My standard test is "A machine learning to do machine learning." And it works
here.

~~~
taneq
Sentences like these lead you down the garden path sentence.

~~~
stared
This was just "verb vs a part of a compound noun". For a garden path sentence
I would go with "A machine learning machine learning.".

~~~
taneq
But that can be validly parsed two ways: "A machine-learning machine, which is
learning" or "A machine which is learning machine-learning".

------
trebor
My only request would be to make the highlighted parts of speech configurable.
For me it's too active with 11 different colors/things demanding attention.

But it would be awesome to be able to hover my mouse over a "palette" and see
the matching parts of speech highlight. Or even just turn the parts on/off to
see what I want to see.

------
peterburkimsher
That's beautiful! I wish that someone could make that for Chinese. I'm trying
to learn Chinese now, and I think it would be easier in colour.

Also, is there a way to run this script on a large collection of text files?
Specifically, the Bible. I'd be curious to split out all the names/places/etc.

~~~
cabalamat
I imagine people learning Latin might find it useful as well. And have a
mouseover for conjugation / declension.

~~~
milesokeefe
I made a tool that does a form of that:

[http://latin.milesokeefe.com/?s=disce+quasi+semper+victurus+...](http://latin.milesokeefe.com/?s=disce+quasi+semper+victurus+vive+quasi+cras+moriturus)

------
no1youknowz
Are there plans to open source this or does anyone know of OS projects that
does this? Thanks!

------
x1798DE
It's very odd - I can barely work without syntax highlighting when
programming, but this does absolutely nothing for me in English. I wonder why
this is.

I'm tempted to say that it's because generally when reading in English, I'm
not "jumping around" as much - I'm just reading left to right and processing
everything into a thought. It could easily be, though, that it's just that
English is my native language and I already do a better job immediately
understanding the syntax of sentences than an automated tool would anyway.

------
laurieg
Very interesting. I've toyed around this idea myself and come to the
conclusion that coloring on part of the speech level is not that useful for
the educational aspects I'm interested in. If you could take this to the next
level and visually display the grammatical sentence structure then I think
this would be a really useful tool for people studying English as a second
language.

~~~
wodenokoto
There are plenty of tools that can draw a variety of tree structures over
English text.

[https://displacy.spacy.io/displacy/index.html?full=Click+the...](https://displacy.spacy.io/displacy/index.html?full=Click+the+button+to+see+this+sentence+in+displaCy).

------
melloclello
I think this is kind of neat for structured English as opposed to prose. Prose
tends to look like vegetable soup, but I pasted a very regularly phrased and
structured todo list in here and it looks great. Things like the standard
Agile "as a user, I want X so I can Y" formalised story descriptions just make
sense with this kind of syntax highlighting.

------
mrspeaker
I made a proof-of-concept of this a while ago
([http://www.mrspeaker.net/2012/03/24/syntax-highlighting-
for-...](http://www.mrspeaker.net/2012/03/24/syntax-highlighting-for-
writers/)) - my "writer-friends" weren't into it (what do they know!)... but
then there was a post on HN a few years later where someone had patented the
idea! I can't dig up - does anyone else remember this?

EDIT: Ah, this was it:
[https://news.ycombinator.com/item?id=6966528](https://news.ycombinator.com/item?id=6966528).
An update on the blog says the company it was about tweeted "We will drop our
patents pending. Thank you @dhh for clearing our minds."

~~~
ipsum2
I'm not sure why you were downvoted, your demo is really cool.

------
ipsum2
Server load too high! I'm going to restart the server. Apologies for the
downtime. In the meantime, here's a screenshot:
[http://i.imgur.com/493smt7.png](http://i.imgur.com/493smt7.png)

~~~
sethjgore
Hello! Are you open sourcing this? If so- can I use your library?

------
JulianMorrison
I actually really like this, and I wonder, if it were made an extension and
made to do more subtle color shifts of the underlying font, whether it might
be a very useful tool for people with some types of learning difficulties, and
aid effective reading in general.

------
chronial
I think a highlighting that helps parse the sentence structure would me more
useful.

------
a3_nm
On Iceweasel (Firefox) 44.0.2 I only see "undefined" in the text area.

~~~
ipsum2
Thanks for the report. It seems like Firefox 45 has implemented the innerText
function ([https://developer.mozilla.org/en-
US/docs/Web/API/Node/innerT...](https://developer.mozilla.org/en-
US/docs/Web/API/Node/innerText)). I'll see if I can polyfill it.

~~~
dvh
use textContent instead of innerText, works both on chrome and ff

~~~
ipsum2
textContent doesn't preserve newlines correctly. I found a polyfill, so
Firefox <45 should work now.

~~~
a3_nm
I confirm it does. Thanks!

------
skykooler
It does seem to have issues wit garden path sentences - in "The old man the
boat", it colors "man" as a noun rather than a verb.

~~~
thylacine222
Yup.

    
    
      The old man the boat.
    
      The complex houses married and single soldiers and their families.
    
      The man whistling tunes pianos.
    
      Time flies like an arrow; fruit flies like a banana.
    
      The cotton clothing is made of grows in Mississippi.
    
      I convinced her children are noisy.
    

Gets all of 'em wrong.

------
sciencerobot
You should post some examples of highlighted literature to see if different
writing styles look different because of the syntax highlighting.

------
mchahn
This seems like it would be useful for education. I don't see how it helps
reading, which is the purpose of highlighting source code.

------
mkrecny
Source?

------
sarreph
It's really great to see advances in English syntax highlighting (an oft-
neglected 'language' when it comes to code editors).

However, I get a feeling this is a sort of hype-train at the moment, because I
swear I've seen three English syntax highlighters in the past week alone
(being posted to HN).

Could this be the latest, albeit welcomed and intelligent, project exercise
obsession à la Flappy Bird / 2048? :)

~~~
jychang
Well, OP said posted in a comment that it was inspired by the top comment from
the post from 2 days ago...

------
im3w1l
Very cool! Could you change the colors so as to make adverb vs noun easier to
tell apart?

------
cabalamat
I tried it with:

    
    
        The red car
        This red car
        Your red car
    

And it marked "Your" like "red" and not like "The" or "This". Surely "your" is
a determiner and not an adjective? I mean, you can't say *"The your car".

------
amelius
A case against syntax highlighting: [1]

[1]
[http://www.linusakesson.net/programming/syntaxhighlighting/](http://www.linusakesson.net/programming/syntaxhighlighting/)

