

Stanford Scientists Put Free Text-Analysis Tool on the Web - ninjin
http://engineering.stanford.edu/research-profile/stanford-scientists-put-free-text-analysis-tool-web

======
peter_l_downs
The tool itself is hosted here: [http://www.etcml.com/](http://www.etcml.com/)

The paper is here:
[http://hci.stanford.edu/publications/paper.php?id=279](http://hci.stanford.edu/publications/paper.php?id=279)

~~~
kulkarnic
Um, hey, I'm the author of the paper. I'll check this thread again in a few
hours, in case you have questions about it.

~~~
stevenrace
Neat project.

Will there be an API available? Or will I have to get creative with Form POSTs
:).

~~~
kulkarnic
There might eventually be an API available. Right now, we're focused on
getting the actual grading interactions for the peers right. Richard's etcml
already has an API.

------
peter_l_downs
That's pretty cool — I had the exact same goals (help professors grade essays
faster) with my bookshrink project [0]. Of course, it was the first python
code I ever wrote and it uses an extremely naïve algorithm, but the results
aren't too bad if you want to try it yourself [1].

[0]:
[https://github.com/peterldowns/bookshrink](https://github.com/peterldowns/bookshrink)
[1]: [http://www.bookshrink.com/](http://www.bookshrink.com/)

~~~
e12e
Regarding stuff like:

""" SPLIT INPUT INTO SENTENCES"""

god_awful_regex =
r'''(?<!\d)(?<![A-Z]\\.)(?<!\\.[a-z]\\.)(?<!\\.\\.\\.)(?<!etc\\.)(?<![Mm]r\\.)(?<![Pp]rof\\.)(?<![Dd]r\\.)(?<![Mm]rs\\.)(?<![Mm]s\\.)(?<![Mm]z\\.)(?<![Mm]me\\.)(?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])'''

Be advised that one of the nice things about python reg-exes is that they
allow in-line comments (and naming of groups), if compiled with the verbose-
flag:

    
    
        """ SPLIT INPUT INTO SENTENCES"""
        verbose_regex = r'''(?<!\d)  # I can't actually tell
          (?<![A-Z]\.)(?<!\.[a-z]\.) # what you're doing here...
          (?<!\.\.\.)(?<!etc\.)      # Is this one big group, or is
          (?<![Mm]r\.)(?<![Pp]rof\.) # it several groups, with
          (?<![Dd]r\.)(?<![Mm]rs\.)  # different prefixes?
          (?<![Mm]s\.)(?<![Mm]z\.)(?<![Mm]me\.) # Clearly, it's got something to do
          (?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])''' # with
                     # not matching the dot at the end of Dr.
                     # or as part of an ellipsis as the end of a
                     # sentence? But my point was that if such a
                     # regex is built-up and tested with comments
                     # it can be quite readable
    
          god_awful_regex = re.compile(verbose_regex,
                                         re.VERBOSE)
          # continue here...
    

[http://www.diveintopython.net/regular_expressions/verbose.ht...](http://www.diveintopython.net/regular_expressions/verbose.html)

~~~
peter_l_downs
Yeah, I've since learned that feature of Python :) Like I said, this is some
of the first Python code I ever wrote; it definitely does not reflect my
current knowledge.

~~~
e12e
Oh, to be clear, it wasn't meant as critique -- I just saw the aptly named
variable, and thought it might be useful to highlight this aspect of python to
others that might be new to the language. It's one of the few "special" parts
of python I have personal experience with, as I worked on a small program that
dealt with parsing emails, and being able to fully comment the reg-exp over
several lines as I tweaked both it and the groups was very helpful :-)

~~~
peter_l_downs
You're absolutely right, and what I forgot to say in my earlier comment was
"thank you!"

------
jonsy0999
I did a short blog post on this service here: [http://bicortex.com/twitter-
data-sentiment-analysis-using-et...](http://bicortex.com/twitter-data-
sentiment-analysis-using-etcml-and-python/)

It pretty good for what it does.

~~~
e12e
While the tool was still up, I did a quick search for #NSA -- and if I
understood it correctly -- that the search field was tuned to give sensible
sentiment analysis for tweets with the given hash-tag -- it didn't do a very
good job. I can't verify now (as the site is down) but it seemed like it got
about 50% wrong on that one...

Perhaps it does better with different hash-tags (there's a lot of bitter irony
associated with #nsa -- possibly more than average for other tags) ?

------
rahimnathwani
Other good text analysis tools from Stanford:
[http://nlp.stanford.edu/software/index.shtml](http://nlp.stanford.edu/software/index.shtml)

------
spdegabrielle
Don't forget [http://gate.ac.uk](http://gate.ac.uk)

------
mrgrieves
I'm looking forward to the day that this technology is used for censorship;
since everything political will need to sound positive, satire will rise
again!

~~~
ropz
"Censorship is the mother of metaphor" \- Borges

------
infocollector
I don't see any source that I can download and run. Is this a web service?
Aren't there other web services already that do this as a service?

------
j2kun
Link appears to be down... :(

~~~
prht
yep...

~~~
visarga
Worst possible time - and no Google cache. By tomorrow, 90% of the people who
are interested would have forgotten the site, especially that they got some
interest going on today. It seems to be an offshoot from Andrew Ng's team,
which is trustworthy.

~~~
BlackDeath3
Lesson learned - ensure that your infrastructure can handle your times of peak
interest _before_ launching your product.

~~~
j2kun
I heard this analogy when I was at Amazon: the internet is like a party where
your worst fear is not that nobody will show up, but that EVERYBODY will show
up.

They used it as a marketing pitch for AWS/EC2

------
mceoin
Dude, this is awesome! Thanks kulkarnic.

Username response was here:
[http://www.etcml.com/jobs/8188](http://www.etcml.com/jobs/8188) The only
thing I would add is an overall score for how positive/negative/neutral a text
is.

------
dexterchief
Apparently this is done with Ruby/Rails... any chance this is going to find
its way onto Github?

------
joeguilmette
It has a hard time with phrases like "I want to do X so bad"...

[http://www.etcml.com/jobs/8354](http://www.etcml.com/jobs/8354)

