Hacker News new | past | comments | ask | show | jobs | submit login
Search the Enron archive using Searchify's Gotank, a new IndexTank client in Go (gosearchify.herokuapp.com)
27 points by luriel on Sept 27, 2012 | hide | past | favorite | 7 comments




The company that I work for provides litigation services, such as distributed document conversion tools, review platforms and such. We've actually hosted data for reviewing attorneys of some of the larger cases over the past 15 years.

This Enron dataset is one of the standard sets of data that we use and test the speed and resilience of our software against.

I always liked the Enron data because the "smoking gun" terms were disguised using Star Wars terms, like "jedi" and "wookie". It does not look like this site has embedded email attachments indexed, so you may not see any interesting searches for these terms, but I did see a few questionable ones for "jedi". :)

This set also contains some of most hilarious, typical inner-office humor emails that I've had the pleasure of being able to debug. I remember one day, while testing our distributed automated document conversion tool (basically convert any document into a PDF (not a simple task, think about all the possibilities)), we noticed one of the workers had hung up on a PowerPoint document. So, first thing I did was open the document and it was a slideshow of porn images basically with embedded sound files. The audio files are what crashed the app, but when I opened it at the office the audio played loudly and my co-workers were like "wtf?". That was a hilarious moment.


I also love the Enron data set for the way my antivirus software has its own little jamboree every time I extract the attachments from it.


Impressive! Did you guys compete in TREL?


Nope. I've never heard of it. I tried looking it up on Google but didn't come up with anything, either. What is TREL?


Amen, that's the best search to run on these legal libraries. Or something like "we are fucked" or "shit," produces the best results. I came out to SV two years ago working on something similar, generating conversational webs between key individuals, selling to lawyers (as it turns out, we misidentified IP lawyers as the purchasers rather than the much more technologically conservative).

Looking at the site, you're selling full-site search rather than general email analytics or legal SaaS, so good for you! One difficulty we found in selling email analytics with a similar output was that we were selling to IT departments for a company-wide benefit. We found that the setup costs (time, especially) made it so that the IT department opposed the purchase, but we pivoted away from that market quickly after the first few customer development interviews.

Good luck!


I know this is going to sound incredibly lame but I wish they styled their usage of Bootstrap a bit more. I wish sites would spend just a few minutes to be a bit original in their visual appeal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: