

Ask HN: Can I automate document gathering for research? - seanccox

I conduct  research into companies and individuals – due diligence, litigation support, public records checks, and media scans.<p>I find that I often need to repeat searches on the same individuals at different websites and news archives, only to repeat the entire process again with a new name.<p>I haven't a programming background, so forgive me if I'm not using the right terms. I speculate that the process can be accomplished automatically by a webcrawler... though I'm not certain if that is the right solution, and I don't have a clear picture of what the risks in attempting to automate this process might be.<p>At the risk of being long-winded, here's a typical case. I'm given the name of the firm, it's principal management, and maybe a few articles from the company website. My task is to find out if anyone at the company has been accused of a crime, or whether the company has been sued or engaged in fraud. I open tabs in a browser for each of the main websites/databases I will search (chambers of commerce, trade registries, government websites, etc.). Then, one by one, I put each name into each tab and look for results, all the while saving copies of any relevant documents I find. At the end of it all, I  piece together the story of the company or person, using the documents as support. This is the mundane part of the job – gathering the documents.<p>Is there a way to automate that process? Since I'm learning to code anyway, I'd like to try exploring a solution myself, but I'm not certain what language or process would be best for that. I'd love to hear the advice of the HN community.<p>Cheers,
-Sean
======
seanccox
Thanks for the advice. I'm taking a look at Selenium now.

I appreciate the plug for Feedity as well. Unfortunately, I don't think it's
the right tool for my needs. A lot of the resources I'm looking for are not
current, and once I've done the research, I don't generally have to research a
company or person again.

In the end, I decided that (for my purposes), Google's custom search engine
would suffice. It's not a perfect solution, but it seems like it will do the
trick for now.

Thanks again for the comments.

Cheers, -Sean

------
mahesh_gkumar
If most of your searches are happening on a browser, I would recommend
something like Selenium <http://docs.seleniumhq.org/> . Its a browser testing
tool, but it can automate some of the things that you are looking to do.
Selenium can also be extended if you want to.

------
nreece
_(shameless plug)_ Checkout our service, Feedity - <http://feedity.com> to
create and subscribe to feeds for public webpages and use it for media
monitoring and market intelligence.

