

Ask HN: How do Mint and Hipmunk grab their data from external sites? - dglassan

I understand how you can grab data from websites that offer API's, but how do sites such as Mint and Hipmunk aggregate their data from sites like Banks and Airlines?<p>Where does Hipmunk gather their flight information and prices from?<p>How does Mint grab all your debit and credit card information from your bank?
======
pedalpete
It sometimes depends on what sort of data you are trying to get. Hipmunk could
be crawling travel sites. Mint couldn't have been crawling banks as those are
protected, unless people were giving them the passwords to their accounts.

Are you familiar with web scraping? Depending on the volume, and type of data
you are trying to get, it might work for you. It should at least get you
through proof of concept.

~~~
dglassan
I'm familiar with web scraping...hasn't it always been looked down upon
though? It just seems like it's got a negative connotation or something.

~~~
pedalpete
I think it depends on the application. Though I asked the question below as
ffumarola said it was 'bad practice'.

Why scraping is any worse than crawling is beyond me. But again, it depends on
the application. I built a concert search engine which scraped data from
multiple sites. It is my understanding that all the concert sites are getting
a bunch of data from published apis and then topping up with scraped data.

------
byoung2
_How does Mint grab all your debit and credit card information from your
bank?_

When they started, they used Yodlee, who now has an app store where you can
build on that same platform. They have already done the hard part of
integrating with Banks. <http://www.finappstore.com/>

------
ffumarola
Some sites like pay trust (intuit) practice screen scraping, but that is a bad
practice.

~~~
pedalpete
can you elaborate on why screen scraping is bad practice? or do you mean bad
practice for specific types of data?

I ask because I recommended screen scraping and have used it successfully in
the past. Though I do think it depends on the application.

