Hacker News new | comments | show | ask | jobs | submit login
Show HN: Web Scraping in Google Sheets (link.fish)
153 points by linkfish 7 months ago | hide | past | web | favorite | 31 comments

Love seeing Google Spreadsheets used like this. Many MVP 'products' would do well to be a 'simple' spreadsheet first.

sheetsee.js is great for visualizing data in these spreadsheets too (can make a graph on a website from data in a google spreadsheet in a few lines) https://github.com/jlord/sheetsee.js

And Google Sheets would do well to improve the add-on development experience. It's absolutely awful and the documentation rarely tells you what you actually need to know

Totally agree! Did run into a lot of problems developing the Add-On. For some issues, even bug reports exist that are many years old. So for me, it sadly kind of seems like Google totally lost interest.

You developed this addon? Lots of respect, my development experiences were really discouraging, and I was very happy when I didn't have to do it anymore. I can't begin to imagine what it must have taken to develop something so complex

To be honest, was not that complex. The problem was just that Google Script made stuff that I expected to take 5 minutes to code take days in the end. For example, simply refreshing the value of a cell. I thought I simply gonna call some kind of method and it will take care of it. Sadly was quite the opposite. Took me a week to figure out that to achive that, I have to: 1. save the formula of the cell 2. remove the formula from cell 3. set the focus to another cell 4. set the formula in the cells back to their former value 5. activate the cells again And in the end, they then have such additional gimmicks like that they use different value separators in the formulas depending on your country (some use comma others semicolons).

80/20 rules sadly.

Do yourself a big favour, and make your credit numbers locale-aware. Many (most? not sure) people are going to assume (like I did) that $9.99 would get me 10 credits, because "." is a decimal point in my locale.

I almost closed the page at that stage because $1 per page scraped seemed very greedy, and it wasn't until I saw the Expert plan with it's "1 million" that I realised that 10.000 = ten thousand.

Maybe use the SI convention of spaces: 10 000.

Thanks a lot for the tip. Will change that immediately.

Cool product, excited to try it out!

Apologies in advance for unsolicited feedback, but I think you could benefit a lot on the revenue side if you played around with pricing. I've done a lot of scraping stuff in the financial services world, and this would definitely be worth more than $199 per month for them if it can help them get an information edge.

Patio11 writes some great stuff about pricing here https://www.kalzumeus.com/2012/08/13/doubling-saas-revenue/

Thanks a lot! Great to hear!

Actually asked bellow specifically for feedback, so exactly what I wanted to hear. Agree, the pricing is probably really something that still has to get improved. This is currently the first iteration in which I did mainly orient on competitors and my costs. Hope once I have some more users and get more feedback I can improve on that. Also thanks for the link! Gonna read that later.

Please keep the pricing as is for now, I like it :)

In case you're not aware, MS Excel already does this.


Thanks, was actually not aware of that. Did honestly not use MS Excel for probably 10 years. Looks very interesting, gonna check out the documentation.

Looks like that now with my Add-On Google Sheets can finally do the same what MS Excel apparently could since 2007.

Unfortunately Office for Mac does not support this, just find out...

Launched today our Sheets Add-On and would love to get feedback. Also if there are any questions, I am here to answer them.

Ever since kimono labs shut down I haven't found a good point and click replacement - this looks like it might fit my needs of scraping <10 pages for something simple like blog titles + URLs.

Most pages should work by default because more and more start to use Schema.org to markup their pages. In case a website is not supported or the needed data is missing it is possible to add support. The easiest by simply writing us and we add it ourself. The alternative is to use our point and click data-selector tool which is similar to how kimonolabs worked. It can be accessed by logging into link.fish and then select in the header menu "Plugins >> Data Selector".

Point and click with OCR: https://a9t9.com/kantu/web-scraping

Webscraper.io works for me

I think this is an opportunity for people who want to quickly build early versions of a product or website without diving deeply into the code. I've used Google Spreadsheets for some of my Alexa Skills and I think it's a super lightweight CMS. This integration could be helpful for adding dynamic content. Thumbs up

for low-volume requests Airtable is also pretty nice to use in an Excel-like fashion with a dedicated API (free for a couple of thousand of entries).

edit: except missing the scraper part haha

Actually wrote this morning Airtable to maybe also create a scraping extension for them. Still waiting for an answer.

GoogleSheet natively can import webpages using '=ImportXML("website.com")'. But it is not as elegant as this addon

an example: https://zapier.com/blog/google-sheets-importxml-guide/

Great job. You might need to work on pricing. (I am sure, you are still experimenting). I used google sheets before to get rest APIs and extract information. (internal project). But google sheets being on cloud, provides many opportunities for integration. Google should be more serious on plugin marketplace.

Really interesting tool. Searching for that time ago. Can't wait to try it out.

Great to hear! Would love to get some feedback from you once you did. Thanks!

Sure! I will do it.

Interesting. I'm trying it out but getting "You are not allowed to get a DomainDescriptions you did not create!" after creating a custom Data Selector.

Yes saw that in the logs and actually already wrote you an email. At the current point in time, we do not allow to change DomainDescriptions other people created. In this case, we save the one the user wanted to save and log it (that the work is not lost) and then incorporate the changes our self. This will change in the future. Sorry for the inconvenience.

FYI, the email signup button on the Home page reads: "Keep me updat"

Thanks for the info! However, I just checked and for me, it displays it correctly. Can you please tell me what browser and version you are using that I can debug it. Thanks!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact