Hacker News new | comments | show | ask | jobs | submit login
Show HN: Getsy – browser/client-side web scraper (github.com)
49 points by ep123 240 days ago | hide | past | web | favorite | 11 comments



I was wondering what was the trick to be able to get around CORS restrictions. It seems that Getsy is a wrapper for https://crossorigin.me/

I've just started learning typescript and this is the kind of library I'm looking to write to improve myself. Good job!


crossorigin.me is a reverse proxy that will make requests for you and add CORS headers to them. You can specify the API endpoint of a proxy you want to use or let it default to crossorigin.me.


This looks nice and relevant to something I've been meaning to explore but haven't had time to yet.

I would love to have a tool that can give me an exact dump of the actual DOM (and allow me to restore it). I've found some libraries that try to do this, but they are randomly failing so haven't found something foolproof yet. Anyone know of such a tool? Basically, I want a "live" copy of a website, as-is in that moment. Not just the HTML but the actual DOM tree.


Try jsdom[1]? It returns a valid window object and stubs out events to allow DOM manipulation (amongst other things).

1. https://github.com/tmpvar/jsdom


This tool will give you all the html that loads on your first visit to the url. You can insert it in a div's innerHTML, although this would be dangerous since all their scripts would run on your site.

You can get the raw response by accessing the Getsy's .content property.


> exact dump of the actual DOM

> I want a "live" copy of a website

You mean after the JS has modified it?


tools that export .har format do this for you. Chrome has such a thing in the dev tools, for example


I added a github page with a repl so you can try it out: https://epiqueras.github.io/getsy/


I added support for infinite scrolling sites.

Example here: http://www.getgetsy.com


Is there documentation for this?


https://github.com/epiqueras/getsy

I'm updating it soon with new methods that support websites with pagination.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: