
Show HN: Getsy – browser/client-side web scraper - ep123
https://github.com/epiqueras/getsy
======
Outpox
I was wondering what was the trick to be able to get around CORS restrictions.
It seems that Getsy is a wrapper for
[https://crossorigin.me/](https://crossorigin.me/)

I've just started learning typescript and this is the kind of library I'm
looking to write to improve myself. Good job!

~~~
ep123
crossorigin.me is a reverse proxy that will make requests for you and add CORS
headers to them. You can specify the API endpoint of a proxy you want to use
or let it default to crossorigin.me.

------
diggan
This looks nice and relevant to something I've been meaning to explore but
haven't had time to yet.

I would love to have a tool that can give me an exact dump of the actual DOM
(and allow me to restore it). I've found some libraries that try to do this,
but they are randomly failing so haven't found something foolproof yet. Anyone
know of such a tool? Basically, I want a "live" copy of a website, as-is in
that moment. Not just the HTML but the actual DOM tree.

~~~
rkv
Try jsdom[1]? It returns a valid window object and stubs out events to allow
DOM manipulation (amongst other things).

1\. [https://github.com/tmpvar/jsdom](https://github.com/tmpvar/jsdom)

------
ep123
I added a github page with a repl so you can try it out:
[https://epiqueras.github.io/getsy/](https://epiqueras.github.io/getsy/)

------
ep123
I added support for infinite scrolling sites.

Example here: [http://www.getgetsy.com](http://www.getgetsy.com)

------
snowpanda
Is there documentation for this?

~~~
ep123
[https://github.com/epiqueras/getsy](https://github.com/epiqueras/getsy)

I'm updating it soon with new methods that support websites with pagination.

