

Ghost.py is a webkit web client written in python - davidbrai
http://jeanphix.me/Ghost.py/

======
pie
This appears to be a wrapper for PyQt4's QTWebKit.

[http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/q...](http://www.riverbankcomputing.co.uk/static/Docs/PyQt4/html/qtwebkit.html)

~~~
wahnfrieden
Ah it's going to be a quite out of date version of Webkit then. Anyone know
which version exactly? We're using PyQt and this has been a huge issue.

~~~
albertzeyer
Here you'll find the exact revision for each Qt version:
<http://trac.webkit.org/wiki/QtWebKit>

E.g. I use Qt 4.8 (via PyQt 4.9) and thus I have WebKit trunk from 2011-05-05
(r85855).

There is also PyQt4.QtWebKit.qWebKitVersion().

If you have the Qt source at hand, check src/3rdparty/webkit/VERSION.

~~~
wahnfrieden
Thanks

------
albertzeyer
I have done this somewhat more directly with PyQt4. I wonder wether it's
useful to have this layer in between. Whereby it seem to have some useful
tools. This is the main code:

* [https://github.com/jeanphix/Ghost.py/blob/master/ghost/ghost...](https://github.com/jeanphix/Ghost.py/blob/master/ghost/ghost.py)

* [https://github.com/jeanphix/Ghost.py/blob/master/ghost/utils...](https://github.com/jeanphix/Ghost.py/blob/master/ghost/utils.js)

If you are interested, this is some own code where I just use PyQt4 directly:

* [https://github.com/albertz/google-books-export/blob/master/g...](https://github.com/albertz/google-books-export/blob/master/google-books-export.py)

~~~
jlarocco
I once used a similar technique to mass download some elevation data files
from the USGS website.

I forget the exact details, but fetching the URL just kicked off a job on
their server and returned some Javascript to execute. The Javascript did
"something" while the data was being fetched/processed on the server, and
eventually decided when it could start the real download.

I spent a while trying to figure out the Javascript, but finally came up with
the PyQt/WebKit approach.

It's the ugliest download code in the world, but it's up on GitHub:
[https://github.com/jl2/GIS-
Stuff/blob/master/map_download/ne...](https://github.com/jl2/GIS-
Stuff/blob/master/map_download/neddown.py)

I'm not sure how useful something like Ghost would have been. I was basically
using it as a glorified urllib.request, though, and it doesn't look like
that's the main use case for Ghost.

------
candeira
I just tried to use dryscrape [1] for a project. It's great when it works, but
it's not liberal enough in what it accepts [2], so it gives off showstopping
InvalidResponseErrors (which make sense when the library is using for BDD, but
not when you are using it to get at javascripty download links).

This ghost.py looks great, I'll give it a go after dinner tonight.

[1] <https://github.com/niklasb/dryscrape>

[2] <http://en.wikipedia.org/wiki/Robustness_principle>

[3] <https://github.com/niklasb/dryscrape/issues/6>

------
fpp
Could someone describe the key differences to Phantomjs (indirectly referred
to in credits via Casper.js)

Phantomjs has just recently stopped their Python support.

~~~
zackzackzack
Phantom js has to run as it's own process, so no support for node or anything
similar. This looks like it can run within something like django.

~~~
fpp
You can use PhantomJS with Node e.g. with child_process and messages via
stdout. Works pretty well. Running this in its own process context might
actually be a benefit e.g. when you spawn multiple phantomjs browsers

There is a project on github where this was taken a step further via dnode (
<https://github.com/sgentle/phantomjs-node> ) - you get access to the
phantomjs objects, get/set properties and access the phantomjs api methods.

Will certainly have a deeper look into ghost.py

------
daGrevis
Is it possible to run jQuery inside Ghost.py to parse HTML with it?

~~~
schwuk
Why not use pyquery (<https://bitbucket.org/olauzanne/pyquery/>)? Plays nice
with WebTest (<http://webtest.pythonpaste.org/>) and django-webtest
(<https://bitbucket.org/kmike/django-webtest/>).

------
jMyles
To use with django's LiveServerTestCase?

------
chrishacken
Looks pretty useful. Def. going to play around with this a bit.

