

Phantompy - Headless WebKit engine for Python, like phantomjs - juanriaza
https://github.com/niwibe/phantompy

======
fauigerzigerk
I wonder if Phantompy is truly headless so it works on servers without GUI
libraries and without xvfb.

QtWebKit needs access to some GUI features out of the box, but phantomjs
includes platform abstraction code that avoids the dependency on an X server.
That's the feature that made me migrate my own C++ crawler code to phantomjs.

~~~
kanzure
Actually, you don't even need xvfb these days (and I have no idea why anyone
is telling people that they do). You could use something like xserver-xorg-
video-dummy. Of course, as you have pointed out, PhantomJS doesn't require an
X server (as opposed to these other projects, wtf) so that it is "really
really truly headless".

------
shadowmint
OT I know, I cant help but feel a little smug when I see cool projects like
this and they use cmake, and not autotools.

Thank you cmake, you've made working in the C ecosystem fun again.

~~~
randallu
CMake is pretty horrifically inefficient at compiling WebKit. The makefiles it
generates fork to call CMake (to do things like printf in color) far more than
they fork to call the compiler. The ninja generator is better, but it has
various issues with long command lines on WebKit currently (trunk CMake fixes
some of these, but then it doesn't make some directories in the output
directory).

Also, the CMakeLists syntax is pretty nasty IMO, but it feels like gyp, cmake
and autotools are in some kind of syntax ugliness competition... (Maybe
because nobody ever wants to work on a build system, so they'll just do this
one little hack which inevitably grows tentacles).

------
boothead
Is there anything wrong with the python bindings to phantom? I used phantomjs
(it's great)a few years ago and I thought there were some python bindings to
the original phantom C++ library - could be wrong though.

~~~
Herald_MJ
There _were_ a set of Python bindings for phantom.js named PyPhantomjs, and it
was included in actual phantom releases, but the maintainer moved away from
the project, no-one stepped forward and it was ultimately removed from the
project (<https://github.com/ariya/phantomjs/issues/10344>).

So this project is really useful.

~~~
kanzure
> no-one stepped forward and it was ultimately removed from the project

Here's the source code, if you are interested:

<https://github.com/kanzure/pyphantomjs>

------
woadwarrior01
Nice. There's also <https://github.com/jeanphix/Ghost.py> which I remember
using a couple of months ago.

~~~
killahpriest
Ghost.py seems promising but is painful to work with. Installing the
dependencies took about an hour (compiling PyQt itself took about 25 minutes).
When we did finally get it working, PyQt would crash after the third or fourth
scrape and would take between 15 to 30 seconds per scrape. Where we used it:
[https://github.com/createch/PriceChecker.py/blob/master/pych...](https://github.com/createch/PriceChecker.py/blob/master/pychecker/scraper.py).

My advice to anybody looking to do headless webkit in Python: Don't use Ghost.
Try out PhantomPy and try using the Selenium webdriver for Phantom.

We switched over the Selenium webdriver for PhantomJS and found it to be much
more stable and fast. See this SO answer by another Ghost.py user who gave up:
<http://stackoverflow.com/a/15699761/854025>. Where we used the Selenium
driver:
[https://github.com/createch/PriceChecker.py/blob/phantom/pyc...](https://github.com/createch/PriceChecker.py/blob/phantom/pychecker/scraper2.py).

~~~
arikfr
Just to give another perspective: I'm not sure on which OS you had those
issues, but I'm successfully using Ghost.py on Ubuntu and OS/X. On OS/X it was
harder to install PyQT and sometimes there were weird issues, but on Ubuntu it
all went smooth.

By now I've ran it on thousands of different websites with little issues. It's
true that I'm "recycling" the processes after ~10 sites, but that's mainly
because of memory leaks.

EDIT: and to clarify on Ubuntu installing either PyQT or PySide (Ghost.py now
supports both) was as easy as apt-get install...

~~~
killahpriest
I tried (but failed to) get it to run at all on OS X. We were able to get it
to run on Ubuntu, but thats where it crashed after every three to four
scrapes.

~~~
arikfr
When was it?

~~~
killahpriest
About three weeks ago.

------
encoderer
There are so many half-baked headless browser projects, many building in some
way on PyQT. I've made heavy use of Spynner and Ghost. Spynner especially. It
always seemed to offer _exactly_ what I needed but the inconsistent docs and
endlessly-deep stack always seemed to leave successful implementation just
beyond my grasp.

Now, as an OSS maintainer myself, I sympathize. But I'd love it if this or
another one of these projects started breaking away from the pack and could
build some critical mass. I'm all for democratization but in this case I'd
gladly trade 10 mostly working projects for a single solution ran by an iron-
fisted BDFL if it meant decent docs and a mostly stable API.

------
_seininn
I've been working on a project like this that's nearing completion (it works,
and the api is mostly complete). It differs from this in that it uses any
normal browser to fulfill requests (via extensions).

The idea is to have a pythonic API to any browser that has reasonable support
for extensions; i.e. a cross-browser scripting.

~~~
laurencerowe
What does it offer over Selenium Webdriver? Yeah, the API is not terribly
pythonic, but it's battle tested and stable, works across almost all browsers
and there's always wrappers like Splinter (<http://splinter.cobrateam.info/>)
if you prefer a more pythonic interface.

------
acron0
For those of us on the outside, what's the typical use cases of a headless
browser engine? Testing?

~~~
meerita
We use it to generate images from a website. We do a product with HTML5/CSS3
and webfonts, layout it all then "screenshooting it" to make a final image
with the proportions we want. It's cool because its server side and we can
generate endless images based on what the user want.

Addendum: If you want to use webfonts, specially from Google, you need to
recompile Phantom to support WOFF.

~~~
DenisM
Can you capture PDF that way, or is it only png?

~~~
meerita
To do PDF, you better install HTML->PDF command line tool than that. What this
do is simply load a webkit browser and we just perform an screen of that.

~~~
DenisM
Printing works differently in different engines, so I want the designer to use
her HTML / chrome skills to design templates, and then render the PDF with
users data using the same engine. Command line HTML to PDF tool will work too,
but it will require the designer to relearn engine quirks.

------
salimmadjd
OT - I've been looking for a headless WebKit for Go (Golang) I've searched all
over with no luck. If anyone wants to become famous, please make one :)

------
rkrzr
Does this actually offer full DOM access?

I see a 'cssselect' in tests.py, but I wonder how much of the DOM Api is
actually supported?

~~~
jessedhillon
Take a look at lib/webelement.hpp -- there's not too much yet, notably absent
is a way to iterate children. But you can evaluate Javascript.

------
hackerboos
Cool you can take screenshots with it, see the test_capture_page example.

------
druska
Will this render Flash content?

------
neo2001
Looks phantastic

