
PhantomJS is a minimalistic, headless, WebKit-based, JavaScript-driven tool - aheilbut
http://code.google.com/p/phantomjs/
======
gojomo
Cool!

But, have security issues been considered?

It looks like a Javascript thread-of-execution that can write to local
filesystem paths (to render graphics, at least) may call into network-loaded
DOM/code. Is there any assurance that page contents can't discover and use
phantom API operations? (Or perhaps read local 'file:' URIs?)

~~~
kj12345
Wow, that's a good insight, seems like a possible attack vector. Maybe
something like Adobe Air's security model of putting local access and network
access in different iframes with a message-passing API between them would
work. That's always felt like a bit of a hack to me but at least the
separation between different frames in WebKit has been well-tested.

------
wccrawford
I wasn't excited about this until I realized it could be used to render a
webpage to an image with a headless browser. I'm pretty happy about that.

Edit: The code for that is on
<http://code.google.com/p/phantomjs/wiki/QuickStart>

~~~
stevejohnson
Yep, now I can finally write that Webkit-based book layout software I've been
wanting!

~~~
stevejohnson
Wow, I've never had to do this before...could the downvoter explain
him/herself?

------
hinathan
This is really slick. It's tempting to think of this as a potential route
towards headless functional testing — probably no substitute for something as
heavy as selenium but for light tasks and DOM-based inspection of returned
payloads (and screenshots, baked in!) it seems like a plausible foundation.

~~~
ehsanul
Also note these other headless javascript testing frameworks:

Zombie.js: <http://news.ycombinator.com/item?id=2038663>

HTMLUnit: <http://htmlunit.sourceforge.net/>

Celerity (JRuby wrapper around HTMLUnit): <http://celerity.rubyforge.org/>

~~~
xtacy
I think the main differentiating factor between phantomjs and the above
mentioned ones is that phantomjs actually uses WebKit and has full support for
everything that WebKit supports. Zombie.js uses the DOM API due to jsdom,
which might have its own parsing intricacies.

~~~
regularfry
HTMLUnit has its own share of parsing gotchas.

I've used Selenium with Firefox and Xvfb to do headless scraping in the past;
looks like my toolkit just got simpler.

------
helper
I'm sorry but I don't consider an application that must be run under a
windowing environment to be "headless".

While this cool, its really not that different from projects like
<http://code.google.com/p/wkhtmltopdf/>.

If you really want a headless webkit browser you would need to write a new
webkit port to a graphics library that doesn't require a windowing system
(maybe cairo).

~~~
epochwolf
What's the cost of running an Xserver in the background? 20~30mb?

~~~
blago
What happens when Xserver is not installed and you don't have the permissions
or don't want to install it?

------
ivan_ah
the announcement blog post: [http://ariya.blogspot.com/2011/01/phantomjs-
minimalistic-hea...](http://ariya.blogspot.com/2011/01/phantomjs-minimalistic-
headless-webkit.html)

------
xtacy
Brilliant! "Just 250 lines of Qt and C++" shows how good Qt/WebKit are.

------
olalonde
What is meant by "headless"?

~~~
spicyj
Presumably you don't see the WebKit window or anything like that; it runs
invisibly and doesn't require any window server.

------
jontas
Anyone know if this could be used for generating heatmaps? I'd basically need
to identify the x,y offsets of elements on a page. I realize that these can be
effected by the browser's width/height, but I'm hoping I can set those to
generate the data.

------
smilliken
This might be useful for html sanitization. You can allow anything as input
(including scripts, styles, etc), render it to a page in phantom, apply your
whitelist on the effective DOM, and render it out as output. Of course, this
might be resource intensive, and you have to be careful about phantomjs being
sandboxed and having cpu/memory/timeout limits. The nice thing about this
though is you that 1) you get your serializer/deserializer for free, 2) it's
very forgiving on malformed input, 3) the output is WC3 valid since webkit
corrects the DOM, and 4) you can support styles and scripts that affect the
DOM.

------
ollysb
Have been trying to find a solution to headless js testing in cucumber that
doesn't suck or need java. A driver built on top of this would rock!

------
rb2k_
In case somebody wants an OSX binary but doesn't want to download the
development tools and qt: <http://blog.marc-
seeger.de/2011/01/26/phantomjs_osx_binary>

(Indirection over my blog in case I need to switch the file away from my
webspace)

~~~
rb2k_
Ignore this, I thought it compiled a statically linked version... it didn't
and I really suck when it comes to compiling c :(

~~~
mnutt
Yeah, I had a really hard time getting qtwebkit to statically link on a mac as
well. I think it may not be possible.

~~~
andrewf
Not possible according to the docs: [http://doc.qt.nokia.com/4.7/developing-
on-mac.html#building-...](http://doc.qt.nokia.com/4.7/developing-on-
mac.html#building-qt-statically)

------
gregwebs
I would really like to see a comparison between this and other options.
(rhino, envjs, htmlunit).

~~~
blago
They are completely different things. Rhino is a JS engine. envjs as a script
that creates a mock window object and can run in an engine like Rhino.
Htmlunit comes close, implemented in Java, tries to SIMULATE some popular
browsers. PhantomJS is... Webkit.

------
blago
Not headless, many other projects, many other ways to do the same even without
programming/compiling. I'm still longing to see a true headless browser that
renders to Cairo or something else.

------
tworats
The first two applications that come to mind are automated testing and screen
scraping, I'm a little surprised they didn't include examples of those. Looks
interesting though.

~~~
buddydvd
Automated Testing:
[http://code.google.com/p/phantomjs/wiki/ServiceIntegration#J...](http://code.google.com/p/phantomjs/wiki/ServiceIntegration#Jasmine_Driver)

Screen Scraping:
<http://code.google.com/p/phantomjs/wiki/QuickStart#Rendering>

------
z92
I can't find the example which generates PDF of the wikipedia page, as
described on the front page.

~~~
Knacker_Hughes
It's on <http://code.google.com/p/phantomjs/wiki/QuickStart> about three
quarters of the way down the page:-

phantomjs rasterize.js
'[http://en.wikipedia.org/w/index.php?title=Jakarta&printa...](http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes)
jakarta.pdf

------
wicknicks
How is this different from node.js?

~~~
rb2k_
This is a headless browser using webkit as a rendering engine.

Node.js is an asynchronous I/O framework

~~~
wicknicks
This is great indeed! I just tried it out. For people who are running Ubuntu
10.10, and have Qt < v4.7 can change line 34 and comment line 164 (to atleast
test out the examples on the website).

------
retube
So this is a headless browser with a javascript api? Sweet.

------
toisanji
I'm going to try this to do some web page scraping.

