PhantomJS: Stepping down as maintainer

franciscop · on April 13, 2017

IMO it'd be a really smart move for Google to hire Vitaly to help with the launching of this feature and things around it. He has done a great job with PhantomJS.

Even an acquisition[1] of PhantomJS would totally make sense, then let him keep working on it but based on headless Chrome and with real resources.

[1] Careful how you spin it for this route, learn from TJ/Express https://medium.com/@tjholowaychuk/strongloop-express-40b8bcb...

krzyk · on April 13, 2017

Could you elaborate on the TJ/Express? The post you provided looks like the last one of series of events.

franciscop · on April 13, 2017

Yes, there was some controversy in the Node.js world because he sold the project of Express to StrongLoop just as he was leaving for Go. I think it was totally a problem on how things were communicated and that he had really a good intention. What didn't help much though is that StrongLoop then left the project unattended so the main person contributing to it (Doug Wilson) wasn't even acknowledged.

So if an open source project is being sold:

- Make sure everyone knows it's because it'll be better handled and not because of the money.

- At least get a decent amount of money! According to TJ it was half a month worth of money.

- If you are the company buying it don't leave it unattended afterwards.

Anyway, now TJ is surely remembered with a lot of gratitude in the Node.js community and I'm so happy he helped it grow so much!

This is all better explained here (and read the comments for TJ opinion): http://thefullstack.xyz/history-express-javascript-framework...

stevenspasbo · on April 13, 2017

There was a small controversy when StrongLoop purchased the Express repository from TJ, the creator of the project.

why-el · on April 13, 2017

This is an excellent idea.

cr0sh · on April 13, 2017

Yeah - I like that idea too.

I played around with PhantomJS for something at my current job - it ultimately didn't work out for us (and we went a different route), but it was interesting and fun to learn about.

d--b · on April 13, 2017

Wow, I'm very impressed. At this stage it is a very wise decision to step down and to focus onto something else, rather than to hold on to a project that will eventually disappear. It takes a lot of courage to move on from a project that had to be maintained for several years and that had such a reach.

We can only be thankful for all the good work that went into PhantomJS, and wish the maintainers the best of luck in their next endeavors.

Cheers!

Touche · on April 13, 2017

Some context, he's essentially been maintaining a web browser (which is a project on the level of an operating system) on his own.

Phantom 2 switched to QTWebKit which I'm sure was a tremendous amount of work. Probably at the end of that he was hoping things would get "easier", and it sounds like it hasn't. It's just too much work for one person and if companies aren't will to pay people to do it, I'd quit too.

moxious · on April 13, 2017

He says in his message it's been a slog for some time, looks like a good time to be done with it. Open source is great, we all like it, but demanding unpaid jobs can get old, too.

yebyen · on April 13, 2017

And it makes sense. You want Chrome in your tests if your users are using Chrome. Very few (if any) of your users will ever visit your app with a headless PhantomJS browser, so it's not a platform that you should go out of your way to support.

I've been using Selenium Driver with Chrome in xvfb for my headless testing needs, and I've used PhantomJS for some automation things in the past where it was great, but since I switched I really haven't looked back!

I had things breaking subtly that I couldn't fix, and they did not manifest as problems in the Chrome browser or Selenium. I still don't know what was wrong, I just know that my Rails app won't pass its JavaScript functional tests if I use PhantomJS. When I did the evaluation of 3 test drivers, I found that the one with the actual browser in it was the one that worked most reliably.

moogly · on April 13, 2017

Thanks so much to the PhantomJS maintainer for his hard work over the years! To me, it feels like his decision is the correct one here.

After we realised that we hadn't seen a "one-browser" bug for 2 years in our massive angularjs app, we got rid of all browsers but PhantomJS in our karma suite. PhantomJS' slowness, lagging-behind in webstandards and just my general gut feeling (these facts above made me question the point in running JavaScript tests in an actual browser at all) made me port our karma test suite to jest w/ jsdom and haven't been happier since we, years prior, got rid of our gnarly Selenium test suite that caught 0 bugs but was the major cause for maintenance headache.

forgotpwtomain · on April 13, 2017

> After we realised that we hadn't seen a "one-browser" bug for 2 years in our massive angularjs app

Really? When I was writing JavaScript I thought I'd hit one ~ once a month. As a matter of fact I hit a v8 bug just yesterday which apparently doesn't support /Etc/GMT[-+][0-9]{0,2} timezones.

thom_nic · on April 13, 2017

I actually started using webdriverIO + chromedriver after fighting too much with casperJS - while webdriver (and Selenium) seem to have much more momentum there are still some things I really miss that PhantomJS gave when using Capybara. I was a very happy Capy+Phantom user when testing my rails apps.

Things like reading the HTTP response code, detecting 404s on assets and catching JS errors in the console are all not possible with Selenium/ Webdriver, and I relied heavily on those capabilities in my Capybara tests.

While headless chrome might be able to replace PhantomJS for many use cases that doesn't necessarily mean the APIs will be comparable. In fact I'd more likely expect the Chrome folks to say "the webdriver API is it, because it's a standard." [1] Sadly IMO it's lacking compared to what PhantomJS was capable of.

[1] https://www.w3.org/TR/webdriver/

erik · on April 13, 2017

When I looked into it, I was surprised by how hard it is to set up basic smoke tests for web development, and that Selenium / Webdriver can't do this.

cookiecaper · on April 13, 2017

You can use PhantomJS as the backend on a Selenium script, and this news clearly demonstrates the utility of a higher-level API than using PhantomJS directly. If your tests are in Selenium, changing backends is generally a small matter.

I started writing in Selenium instead of CasperJS (a PhantomJS frontend) because PhantomJS experienced intermittent bugs on the page I was trying to access. I think you're right that "real browsers" are still much more reliable for complex use cases, but the low profile of PhantomJS is definitely nice when it works.

holtalanm · on April 13, 2017

I used Phantom once upon a time, but I eventually switched to just using xvfb as well within a docker instance. Headless Chrome and Firefox with xvfb.

avaer · on April 13, 2017

Open source software has to be one of the least efficient markets out there.

If you sum up the very real value PhantomJS has delivered to very real companies over the last several years, napkin math tells me we wouldn't have the project being abandoned for being a "bloody hell" to work on.

Cthulhu_ · on April 13, 2017

The main problem for developers is that getting your company to pay for something like developer tools can be very hard and a long project. Ideally every team gets a budget and a credit card to do as they see fit, but in practice a lot of (especially bigger) companies have a whole acquisition process. It's not uncommon for the hours spent in getting a license for e.g. an editor to be far more expensive than the product itself. I believe this is often highlighted again when Sublime Text is in the news again.

This as opposed to open source tooling which has no such hurdles.

taude · on April 13, 2017

Why do you think open source tooling has no such hurdles? At these same "larger" companies, they're going to have an open source review board, and legal involved for each open source product you want to use.

Just because you can download and use it on your local machine, doesn't mean you're not violating your corp policy and procedure.

hackits · on April 13, 2017

Whats more surprising is the company I work for does have a support agreement with Oracle, although when we've run into problems that do need a technical expert from Oracle such as crash dumps and stack traces we're told to figure it out our-self.

Kind of defeats the whole reason to have a closed source system when you're still on the hook and charging out $500 per hour to clients.

nulagrithom · on April 13, 2017

This problem was well demonstrated at my shop the other day. There's a lot of fear of open source here. We needed to run a simple FTP process. I said I'd write a short script.

"You've gotta be careful with those free tools though... Never know what will happen when they break. You can't get any support. Besides, we need logging and alerting too."

"Oh, $currentTool does logging and alerting?"

"Well, yes... But it's currently not working. We've got a ticket in to fix it."

That was months ago. Yesterday it broke and there was no alert. There's also still no logs.

Gotta be careful about that "open source" stuff though.

lurker456 · on April 13, 2017

Open source doesn't allow you to pass the buck. Commercial software with support contracts does. There is also the "look how much we're paying! it must be good" factor.

brazzledazzle · on April 14, 2017

Many companies are paid handsomely to take the blame.

taude · on April 13, 2017

They're also afraid of the abomination you quickly scripted together that the future-you, who no longer works for them, then has to try and figure out.

nulagrithom · on April 13, 2017

ffs it's just FTP. All it needed was a scheduled task that called winscp pointed to a file containing a handful of flat FTP commands. If the guy who replaces me can't sort that out, how's he going to sort through my code?

Besides, now it's some abomination wrapped inside a proprietary program that doesn't work right in the first place, only this time we're out the $x,xxx licensing costs and the entire process is opaque, with absolutely zero hope of sorting it out on our own. That's not a huge gain...

jonny_eh · on April 13, 2017

and then you can't search stackoverflow for help, compared to using something like postgres

brazzledazzle · on April 14, 2017

Is it fair to say that kind of people that want to pay for an SFTP client for an automated process probably aren't doing much digging around on stackoverflow?

nailer · on April 13, 2017

A good way to address this is have a tools budget.

Product Manager has $X per team member. Is encouraged to coordinate purchases and pool purchases together for licenses. Licenses can be proprietary or Open Source licenses X, Y and Z.

azernik · on April 13, 2017

It's a classic "public good" - an economic activity whose benefits are impractical or unworthwhile to deny to those who don't pay. Things like emergency services, last-mile road infrastructure, and environmental protection work. A special-case of positive externalities. These are a classic example used by economists of a natural place for government in the economy.

IshKebab · on April 13, 2017

Which makes me wonder why governments don't give more grants for writing open source code.

They do it for research, which is another thing that would otherwise probably not pay for itself.

I guess if they started doing it a lot of commercial companies would complain to them about the competition.

oever · on April 13, 2017

Dutch government has explicitly funded quite some projects, e.g. libressl and libreoffice, but compared to budgets for closed source, it's still very little.

Here's a nice blog about how the UK deals with this.

https://governmenttechnology.blog.gov.uk/2016/12/15/next-ste...

As to infrastructure work, easiest way to help and still profit is to hire one or more developers that explicitly work on a set of FOSS libraries. That way you have that knowledge in-house and a connection into the community. Also, you'll have some highly motivated employees.

earthboundkid · on April 13, 2017

I think that in the US, we've been fighting for some long about how much to cut government, the idea of proposing to create a new category of spending just doesn't occur to Democrats. The closest they can come to imagining something new is free college (which is the same what we have now [free high school, college scholarships] but more so) and free daycare (which is like free preschool but younger).

mobilefriendly · on April 13, 2017

You're kind of in a bubble, there is a lot of U.S. government support for open source and most agencies use it. See https://code.gov for a starting point. There's never going to be a huge multi-billion dollar grant program because it is unfair to closed-source companies (who pay taxes) but you are imagining a debate or hangup that doesn't exist.

misingnoglic · on April 13, 2017

Why is it unfair? Those companies presumably charge for their software, and maybe will be incentivised to go open source.

paulddraper · on April 14, 2017

It's unfair because those companies can't extract their required funding at gunpoint.

philipkglass · on April 13, 2017

At some point the falsifiable hypothesis "government cannot provide goods or services better than private actors" mutated into the dogma "government shall not provide goods or services better than private actors."

acdha · on April 13, 2017

Governments do fund some things directly – most commonly grants to specific interests – but one other area which helps is allowing staff to work on open-source projects. At least in the United States civil servants’ work is generally considered public domain so we don't have to deal with the IP concerns which many companies still obsess over, which is nice.

If you poke around https://government.github.com/community you'll find a lot of government created projects but checking those organizations/ contributors will often turn up a ton of forks of popular tools. One common thing is improving security defaults or accessibility, which are tedious but mandatory for government.

If you value this, make sure to let your elected representatives know: I'm sure they hear from the major contractors regularly.

westoque · on April 13, 2017

Or companies for that matter. I know other companies that have an open source division like AT&T labs.

atemerev · on April 13, 2017

The question is reversed: open source software _is_ a positive externality, but happened almost always _without_ any government involvement (save for some rare exceptions like SELinux).

This is probably the reason it worked so good.

So, do we necessarily need governments for other positive externalities in the list?

azernik · on April 13, 2017

The comment above mine was mentioning chronic underinvestment; I would say that this indicates the current system doesn't work.

mycomian · on April 13, 2017

It's absolutely not a necessity, but more incentive for writing OSS could be nice.

chimeracoder · on April 13, 2017

> It's a classic "public good" - an economic activity whose benefits are impractical or unworthwhile to deny to those who don't pay.

Open source software is not a public good, in the economic sense. There are two criteria for being a public good: non-rivalry and non-excludability. Open-source software satisfies the first criterion (my using it doesn't prevent you from using it), but it's fairly excludable (I can prevent you from using it legally).

As developers, our instinct might tell us that it's not excludable because "if the source is there, nothing prevents me from using it", but when we're talking about goods which fall under copyright law, the legal aspect matters as well as the practicality. And in fact, open-source licenses (such as the GPL and the Apache licenses) can contain provisions which prevent people from using the licensed software under certain circumstances, while still being considered both free and open-source by the FSF and OSI respectively[0]

The real classic example of a public good is national security. Practically, there is literally no way that national security can be applied to people within a country on an individual basis, as opposed to a geographic one. For most threat models (e.g. espionage, (counter-)terrorism), the mitigations are things like "prevent terrorist attacks from happening". You can't apply the benefits of that only to people who have paid for the service - a terrorist attack either happens or it doesn't, and you can't choose who's a victim of it.

> These are a classic example used by economists of a natural place for government in the economy.

Even for things which are actually public goods, like national security, that's overstating the case greatly. Public goods are used as an example of a good for which an individual market cannot exist, but that doesn't mean that the only alternative is a government one.

The so-called "tragedy of the commons" is an appropriate (and ironic) example - despite the way that most people use the term, the town commons was actually something for which there were plenty of well-established codified rights, and these were not always negotiated or enforced by a government entity.

[0] For example, the Apache license contains a patent retaliation clause, which terminates your right to use the software in the event of a patent lawsuit. (Technically it doesn't revoke your right to the copyrighted code, but it does revoke your right to the underlying patents, which amounts to the same thing, because presumably the copyrighted code utilizes the underlying patents, or else it wouldn't be covered by the license in the first place).

azernik · on April 13, 2017

Non-excludability isn't just about the law; it's about practicality. It might in some system be legal to prevent fire services from putting out fires in houses that haven't paid for it, but that would be an impractical system.

Similarly, critical software intended for developers is impractical to deny to them; doing so has some serious negative effects on the production process (hard to get good feedback/PRs from customers, for example, unless you expose the source to them to a level that closed-source manufacturers find dangerous to their business model).

WRT the original commons, you can make such things work in a tight-knit community that can enforce social norms (which, by the way, would take on a lot of what are now considered "government functions" in a modern society), but in a larger capitalist economy with actors that aren't inside the community, the government is the only actor that has the authority to enforce public ownership of the commons.

dwaltrip · on April 13, 2017

Can we get non-excludability by requiring that any OSS produce with public funds have a license that prevents such possibilities?

I'm imagining a clause like: "this source and any modifications is irrevocably eligible for use by all, provided that its creation did not break any other laws".

cookiecaper · on April 13, 2017

Yeah, I really think that open-source software shot itself in the foot by incorporating unlimited free sharing for every recipient into its mantra. Now everyone thinks that open-source has to mean impoverished, because despite all the happy vibes, very few people will pay for software that they could otherwise get for free.

You can make your software "source available", i.e., not open-source under activist definitions but still have a GitHub repo and all that, and restrict [heavy?] commercial use. I think it'd be interesting to see more open-source devs take that route and stop giving away the farm.

This will still allow people to use your stuff, developers will get familiar with the tooling and expect to be able to use it at work, and companies that have the dough can be compelled to pony up for a license.

On Windows, there is still an underappreciated market for cheap early-90s-shareware-style applications that are < $100 a pop, but I think most of them think that sharing the source means they have to enter the poor house, which is sad. We should show people that there's a way to share your source without bankrupting yourself in the meantime.

The GPL almost gets there, as it makes large-scale commercial use undesirable due to its infectious nature, which allows for dual-licensing, but with everything server-side nowadays, those stipulations are much less effective (have to go AGPL).

egeozcan · on April 13, 2017

There's no shared source license which bans commercial usages but I think the software industry desperately needs one. I'm selling the software and want to enable the user to make modifications for private use and even share those modifications if they so desire = I have to hire a lawyer and hope he comes up with something that stands the test of a trial.

Not giving away something for free (as in freedom) but asking for free legal advice may sound ironic but it's not about the money, it's about having something reliable for a very common use case.

x0x0 · on April 13, 2017

That's typically the purpose of AGPL plus a dual license offering for cash. This will probably require copyright assignments to make it work tho

alphapapa · on April 13, 2017

I can't help but think that if the Linux kernel were under a CC-NC-BY-SA-type license, as you suggest, that it would be practically unheard of today.

BTW, please mind the distinction between open-source software and Free Software. You don't have to like the FSF, but the distinction they recognize is important.

cookiecaper · on April 13, 2017

>I can't help but think that if the Linux kernel were under a CC-NC-BY-SA-type license, as you suggest, that it would be practically unheard of today.

I don't suggest that the Linux kernel be distributed under non-commercial terms, and I agree with you that such a project wouldn't have done well.

>BTW, please mind the distinction between open-source software and Free Software. You don't have to like the FSF, but the distinction they recognize is important.

Right, so there are 3 levels of "purity" here. For the record, I didn't run afoul of any of them; I intentionally distinguished my suggestion as "source available", not "open-source".

There is "Free Software", which is software meeting Stallman's "Four Freedoms".

There is "Open Source", typically referred to as the improper noun "open-source", which activists insist refers solely to license approved by the OSI (Open Source Initiative). Because these include permissive licenses, the FSF considers them potentially non-free, and makes the point that Open Source isn't good enough; it must be Free Software.

Then there is "source available", insisted upon by the OSI people, to indicate that while you can download and modify the source, it is not distributed under copyright terms they like. This would be source distributed only for non-commercial use, for example. Jef Raskin's Archy project (apparently now dead) [0] was distributed under CC BY-NC-SA and made this distinction.

[0] https://en.wikipedia.org/wiki/Archy

VeejayRampay · on April 13, 2017

No, companies that make billions of dollars in an ecosystem that benefits from said tools and don't pay a dime for the public good are to blame. It's not a problem with open source, it's a problem with a culture that takes and doesn't give back enough.

edoloughlin · on April 13, 2017

Open source software has to be one of the least efficient markets out there

Does it have to be a market?

spiderjerusalem · on April 13, 2017

Amen. Somehow a lot of comments in this thread are blinded to the fact that the 'commons' are not simply subsumed within the 'market'. The logic of commons is quite orthogonal to that of markets.

sangnoir · on April 13, 2017

> Open source software has to be one of the least efficient markets out there.

The regular market rules don't really apply to Open source software. A lot of viable (or even thriving) open source projects would be dismal failures as stand-alone businesses or startups. Paradoxically, the only way they can provide real value is in their current form of open source projects.

I would have given CyanogenMod as an example, but the amount of inept management at the startup there would cloud the issue.

0x445442 · on April 13, 2017

Hopefully the author will be able to parlay the street cred generated from PhantomJS by selling future employers or customers on the same napkin math.

hpaavola · on April 13, 2017

Also Firefox will get headless mode in few releases https://bugzilla.mozilla.org/show_bug.cgi?id=1338004

imr_ · on April 13, 2017

thank god. I was afraid we will be back in the grasp of the google's moloch.

tyingq · on April 13, 2017

Curious what "moloch" means in this context. I can google it, of course, but I get "Biblical name relating to a Canaanite god associated with child sacrifice". Which doesn't help me much. End users are children that Google is sacrificing? Or?

philh · on April 13, 2017

To give a shorter answer: Scott Alexander (at the slatestarcodex link), through the poem, associates Moloch with negative-sum games, where no one comes out better than they went in. In extreme cases they force us to sacrifice the things we love in order to survive. You throw your children to Moloch to help you defeat enemies; otherwise, you die. Your enemies do the same thing. It would be better if nobody sacrificed their children, but nobody is in a position to bring that outcome to pass.

In this context, I would interpret "google's Moloch" along the lines of: Google is net-bad for the world, because of privacy issues and problems with centralisation and so on. Using Google's software (and services) makes them more powerful, so people don't want to use Google's software. But because everyone else is using Google's software, the world is optimized for Google users in a way that it isn't optimized for non-Google users, and so it's difficult to escape. And so Google grows yet stronger, and it becomes more difficult to escape.

(To clarify: this is my interpretation of grandparent's use of the phrase. It's not my own position, and there's a decent chance that I'm completely off-base and it's got nothing to do with grandparent's position either.)

pvg · on April 13, 2017

It's a term that's been kicking around literature for centuries. It's in Paradise Lost, for cryin' out loud. Why on earth would its use be a references to some random prolix internet dude's weird verbal recreation of the La Brea Tar Pits?

philh · on April 13, 2017

Like I say, I don't know that it is. I presented a hypothesis that seems like it fits the facts fairly well.

Do you have another hypothesis about what the term means in context? I note that "it's a reference to Paradise Lost" is not very descriptive: for example, if I talked about "Google's Frodo" you might ask what I mean by that, and "he's a character in Lord of the Rings" does not answer the question.

pvg · on April 13, 2017

I don't really have to have a hypothesis, it's not that uncommon a term. It's used for all sorts of things including a fairly generalized 'insatiable and demanding metaphorical monster'. If someone offhandedly mentions Icarus it seems reasonable to assume they're not really alluding to a review of the Hungarian brand of buses posted to alt.rec.bus in 1991.

Take a look at, say,

http://www.nybooks.com/search/?s=moloch&option_match=&year_a...

Lots and lots of Moloch. Your hypothesis is 'it's a reference to some logorrheic blogger'. Sure, it's possible you're right but it's one hell of a weirdly specific guess. It's not like the three people to ever mention Moloch were Milton, Ginsberg and internet-man-addicted-to-his-own-typing.

tyingq · on April 13, 2017

Well, there is more than one social allegory related to Moloch. Many of them have no obvious relationship to Google. Which is why I asked the question in the first place. So people tried to be helpful and identify any sources that might be related.

I just read the character description for Moloch in Paradise Lost, and don't see any obvious theme you might tie to Google.

In fact, I'm still not quite sure what the original comment was trying to say. Apart from some fuzzy notion that capitalism is sort of like Moloch and we keep feeding it with our "children". Where children is what? Privacy? Money? Open source tools?

hluska · on April 13, 2017

I don't think that "Google's moloch" was proper usage, but in literature (ie - Howl by Allen Ginsberg), Moloch refers to something requiring a very costly sacrifice. In Howl, many critics argue that AG is referring to capitalism when he uses Moloch.

Pent · on April 13, 2017

They are probably referring to http://slatestarcodex.com/2014/07/30/meditations-on-moloch/

fnovd · on April 13, 2017

https://www.poetryfoundation.org/poems-and-poets/poems/detai...

Heading II

corford · on April 13, 2017

And while people wait, you can already do a 'poor mans' headless Firefox thanks to SlimerJS and xvfb.

Phantomjs is less resource heavy if you're constantly spooling up and down lots of instances but I prefer SlimerJS w/ Firefox since it lets you keep up to date with a modern version of Firefox (rather than relying on sporadic QTWebkit updates from Phantomjs).

If you're using Casperjs, SlimerJS is virtually a drop in replacement for Phantomjs (though I worry about how long/well Casperjs will continue to be maintained).

kozhevnikov · on April 13, 2017

One can use Xvfb to run normal Chrome as well (with few gotchas like --no-sandbox, --disable-gpu and dep on dbus-x11). A test I'm working on at the moment takes 28 seconds in PhantonJS and 6 in Chrome under Docker and Xvfb.

vaviloff · on April 13, 2017

Very interesting comparison! Could you write a note on your experience somewhere?

chrisber · on April 14, 2017

If you need a real good alternative checkout https://github.com/arachnys/athenapdf it's based on electron.

corford · on April 15, 2017

Looks very nice but I'm not using Slimer for PDF stuff.

Jimmed · on April 14, 2017

A colleague and I spent a couple of days setting up high-fidelity webpage->PDF rendering, and by far the best results were got with SlimerJS and xvfb.

vor0nwe · on April 13, 2017

I'm actually hoping CasperJS will support headless Chrome, and headless Firefox later on too.

askmike · on April 13, 2017

Though a collaboration between the two projects might not be out of the question: https://groups.google.com/d/msg/phantomjs-dev/S-mEBwuSgKQ/PQ...

make3 · on April 13, 2017

google should hire him

z3t4 · on April 13, 2017

> I even bought the Mac for that!

I did too, then found out you also needed a dev license for users being able to run your app. Supporting Mac/OSX is damn expensive if your app is free.

Cthulhu_ · on April 13, 2017

For iOS development I believe they got rid of that requirement, you can develop iOS apps (at least) using a personal dev account; only when you want to go to the app store do they ask for the license fee.

prashnts · on April 13, 2017

Unfortunately they limit app capabilities (such as iCloud, keychain access), so you can't use those even for your personal account.

tokenizerrr · on April 13, 2017

How do I do this without buying dedicated hardware?

j_s · on April 13, 2017

Illegaly (virtualization) or in the cloud.

Globz · on April 13, 2017

PhantomJS enabled us at the time to bootstrap a big project at work where at the end workflow the app had to turn HTML orders to PDF on the fly, eventually we moved to WKHTMLTOPDF (https://wkhtmltopdf.org/) which is much less hungry with resources but nonetheless PhantomJS played a huge role during the early days of the project and was easy to setup. If I remember correctly the only down side was to find the correct format for our HTML template so PhantomJS would render proper page break and repeat the header for super long orders.

I can understand why stepping down is the right decision, maintaining such project by himself is an amazing feat on its own and even more when it proves to be useful for so many companies. Sadly when it becomes your second job you might always be on the lookout for a clean exit and such opportunity just became a reality.

Good luck in your future projects Vitaly!

neebz · on April 13, 2017

We have used PhantomJS and WKHTMLTOPDF both. Phantomjs hogs a lot of resources but is very good when you want print large PDFs (500+ pages). WKHTMLTOPDF struggles with larger HTMLs.

chrisber · on April 14, 2017

checkout https://github.com/arachnys/athenapdf. Its based on electron.

Globz · on April 13, 2017

Good to know, this is not our use case at the moment, we mostly generate PDF of 2-3 pages long.

Doctor_Fegg · on April 13, 2017

> Chrome is faster and more stable than PhantomJS. And it doesn't eat memory like crazy.

Wait, can someone tell me where to download this doesn't-eat-memory-like-crazy version of Chrome? Activity Monitor is showing me 2GB of Chrome processes right now and that's even with The Great Suspender having paused almost all my tabs.

yebyen · on April 13, 2017

I saw a trick where you can run Chrome and give it less memory, and it uses less memory. This is done using cgroups.

The blog post is somewhat old[1] and not in sync with the version that is in Git[2], you might find a way to do this without Docker (I was using an old version of Docker and kernel that couldn't get it to work. But I need the old version of Docker for reasons.)

Chrome will aggressively consume any memory you give it (up to a point?) to "make your browsing experience better" somehow. You're not wrong. But there is modern technology that can make it better. If you have a fast SSD, then Chrome can still use Swap to make your experience better. The later version in the Dockerfile linked on Git also leverages swapaccount with the seccomp setting.

This may be one great use of Docker for people that wouldn't yet have been convinced to use Docker for any serious reason.

[1]: https://blog.jessfraz.com/post/docker-containers-on-the-desk... [2]: https://github.com/jessfraz/dockerfiles/blob/master/chrome/s...

tayo42 · on April 13, 2017

You can just use cgroups on their own without docker

tyingq · on April 13, 2017

I would guess ulimit -v might work as well, so you may not even need cgroups.

yebyen · on April 13, 2017

Hah! Something new is also something old. Thanks for that.

But won't you run into issues with child processes, of which Chrome tends to spawn a zillion? (I'm reading that each one gets its own limit under ulimit...)

And potentially get your browser killed when it hits the limit? I haven't tested this and I don't really know how ulimit works, but I think it's a less effective solution than the cgroups for at least one reason.

tyingq · on April 13, 2017

Yes, it is per process, so it would only limit memory-per-tab. I believe you get the "aw snap" window in chrome if it hits the limit.

acdha · on April 13, 2017

Disable your extensions one by one until you identify the one leaking memory.

tokenizerrr · on April 13, 2017

Or just look at the chrome process manager. Shift+Esc

acdha · on April 13, 2017

That doesn't tell you if an extension is injecting code which causes a page to bloat significantly.

tokenizerrr · on April 13, 2017

Ah right, good point.

Noseshine · on April 13, 2017

I'm not sure that the results should be expected to be the same in GUI and in headless mode. I don't know - I'm just saying this is not clear without a test or clarification from someone who knows how it all works.

kzisme · on April 13, 2017

I'm assuming they are referring to: https://www.chromestatus.com/features/5678767817097216

Or something pretty close to it.

robmcm · on April 13, 2017

Is it Chrome eating memory of the websites you have visited? Suspended tabs suggests issues, which may not free up till you close them.

tempodox · on April 13, 2017

Wow, that was a quick reaction.

Thanks to the maintainers for all the good work!

To me, it's always a sad occasion to see diversity diminished. Nothing against Chromium, but I hope it won't be the one browser to rule them all. It's always good to have alternatives.

gus_massa · on April 13, 2017

They know since June about the project. This is an interesting conversation:

https://groups.google.com/forum/#!topic/phantomjs-dev/S-mEBw...

( posted by askmike in https://news.ycombinator.com/item?id=14105613 )

mrfusion · on April 13, 2017

Naive question here. What makes headless mode so difficult?

pavlov · on April 13, 2017

It's a very good question. One might imagine that the browser renders everything into a buffer at some point, and you could simply ask the engine to give you a pointer to that data.

The reality is very different. WebKit/Blink rendering is intimately tied with the graphics system of each platform, in particular through the use of native widgets and native window system compositors.

For example, on the Mac, a lot of compositing within the browser window is done using Core Animation layers. This is a really good idea for performance, because it leverages the work done by Apple to improve their GUI performance.

The downside is that capturing the output becomes very tricky when the browser doesn't do the final compositing. Previously this didn't really matter because 99.99% of browser rendering is for end users and they don't need to capture the output (or if they do, they would just use platform GUI functionality like screen capture).

An increasing demand for headless rendering has effectively forced browser engine teams to rethink some of the internal APIs so that a pipeline can be built to capture the final rendering.

tpaschalis · on April 13, 2017

It's the same order of magnitude of work, as it is maintaining a fully-fledged browser. It's sad to see open-source projects shut down, but being a sole developer is a lot different than having the resources of a giant like Google, for example.

tal_berzniz · on April 13, 2017

Thank you for PhantomJS! Been using it for testing, generating PDFs and screenshot.

amelius · on April 13, 2017

Yes, thanks PhantomJS!

I'm wondering, are there any examples out there for generating screenshots on headless Chrome?

throwaway2016a · on April 13, 2017

From the docs it sounds like it is largely compatible with Selenium https://chromium.googlesource.com/chromium/src/+/lkgr/headle...

If that is true, the takeScreenshot() should work as normal hopefully.

veidr · on April 13, 2017

I can confirm that. Here's an example of how we take screenshots when automated tests fail, using Selenium + Chrome:

https://gist.github.com/masonmark/2332c1238a2fa70b5e4fcfffdc...

JepZ · on April 13, 2017

I am happy for the guy as he seems to be able to let go without letting anybody down (which seems to be important to him). At the same time, it is sad when people have such a pressure for something they probably started as a fun project.

_lwad · on April 13, 2017

Question for those of you more involved with such headless tasks. Do you think that chromium and firefox supporting headless will induce a surge in bots crawling the open web from now on?

heipei · on April 13, 2017

Right off the bat: No. The reason for that is that crawling using a proper browser (i.e. Chrome) is a lot more resource-intensive than using a dedicated tool which only gets the top resource and maybe tries to parse some additional resources. With these kinds of tools you're limited by available bandwidth and IO speeds if you want to store things. If you're looking at a browser, you'll be limited more by things like memory consumption and CPU time, so you'd need a bigger box or more of them to drive the same amount of traffic. There is also not the same amount of ready made applications which take care of crawling, storing and maybe even indexing your data, so not something you can do without actually implementing a lot of things yourself.

Of course, that is only talking about wide-scale scanning. If you're only looking to scrape a single target, for whatever reason, then having an instrumented headless browser will greatly simplify things. Headless Chrome should be more efficient than running it in a (virtual) framebuffer. Plus the whole setup for a powerful crawler is reduced to "install Chrome, start it, point $crawler at the API endpoint". My guess is that we might see turnkey crawling / automation tools appear where you supply a list of URLs and the library + Chrome does the rest. Then, browser-based large scale scanning will be within everyone's reach, only limited by their resources.

Background: I created https://urlscan.io which will simply visit a website and record HTTP interactions (annotating the resources with some helpful meta data). I've been preaching the power of headless instrumented browsers for the better part of a year now ;)

uitgewis · on April 13, 2017

Not a whole lot more than they are at the moment.

kyriakos · on April 13, 2017

With chrome headless we still need an api like phantomjs or slimerjs to have the same functionality.

malinens · on April 13, 2017

My colleague from automatic testing says that phantomjs actually is much more stable than chrome...

onion2k · on April 13, 2017

That's true, but in my experience the instability of Chrome comes from opening and closing it's windows repeatedly as a large test suite often does - occasionally it doesn't seem to like opening a new instance while another is closing. Headless mode should resolve that problem.

holtalanm · on April 13, 2017

There was also the issue a few years ago (last time I wrote automated scripts) where the Chrome driver for Selenium would go too fast for the browser to keep up, causing false failures.

I had to implement a "wait between actions" feature to handle it, while Phantom had no such problems. I'm assuming this will not be an issue with headless Chrome, since I think half of the problem was due to graphical rendering.

lambduh · on April 13, 2017

It's both sad to see an incredibly useful project be sunsetted and exciting that it's no longer needed. I remember a project that used phantomjs to scrape an old government camping site to build a compatibility layer on top.

Thank you, if you're reading.

tomphoolery · on April 13, 2017

Good riddance! PhantomJS is a non-stop firehose of random errors and productivity breakdowns. It's also a way better JS driver than anything else out there. I'm glad Google is following in their footsteps and integrating Phantom's features directly into Chrome, where it will be supported by a large team and (hopefully) headless use of the Blink engine will be standardized so your test integrity doesn't depend on a patch version upgrade of your underlying JS implementation.

So, cheers to you Vitaly and anyone else who's helped make Phantom & Poltergeist into my favorite Capybara web driver!

est · on April 13, 2017

This is sad, phantomjs is better stripped than Chromium headless, if you ever try to install Chromium on servers without X, it requires shit ton of dependencies, while phantomjs was properly modified requires only minimal library.

akx · on April 13, 2017

I don't really see the problem. After all, you just install those deps once in your Dockerfile, right? ;)

est · on April 13, 2017

chromium-browser requires these

    chromium-browser chromium-browser-l10n chromium-codecs-ffmpeg-extra cpp
    cpp-4.8 fontconfig fontconfig-config fonts-dejavu-core hicolor-icon-theme
    libasound2 libasound2-data libatk1.0-0 libatk1.0-data libatomic1
    libavahi-client3 libavahi-common-data libavahi-common3 libcairo2
    libcloog-isl4 libcups2 libdatrie1 libdrm-intel1 libdrm-nouveau2
    libdrm-radeon1 libfile-basedir-perl libfile-desktopentry-perl
    libfile-mimeinfo-perl libfontconfig1 libfontenc1 libgdk-pixbuf2.0-0
    libgdk-pixbuf2.0-common libgl1-mesa-dri libgl1-mesa-glx libglapi-mesa
    libgmp10 libgnome-keyring-common libgnome-keyring0 libgraphite2-3
    libgtk2.0-0 libgtk2.0-bin libgtk2.0-common libharfbuzz0b libice6 libisl10
    libjasper1 libjbig0 libjpeg-turbo8 libjpeg8 libllvm3.4 libmpc3 libmpfr4
    libnspr4 libnss3 libnss3-nssdb libpango-1.0-0 libpangocairo-1.0-0
    libpangoft2-1.0-0 libpciaccess0 libpixman-1-0 libsm6 libspeechd2
    libthai-data libthai0 libtiff5 libtxc-dxtn-s2tc0 libx11-xcb1 libxaw7
    libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0 libxcb-render0
    libxcb-shape0 libxcb-shm0 libxcb-sync1 libxcomposite1 libxcursor1
    libxdamage1 libxfixes3 libxft2 libxi6 libxinerama1 libxmu6 libxpm4
    libxrandr2 libxrender1 libxshmfence1 libxss1 libxt6 libxtst6 libxv1
    libxxf86dga1 libxxf86vm1 x11-common x11-utils x11-xserver-utils xdg-utils

phantomjs:

    fontconfig-config fonts-dejavu-core libfontconfig1 libjpeg-turbo8 libjpeg8

akx · on April 20, 2017

Err, that's... simply not right. The difference isn't that dramatic.

  $ docker run -it ubuntu:16.04 sh -c 'apt-get update && apt-get install phantomjs'
    134 newly installed. After this operation, 339 MB of additional disk space will be used.
  $ docker run -it ubuntu:16.04 sh -c 'apt-get update && apt-get install chromium-browser'
    202 newly installed. After this operation, 615 MB of additional disk space will be used.

tlhunter · on April 14, 2017

I'm assuming a lot of those dependencies (x11, libgtk) will disappear once installing a special headless version of chrome.

yeukhon · on April 13, 2017

Not everyone is using Docker.

Hernanpm · on April 13, 2017

Besides testing, my team uses phantomjs to convert pages to PDF and also to convert javascript generated charts to images. This is sad news, I'm don't think chrome eventually will add support for.

egeozcan · on April 13, 2017

There's wkhtmltopdf[1] when you need the PDF functionality. It is licensed with LGPL, so you can make it interop with your commercial product.

[1]: https://wkhtmltopdf.org/

petercooper · on April 13, 2017

And there's its sister project wkhtmltoimage for rendering to images as well :) Sadly, though, the browser seems way behind the times so I think Chrome will win the next round. I built a prototype library for rendering pages to animated GIFs using the Chrome debugging protocol, but headless will make it even easier: https://github.com/peterc/chrome2gif

eknkc · on April 13, 2017

While it's much better than PhantomJS, still causes a lot of issues. We run a paid pet project [1] for html to pdf conversion and most of our customers have stories beginning with PhantomJS then moving to wkhtmltopdf then finally going for something else due to issues with both.

Headless Chrome might solve this issue once and for all though.

[1] https://restpack.io/html2pdf

egeozcan · on April 13, 2017

For me the only issue was kerning with some specific fonts on windows servers. Do you have any other examples?

Globz · on April 13, 2017

I have been using it for almost 2 years on a production server to generate html orders to PDF on the fly, such an amazing tool.

throwaway2016a · on April 13, 2017

Second wkhtmltopdf ... I love PhantonJS but wkhtmltopdf is a better tool for PDFs.

Hernanpm · on April 13, 2017

did you have luck with SPA to PDF? I recall I tried it but with no luck for good javascript support.

throwaway2016a · on April 13, 2017

SPA as in Single Page Application?

We are using it for reports and we are tailoring our HTML specifically to the reports so we haven't had that issue. If I needed Javascript it may not be the best solution.

I like that it generates full PDFs (with real text objects) though not just a static image. I'm not sure if you can generate a PDF like that with PhantomJS. I haven't tried.

HarveyKandola · on April 13, 2017

We recently ended up using NightmareJS with xvfb in Debian Docker image to produce high fidelity PDF's.

Seems to work well so far.

Flenser · on April 13, 2017

Chrome will have headless save to PDF fairly soon: https://bugs.chromium.org/p/chromium/issues/detail?id=603559

masklinn · on April 13, 2017

You can use WebDriver to take screenshots. OTOH I don't think there's any way to do PDF generation without fucking around with injecting window.print() and trying to go from there.

djd20 · on April 13, 2017

Hey - what do you use for the pdf conversion specifically? We've been looking for something like this. Thanks in advance.

tannhaeuser · on April 13, 2017

Many thanks to the maintainer for his work. I think this isn't unexpected, and actually encourage other unpaid maintainers to follow. Reason I'm thinking this is that the current state of voluntary support is unsustainable anyway, and by letting it go we maybe could make the market for dev tools economically viable again.

ohitsdom · on April 13, 2017

PhantomJS is a great tool, I implemented it as a PDF report generating system. But will Chromium be able to replace it in this regard? Will Chromium have paging features? Will it be able to repeat table headers when a table body content extends to the next page?

jffry · on April 13, 2017

I can't answer all of your questions, but this thread may interest you: https://news.ycombinator.com/item?id=14102248

ohitsdom · on April 13, 2017

Huge help, thank you!

sitepodmatt · on April 13, 2017

You deserve a nobel peace prize.

I recall fondly informally introducing colleagues to chrome dev tools, injecting jquery via a booklet, and querying the dom like xml xpath. Then taking this headless (server-side) with almost minimal wrapping due to your work.

Hail you and damn regexp.. :)

nashashmi · on April 13, 2017

Anybody know how to port code from using PhantomJS to headless Chrome? I have been using CasperJS that wraps to PhantomJS. PhantomJS had its own set of commands. headless chrome will have to be different one way or another.

JimTheDev · on April 13, 2017

I wrote a bit about how you can get started here using Node: https://objectpartners.com/2017/04/13/how-to-install-and-use...

aantix · on April 13, 2017

Looks like it's already usable?

"Use headless chromium with capybara and selenium webdriver - today!" -- March 30th

http://blog.faraday.io/headless-chromium-with-capybara-and-s...

rogerwang · on April 14, 2017

NW.js v0.23 will support headless with Chromium 59. I've been collecting feature requests and sharing the plan. Please see https://github.com/nwjs/nw.js/issues/769#issuecomment-259867... and https://github.com/nwjs/nw.js/issues/769#issuecomment-294064...

elchief · on April 13, 2017

I found PhantomJS pretty unusable for my use-case. It always spits logs to stdout and there's no way to change that. Filed a bug and he said "fix it yourself"...sorry I don't know Qt.

scottydelta · on April 13, 2017

As a developer, I wonder how hard can it be to disable console logging on any project if the source code is provided.

cheeze · on April 13, 2017

Welcome to open source software. Somebody has to fix it.

Better than it being closed source and you being told to go away.

__s · on April 13, 2017

You don't need to know Qt to comment out printf

logn · on April 13, 2017

I maintain a web driver too (Selenium-based) and have been wondering about how it would compare to Selenium ChromeDriver headless. Mine is built in Java and uses Java's embedded WebKit. If anyone has feedback this is where I'm discussing it, https://github.com/MachinePublishers/jBrowserDriver/issues/2...

aaossa · on April 13, 2017

I think is a wise decision too. As an OSS collaborator is hard to explain how important/demanding this work is. I really understand his feeling, and I hope that more people like him could collaborate in OSS projects. Thanks for everything!

kyledrake · on April 13, 2017

PhantomJS was the first thing we used for Neocities screenshots and I've always had a special affection for this project for that reason. Neocities really wouldn't have been possible without the ability to do screenshots.

eric_b · on April 13, 2017

This is a great move. I appreciate what Phantom (and the maintainer) were trying to do, but I have always loathed PhantomJS. It has never worked well. In fact, I'd been away from it for some time, but just last night needed to install it to run some tests and it caused massive frustration.

I pulled a Node repo that ran tests using Karma (why people use Karma is a complete mystery to me). I pulled the repo, ran `npm install` and then `npm test`. Sure enough Karma explodes out of the gate.

Phantom can't start. I'm on Windows 8.1. I debug for an hour, eventually finding a magic custom binary Ariya created. I then have to copy this binary to the `/node_modules/karma-phantom-launcher/node_modules/phantomjs2-ext/bin` directory.

All this to run some Jasmine specs.

If Chrome headless support is really as good as "works just like Chrome without the GUI" then I will be one happy camper.

joshuaduffy · on April 13, 2017

I personally use Chrome at the moment as on larger projects PhantomJS as non forgiving with syntax errors and tests will simply fail.

bluepeter · on April 13, 2017

Where's Ariya in all this? He seems to have also completely abandoned Phantom...?

nrook · on April 13, 2017

Is there any way to donate to the PhantomJS project? This seems like a good time to throw some money their way, in thanks for what the maintainers (mostly Vitaly over the past few months, at least, it looks like) have done.

novaleaf · on April 14, 2017

I run phantomjscloud.com, I guess this is the writing on the wall and I better start building another [chrome] back end soon. Probably a name change is in order too!

stretchwithme · on April 13, 2017

I appreciate all the hard work.

In my last job, I used PhantomJS with highcharts to provide a web service for generating charts. And used it with the poltergeist gem for headless testing.

runnr_az · on April 13, 2017

Hey dude... thanks from a grateful dev in Scottsdale, AZ. Your hard work enabled a lot of really cool stuff for us! Good luck in your future adventures!

sillero · on April 13, 2017

Thank you for your work Vitaly, it's truly inspiring, you've made a difference for the community. Good luck on your next project.

amiga-workbench · on April 13, 2017

Thanks for the work, PhantomJS made generating screenshots to show different UI states in bulk effortless.

It has saved me days of effort over the last year.

gildas · on April 13, 2017

Thanks Vitaly for your work! SEO4Ajax would certainly not exist without PhantomJS. It helped us to deliver the service efficiently at the time.

Unfortunately, we had quite a few compatibility issues with it leading us to migrate to Chrome (with xvfb) one year ago. Since then, we must confess that we are very happy of this choice. Chrome is indeed very stable, fast and more importantly for us, always up-to-date.

m4k · on April 13, 2017

Have been using PhantomJS since couple of years for data scrapping. It's a really good project.

frik · on April 13, 2017

That's quite sad.

etatoby · on April 13, 2017

It might be for the best. One of the many companies using PhantomJS to make money could go ahead and employ Vitaly to work on the project full time.

redsummer · on April 13, 2017

I'm curious - What is the utility of headless browsers?

Are there people who earn money by getting it to automatically fill out forms, enter competitions etc?

avaer · on April 13, 2017

Testing, rendering, scraping, streaming, botting, and many more.

Think "what could I accomplish with a browser, with the slow human replaced by a fast program" and let your imagination run wild.

The space is interesting enough that people have jumped through a lot of hoops to make it work in the past; this makes it one less hoop.

Oh, and if you ever wondered why web captchas are a thing, one of the reasons is headless browsers.

demonshalo · on April 13, 2017

A great example would be PDF generation for things like invoices. Rather than generating a pdf with something like PHP or Java. Render a regular html page with all the css you want (super easy compared to drawing a PDF using PHP) and then proceed to use a pdf printer on that page.

You could run such a thing as a microservice using a headless browser or PhantomJS. There are probably better ways to do this but that's one of the first things that popped into my head!

tjelen · on April 13, 2017

The Webkit/PhantomJS PDF export actually supports SVG embedding (as real vectors), Webfonts and many other things.

It's possible to create pretty advanced layouts with maps, graphs using that. Even embedding IFrames with e.g. Google Maps works.

asdfgadsfgasfdg · on April 13, 2017

This sounds horrible.. In general all these html->pdf ways of generating pdfs sounds horrible -- why don't people use latex for this?

SigmundA · on April 13, 2017

Because you probably already did the layout work in HTML to display on screen to the user and now just want a PDF version of it.

Or you can redo the layout in latex and maintain two layouts.

The full print css is actually pretty complete, problem is the only browser that fully supports it is PrinceXML. None of the major browsers seem to care much about print layout.

asdfgadsfgasfdg · on April 13, 2017

But HTML layout is very different to page based layout. HTML is responsive and has no concept of pagination. PDF is paginated and has no concept of responsiveness.

IneffablePigeon · on April 13, 2017

CSS2 has the concept of paged layouts: https://www.w3.org/TR/CSS2/page.html

demonshalo · on April 13, 2017

It is NOT horrible. However, Latex is another way of doing things. Perhaps your requirements should dictate which of the two you should go for!

tzs · on April 13, 2017

A few little personal things I've done with PhantomJS:

• A script that would go to Comcast's TV schedule for my area and make a list of all movies upcoming in the next two weeks on all channels that are included in my subscription. I could then grep that for a list of movies I've been looking for.

I couldn't just grab the page with curl and parse it, because JavaScript does most of the work. JavaScript fetches the listings, and when you advance the listings it it fetches the new listings and replaces the old ones on the page.

• A script that goes to the FCCs license information site and gets a list of all ham radio callsigns issued recently [1].

• A script that given a URL to a tactics problem on lichess gets the FEN for the position. I'd use this if I was doing tactics training there on my iPad and did not understand why my answer was wrong or why their answer was right.

I'd mail myself a link to the problem, and then later on my desktop I'd give that URL to this script and it would go to lichess to the problem page, and then from there to the board editor page for that position, and grab and give me the FEN, which I could then use to set up the position in Stockfish to analyze.

(This is no longer useful. They have made some changes at lichess and now they have a browser-based version of Stockfish on the problem pages, so I can answer my questions right there).

• A script that goes to everquest.com and gets the server population levels from the server population display on that page.

I don't think that there was anything in this one that actually needed a headless browser. As far as I recall it could have all been done with getting the page with curl and parsing it. It was just easier to do it in JavaScript using the DOM. (The lichess one may also have been that way).

[1] https://github.com/tzs/todays_hams

praptak · on April 13, 2017

Web scraping for sites that discourage simple robots by checking for JS or serving content via JS.

Also, the darker stuff like click fraud and all the other kinds of fraud where you pretend there are humans doing something when in fact there's just a bot.

tomp · on April 13, 2017

I wrote a PhantomJS script to download data from my bank accounts. They offer no API and disfunctional text-based exports, their websites are ridden with "good" (=terrible) web & security practices, like 3-characters-of-the-whole-password authentication, single-tab sessions, frames, etc. that makes it pretty much impossible to scrape with Python but relatively easy with a fully-fledged browser (although that still requires a lot of bank-specific boilerplate code).

slig · on April 13, 2017

If your bank has a mobile app, it might be easier to MITM and figure out their API and use it directly.

cr0sh · on April 13, 2017

That actually sounds like it could run you into legal issues (or worse), depending on your location (ie - access to a computer system without permission; they give you permission to use the app on the phone, but maybe not to use the API directly). YMMV.

sjtgraham · on April 13, 2017

Who do you bank with?

tomp · on April 13, 2017

NatWest, Halifax and AMEX (not a bank, but I want my account data from there as well).

sjtgraham · on April 13, 2017

I can't help you with Halifax or AMEX (yet), but my company (https://teller.io/) has a Natwest API in production (private beta). If you would like access, please ping me. sg -at- teller.io

acdha · on April 13, 2017

I use https://github.com/bfirsh/needle/blob/master/README.md for automated UI regression testing. Using a headless browser means your test suite can run faster and with fewer dependencies.

etatoby · on April 13, 2017

Downloading any kind of web page where a simple wget or curl turns out empty, for instance anything made with React or other advanced JS frameworks.

mvitorino · on April 13, 2017

Server side automation of multiple kinds (not just tests): screenshots, advanced crawlers, etc.

dec0dedab0de · on April 13, 2017

At work I was given permission by a vendor to screen scrape their site while they worked on building a real API. This site was extremely dependent on javascript. Including doing some really complex token passing between multiple domains that the company owned. Not to mention all of their js was minified and uglified so I had a very hard time understanding what it was doing.

It was the first time I wasn't able to successfully reverse engineer a site enough to scrape what I needed with just requests/beautifulsoup. I was however able to get it working just fine using phantomjs via selenium via splinter. It was a fun exercise, but part of me still feels like it was cheating.

wfunction · on April 13, 2017

While we're on the topic, does anyone know where one might find scripts to scrape bank statements so you don't have to download them manually every month? (This is one thing I would find headless browsers useful for...)

RandomBookmarks · on April 13, 2017

>Scrape bank statements...

There is Kantu: https://www.a9t9.com/kantu/web-automation

It uses screenshots and OCR to automate web browsing and scraping, to you do not even have to "touch" the DOM. What you do is you simply draw a frame around the areas that you need to have extracted and OCR'ed. It also works with PDFs.

thibaut_barrere · on April 13, 2017

Boobank (http://weboob.org/applications/boobank) is such a collection of scripts (althought mostly centered on french banks).

vidarh · on April 13, 2017

If your authentication is basic enough, it should be a problem to write one. The problem with a lot of banks is things like two-factor auth to do anything. If I didn't have a mortgage locked in at a crazily great rate, I'd consider changing banks just to get one that'll let me automate statement downloads. The alternative is to build a device to press buttons on their 2FA device... Come to thing of it that might be a fun hack.

wfunction · on April 13, 2017

2FA isn't my issue here. Actually downloading the statements is. Lots of banks use JS, some go through really weird hoops getting you the PDF that are difficult for non-experienced people to automate. Stuff like JS in embedded iframes that generate the link on the fly and open a new tab that you have to navigate. It's hard to accurately detect all the links and handle things like "Next Page" and so on, especially for more than one bank. It's quite nontrivial.

vidarh · on April 13, 2017

If you don't have 2FA issues, then while I agree it's non-trivial, it's certainly doable with a headless browser. But yes, I'd love for there to be simpler ways to do this in general.

sjtgraham · on April 13, 2017

Who do you bank with? I might be able to help.

sjtgraham · on April 13, 2017

For UK banks check out https://teller.io/ for an API to your bank account (Disclosure: my company)

pricechild · on April 13, 2017

Every UK bank I've used allows some format of csv/qif/ofx export.

Are you in the US?

vidarh · on April 13, 2017

They generally do. But many UK banks make it hard to automate things. Either insisting on 2FA in all cases, or having a secondary login without 2FA that only gives very limited access.

Some way of authorising API access to read-only access to things like statements would be fantastic, to the extent that I'd consider changing banks over it, if you know of any UK banks that offer it.

thibaut_barrere · on April 13, 2017

As well, most bank OFX/CSV exports I've dealt with are truncated in some ways (e.g. truncated labels), which make it harder to really leverage sometimes.

nsteel · on April 13, 2017

Monzo will be offering current accounts soon.

https://monzo.com/blog/2017/04/05/banking-licence/

pricechild · on April 17, 2017

Ahh of course. Apologies for missing that.

wfunction · on April 13, 2017

The format isn't the issue. The problem is I want it to be API-friendly so I don't even have to think about it; my system should download it automatically.

But yes, I'm talking about the US.

lightbyte · on April 13, 2017

My bank offers that, but it costs I believe ~$15 a month.

ble · on April 14, 2017

I used PhantomJS as part of a report generation pipeline which served no HTTP requests and contacted no outside servers. We made PDFs and ready-to-email, single-file HTML reports with some minimal interactive features. (Ready-to-email, single file == all images turned into data URIs, styles inlined, for HTML files sent as attachments)

PhantomJS loaded up an HTML file written earlier in the pipeline. The HTML consisted of a big slug of JSON containing all the relevant data (which would vary from one run to another) and a bunch of scripts and templates (which were fixed for any given report type). The scripts built into the HTML file would chew up the JSON slug and build up the DOM required for the report. Then the PhantomJS script would identify all the images in the DOM and replace all of them with data URIs, strip out the JSON slug to prevent giving away more data than contained in the DOM, and strip out all of the templating JavaScript, leaving behind only the JavaScript needed for the interactive features, which was inlined.

We went with PrinceXML for PDF generation. I was briefly nervous because I saw people praising PhantomJS' pdf generation capabilities... but then I saw the people saying, "we used PhantomJS for pdf generation, we used wkhtmltopdf, then we just paid some money to get something that wouldn't produce weird output some of the time." CSS Paged Media Module FTW, y'all.