Pix2code: Generating Code from a GUI Screenshot

Black-Plaid · on May 26, 2017

Oh, man, if only the designs I got were this simplistic and only used standard controls with the standard look and feel.

This is a very interesting start, but it's a long long way from being able to even represent in a simple layout what I am asked to do on a regular basis as an iOS dev.

Maybe combine this with a PaintCode back-end and add markup for layout behaviors?

melloclello · on May 26, 2017

"Design hubris" is such a common problem for mobile/front end developers and the source of so much wasted time I'm surprised it's not talked about more commonly. Shit's expensive, man!

I am thankful that these days I get to work with talented designers who know what a UITableView is and understand the limitations of the medium.

realharo · on May 26, 2017

It's also funny when the design has some short placeholder text somewhere. Then the actual real text ends up being longer and won't fit there, or will look like crap.

tal_berzniz · on May 26, 2017

http://berzniz.com/post/14669116083/dont-be-a-liar-and-4-oth...

codingdave · on May 26, 2017

I wrote a system with this concept back in the early 00s, to let designers drag/drop a design on-screen and then read the positioning of the dropped elements to compute the HTML that would produce their design... it was a dismal failure. What I found was that designers who wanted to work with the web didn't want an easy-to-use tool. They either wanted to dig in and learn HTML themselves, or just keep using Photoshop. And end users building personal sites didn't want to innovate on their designs, they wanted to use pre-made templates and just fill in their own text, change colors and images, etc.

Maybe the pervasiveness of computing has increased to where there now would be a market for such things... it has been 15 years since I failed to find an audience after all. And I wouldn't hold back on developing the technology, but I'd put a decent amount of effort into a product fit before getting too excited.

minademian · on May 26, 2017

As an engineer, I'd be interested in the quality of the code generated by this tool. I have too many nightmares from the early days of web dev where generated code was a mess because FrontPage.

jjcm · on May 26, 2017

It can be done, but it's rare. I actually really enjoyed working with Expression Blend (Microsoft's front end xaml wysiwyg design program). The code that it outputted was usable, which after seeing the things that dreamweaver made was a breath of fresh air.

minademian · on May 26, 2017

Agreed. Dreamweaver and FrontPage committed acts of mass monstrosity with the code generated.

eriknstr · on May 26, 2017

I submitted this to some subreddits and a couple of people made some quite similar questions that I'd like to see a video for in case OP here is the author or in case the author sees this.

>What would happen if I put something other than a clear mockup as input? Replications?

>What happens if you try to scan console output [1] or this mess [2]?

1: http://i.imgur.com/LJqJUnF.png

2: http://i.imgur.com/OMbZsgd.png

radarsat1 · on May 26, 2017

Side rant: the page has a bibtex entry for arxiv. Look, it's all well and good to publish early and often, but I find I still have some reservations.. I mean, if I use this as a basis for some future work, obviously I will cite it, but I don't particularly feel good about citing an arxiv publication. It completely side-steps the peer review process, and I feel that in the long run, that is bad.

You might say, well working code is working code, and sure I've cited software in the past, and having an article to go with it is even better, but it's getting to the point that people are using arxiv not as a preprint service but as a publishing platform. I find this frustrating for two reasons: 1) like I said, it skips peer review and even allows people to cite rejected papers (for better or for worse), and 2) it makes the race to publish that much more severe -- now if I wait until a conference or journal publishes my work, I'm 6 months behind the guy who just uploads it to arxiv and is already getting dozens of citations in current work.

So, perhaps this is not the place for this debate, but putting aside the fact that preprint does seem like a useful way to "pre-publish", do you think it's appropriate to cite preprint papers and work? What are the implications for computer science as a research field down the road, since "free for all" seems to be taking over as a publication medium?

I know this will come off as being old fashioned, but I'm really worried about where research publication is going in this field. It feels like a knee-jerk reaction to first-to-publish pressure, rather than something that is a well thought-out solution.

pcwalton · on May 26, 2017

In reality, the competition isn't between "posting to arXiv" and "getting peer-reviewed". The competition is between "posting to arXiv" and "posting to your blog". If you succed in manufacturing a stigma against arXiv papers, then you're just going to encourage people to make blog posts instead, which can and do go down at any time.

radarsat1 · on May 26, 2017

> If you succed in manufacturing a stigma against arXiv papers

That's not exactly my intention, and your statement goes in line with what I said..

> sure I've cited software in the past, and having an article to go with it is even better

where "software" could also include blog posts. It's just that, yeah, when it comes to it if I have to choose between citing something some guy wrote on a blog or citing Arxiv, I'll choose the latter every time.

But my question was not about whether arxiv is an appropriate medium for publishing ideas, I'm certainly not arguing that it should go away, but rather whether it's an appropriate thing to be citing in a scientific context. i.e. in derived work.. should it be "okay" for people to publish to arxiv and just.. leave it there and not put it in a conference or journal? Should such work be validated by citation?

It's a legitimate question that I don't know the answer to. If you get an idea from there, you can't just.. not cite it.. and yet, I feel like arxiv should be used only as long as the work will eventually get a proper publication. Which more and more it seems is not a given.

I'm really just responding to the statement on the posted link, which is just a header, "Citation", and an Arxiv-bibtex entry. No "submitted to X..", "in press", etc., or anything.

Again, it's not that I'm totally against the idea, but I think there must be some happy medium between "peer reviewed work" and "well-written but unreviewed article". Typically this used to be conferences, but conferences are expensive, and with people uploading their pre-conference unaccepted publications and those getting citations, I mean.. where will this end?

I was a bit shocked recently while writing an ML article to find that the "official" original reference, that you see cited everywhere, for the whole concept of "style transfer" appears to be an arxiv paper [1]. My reservation doesn't so much come from the fact that the authors put their work there, but more that this is what people are actively choosing to cite over their peer-reviewed conference publication [2]. When does "pre-print" become just "print"?

(Currently Google Scholar reports 216 citations for the former and 74 citations for the latter. Of course they were published a few months apart so it's not a great comparison but still, just an example..)

[1]: https://arxiv.org/abs/1508.06576 [2]: http://www.cv-foundation.org/openaccess/content_cvpr_2016/ht...

Houshalter · on May 26, 2017

I think the reason for this might be because a lot of machine learning research and practice takes place outside of academia. Maybe there isn't the same pressure to publish in official sounding places.

And people outside of academia often don't have access to journal publications. Who wants to spend $30 per paper? That's just obscene. The arxiv link is accessible everywhere, will never go away (probably), and can be updated as the authors revise the paper. Why not cite it?

aub3bhat · on May 26, 2017

Your argument is ridiculous.

Conferences and journals are merely marketing venues, and there is no reason to slow down the field by clinging on to flawed review process. If you adopt this attitude you will only find co-authors and students abandon you for fear of getting scooped.

blahbwah777 · on May 26, 2017

In the case of building a nuclear reactor, the knowledge one relies on must be vetted by experts.

In the case of building web apps and such, sure, why not put it out there ASAP?

radarsat1 · on May 27, 2017

Huh. I guess I didn't succeed in getting my point across. I suppose it is a it subtle so I shouldn't be surprised. What I was expressing skepticism for was not whether or not things should be uploaded early to Arxiv, but whether Arxiv should subsequently be considered the reference for that work, rather than preferring a peer-reviewed path.

toisanji · on May 26, 2017

I wrote a similar system and plan to release the source for mine over the next few weeks. it produces working code that can be run on linux,mac,windows.

hardwaresofton · on May 26, 2017

Did we collectively give up on building UI building/organization tools that are easy enough for designers to use?

I understand that you can't control what tools designers use (whether sketch or photoshop or MS paint), but it seems like building a tool they don't hate using that builds UIs is the way simpler solution? There are already mockup apps that include even basic functionality that designers use...

miguelrochefort · on May 26, 2017

Microsoft Expression Blend is very good.

idibidiartists · on May 26, 2017

It's best to make it so easy for a designer to make responsive (and even adaptive) layouts (for both native mobile and web) that this AI-based contraption that is guaranteed to produce wrong outcomes some of the time becomes unnecessary, at least for any practical purpose (unless the goal is the automated copying of web pages which is a lot bigger problem than guessing the markup, given how much complex dynamic behavior is built into pages these days)

I've taken an attempt at simplifying the task of building responsive and adaptive React Native apps with the following little library.

https://github.com/idibidiart/react-native-responsive-grid

jaytaylor · on May 26, 2017

Looking forward to when the code actually becomes available.. until then, it is a cool demo video :)

ge96 · on May 26, 2017

Awesome before I got into web design and became a "full stack developer" I wanted this same thing... asked about it on a forum. I'll have to read the article/check it out to see if you literally do something like pixel mapping or "Open CV" throw that in there to be safe. "Bit mapping?"

edit: I didn't ask if this was for web or app, it looked like it was for applications... wouldn't know how to do layout on that but can do it with HTML/CSS, maybe with Electron you could do that for Desktop apps, not sure about Android/iOS though.

sebastian · on May 26, 2017

The demo video shows two examples and iOS UI and a web UI, it looks like it's using Bootstrap for the web UIs

ge96 · on May 26, 2017

Ahh thanks for the clarification, I probably should have watched it full screen.

I suppose once you know how layout works with XML (Android) it wouldn't be hard to translate... assuming you've got the layout down... interesting to see their decisions whether to decide if thing are grouped together or work separately/positioned independently like a menu icon. Unless it's absolutely positioned and not responsive/dynamic.

At any case thanks

doodpants · on May 26, 2017

Neat concept, but it wouldn't work for creating modern GUIs, where half the controls are invisible until moused over (desktop), or hidden behind hamburger menus and swipe gestures (mobile).

whatever_dude · on May 26, 2017

Not only that, all those magical solutions are doing is solving the easy part of development, which is laying down the widget-based layout. The actual hard parts (different behavior based on context such as viewport size/orientation, actual UI business logic) are an afterthought.

Impressive work, yet misguided in many ways in my opinion.

m3talsmith · on May 26, 2017

It is a cool concept. However, what it doesn't seem to cover, and what would fix the issue you are speaking about, is coding for a set of images. Given a set of wireframes, with expected post effect and interactivity hints interleaved, I would expect this generator to build the code that covers all the specs, balancing the spec needs and being able to hold the stack in memory until all the known elements are accounted for before writing the final code.

scraft · on May 26, 2017

Interesting, wonder how it handles much more complex setups. I find a big part of taking a design from Photoshop and getting it running on iOS/Android is often more about thinking about what the constraints are, i.e. does an element sit at an absolute distance from a screen edge, does it sit a specific distance from another element, does it expand to fit an area, you obviously have to do this to ensure it works at different screen resolutions. These would certainly be hard decisions for AI to get right on a relatively complex screen, but then again, maybe with enough training it could actually do really well and solve these problems in a completely different way to how a human would. There is also stuff like considering whether any of the information is dynamic (text being received from a server) in which case elements needs to be able to adapt to different sizes etc. again hard for AI to have any clue from this from a photoshop image only.

seibelj · on May 26, 2017

I think this is just creating the layout files, all the work of wiring it up still has to be done by the programmer. This is really cool and probably a glimpse of the future, but honestly it would probably be easier to train the designer on how to use the drag and drop UI builders that already come with these platforms.

abrookewood · on May 26, 2017

That assumes the designers want to learn them. Sure, some of them are happy to, but this negates the need for them to do anything other than concentrate on their area of concern.

hasenj · on May 26, 2017

Why train someone to do grunt work when the machine can do it?

Translating picture designs into html files is not really that much fun.

Now the next step (probably more difficult) is you need a way to let the programmers "hook" into the generated output in order to further tweak it or customize it, but _without_ having to modify the generated files.

_That_ would make this technology really viable.

sebastian · on May 26, 2017

It looks like a very good first step. I wonder how structured/semantic the generated HTML would be for engeniers to "hook' the generated code with a backend

visarga · on May 26, 2017

Video here: https://www.youtube.com/watch?v=pqKeXkhFA3I

catmanjan · on May 26, 2017

Very cool, I wonder if the web one ends up responsive given a bootstrap-y model.

mwcampbell · on May 26, 2017

A tool like this might produce a UI that looks right on the surface. But what about things like accessibility (e.g. for blind users with screen readers) that even many human developers don't get right?

tanilama · on May 26, 2017

This is just image to layout. The code here is really some nested tree structure. Even though it is still very interesting result, says it is pix2code might be a little misleading.

octalmage · on May 26, 2017

This reminds me of a tool AutoHotkey had that let you rip existing UIs. It would read the window and rebuild the UI with the exact layout.

SmartGUI I think.

Stanleyc23 · on May 26, 2017

this would address a huge need in mobile UI if we could upload different screenshots for different screen sizes e.g. iPad and iPhone or different iPhone orientations and it would autogenerate an efficient set of dynamic autolayout constraints, content hugging etc to achieve both UIs.

braindead_in · on May 26, 2017

What about PSDs? PSD to html is a big market and a product like this could be a good fit.

patates · on May 26, 2017

PSDs are layered, and a correctly organized PSD file with some layer naming conventions should contain enough information to generate layout code without needing to use machine learning.

speleo_engr · on May 26, 2017

Altia has a product (PhotoProto) that can import PSDs to their tool which can then generate C code for embedded systems.

http://www.altia.com/products/photoproto/

aub3bhat · on May 26, 2017

By generating the "code" are you merely inferring / representing UI layout as some kind of a tree. E.g. each button/panel/textbox gets detected, and an "RNN/LSTM" generate a tree structure using attention.

avg_dev · on May 26, 2017

"merely" is not the word I would use.

tanilama · on May 26, 2017

If you somewhat familiar with DL literature, you will see this paper, while having a very interesting angle, the underlying architecture is a standard, enc-dec network, with encoder being CNN and decoding being LSTM. Such application, has been studied before:

https://arxiv.org/pdf/1609.04938.pdf

The above paper shows nice result that turns image to latex expression, and image to html.

strin · on May 26, 2017

How does this compared to WYSIWYG editors?

hasenj · on May 26, 2017

I seriously hope that AI can soon take over the majority of work (usually mundane) involved in create CRUD applications. Leaving the programmer with simply customizing certain parts or writing very specific business or validation logic.

It might drive down the wages of _some_ programmers but it will at the same time free us to work on more interesting problems.

_y4o5 · on May 26, 2017

I'm hoping that it would just free us from work entirely ;-)

hasenj · on May 26, 2017

Maybe OT but "free from work" is kind of an oxymoron.

You can't be free if you don't produce. You will be a slave to people who do produce.

doesnt_know · on May 26, 2017

If you lack the imagination to consider such a future, learn from the best and read the Culture series of novels.

hasenj · on May 26, 2017

Imagination is easy. Doesn't mean it's viable (or desirable).

You cannot imagine your way out of human nature.

You'd need at minimum the entire population to be of sufficiently high IQ and high agreeableness (and low aggressiveness). This kind of population does not exist, and probably will not exist in the foreseeable future. If it did exist, it will probably be run over by another population of high aggressiveness. So unless the population of the entire planet is as such, this will continue to be mere imagination.

walterstucco · on May 26, 2017

Mate, my grapes need attentions that I cannot give cause I'm working.

My home made spumante takes time that I have to take from other tasks, such as laying at the beach

My craft beers won't make themselves

I have tomatoes, potatoes, beans and fruit to grow

And of course programming as an hobby

I would have so much to do, if I didn't have to work…

hasenj · on May 26, 2017

AI will take care of your grapes and your craft beers will indeed make themselves, as will your tomatoes, potatoes, and beans.

Laying at the beach will get boring after two weeks probably.

nl · on May 26, 2017

I think there was a comment on HN yesterday relevant now: oh the wonderous things we could do with Haskell if we neither had nor needed jobs.

walterstucco · on May 26, 2017

> AI will take care of your grapes and your craft beers will indeed make themselves

That's even better!

I love watching the others doing things

> Laying at the beach will get boring after two weeks probably.

You really lack imagination then…

Maybe your internal robot is still too strong

Yu will get it, eventually

cust0m · on May 26, 2017

> Laying at the beach will get boring after two weeks probably.

Learn to surf and change your mind.

bryanrasmussen · on May 26, 2017

You cannot imagine your way out of human nature, but you should imagine that you might not know exactly what that human nature is.

hasenj · on May 26, 2017

You don't have to know it exactly. You already know enough. There are enough people on the planet with low IQ and/or high aggressiveness to make this infeasible.

cust0m · on May 26, 2017

so low IQ and high aggressiveness is human nature?

BjoernKW · on May 26, 2017

I'm always wondering why this kind of tasks can't be largely automated already. It's not like you need AI for generating what largely amounts to default plumbing code.

It shouldn't be that difficult to generate back-end code for say a Bootstrap template. It should even be possible to generate both a Bootstrap template and the corresponding back-end code from something like a Balsamiq mockup.

The problem with generated code though used to be that it tended to both break quickly when confronted with non-default requirements and to make customisation by (human) developers much more difficult and cumbersome.

So, perhaps the route to having largely automatically generated CRUD apps isn't so much paved by the code generation process - involving AI or not - itself but by how easily the generated code can be extended afterwards. I'm envisioning to never have to touch the actual CRUD code. I'd rather like that CRUD code to provide extension points for additional services or functions to tack on to.

narrator · on May 26, 2017

It wont drive down wages, because of Jevon's Paradox. The commodity that is used more efficiently will become more in demand.

hasenj · on May 26, 2017

It will drive down the wages of the less experienced programmers. Relatively less skilled programmers who still make higher than the average wage of the general population.

sshb · on May 26, 2017

gRPC already does automate huge chunk of CRUD apps creating routine

colejohnson66 · on May 26, 2017

The video says the datasets will be available on the Github repository, but I don't see anything...

jageen · on May 26, 2017

Github repo contain that it will be available on this repository later this year.

`To foster future research, our datasets consisting of both GUI screenshots and associated source code for three different platforms (ios, android, web-based) will be made freely available on this repository later this year. Stay tuned!`

huula · on May 26, 2017

Mapping screenshots to code is not hard. By having the model simply memorize the screenshots to code mappings of the training data can give you almost 100% accuracy (for some demo). What is hard is if given a new screenshot, how would this model generalize. To have something work for mobiles is a much easier task than having something work for other more complex UI though. Looking forward to seeing more updates on this!

wiz21c · on May 26, 2017

Yep, for example, more dynamic UI such as tables, list of components (for example kanban swim lanes), etc...