Hacker News new | past | comments | ask | show | jobs | submit login
Pix2code: Generating Code from a GUI Screenshot (github.com/tonybeltramelli)
290 points by visarga on May 26, 2017 | hide | past | favorite | 66 comments

Oh, man, if only the designs I got were this simplistic and only used standard controls with the standard look and feel.

This is a very interesting start, but it's a long long way from being able to even represent in a simple layout what I am asked to do on a regular basis as an iOS dev.

Maybe combine this with a PaintCode back-end and add markup for layout behaviors?

"Design hubris" is such a common problem for mobile/front end developers and the source of so much wasted time I'm surprised it's not talked about more commonly. Shit's expensive, man!

I am thankful that these days I get to work with talented designers who know what a UITableView is and understand the limitations of the medium.

It's also funny when the design has some short placeholder text somewhere. Then the actual real text ends up being longer and won't fit there, or will look like crap.

I wrote a system with this concept back in the early 00s, to let designers drag/drop a design on-screen and then read the positioning of the dropped elements to compute the HTML that would produce their design... it was a dismal failure. What I found was that designers who wanted to work with the web didn't want an easy-to-use tool. They either wanted to dig in and learn HTML themselves, or just keep using Photoshop. And end users building personal sites didn't want to innovate on their designs, they wanted to use pre-made templates and just fill in their own text, change colors and images, etc.

Maybe the pervasiveness of computing has increased to where there now would be a market for such things... it has been 15 years since I failed to find an audience after all. And I wouldn't hold back on developing the technology, but I'd put a decent amount of effort into a product fit before getting too excited.

As an engineer, I'd be interested in the quality of the code generated by this tool. I have too many nightmares from the early days of web dev where generated code was a mess because FrontPage.

It can be done, but it's rare. I actually really enjoyed working with Expression Blend (Microsoft's front end xaml wysiwyg design program). The code that it outputted was usable, which after seeing the things that dreamweaver made was a breath of fresh air.

Agreed. Dreamweaver and FrontPage committed acts of mass monstrosity with the code generated.

I submitted this to some subreddits and a couple of people made some quite similar questions that I'd like to see a video for in case OP here is the author or in case the author sees this.

>What would happen if I put something other than a clear mockup as input? Replications?

>What happens if you try to scan console output [1] or this mess [2]?

1: http://i.imgur.com/LJqJUnF.png

2: http://i.imgur.com/OMbZsgd.png

Side rant: the page has a bibtex entry for arxiv. Look, it's all well and good to publish early and often, but I find I still have some reservations.. I mean, if I use this as a basis for some future work, obviously I will cite it, but I don't particularly feel good about citing an arxiv publication. It completely side-steps the peer review process, and I feel that in the long run, that is bad.

You might say, well working code is working code, and sure I've cited software in the past, and having an article to go with it is even better, but it's getting to the point that people are using arxiv not as a preprint service but as a publishing platform. I find this frustrating for two reasons: 1) like I said, it skips peer review and even allows people to cite rejected papers (for better or for worse), and 2) it makes the race to publish that much more severe -- now if I wait until a conference or journal publishes my work, I'm 6 months behind the guy who just uploads it to arxiv and is already getting dozens of citations in current work.

So, perhaps this is not the place for this debate, but putting aside the fact that preprint does seem like a useful way to "pre-publish", do you think it's appropriate to cite preprint papers and work? What are the implications for computer science as a research field down the road, since "free for all" seems to be taking over as a publication medium?

I know this will come off as being old fashioned, but I'm really worried about where research publication is going in this field. It feels like a knee-jerk reaction to first-to-publish pressure, rather than something that is a well thought-out solution.

In reality, the competition isn't between "posting to arXiv" and "getting peer-reviewed". The competition is between "posting to arXiv" and "posting to your blog". If you succed in manufacturing a stigma against arXiv papers, then you're just going to encourage people to make blog posts instead, which can and do go down at any time.

> If you succed in manufacturing a stigma against arXiv papers

That's not exactly my intention, and your statement goes in line with what I said..

> sure I've cited software in the past, and having an article to go with it is even better

where "software" could also include blog posts. It's just that, yeah, when it comes to it if I have to choose between citing something some guy wrote on a blog or citing Arxiv, I'll choose the latter every time.

But my question was not about whether arxiv is an appropriate medium for publishing ideas, I'm certainly not arguing that it should go away, but rather whether it's an appropriate thing to be citing in a scientific context. i.e. in derived work.. should it be "okay" for people to publish to arxiv and just.. leave it there and not put it in a conference or journal? Should such work be validated by citation?

It's a legitimate question that I don't know the answer to. If you get an idea from there, you can't just.. not cite it.. and yet, I feel like arxiv should be used only as long as the work will eventually get a proper publication. Which more and more it seems is not a given.

I'm really just responding to the statement on the posted link, which is just a header, "Citation", and an Arxiv-bibtex entry. No "submitted to X..", "in press", etc., or anything.

Again, it's not that I'm totally against the idea, but I think there must be some happy medium between "peer reviewed work" and "well-written but unreviewed article". Typically this used to be conferences, but conferences are expensive, and with people uploading their pre-conference unaccepted publications and those getting citations, I mean.. where will this end?

I was a bit shocked recently while writing an ML article to find that the "official" original reference, that you see cited everywhere, for the whole concept of "style transfer" appears to be an arxiv paper [1]. My reservation doesn't so much come from the fact that the authors put their work there, but more that this is what people are actively choosing to cite over their peer-reviewed conference publication [2]. When does "pre-print" become just "print"?

(Currently Google Scholar reports 216 citations for the former and 74 citations for the latter. Of course they were published a few months apart so it's not a great comparison but still, just an example..)

[1]: https://arxiv.org/abs/1508.06576 [2]: http://www.cv-foundation.org/openaccess/content_cvpr_2016/ht...

I think the reason for this might be because a lot of machine learning research and practice takes place outside of academia. Maybe there isn't the same pressure to publish in official sounding places.

And people outside of academia often don't have access to journal publications. Who wants to spend $30 per paper? That's just obscene. The arxiv link is accessible everywhere, will never go away (probably), and can be updated as the authors revise the paper. Why not cite it?

Your argument is ridiculous.

Conferences and journals are merely marketing venues, and there is no reason to slow down the field by clinging on to flawed review process. If you adopt this attitude you will only find co-authors and students abandon you for fear of getting scooped.

In the case of building a nuclear reactor, the knowledge one relies on must be vetted by experts.

In the case of building web apps and such, sure, why not put it out there ASAP?

Huh. I guess I didn't succeed in getting my point across. I suppose it is a it subtle so I shouldn't be surprised. What I was expressing skepticism for was not whether or not things should be uploaded early to Arxiv, but whether Arxiv should subsequently be considered the reference for that work, rather than preferring a peer-reviewed path.

I wrote a similar system and plan to release the source for mine over the next few weeks. it produces working code that can be run on linux,mac,windows.

Did we collectively give up on building UI building/organization tools that are easy enough for designers to use?

I understand that you can't control what tools designers use (whether sketch or photoshop or MS paint), but it seems like building a tool they don't hate using that builds UIs is the way simpler solution? There are already mockup apps that include even basic functionality that designers use...

Microsoft Expression Blend is very good.

It's best to make it so easy for a designer to make responsive (and even adaptive) layouts (for both native mobile and web) that this AI-based contraption that is guaranteed to produce wrong outcomes some of the time becomes unnecessary, at least for any practical purpose (unless the goal is the automated copying of web pages which is a lot bigger problem than guessing the markup, given how much complex dynamic behavior is built into pages these days)

I've taken an attempt at simplifying the task of building responsive and adaptive React Native apps with the following little library.


Looking forward to when the code actually becomes available.. until then, it is a cool demo video :)

Awesome before I got into web design and became a "full stack developer" I wanted this same thing... asked about it on a forum. I'll have to read the article/check it out to see if you literally do something like pixel mapping or "Open CV" throw that in there to be safe. "Bit mapping?"

edit: I didn't ask if this was for web or app, it looked like it was for applications... wouldn't know how to do layout on that but can do it with HTML/CSS, maybe with Electron you could do that for Desktop apps, not sure about Android/iOS though.

The demo video shows two examples and iOS UI and a web UI, it looks like it's using Bootstrap for the web UIs

Ahh thanks for the clarification, I probably should have watched it full screen.

I suppose once you know how layout works with XML (Android) it wouldn't be hard to translate... assuming you've got the layout down... interesting to see their decisions whether to decide if thing are grouped together or work separately/positioned independently like a menu icon. Unless it's absolutely positioned and not responsive/dynamic.

At any case thanks

Neat concept, but it wouldn't work for creating modern GUIs, where half the controls are invisible until moused over (desktop), or hidden behind hamburger menus and swipe gestures (mobile).

Not only that, all those magical solutions are doing is solving the easy part of development, which is laying down the widget-based layout. The actual hard parts (different behavior based on context such as viewport size/orientation, actual UI business logic) are an afterthought.

Impressive work, yet misguided in many ways in my opinion.

It is a cool concept. However, what it doesn't seem to cover, and what would fix the issue you are speaking about, is coding for a set of images. Given a set of wireframes, with expected post effect and interactivity hints interleaved, I would expect this generator to build the code that covers all the specs, balancing the spec needs and being able to hold the stack in memory until all the known elements are accounted for before writing the final code.

Interesting, wonder how it handles much more complex setups. I find a big part of taking a design from Photoshop and getting it running on iOS/Android is often more about thinking about what the constraints are, i.e. does an element sit at an absolute distance from a screen edge, does it sit a specific distance from another element, does it expand to fit an area, you obviously have to do this to ensure it works at different screen resolutions. These would certainly be hard decisions for AI to get right on a relatively complex screen, but then again, maybe with enough training it could actually do really well and solve these problems in a completely different way to how a human would. There is also stuff like considering whether any of the information is dynamic (text being received from a server) in which case elements needs to be able to adapt to different sizes etc. again hard for AI to have any clue from this from a photoshop image only.

I think this is just creating the layout files, all the work of wiring it up still has to be done by the programmer. This is really cool and probably a glimpse of the future, but honestly it would probably be easier to train the designer on how to use the drag and drop UI builders that already come with these platforms.

That assumes the designers want to learn them. Sure, some of them are happy to, but this negates the need for them to do anything other than concentrate on their area of concern.

Why train someone to do grunt work when the machine can do it?

Translating picture designs into html files is not really that much fun.

Now the next step (probably more difficult) is you need a way to let the programmers "hook" into the generated output in order to further tweak it or customize it, but _without_ having to modify the generated files.

_That_ would make this technology really viable.

It looks like a very good first step. I wonder how structured/semantic the generated HTML would be for engeniers to "hook' the generated code with a backend

Very cool, I wonder if the web one ends up responsive given a bootstrap-y model.

A tool like this might produce a UI that looks right on the surface. But what about things like accessibility (e.g. for blind users with screen readers) that even many human developers don't get right?

This is just image to layout. The code here is really some nested tree structure. Even though it is still very interesting result, says it is pix2code might be a little misleading.

This reminds me of a tool AutoHotkey had that let you rip existing UIs. It would read the window and rebuild the UI with the exact layout.

SmartGUI I think.

this would address a huge need in mobile UI if we could upload different screenshots for different screen sizes e.g. iPad and iPhone or different iPhone orientations and it would autogenerate an efficient set of dynamic autolayout constraints, content hugging etc to achieve both UIs.

What about PSDs? PSD to html is a big market and a product like this could be a good fit.

PSDs are layered, and a correctly organized PSD file with some layer naming conventions should contain enough information to generate layout code without needing to use machine learning.

Altia has a product (PhotoProto) that can import PSDs to their tool which can then generate C code for embedded systems.


By generating the "code" are you merely inferring / representing UI layout as some kind of a tree. E.g. each button/panel/textbox gets detected, and an "RNN/LSTM" generate a tree structure using attention.

"merely" is not the word I would use.

If you somewhat familiar with DL literature, you will see this paper, while having a very interesting angle, the underlying architecture is a standard, enc-dec network, with encoder being CNN and decoding being LSTM. Such application, has been studied before:


The above paper shows nice result that turns image to latex expression, and image to html.

How does this compared to WYSIWYG editors?

I seriously hope that AI can soon take over the majority of work (usually mundane) involved in create CRUD applications. Leaving the programmer with simply customizing certain parts or writing very specific business or validation logic.

It might drive down the wages of _some_ programmers but it will at the same time free us to work on more interesting problems.

I'm hoping that it would just free us from work entirely ;-)

Maybe OT but "free from work" is kind of an oxymoron.

You can't be free if you don't produce. You will be a slave to people who do produce.

If you lack the imagination to consider such a future, learn from the best and read the Culture series of novels.

Imagination is easy. Doesn't mean it's viable (or desirable).

You cannot imagine your way out of human nature.

You'd need at minimum the entire population to be of sufficiently high IQ and high agreeableness (and low aggressiveness). This kind of population does not exist, and probably will not exist in the foreseeable future. If it did exist, it will probably be run over by another population of high aggressiveness. So unless the population of the entire planet is as such, this will continue to be mere imagination.

Mate, my grapes need attentions that I cannot give cause I'm working.

My home made spumante takes time that I have to take from other tasks, such as laying at the beach

My craft beers won't make themselves

I have tomatoes, potatoes, beans and fruit to grow

And of course programming as an hobby

I would have so much to do, if I didn't have to work…

AI will take care of your grapes and your craft beers will indeed make themselves, as will your tomatoes, potatoes, and beans.

Laying at the beach will get boring after two weeks probably.

I think there was a comment on HN yesterday relevant now: oh the wonderous things we could do with Haskell if we neither had nor needed jobs.

> AI will take care of your grapes and your craft beers will indeed make themselves

That's even better!

I love watching the others doing things

> Laying at the beach will get boring after two weeks probably.

You really lack imagination then…

Maybe your internal robot is still too strong

Yu will get it, eventually

> Laying at the beach will get boring after two weeks probably.

Learn to surf and change your mind.

You cannot imagine your way out of human nature, but you should imagine that you might not know exactly what that human nature is.

You don't have to know it exactly. You already know enough. There are enough people on the planet with low IQ and/or high aggressiveness to make this infeasible.

so low IQ and high aggressiveness is human nature?

I'm always wondering why this kind of tasks can't be largely automated already. It's not like you need AI for generating what largely amounts to default plumbing code.

It shouldn't be that difficult to generate back-end code for say a Bootstrap template. It should even be possible to generate both a Bootstrap template and the corresponding back-end code from something like a Balsamiq mockup.

The problem with generated code though used to be that it tended to both break quickly when confronted with non-default requirements and to make customisation by (human) developers much more difficult and cumbersome.

So, perhaps the route to having largely automatically generated CRUD apps isn't so much paved by the code generation process - involving AI or not - itself but by how easily the generated code can be extended afterwards. I'm envisioning to never have to touch the actual CRUD code. I'd rather like that CRUD code to provide extension points for additional services or functions to tack on to.

It wont drive down wages, because of Jevon's Paradox. The commodity that is used more efficiently will become more in demand.

It will drive down the wages of the less experienced programmers. Relatively less skilled programmers who still make higher than the average wage of the general population.

gRPC already does automate huge chunk of CRUD apps creating routine

The video says the datasets will be available on the Github repository, but I don't see anything...

Github repo contain that it will be available on this repository later this year.

`To foster future research, our datasets consisting of both GUI screenshots and associated source code for three different platforms (ios, android, web-based) will be made freely available on this repository later this year. Stay tuned!`

Mapping screenshots to code is not hard. By having the model simply memorize the screenshots to code mappings of the training data can give you almost 100% accuracy (for some demo). What is hard is if given a new screenshot, how would this model generalize. To have something work for mobiles is a much easier task than having something work for other more complex UI though. Looking forward to seeing more updates on this!

Yep, for example, more dynamic UI such as tables, list of components (for example kanban swim lanes), etc...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact