This is a neural network that takes an image and predicts very simple blocks (like BODY, TEXT, BTN-GREEN in the bootstrap example) and then uses a map to convert them to well-formed HTML. While I think it's a great learning example I think it's important to note that this does generalize at all to any other websites -- you are NOT going to replace an actual person writing HTML with anything like this.
You can see the mapping here: https://github.com/emilwallner/Screenshot-to-code-in-Keras/b...
I don't think it's clear, or likely, that this can extend to all possible html input tokens. As you add more tokens, it becomes more difficult for the network to choose among them accurately. Additionally, as the token set becomes more fine-grained the size of the output space will grow exponentially and the network will likely struggle to learn from the training examples as well as output valid structure.
I think you can compare this to approaches that receive an image as input and provide a caption of the image as output. Works surprisingly well in simple cases but no where near fully functional or actually capable of understanding all inputs.
I agree that this might be a feasible approach toward automatic UI code generation eventually, but this is several significant levels of complication away from that.
I think one big problem with image captioning could be lack of high quality training data. While in this case we can generate lots of good training data. Whether we will be able to generate enough good data and have enough compute power to train on them is something that we need to find out.
Playing go was considered a problem too complex to solve couple years ago, but it's now a solved problem. So I am hoping we can get a breakthrough on this sooner than we think.
However, it also makes me think that:
A: Maybe developers (and software engineers in general) should stop thinking their own jobs are necessarily safe from automation, since something like this could be the first step to the field going the same way as lorry or taxi driving will in future.
B: That agencies might start seeing their basic web development work start drying up in the not too distant future.
Dreamweaver and FrontPage have existed for over 20 years. The use-case listed here isn't all that useful - if you just want a quick prototype website built from a WYSIWYG graphical editor, there are already very good tools for that, starting with the above two and going through web services like SquareSpace, Wix, and Weebly. I found the discussion of his approach fascinating, though, because it could be applied to lots of other, unsolved problems. Imagine being able to publish a website by taking a photo of a poster using your smartphone.
A. No. Good developers can run up the abstraction chain to keep providing value at a higher level. Most good devs fundamental value prop isn’t “I can make something in HTML”.
If anything, this will mean that more people can do the redundant stuff faster.
As a frontend guy, I can’t wait for the day that a tool can generate presentational stuff.
Think back to CS in the 70s - AFAIK, the majority of the field that was understood is now covered before an undergraduate gets to their upper division classes. We're now expected to understand things at a much higher level of abstraction, but we ALSO need to know how logic gates and machine code works in order to fully understand what we're doing.
The 'incubation period' to produce a good dev is going to keep growing. The ones that make it to full dev maturity are going to be more productive than ever before, but fewer people will make it there.
Beside them will be "nurse" developers – people who are just as professional, but less learned in the full range of computer science ideas.
Even if writing, practicing and administering software legally required a degree and license, I think this still detracts. Maybe useful once software powers organic, higher-order life forms, at which point I imagine our doctors and nurses will be entirely software driven anyway.
Hopefully the sword will be more Excalibur and less Sword of Damocles.
^^ I have no idea what that means, but it seemed relevant. :)
Doctor is a long-running process that requires a callback from nurse.
Qualified agreement from this sometimes front-end guy, but that's assuming the generated presentational stuff is of equal or higher quality as that of the other front-end devs.
Being the front-end guy who has to debug the auto-generated code that doesn't quite work in a corner case or two sounds potentially worse than the status quo.
Squarespace is simultaneously worse than what _any-decent-designer_ would put together AND better than an average marketing website.
Sure, the days in which the web guy was viewed as a special guru won’t last forever. But even with auto-generated mockups, we’re still a long way from computers taking over everything.
Programmers will use these tools the way we always have: To automate the simple work on the way to building a finished product that's more complex, and done more quickly, than would have been possible previously.
Some markets die, yes. It's no longer possible to make a business out of selling Unix clones for commodity hardware, like you still could in the 1980s. It's no longer possible to make a living just making static HTML websites, like you could in the 1990s.
Ultimately, what programmers trade in is a specific kind of problem-solving design sense. Not just solving this single problem as fast as we can, but creating a more general solution, which composes well and can be understood and changed later on. Once programs can do that... well, we'd better be willing to offer them citizenship and recognize their civil rights as thinking beings.
Wouldn't the agencies most affected by this type to image-to-website tech already be affected by WYSIWYG operations, (squarespace, wix, wordpress, etc.)? I am curious how many agencies are developing simple html/css enough that this type of thing would dry up work.
As far as point A: I don't know if there will ever be a "final state" of web development. For every automation fix that is introduced to the web dev space, 10 new technologies are developed that provided an added layer of complexity.
I think the web is bit different from traditional jobs being taken over by automation in that (as far as we know) there isn't a hard limit on how far we can push web technology and use. If anything, the people who would be replaced by these automation services are (presumably) the ones who would be developing the automation systems and integration to begin with.
- As a communications/marketing/PR agency who provides the website as a component of a larger campaign. Many of these folks are already using platforms like Wix, Squarespace, etc., or Wordpress with purchased templates.
- Doing more complex site development that requires a lot of IA work and institutional change management. For example: migrating ancient federal HTML websites (yes some still exist) to Drupal.
Personally, I think bespoke web application development will be a strong area of innovation for many years to come as browsers and devices continue to increase in power. I'm more worried about keeping up with the fast pace of change than the entire industry drying up.
I've seen hundreds of products to automate this kind of tasks and all of them failed one by one. The simple answer is that if you would have a software which could replace code, it would have a learning curve as complex as learning to code ans so being then worthless to learn.
Many years ago, I was neck deep in frontend coding. I got so good that when given a mockup, I could code the skeleton of the page "blind" without viewing it in the browser. I would load the page at the end and grade myself on how accurate I was. I've always wanted to do a contest with other frontend coders to see who could get closest to a complex layout—like the NYTimes—in one go.
One day, this type of skill will become a curiosity—a relic of the past similar to horseback riding. I would be happy to see the automation of pure translation of boxes into code.
A large audience, smoke machine and lots of lasers, hard techno, and free beer. They did an amazing job setting the mood. They even have their own IDE that everyone competing is required to use (with combo-counter and cool rave-y visual effects).
But, one thing I know that I'll never do for fun is debugging IE6 browser compatibility issues. I'll never get that part of my brain back again.
edit: someone else found it!
There are some promising moves that direction (such as responsive resizing in Sketch and projects to automatically convert sketch files to React components), but we're not there yet—most designers I know are still producing largely static designs that have to be put through development cycles.
And I know you can live design with HTML and CSS, skipping the static mockup phase. That's my preferred method at the moment, but I'm here to tell you it's still pretty slow and tedious and makes me long for better web design tooling.
More broadly, to me, that combo is one of the tools that makes me feel most connected to the output, as the results are near instantaneous.
There's a reason Visual Basic was so popular in the 90s. That sort of development kind of got forgotten when the web overtook everything. And it's why Alan Kay made Smalltalk a visual environment, and not just raw code in an editor, because human beings are faster designing UIs with visual tools.
One thing I've learned is that it's extremely hard to get designers to think about structure and hierarchies within a user interface. I don't see an easy solution.
Maybe machine learning that suggests reasonable structural transformations during the design process is the way? I know that Airbnb's design tool team (Jon Gold etc.) are exploring something along those lines.
What I actually get out of this is not much better than emebedding the original image in an img tag, in fact it might be worse, because it creates technical debt that I now have to maintain.
I believe it would be a lot more effective if designers used tools that generated clean code directly, instead of Photoshop. If it's a SPA, then something with the actual, integrated React/Vue components for consistency and reuse.
While I'm not familiar with any such tool that's actually good enough for professional use (most WYSIWYG designers for the web I tried were terrible), I wonder why we haven't seen much success in this area yet, as we have with desktop app design since VB6.
I am not a huge fan of front-end dev, so I am really excited about the great opportunities behind this work/research. It will be another major step in the democratization of UI and less work at maintaining the web views.
My biggest concern is the model capacity. My question is: in your opinion, will it be able to handle very long HTML page? not to mention when it will be added a lot of CSS or JS. Even with more layers, will the model be able to generate corrected syntax(for really long pages)? will it need some additional modules(not only layers) to handle this? maybe a different problem formulation?
The tougher part is integrating JS or adding hover effects in CSS. In theory, this can be done with an attention layer, but I haven’t seen any papers on it.
When I owned a web design firm, if the UI wasn't polished, clients had a hard time wrapping their head around the design or what the final product might look like. In every case I remember we had to make it pixel perfect in the design phase and then again in the development phase.
It was a huge pain and an area ripe for innovation but I think the problem like other posters have mentioned is a lack of quality responsive design tools that are better than simple HTML/CSS design in the browser.
Then again, if you used some additional inputs like "readable code", it theoretically could optimize both the resulting application and the source code.
I'm okay with this though. Creating websites is tedious and is most analogous to cranking a handle as it is a learnable skill. After a while it is clear it is just banging characters and seeing what changed on the screen this time.
Bear in mind I'm only talking about implementing the layout, colors, texts and responsiveness. Any dynamic functionality such as forms, buttons, dialogs, integrations and talking to servers is still potentially very complex and difficult taking.
Actually, it's poorly though-out web technologies that don't allow for reasonable WYSIWYG editing.
I'm not smart enough to take the code and get it running...
git clone https://github.com/emilwallner/Screenshot-to-code-in-Keras
pip install -U floyd-cli
(login to floydhub.com, a super simple platform for cloud GPUs)
floyd init picturetocode
floyd run --gpu --env tensorflow-1.4 --data emilwallner/datasets/imagetocode/2:data --mode jupyter
Then, navigate to the folder floydhub/Bootstrap/test_model_accuracy.ipynb in your Jupyter Notebook and click on Cell > Run all
Now you can see the prediction and the correct markup for all the evaluation images. Ping me on twitter: @emilwallner, if you get stuck.
Thanks a lot for a great write-up. Truly a good browse for anybody learning Neural Nets.
Also who is to say that it will not be able to do more in the future. More features could be added.
You need to understand what has been designed and the goals of the project to know which tags to use. I need to actually read the content to know if I should use <article> or <section>. There's not much time saved if I have to go back and readjust everything, I may as well have written it the way I wanted it in the first place.
It won't be able to do more in the future, computers can't understand things.
To me, it's still up in the air whether we understand things, or if we're just applying a lot of simple, low level, information processing networks. Recognizing objects in vision, for example, turns out to be a serious of clever organizational and optimization tricks around simple mathematics.
So in order for a computer to understand something, it has to also understand something else, otherwise it can only regurgitate back what you've given it. If I tell a computer "there is snow covering a hill" and it responds "a hill is covered in snow" I am not sure it really understand what I said. But if it responds "snow blankets the hill"... Ohh! Now it understands!
This begs the question, what was the first thing we understood? I don't know! Maybe we're born with a certain level of understanding? Maybe our biology imparts a baseline understanding-of-things? If that's the case, then computers can never understand something; they have nothing to build off of, only what we tell it to regurgitate.
What do you think about the classic example from computational linguistics? Is that approaching understanding?
That seems like a neat trick, not actual comprehension, but I don't know why.
 King + Women = Queen, see https://cacm.acm.org/news/192212-king-man-woman-queen-the-ma...
A <p> tag will be read out by a screen reader as "paragraph", but a div tag will not read out anything but the text inside. I only use a <p> tag if I want my accessible users to know that this is a paragraph (of a larger body of text).
I suppose you could make similar arguments for the other semantic tags. You use them when the content explicitly matches the intent of the tag so that other software can pick up and read it with a bit of... ahem understanding.
However, it's an interesting technical problem to adjust the code. Does anyone have a rough idea how to implement it?
It's popular to apply AI to healthcare, transportation and legal work. However, these fields are heavily regulated, require domain expertise and data is hard to get. The nature of front-end dev and other digital skills make them more ripe for automation. I'm surprised there is so little progress in this area.
Props for the marketing though.