
Turning web design mockups into code with Deep Learning - narenst
https://blog.floydhub.com/turning-design-mockups-into-code-with-deep-learning/
======
chrisfosterelli
A lot of people in this thread seem to think that this is a neural network
that takes an image and produces HTML, when that's not the case here at all.

This is a neural network that takes an image and predicts very simple blocks
(like BODY, TEXT, BTN-GREEN in the bootstrap example) and then uses a map to
convert them to well-formed HTML. While I think it's a great learning example
I think it's important to note that this does generalize at all to any other
websites -- you are NOT going to replace an actual person writing HTML with
anything like this.

You can see the mapping here: [https://github.com/emilwallner/Screenshot-to-
code-in-Keras/b...](https://github.com/emilwallner/Screenshot-to-code-in-
Keras/blob/master/floydhub/Bootstrap/compiler/assets/web-dsl-mapping.json)

~~~
houqp
You are correct. But like the other comment mentioned, the cool part about
this is it automatically learns the mapping from image to sequence of tokens.
To handle arbitrary html, we just need to extend it a little bit and convert
all possible html input into tokens. I think the take away is this might be a
feasible approach towards automatic UI code gen.

~~~
chrisfosterelli
Automatically learning the mapping from an image to a sequence of tokens is a
very fundamental task for CNNs and not particularly new.

I don't think it's clear, or likely, that this can extend to all possible html
input tokens. As you add more tokens, it becomes more difficult for the
network to choose among them accurately. Additionally, as the token set
becomes more fine-grained the size of the output space will grow exponentially
and the network will likely struggle to learn from the training examples as
well as output valid structure.

I think you can compare this to approaches that receive an image as input and
provide a caption of the image as output. Works surprisingly well in simple
cases but no where near fully functional or actually capable of understanding
all inputs.

I agree that this _might_ be a feasible approach toward automatic UI code
generation eventually, but this is several significant levels of complication
away from that.

~~~
houqp
Agree with you that this is by no means a complete solution and there is a
long way to go to make it actually usable.

I think one big problem with image captioning could be lack of high quality
training data. While in this case we can generate lots of good training data.
Whether we will be able to generate enough good data and have enough compute
power to train on them is something that we need to find out.

Playing go was considered a problem too complex to solve couple years ago, but
it's now a solved problem. So I am hoping we can get a breakthrough on this
sooner than we think.

~~~
fnl
Not wanting to dampen your high hopes, yet, the rules of Go seem a lot simpler
than the rules and grammar of the current hypertext markup "language",
particularly if taking the "browser dialects" into account, which are crucial
for professional pages....

------
CM30
Well, this seems like it could be useful for the times you just want a quick
prototype mocked up and can't be bothered to code it, or when you're dealing
with sites that don't need much in the way of dynamic functionality.

However, it also makes me think that:

A: Maybe developers (and software engineers in general) should stop thinking
their own jobs are necessarily safe from automation, since something like this
could be the first step to the field going the same way as lorry or taxi
driving will in future.

B: That agencies might start seeing their basic web development work start
drying up in the not too distant future.

~~~
iambateman
B. Yes.

A. No. Good developers can run up the abstraction chain to keep providing
value at a higher level. Most good devs fundamental value prop isn’t “I can
make something in HTML”.

If anything, this will mean that more people can do the redundant stuff
faster.

As a frontend guy, I can’t wait for the day that a tool can generate
presentational stuff.

~~~
macintux
I agree with your response to A in the short term. In the long term, however,
much like a mountain in a flood people can only scramble so high before they
run out of oxygen, or run out of room.

~~~
iambateman
To take your metaphor a bit too far, it’s fortunate that we have more than one
mountain. ;)

Sure, the days in which the web guy was viewed as a special guru won’t last
forever. But even with auto-generated mockups, we’re still a long way from
computers taking over everything.

------
jamesjyu
This is really great.

Many years ago, I was neck deep in frontend coding. I got so good that when
given a mockup, I could code the skeleton of the page "blind" without viewing
it in the browser. I would load the page at the end and grade myself on how
accurate I was. I've always wanted to do a contest with other frontend coders
to see who could get closest to a complex layout—like the NYTimes—in one go.

One day, this type of skill will become a curiosity—a relic of the past
similar to horseback riding. I would be happy to see the automation of pure
translation of boxes into code.

~~~
janneklouman
Sorry if this is OT, but these types of contests exist! I went to one of
these[1] maybe two years ago in Stockholm and I had a blast. I think the
format was 32 people competing, 8 on stage at a time being shown a design
which they had 15 minutes to mimic without previewing, then the crowd voted
for winners using their smartphones. Two winners from each group of 8 went on
to the finals. The competitors screens were mirrored on screens facing the
crowd, so everyone could see how people tackled the problem in real-time. One
guy did an ASCII representation of the design, which the crowd enjoyed enough
to send him to the finals.

A large audience, smoke machine and lots of lasers, hard techno, and free
beer. They did an amazing job setting the mood. They even have their own
IDE[2] that everyone competing is required to use (with combo-counter and cool
rave-y visual effects).

[1] [http://codeinthedark.com/](http://codeinthedark.com/)

[2]
[https://github.com/codeinthedark/editor](https://github.com/codeinthedark/editor)

~~~
jamesjyu
That is incredible! I knew there would be the off chance that someone would
have put something like this on, but this looks like they went all in.

------
toddmorey
I'm still waiting for the generation of tools that are native to the
medium—fluid, responsive, interactive, and aware of the final context for the
design. So we don't start with static mockups that have to be converted, but
rather build web-native from the beginning.

There are some promising moves that direction (such as responsive resizing in
Sketch and projects to automatically convert sketch files to React
components), but we're not there yet—most designers I know are still producing
largely static designs that have to be put through development cycles.

And I know you _can_ live design with HTML and CSS, skipping the static mockup
phase. That's my preferred method at the moment, but I'm here to tell you it's
still pretty slow and tedious and makes me long for better web design tooling.

~~~
oliv__
Honestly, I actually find it really really fast. I don't know how long you've
been doing this and I suspect that may be the issue here, but for me it just
comes so naturally now that I can't think of any other tool someone could use
to outrun me using CSS/HTML.

More broadly, to me, that combo is one of the tools that makes me feel most
connected to the output, as the results are near instantaneous.

~~~
goatlover
Have you ever watched a good designer mockup something using Indesign,
PowerPoint or a Wordpress Theme like Divi? They're superfast with the visual
tools. I doubt you can match that in CSS/HTML.

There's a reason Visual Basic was so popular in the 90s. That sort of
development kind of got forgotten when the web overtook everything. And it's
why Alan Kay made Smalltalk a visual environment, and not just raw code in an
editor, because human beings are faster designing UIs with visual tools.

------
albertgoeswoof
While cool I am not sure if this is genuinely useful. I can now take my design
and convert it to HTML- now what?

I still have to integrate my api, handle responsiveness, add JavaScript and
other animations/actions etc. So I’ll probably end up rewriting most of it
anyway.

What I actually get out of this is not much better than emebedding the
original image in an img tag, in fact it might be worse, because it creates
technical debt that I now have to maintain.

~~~
juliushuijnk
I think the potential is in converting images (sketches) into ux source files
that you can edit immediately and then convert those to HTML, PDF, DOCS, etc.
I can see that the HTML would more likely used for a clickable prototype than
the real thing, but it could still be a neat way to save some time. I'm
working on a (text-command) ux tool and have been thinking about such a
machine learned import feature, but lack those skills and time at the moment
to build it.

------
ReDeiPirati
Wow, really cool article!

I am not a huge fan of front-end dev, so I am really excited about the great
opportunities behind this work/research. It will be another major step in the
democratization of UI and less work at maintaining the web views.

My biggest concern is the model capacity. My question is: in your opinion,
will it be able to handle very long HTML page? not to mention when it will be
added a lot of CSS or JS. Even with more layers, will the model be able to
generate corrected syntax(for really long pages)? will it need some additional
modules(not only layers) to handle this? maybe a different problem
formulation?

~~~
emilwallner
The model uses 48 tokens at a time to make the next prediction. As long as it
can keep track of where it is on the design image, it’s not a problem. In the
final version, it has roughly the same accuracy (~97%) on short and long
websites.

The tougher part is integrating JS or adding hover effects in CSS. In theory,
this can be done with an attention layer, but I haven’t seen any papers on it.

------
gojomo
Another step closer to my dream: a cross-compiler that takes as input a
Powerpoint presentation, and outputs the enclosed-described mobile app, cloud-
hosted website/backend, and Delaware C Corporation.

~~~
snaky
Or just ICO whitepaper.

------
everdev
I'm not sure about the usefulness of coding UI prototypes.

When I owned a web design firm, if the UI wasn't polished, clients had a hard
time wrapping their head around the design or what the final product might
look like. In every case I remember we had to make it pixel perfect in the
design phase and then again in the development phase.

It was a huge pain and an area ripe for innovation but I think the problem
like other posters have mentioned is a lack of quality responsive design tools
that are better than simple HTML/CSS design in the browser.

------
sekou
This is a cool idea. I think this technology could potentially assist front-
end developers and the designers who work with them even though there's still
a fair amount of craft involved. Browser compatibility has gotten better on
the desktop in recent years but the mobile appearance of websites is as
diverse as ever. It might be interesting to explore "mobile-first" or
"progressive enhancement" applications of this technology.

~~~
tinymollusk
Did you see any of the output source? I wonder if it's clean enough to be
understood by humans, or if it's one of those situations where it's been
optimized to work but not be understood.

Then again, if you used some additional inputs like "readable code", it
theoretically could optimize both the resulting application and the source
code.

~~~
sekou
Yeah I believe it's about 2/3rds down the page under the heading "Links to
generated websites." It's a far cry from the old Dreamweaver days. I could see
running into some complexity with CSS. Things like which units to use (em,
vh/vw, %, px, rem, etc), building a readable selector hierarchy, or using
newer CSS technologies like CSS grid or animations.

------
wrangler99
In the short-term, this approach will struggle to compete against WYSIWYG
editors. But as soon as they can match them in output, they’ll improve
exponentially faster. WYSIWYG editors has a ton of code to maintain, while a
model is simple to improve.

------
tonybeltramelli
It's awesome to see how people have picked up and built on top of pix2code
(original author here). Very exciting time for front-end development in
general!

------
huac
If the goal is to recreate the _layout_ of a site then the context text isn't
really important. You could represent each letter with a filler character e.g.
. Then, you only need as many tokens as you have words with distinct numbers
of characters. This approach (similar to how we use lorem ipsum for
prototyping) would dramatically reduce the complexity of the model.

~~~
rburhum
Why wouldn't you be able to train it to recognize expected behavior from a
mockup? You could easily create a nomenclature of symbols to define standard
behavior (⇑ could mean draggable upload for example). Even if you can get it
to take you 80%, that would be huge.

------
sebringj
When an AI does create a web page successfully in terms of replacing what a
human can do completely, it will probably make the code in such a way that it
is unmaintainable because why make it maintainable? We would not have to
maintain it anyway. It would be like spaghetti code yet would look amazing on
the presentational side. No one would give a damn anymore about coding styles
or elegant frameworks like react, you could just bark at it and it would morph
around. Sites/apps would just be thoughts you could construct on a whim that
typically changed frequently or autonomously to produce the maximum desired
result. Meanwhile we're sucking down nutrient packs floating in stasis in our
travel pods spending the majority of our time in virtual consciousness.

~~~
raisspen
That is just a further step in the training that could be undertaken. Many of
the advances in machine vision come piecemeal for instance. One team advances
one idea that does some thing(s) very well, but has some drawback(s) which
another team comes up with a solution for at a later date. Maintainability of
code could be something the AI is eventually trained to take into account.

~~~
serpix
maintainability is irrelevant if the whole site can be regenerated in seconds
with the new changes required.

I'm okay with this though. Creating websites is tedious and is most analogous
to cranking a handle as it is a learnable skill. After a while it is clear it
is just banging characters and seeing what changed on the screen this time.

Bear in mind I'm only talking about implementing the layout, colors, texts and
responsiveness. Any dynamic functionality such as forms, buttons, dialogs,
integrations and talking to servers is still potentially very complex and
difficult taking.

------
romaniv
_> Currently, the largest barrier to automating front-end development is
computing power._

Actually, it's poorly though-out web technologies that don't allow for
reasonable WYSIWYG editing.

------
ethbro
Stack Overflow answer iterator + genetic build algorithm = 90% of software
development

~~~
y4mi
You'd be right, because most software development creates more code dept than
features.

~~~
trendia
I think you mean "debt" \-- I first read your comment as "code department" and
was trying to figure out what that was.

------
ShirsenduK
I have worked mostly on frontend development and have to say that the quality
of pixel perfect code being generated will make many a designers happy!

Super awesome!

------
andegre
I know alot of you smart ones think this isn't that great, but I think this is
incredible. I'd love to see someone host this someplace so I can just upload
an image and see what it spits out.

I'm not smart enough to take the code and get it running...

~~~
emilwallner
It's not rocket science, I think you can figure it out. To get started:

git clone [https://github.com/emilwallner/Screenshot-to-code-in-
Keras](https://github.com/emilwallner/Screenshot-to-code-in-Keras)

pip install -U floyd-cli

(login to floydhub.com, a super simple platform for cloud GPUs)

cd Screenshot-to-code-in-Keras/floydhub

floyd init picturetocode

floyd run --gpu --env tensorflow-1.4 --data
emilwallner/datasets/imagetocode/2:data --mode jupyter

Then, navigate to the folder floydhub/Bootstrap/test_model_accuracy.ipynb in
your Jupyter Notebook and click on Cell > Run all

Now you can see the prediction and the correct markup for all the evaluation
images. Ping me on twitter: @emilwallner, if you get stuck.

~~~
andegre
Thank you sir, I'll try that out!

------
huula
Cool work! I'm very interested in this topic. Just wondering, how good does it
generalize your training data other than just remembering strict input-output
mapping?

~~~
emilwallner
The bootstrap version generalizes with 97% accuracy on a new image. Because
the vocabulary is limited, you can train the model overnight. To make the
model generalize with all the HTML/CSS markup you need significantly more
compute.

------
rossdavidh
This is, approximately speaking, useless. Which is not to say it shouldn't
have been done; you have to make a lot of useless prototypes with any new
technology before you can actually make something useful. So long as one
considers it in this light, it's cool. Just don't get too excited about not
having to do this work yourself (if you need it done) for at least the next
several years (perhaps decades).

~~~
erAck
I don't think it will be decades. AI and DL are on a very fast pace the last
2-3 years and progress increases exponentially.

------
bbayer
This is great example of deep learning that is applied to real world problem.
I am just curious if that can be done more easier and more robust by using
simple image processing algorithms? Box detection and OCR can work well and
may produce better results with different types of mockups. Sometimes I feel
like we are making problems even more complicated trying to solve them by
using popular concepts.

------
sova
"Nothing made sense until I understood the input and output data. The input,
X, is one screenshot and the previous markup tags. The output, Y, is the next
markup tag. When I got this, it became easier to understand everything between
them. It also became easier to experiment with different architectures."

Thanks a lot for a great write-up. Truly a good browse for anybody learning
Neural Nets.

------
jbob2000
You didn't turn anything into "code", you converted an image to markup; you've
just "described" the image. This doesn't work for any sort of web app. A flat
image from a designer does not convey enough information to build a proper
interface. What of responsiveness? Or hooking it up to an API? Or
accessibility? Or animations?

~~~
anfilt
I would agree. However, most websites start out as such a design image.
Getting the markup for the design, and adding onto the generated html/css
template could speed things up.

-edit Also who is to say that it will not be able to do more in the future. More features could be added.

~~~
jbob2000
Markup is more than how it looks like. Some times a <p> tag should actually be
a <span> or a <div>. Sometimes a button should be an <a> tag, sometimes it
should be a <button>. I wouldn't use the <article> tag if the content wasn't
standalone (a blog post, newspaper article, etc.).

You need to _understand_ what has been designed and the goals of the project
to know which tags to use. I need to actually read the content to know if I
should use <article> or <section>. There's not much time saved if I have to go
back and readjust everything, I may as well have written it the way I wanted
it in the first place.

It won't be able to do more in the future, computers can't _understand_
things.

~~~
tinymollusk
Do humans understand things, or does our inner voice merely tell us we do?
Without more concrete answers to some very vague/often infuriating questions,
it's a very bold and unsupported position to suggest that computers cannot
understand something.

To me, it's still up in the air whether we understand things, or if we're just
applying a lot of simple, low level, information processing networks.
Recognizing objects in vision, for example, turns out to be a serious of
clever organizational and optimization tricks around simple mathematics.

~~~
jbob2000
That's a fair point! Part of understanding something is being able to explain
it in different terms. We tend to create metaphors to display our
understanding of things. Merely rearranging the words in a sentence does not
mean you understand it, my grade school teacher would dock me marks for that.

So in order for a computer to understand something, it has to also understand
_something else_ , otherwise it can only regurgitate back what you've given
it. If I tell a computer "there is snow covering a hill" and it responds "a
hill is covered in snow" I am not sure it really understand what I said. But
if it responds "snow blankets the hill"... Ohh! Now it understands!

This begs the question, what was the first thing we understood? I don't know!
Maybe we're born with a certain level of understanding? Maybe our biology
imparts a baseline understanding-of-things? If that's the case, then computers
can never understand something; they have nothing to build off of, only what
we tell it to regurgitate.

~~~
tinymollusk
So, if I understand you correctly, being able to generate context-aware
comparisons from similar knowledge is understanding? (I promise I'm not trying
to trap you -- I think this is interesting and would like to explore more.)

What do you think about the classic example from computational linguistics[0]?
Is that approaching understanding?

That seems like a neat trick, not actual comprehension, but I don't know why.

[0] King + Women = Queen, see [https://cacm.acm.org/news/192212-king-man-
woman-queen-the-ma...](https://cacm.acm.org/news/192212-king-man-woman-queen-
the-marvelous-mathematics-of-computational-linguistics/fulltext)

------
sedachv
Really cool. I wonder what a differentiable HTML renderer (akin to
[https://www.researchgate.net/publication/270158331_OpenDR_An...](https://www.researchgate.net/publication/270158331_OpenDR_An_Approximate_Differentiable_Renderer))
would look like, since it could be used in a similar manner.

------
peterchon
I'm curious about how it handles design changes. Will it just rewrite from
scratch? or adjust the code it has already written?

~~~
emilwallner
This approach rewrites everything from scratch.

However, it's an interesting technical problem to adjust the code. Does anyone
have a rough idea how to implement it?

------
iluvmylife
This is great to see!

It's popular to apply AI to healthcare, transportation and legal work.
However, these fields are heavily regulated, require domain expertise and data
is hard to get. The nature of front-end dev and other digital skills make them
more ripe for automation. I'm surprised there is so little progress in this
area.

~~~
emilwallner
I agree, researchers tend to focus on more academic problems and industry goes
where the money is, leaving some areas without innovation. I’ve come across
few papers that cover frond-end development.

------
toisanji
I worked on a similar project but was generating graphics code with both java
and ruby:
[http://www.jtoy.net/projects/sketchnet/](http://www.jtoy.net/projects/sketchnet/)

~~~
emilwallner
This is rad, thanks for sharing it.

------
ydmitry
It's just the loud title. Generated templates are consist from ready
components in examples. It's quite simple to do with minimum knowledge about
code and usually useless, because people need more from interface

------
vadimberman
So... it recognizes text (OCR), detects style (bold, italic, etc.), maybe font
face, and outputs the result? No pictures, nothing else, of course.

Props for the marketing though.

------
macawfish
From reading the comments, it looks like a lot of people are bored of mundane
web design. I'm right there with you. Give me something interesting to do!

------
fredley
If this can get one step further and generate a layout from a hand-drawn
sketch, it would really be a gamechanger!

~~~
gregoire
Check out [https://airbnb.design/sketching-
interfaces/](https://airbnb.design/sketching-interfaces/)

------
jlebrech
this would go well with a template language that maps objects to markup rather
than having to put template code within the markup.

------
ben_jones
Can you make the page "pop" more?

------
tkyjonathan
Would be nice if this was in some adobe product. Then you can go from
illustrating straight to web site.

------
smpetrey
This is phenomenal.

------
itissid
didn't Dropbox also recently blog about this?

------
foobaw
I love this! Hope no one turns this into some SaaS startup though :(

~~~
anfilt
You know business people of course someone probably will try. That recurring
revenue stream is what they drool over.

------
dh-g
HTML is not code, its markup.

~~~
Letmesleep69
It is code, it is not a programming language.

------
tanilama
But, in today's workflow, it is trivial for designer to generate the code or
even animation with their mockup tools, right? This is useful in a sense if
you only have an image of the design, like to quickly copy a competitor's
work, but not really that groundbreaking for company's internal design
workflow, the hard part of which is to figure out what is good design, not
given a design layout then translate into markup

