
Using Deep Learning to model personal visual aesthetics - shackenberg
https://devblogs.nvidia.com/parallelforall/personalized-aesthetics-machine-learning/
======
nefitty
Woah, figures 7 and 8 blew my mind. This reminds me of the various Pinterest
boards I've curated in the past. I would set a theme and try to collect visual
items in that theme.

Sometimes I could not describe the theme in mind clearly, for example
"foresty-earthy suburban adolescent feelings with little-to-no ruggedness but
with a bit of a punk edge". Of course, no single image could fulfill the
entirety of that theme (probably), so it's fascinating to wonder how aesthetic
preferences emerge in the mind, though it's possible that with a description
like that another person could filter images to match that description.

Are we combining various specific preferences (the color green, for example),
or are we driven by the emotional flavor of a whole aesthetic object (a haze-
covered mountain range evoking nostalgia for childhood hikes with siblings
leading to the specific preference for pine trees leading to the specific
preference for the color green, etc), basically top-down, bottom-up or a
combo? Just some thoughts...

~~~
bbctol
Yeah, this is applicable to huge swathes of Internet culture. Tons of
Pinterest boards, tumblrs, and subreddits are just collections of images that
fit a particular aesthetic. What will the web look like when almost all
curation can be automated?

~~~
erichocean
> _What will the web look like when almost all curation can be automated?_

Better?

------
gallerdude
Man, this is crazy awesome stuff. I wonder if you could use a deep dream type
thing to make an image more like your own style - that'd be next level.

------
aantix
Total side note, did anyone check out the EyeEm website (the apparent authors
of the article)?

Their curation algorithms are doing a pretty good job! Their "selected" Galaxy
photos look amazing.
[https://www.eyeem.com/en/pictures/galaxy](https://www.eyeem.com/en/pictures/galaxy)

------
andreyk
Summary of approach: they embed the photos (convert photo->vector of numbers
using T-SNE or CNNs [the details are actually here
[https://devblogs.nvidia.com/parallelforall/understanding-
aes...](https://devblogs.nvidia.com/parallelforall/understanding-aesthetics-
deep-learning/\])) and then train a small-ish classifier (three fully
connected layers) on top of it to capture a user's preference. A pretty
obvious approach, basic version should be doable in a hackathon, but cool
result nonetheless.

"We chose a three-layer multilayer perceptron (MLP) network as a good ranker,
since it is able to capture the inherent non-linear shift in distribution
between the user’s choices and the original training set. Notably, an MLP can
be trained rapidly by leveraging GPU computation to obtain near-real-time
results. This is important because it enables us to build interactive
interface, as we’ll explain. We typically precompute a set of negative
features (about 40,000 negative samples) and extract the positive features
from the user-provided input."

