Jurgen Schmidhuber has an interesting perspective on the rewarding aspects of art and music.
To paraphrase, "Artists (and observers of art) get rewarded for making (and observing) novel patterns: data that is neither arbitrary (like incompressible random white noise) nor regular in an already known way, but regular in way that is new with respect to the observer's current knowledge, yet learnable (that is, after learning fewer computational resources are needed to encode the data)".
In other words, enjoyment of art is about learning (easy) patterns. Schmidhuber likens "fun" to the improvement of an observer's ability to compress a scene.
I wonder how much our sense of aesthetics has to do with the perceived scarcity or effort of creation needed. I remember the first HDR photos looked absolutely mind-blowing to me, but now, as the process has been automated and is ubiquitous, it just looks tacky.
Here goes: Aesthetic Theory is a posthumous book by German philosopher Theodore Adorno and is one of the most important works on aesthetics in 20th century philosophy. In it, he combines elements from Kant, Hegel and Marx. From Kant's 1790 work, Critique of Judgement, he draws on the idea of the Sublime and the notion that art has formal autonomy (which Adorno modifies). From Hegel, he takes a dialectical understanding of the ultimate goal or aim of art, and from Marx he approaches art as being inseparable from society as a whole.
He shows how art has a semi-autonomous 'truth-content' (a la Kant), but one that is always the product and embodiment of unresolved contradictions within the larger social fabric. (a la Hegel, and more so Marx).
Returning to the NVidia article at the root of this thread, this passage pops out as problematic (certainly for Adorno, but also in general): "In our case, we use supervised machine learning, with a dataset of photographs pre-categorized as aesthetically pleasing or not." There is a real sense of question-begging going on here. And Adorno would say that this approach forecloses the very fact that the definitional boundaries of these categories are constantly shifting and lack any real social stability.
Author here. Just to clarify, motive of this work is to ease curation, with the massive amount of content being created; but by no means an attempt at creativity or originality.
The work is not at all contradictory to Adorno, especially in the sense that it is explicitly trying to as non-reductionist as possible, and assuming notion of aesthetics is a dynamic entity .
There is a finite pattern in the dataset; more interesting, it has its interesting share of subtleties ( for example, as opposed to a image classification problems), and the technological question is whether we can capture these.
But there is another interesting data question. For our work, we curated our training set with the help of expert curators. But the dataset itself is a metamorphising entity; i.e. it is subject to revision ( it is a continuous process for us at the moment), but more interestingly it is a chance for open debate between our curators. In some sense, technology allow to codify and challenge our notion of aesthetics ( especially with the evolution in our training sets) at a given point of time.
Thanks for the followup. I enjoyed reading the article and it is a very interesting project and a great effort. I did read at the end that it was about how to amplify the efforts of the human curators -- a great problem to tackle.
A better title would have been "Using ML to rank and classify images according to aesthetics." Which in itself is quite impressive. But nowhere do we see indications that these algorithms "understand" the images, in any meaningful sense.
The author here. I used the term "understanding", not as in machines understanding the images, but more as scientific attempt in understanding aesthetics. ( <snippet from the text>"empowering me to develop systems for understanding images from a computational and scientific perspective"</snippet ends> ).
Aesthetics is a game of cat and mouse. Artists create some new things. Then critics and theorists observe the patterns of composition, color, proportion, etc., that are popular. These rules are canonized in books. Then artists challenge the rules.
The comparisons to music theory in this thread are apt. Music theory is always behind music production.
How can you understand aesthetics without understanding creativity?
I believe that there are factors that go beyond visible composition. Just a though experiment, I imagine that the brain would evaluate easthetics of two similar images differently depending on whether it is an image of an object it recognizes or not - when evaluating the image with an object other qualities of the object (that are not necessarily visible in the image) will be taken into account.
Yeah, exactly. No good critic of photography thinks that subject matter is irrelevant, that you can understand pictures as if they were abstract compositions of light and color. You'd might as well try to read a poem in an unknown language. This algorithm might learn to identify certain cliches, but it'll never learn what makes a picture powerful.
You have to wonder, though. Is it impossible that there's a "music theory" for images/paintings/art that explains the mechanics of what makes them more compelling vs less compelling? I suspect there is, at least to some degree.
Obviously images convey much more information than music, so any theory that doesn't encompass the semantics of the subject will miss most of the signal. But is there a theory for the presentation and composition of the subject? To some degree, I'm confident there is.
Some of the methods used to debug the deep learning of images already do a fair job of showing the locus of focus in the image where the DNN found maximum information. I can see such a technique discovering many of the techniques used by artists and photographers to direct the observer's eye or juxtapose objects that conflict.
Perhaps it's not quite analogous to music theory, but what you're describing in the first paragraph would be referred to as the formal elements of art or simply the elements of art.
Analysis of these elements (form, line, space, color, and texture) is usually a part of the sort of art criticism you'd find in academic studio art, art history, or even just the New York Times art section.
The visual design field has a similar, extended set of elements for describing the formal elements of a design piece.
In both art and design, works are usually considered effective if they use the formal elements of art/design to support what you refer to as the semantics of the subject. That's a broad generalization, but you see it in practice a lot, so it seems like a fair thing to say.
Academic art history is starting to feel the influence of machine learning and computer vision precisely because computers can be trained to recognize the formal elements of art and associate their use with movements and historical periods. There are way more detailed articles than this one, but this will get you started if you're interested in this sort of thing:
The problem is that you'll hit a wall when it comes to understanding "what makes art." You can do all the theory you want, and people do, of course. You can analyze all that has ever been done, and come up with rules for describing and even generating music and art. But there is no guarantee that these will allow you to predict what makes future art. Just like with financial markets, in art, what happened in the past is not a good predictor of the future. That is the mistake that "art theorists" tend to make, have made for decades and decades, and are carrying over rather simplistically to statistical analysis via machine learning.
This is particularly challenging in art (as compared e.g. to financial markets) because much of what defines new art is specifically what makes it different from what has come before it. That is to say, art, by its nature, will always beat any rules you try to design, because that is what it does, indeed, what is must do.
The proof is in the pudding: that machine learning systems can be designed to learn the statistical trends in a body of works and then generate similar art, done since at least the 80s if not earlier, evokes the very definition of the detractive term "cookie cutter art." "Good" art then, by contradiction, is exactly that art that does not fit into such a model -- plus "something".
Surely it is that "something" we'd like to find, but I am afraid that using rule- or statistically-based analysis to help curators sort through art, even with the prescribed notion that this should help them find "diamonds in the rough", it will generate an echo chamber in which the next diamond, which by definition is quite different from diamonds that came before it, to remain undiscovered, buried in a pile of sorted spam.
It is for this reason that I believe that despite the advances in machine learning, nothing will ever replace the past-time of "crate digging" for finding gems. The DJs job will never completely die.
... I will add: That is not to say that tools for automatically understanding and measuring aspects of a photo or piece of music are not useful for artists as a way of judging their own work and making decisions. But it is exactly those artists that will look at the "goodness indicator" drop one notch while they make a change, and say, "I'm fine with that", who will produce the next important work.
> You can analyze all that has ever been done, and come up with rules for describing and even generating music and art.
No, you cannot recognize anything as art before you haven't layed out the rules to describe art.
>But there is no guarantee that these will allow you to predict what makes future art
Prediction from Samples is covered by the sampling theorem, which theoretically holds for periodic signals and an infinite amount of samples, only. Although, in practice the output from my soundcard is rather fine, and facial recognition software works, too.
We listen to foreign music a lot and still find pleasure in the voices. Although I'm mostly speaking from my experience with English before I had learned it, which is yet rather close to German, so YMMV
>No True Scotsman thinks that subject matter is irrelevant
No idea what the No True Scotsman fallacy has to do with this; it involves dismissing an example, but no example was offered. You're welcome to offer one, if you'd like.
We listen to foreign music and find pleasure in the voices because a voice itself is expressive, independent of language: a cry, a laugh, an imprecation. The same is true of light, shape, etc., but artists who work with these qualities independent of their reference to objects tend to choose media other than photography, for obvious reasons.
I agree. Although one could imagine that concepts conveyed in a photograph could be extracted and abstracted as vectors -- just like word2vec and its successors. Of course, there is a long way to go before we hit "human understanding" parity, but I think ideas from [1], [2] and [3] could be extrapolated in doing just that.
[1] Deep Visual-Semantic Alignments for Generating Image Descriptions - cs.stanford.edu/people/karpathy/cvpr2015.pdf
[2] Deep Learning for Content-Based Image Retrieval - www.research.larc.smu.edu.sg/mlg/papers/MM14-fp336-hoi.pdf
[3] Deep Learning for Content-Based Image Retrieval - www.cs.rutgers.edu/~elgammal/pub/MTA_2014_Saleh.pdf
I don't think computers would be able to understand aesthetics. It is a really high-level concept. Plus, I think deep-learning is a marketing mambo-jambo and does not perform much better than a linear SVM.
Then why are we using deep convolutional networks for state of the art vision and speech when we could just plug an SVM with handcrafted features? From what I know, error rates in vision dropped from 25% to less than 5% since deep learning. That's no trifle, especially at the higher end of the accuracy scale. It's very hard to conquer those last few percents.
To paraphrase, "Artists (and observers of art) get rewarded for making (and observing) novel patterns: data that is neither arbitrary (like incompressible random white noise) nor regular in an already known way, but regular in way that is new with respect to the observer's current knowledge, yet learnable (that is, after learning fewer computational resources are needed to encode the data)".
In other words, enjoyment of art is about learning (easy) patterns. Schmidhuber likens "fun" to the improvement of an observer's ability to compress a scene.
The source is worth the read: http://people.idsia.ch/~juergen/creativity.html