Hacker News new | past | comments | ask | show | jobs | submit login
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack (meta.com)
52 points by mfiguiere on Sept 27, 2023 | hide | past | favorite | 20 comments



They managed to do the thing, but missed the point. The sample results feel straight out of Kinfolk[1] or Cereal[2]. They have that "over-curated" quality we associate with these publications.

1. https://www.kinfolk.com/

2. https://www.readcereal.com/


Looking at the paper and then these websites they don't look that aligned to me. The paper's images are much more saturated and cutesy (which to be clear i dont consider better).

Regardless, does it matter. I would assume you could chose any aethetic you like. The interesting bit is the training process not the aesthetic chosen.


I've never heard of these. What are they supposed to be? Just looks like random pictures of places and people?


They are two lifestyle magazines. I don't expect the HN crowd to be familiar with them, but they are exemplars of the style that Emu has been aligned with.


What exactly is a lifestyle magazine? I'm curious. Like that cereal site, it's just a bunch of unsorted pictures. What kind of lifestyle is it supposed to represent? Is it kinda like a Flickr for the analog world?


Fair question. Lifestyle magazines generally cover topics in travel, fashion, trends, foods, pop culture. They are one of the most popular types of magazines and their importance in generating demand is an important factor to consider.

Readers of, say a cooking lifestyle magazine, might be looking for new recipes to try, while readers of a travel lifestyle magazine might be looking for ideas of where they would like to journey to next, or perhaps they simply subscribe for the joy they take in imagining themselves there. Some publications, as you've noticed, are aligned on other dimensions, but share other common features, like aesthetic.

I chose the two sites as examples because they have a particular curation that matches the output of the model. I'm not a lifestyle magazine champion or anything, but I think HN in general likes poking out of their wheelhouse on occasion and this in particular I suspected would not be something in general they are familiar with - and what makes it have, hopefully, some value in a unique perspective.


I see, thanks for the explanation! I knew there were magazines about activities (cooking and whatnot), but didn't realize there were some that go for "look" alone. Guess I'm not artsy enough :)


How is this the most upvoted thread? Yuck. Read the paper! It's not those magazines, and the story of the alignment and how they did is one of the most interesting parts.


Don’t yuck their yum. Some of us see the other story here of the value laden use of the words ‘highly aesthetic’ and ‘visually appealing’. PP is questioning to what standard the model is benign aligned


I don't know, i think its pretty pretentious to nitpick what a "good" picture is, when its really irrelavent to the topic at hand and quite frankly not an interesting conversation since it is very subjective. The link isn't to an art contest.


I dunno, I thought that was the most interesting part of this story, personally -- the question of what style algorithms should aim for.

Having skimmed the paper, the results didn't seem that different to me than asking Midjourney to produce "blah blah blah, photorealistic, fine art style, 4k". It just seems to target a particular aesthetic. In the experiments they did, they put their version next to some other AI version and asked people to select which one they "preferred". I'm sure the fine-art version will look vaguely "better" to someone, especially if they're judging dozens of pictures at a time with no real direction.

But that doesn't mean the style/look is appropriate for every situation. Sometimes you want it to look like amateur manga, or like a candid photo, or whatever artistic expression you want it to have.

There isn't just one single "good" style of pictures, and I think that's a fascinating question, regardless of what training sets and algorithms they've used. Maybe you disagree, and that's fine, but y'know... people have different interests!


> There isn't just one single "good" style of pictures, and I think that's a fascinating question, regardless of what training sets and algorithms they've used. Maybe you disagree, and that's fine, but y'know... people have different interests!

Its not that i disagree that its an interesting question - i disagree that its relavent to the paper. It is not a question that the paper really attempts to answer or comment on. It feels unfair to claim the paper is doing art wrong, when its not trying to do "art", its trying to make the AI pick images that are slightly prettier to the average person and describing a process to optimize for that pattern. That's a far cry from doing high art in my mind. This is more like creating an autofocus feature to make "better" images. So if the criticism is that this isn't fine art, my response is basically, no duh.


I see. That's fair!


The paper is about generating art. The paper makes claims about the subjective value of the art.

Seems central. If the art is not good, but commercial kitsch, then that matters.

Now that software engineers are in the business of making software that generates art, get ready to have these conversations, pretentious or not, whatever is meant by that.


That's right.

The authors operationalized "highly aesthetic quality" and grounded it in composition, lighting, color and contrast, subject and background, and "additional subjective assessments."

There's zero citations in this section related to accessing artistic quality, operationalization, or grounding.

A LOT of programmers are going to hate this, but if you're doing soft-sciences work(which this is), you should be knowledgeable in that area and cite some sources that indicate you've at least considered what you're doing.

The lack of citations in this area is an unknowing admittance of that ignorance. I don't care if they cite a paper just to disagree with it, but to not know that they don't know is simply unacceptable. This is basic undergrad material in other relevant fields.


People have been reasoning about this stuff for thousands of years and SWEs act like they can just make shit up. Same for psychology of mind and consciousness.

AI has thrust the tech industry into fields they've been ignoring, and now the tendency is to just continue to ignore them, pretending they don't exist, and filling the void with naive garbarge.

Soft isn't so soft anymore -- and never was, always has been, always will be hard.


In what way did they miss the point?


Spectacular work, best paper I've read in a while, maybe since the DallE-2 paper. (and I've been kicking around since 2019). Easy to understand, really special results. Looks like a recipe for Midjourney.


Is there an APi, GitHub, or web page or application where we can use this today?

I think they said eventually it was going to be in a bunch of apps but not sure about right now.


How is this any different from how SD was fine tuned on LAION Aesthetic after being pretrained on plain old LAION?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: