Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m fascinated by how much this is exactly like working with a human artist who doesn’t really understand the domain that you are wanting to represent with an image. Iterate, iterate, iterate.

It seems like the most valuable thing this could do is get some of that early exploration out of the way faster and easier than a human can do it, get to two or three concepts that feel like they’re in the neighborhood, and then let a human expert take over and turn it to something final quality. That’s pretty cool.



Agreed.

At the end of the article I also described a bit how I would see the evolution of such a tool, and it looks like we're thinking very similarly.

---

Though I think the real breakthrough will come when Dall-e gets 10-100x cheaper (and faster). I would then envision the following process of working with it (which is really just an optimization on top of what I’ve been doing now):

1. You write a phrase.

2. You are shown a hundred pictures for that phrase, preferably from very different regions of the latent space.

3. You select the ones best matching what you want.

4. Go back to 2, 4-5 times, getting better results every time.

5. Now you can write a phrase for what you would like to change (edit) and the original image would be used as the baseline. Go back to 2 until happy.


I see this happening in all areas. Everything would be prompt-driven.

Do you like this? What about this? You simply nod or reject the solutions that you don't want.

Pretty soon somebody's expertise and experience is not going to be enough to continue paying them what they used to get before this magic blackbox appeared.

One day enterprises will realize they can just outsource that expert who's been reduced to simply typing prompts and nodding yes or no.

I am worried that the middle class is rapidly disappearing. We will own nothing and be happy seems quite ominous. The question is then what field is safe from advancements in AI?

The only field I can think of is doctors, lawyers, executives, buy-side money managers. Even their jobs will be partially automated but it will be safe as long as they generate revenue.


You don’t need nodding or really any conscious reaction I think. It should be possible to have some camera directed at face hooked up to another AI that catches slight changes in pupil dilation or other changes imperceptible to naked eye and registers when something looks interesting to the user. You can then quickly show a stream of variations and pick the tagged ones and use them to improve the guesses. I imagine something like this might one day become a preferred way of interacting with computers/AI.


But, if everyone's jobs are automated, nobody is making any money, so nobody has any money to pay doctors, lawyers, executives, money managers, etc. You would think that if these types were thinking rationally, they would be fighting to expand the middle class so more people can pay for their services.


In the past, eliminating humans from one set of jobs has been balanced by a new set of opportunities for humans in different jobs. Usually, the new jobs are more valuable.

That's not utopianism. The new jobs can't always be filled by the people kicked out of jobs. It really sucks to be them.

But it does mean that it's not irrational for people to want to automate other people's jobs. The net amount of stuff generated increases, rather than decreases.

This pattern may not last forever. There's already some thought that we've generated more than enough stuff to guarantee a decent standard of living to everybody (at least in the developed world) without working, and plenty more for luxuries if people choose to work. Even if we haven't reached it, we appear to be heading in that direction sooner rather than later.

That may cause a radical re-think at some point. And it won't be seriously delayed by making sure cartoonists have jobs.


Jobs are plentiful as long as wealth is well distributed.

In the past, fast automation has led to badly distributed wealth, and job loss. This situation has lasted until the unemployable people died off (yep, that was part of it), and enough wealth was redistributed through violent means.

Today we know better, and have really no reason to repeat the violent means of our previous revolutions. But it's really looking like the people in power want to repeat them.


> enough wealth was redistributed through violent means.

there were no instances of violent redistribution of wealth that ended better for the average person than before. Only that a different group of people ended up with wealth.

Automation makes stuff cheaper, even for people who didn't obtain any of the financial wealth via redistribution - because there's more than just financial wealth that get created with automation. New availability of services and goods (think internet of today - this is a wealth that couldn't have existed before, and one can benefit from it even if they are poor today).


> enough stuff to guarantee a decent standard of living to everybody

It's not a zero sum game. There's still growth in us. We'll go to space and expand 1000x more, the space has plenty of resources, and humans will have jobs working together with AI.


> There's still growth in us. We'll go to space and expand 1000x more, the space has plenty of resources, and humans will have jobs [..]

Q: Am I the only one thinking of Golgafrinchan Ark Fleet Ship B?


We'll have to automate childcare to make that happen. Otherwise, the birthrates of the rest of the world will follow the countries with the highest standards of living on a wild plunge into unsustainability.


> Everything would be prompt-driven.

Just like in Star Trek. They really knew what the end goal was didn't they.

> enterprises will realize they can just outsource that expert who's been reduced to simply typing prompts and nodding yes or no

Tbf a program averaging the market for a fact gives better returns than most of the financial industry, yet they still exist. Even if we can automate something doesn't mean we will, usually for pointless emotional reasons.

But on the other hand it's hard to say if in a 100 years humans will still be employable in any practical capacity for literally anything.


>Pretty soon somebody's expertise and experience is not going to be enough to continue paying them what they used to get before this magic blackbox appeared.

Every art director at an ad agency just shrieked!


I doubt it, because the process of thinking of phrases to feed dall-e is really the hard bit.

This is ok for a logo like this where it’s fair to say the base level expectation is not super creative. This logo is cool, but it doesn’t really stand out or make the product ver distinctive. If I am running a hobby or OS project that’s fine, but if I was investing a lot in sales/marketing then paying a real artist to make something interesting and novel is a rounding error.


> This logo is cool, but it doesn’t really stand out or make the product ver distinctive. If I am running a hobby or OS project that’s fine, but if I was investing a lot in sales/marketing then paying a real artist to make something interesting and novel is a rounding error.

Q: Are there really logos out there that are "interesting and novel" and that "stand out or make the product [..] distinctive"? Which ones?

EDIT: (perhaps more importantly) are there interesting, novel, distinctive logos that actually contribute to profitability?


tbf I think when it comes to big company branding it's the opposite.

A lot of GPT iterations of the design has left the article author with something which is quirkier than your average logo, but also looks like clipart and probably doesn't scale up or down well or work in monochrome. Which is fine for OSS. (He might get more users from blog traffic about using GPT-3 to design his logo than he ever could from any other logo anyway)

But when it comes to bigger companies, the design agency are the people that sit in meetings with execs persuading them that a well chosen font and a silhouette of a much simplified octopus will work much better ("but maybe the arms could interact with some of the letters etc etc, now lets discuss colours). The actual technical bit of drawing it is the bit that's already relatively cheaply and easily outsourced, and plenty of corporate logos are wordmarks that don't even need to be drawn...


Doctors are very vulnerable. Most of dermatology is simple pattern recognition. I can easily see AI lawyers beating human lawyers in litigation, too. An AI lawyer will have read every single case and know the outcomes, and can fine tune arguments for specific parameters like which judge etc.


> Most of dermatology is simple pattern recognition.

I have a few qualms with this app:

1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.

2. It doesn't actually replace a USB drive. Most people I know e-mail files to themselves or host them somewhere online to be able to perform presentations, but they still carry a USB drive in case there are connectivity problems. This does not solve the connectivity issue.

3. It does not seem very "viral" or income-generating. I know this is premature at this point, but without charging users for the service, is it reasonable to expect to make money off of this?


What on earth are you referring to? I assume it’s some sort of implicit joke but I don’t get it :)

Edit: Ahh, it’s the Dropbox comment of HN fame. Never mind.


This workflow reminds me of a generative art program from the early 1990s, but I just can't remember its name. It was a DOS or Windows program that had a very curvy, fluid GUI with different graphics sliders. It would show you some random tiles and you choose one to guide the algorithm's next generation of tiles.


Kai's Power Tools.


I wonder if Kai Krause lurks here at HN. I'd love to know how he's doing. Apparently he's still living in his castle, which he bought around 1999 [0].

Some-when in the 00's I read an article about him that he was putting advanced networking stuff into the castle and had the intention to start something like a "think-tank" (doesn't really fit it, but I don't know what I'd call it) where he and others would hang around and code stuff.

I found the article [1] from July 2002, "Lord of the Castle Kai Krause presents Byteburg II".

> So that 's Kai Krause's long-cherished plan: Now the software guru has finally opened a center for founders and developers from the IT and software industry in Hemmersbach Castle near Cologne -- the Byteburg II

I really wonder what he's doing to these days. His plug-ins were legendary, as well as the User Interface for Bryce [2]

[0] https://de.wikipedia.org/wiki/Burg_Rheineck

[1] https://www.heise.de/newsticker/meldung/Schlossherr-Kai-Krau...

[1, google translate] https://www-heise-de.translate.goog/newsticker/meldung/Schlo...

[2] https://en.wikipedia.org/wiki/Bryce_(software)


Your comment really intrigued me to google this interesting person I had never heard about before. This may well not be used to you, but Kai has a not-a-blog blog that I stumbled upon on here http://kai.sub.blue/en/sizemo.html.

Some really interesting reads. I especially appreciated his articles on the passing of Douglass Adams (apparently a close friend of his!) and Then vs Zen.


F’n LEGEND! I spent hours per day twerking his filters for my thesis animation


Hunh, I’ll be in that neck of the world next week. Need to look into this…


Please follow up, and tell us - even a Show HN


https://news.ycombinator.com/item?id=27288454

Love him or hate him (and I do both), Kai was all about cultivating his adulating cult of personality and dazzling everyone with his totally unique breathtakingly beautiful bespoke UIs! How can you possibly begrudge him and his fans of that simple pleasure? ;)

In the modest liner notes of one of the KPT CDROMS, Kai wrote a charming rambling story about how he was once passing through airport security, and the guard immediately recognized him as the User Interface Rock Star that he was: the guy who made Kai Power Tools and Power Goo and Bryce!

Kai's Power Goo - Classic '90s Funware! [LGR Retrospective]:

https://www.youtube.com/watch?v=xt06OSIQ0PE&ab_channel=LGR

>Revisiting the mid 1990s to explore the world of gooey image manipulation from MetaTools! Kai Krause worked on some fantastically influential user interfaces too, so let's dive into all of it.

>"Now if you're like me, you must be thinking, ok, this is all well and good, sure, but who the heck is Kai? His name's on everything, so he must be special. OH HE IS! Say hello to Kai Krause. Embrace his gaze! He is an absolute legend in certain circles, not just for his software contributions, but his overall life story." [...]

>"... and now owns and resides in the 1000 year old tower near Rieneck Castle in Germany that he calls Byteburg. Oh, and along the way, he found time to work on software milestones like Poser, Bryce, Kai's Power Tools, and Kai's Super Goo, propagating what he called "Padded Cell" graphical interface design. "The interface is also, I call it the 'Padded Cell'. You just can't hurt yourself." -Kai

But all in all, it's a good thing for humanity that Kai said "Nein!" to Apple's offer to help them redesign their UI:

http://www.vintageapplemac.com/files/misc/MacWorld_UK_Feb_20...

>read me first, Simon Jary, editor-in-chief, MacWorld, February 2000, page 5:

>When graphics guru Kai Krause was in his heyday, he once revealed to me that Apple had asked him to help redesign the Mac's interface. It was one of old Apple's very few pieces of good luck that Kai said "nein"

>At the time, Kai was king of the weird interface - Bryce, KPT and Goo were all decidedly odd, leaving users with lumps of spherical rock to swivel, and glowing orbs to fiddle with just to save a simple file. Kai's interface were fun, in a Crystal Maze kind of way. He did show me one possible interface, where the desktop metaphor was adapted to have more sophisticated layers - basically, it was the standard desktop but with no filing cabinet and all your folders and documents strewn over your screen as if you'd just turned on a fan to full blast and aimed it at your neatly stacked paperwork.

The Interface of Kai Krause’s Software:

https://mprove.de/script/99/kai/index.html

>Bruce “Tog” Tognazzini writes about Kansei Engineering:

>»Since the year A.D. 618 the Japanese have been creating beautiful Zen gardens, environments of harmony designed to instill in their users a sense of serenity and peace. […] Every rock and tree is thoughtfully placed in patterns that are at once random and yet teeming with order. Rocks are not just strewn about; they are carefully arranged in odd-numbered groupings and sunk into the ground to give the illusion of age and stability. Waterfalls are not simply lined with interesting rocks; they are tuned to create just the right burble and plop. […]

>Kansei speakes to a totality of experience: colors, sounds, shapes, tactile sensations, and kinesthesia, as well as the personality and consistency of interactions.« [Tog96, pp. 171]

>Then Tog comes to software design:

>»Where does kansei start? Not with the hardware. Not with the software either. Kansei starts with attitude, as does quality. The original Xerox Star team had it. So did the Lisa team, and the Mac team after. All were dedicated to building a single, tightly integrated environment – a totality of experience. […]

>KPT Convolver […] is a marvelous example of kansei design. It replaces the extensive lineup of filters that graphic designers traditionally grapple with when using such tools as Photoshop with a simple, integrated, harmonious environment.

>In the past, designers have followed a process of picturing their desired end result in their mind, then applying a series of filters sequentially, without benefit of undo beyond the last-applied filter. Convolver lets users play, trying any combination of filters at will, either on their own or with the computer’s aid and advice. […] Both time and space lie at the user’s complete control.« [Tog96, pp. 174]

METAMEMORIES:

https://systemfolder.wordpress.com/2009/03/01/metamemories/

>Anyone who has been using Macs for at least the last ten years will surely remember Viewpoint Corporation’s products. No? Well, Viewpoint Corporation was previously MetaCreations. Still doesn’t ring a bell? Maybe MetaTools will. Or the name Kai Krause. Or, even better, the names of the software products themselves — Kai’s Power Tools, Kai’s Power Goo, Kai’s Photo Soap, Bryce, Painter, Poser… See? Now we’re talking.

Macintosh Garden: KPT Bryce 1.0.1:

https://macintoshgarden.org/apps/bryce-1

>Experienced 3D professionals will appreciate the powerful controls that are included, such as surface contour definition, bumpiness, translucency, reflectivity, color, humidity, cloud attributes, alpha channels, texture generation and more.

>KPT Bryce features easy point-and-click commands and an incredible user interface that includes the Sky & Fog Palette, which governs Bryce's virtual environment; the Create Palette, which contains all the objects needed to create grounds, seas and mountains; an Edit Palette, where users select and edit all the objects created; and the Render Palette, which has all the controls specific to rendering, such as setting the size and resolutions for the final image.

MACFormat, Issue 23, April 1995, p. 28-29:

https://macintoshgarden.org/sites/macintoshgarden.org/files/...

https://macintoshgarden.org/sites/macintoshgarden.org/files/...

>He intends to challenge everything you thought you knew about the way you use computers. 'I maintain that everything we now have will be thrown away. Every piece of software -- including my own -- will be complete and utter junk. Our children will laugh about us -- they'll be rolling on the floor in hysterics, pointing at these dinosaurs that we are using.

>'Design is a very tricky thing. You don't jump from the Model T Fort straight to the latest Mercedes -- there's a million tiny things that have to be changed. And I'm not trying to come up with lots of little ideas where afterwards you go, "Yeah, of course! It's obvious!"

>'Here's an easy one. For years we had eight character file-names on computers. Now that we have more characters, it seems ludicrous, am historical accident that it ever happened.

>'What people don't realize is that we have hundreds more ideas that are equally stupid, buried throughout the structure of software design -- from the interface to the deeper levels of how it works inside.'


Please don’t just repost walls of copy-pasta


How interesting! Thanks for posting.


This was some really interesting reading, thank you internet stranger :)


+1 what a great program


Given the stochastic way it works I wonder how the randomness is seeded for a certain phrase.

In other words, if another person needed a logo and used the same phrase how long on average until they get a duplicate of your image?


The model starts from 64x64 8bit RGB image of noise (random pixels) so technically 1 in 3_145_728 (64x64x256x3) but most will probably be very close to each other as the color difference won't be that much. The image is then further upsampled by two other models which will change some details, but shouldn't affect the general composition of an image.


Maybe I'm wrong, but with these diffusion models there is randomness in every sampling step too not just in the initialization and they can have 1000 steps to generate a single image.


Ah good point, this would introduce more variation if the initial noise is close, but if the initial noise is exactly the same it probably means it was initialized with the same seed and the rest of the generation will be the same since the random algorithms are deterministic.


Since the image is RGB 1024x1024, and the random seed is noise (as it is for diffusion models), I guess it would be quite long.


It will get cheaper. On 5 years it will run on your phone


Yeah, my first thought was "Ok, but you are going to need to involve a graphical artist to actually really make use of that logo". Like you probably want a vector version and you definitely need simplified versions for smaller sizes but then I stopped and realized how amazing this actually is. It "saved" (I know, it cost $30 but that's a steal for something like this) all the time and money you would have paid for iteration after iteration and let the author quickly hone in on what they wanted.

As someone who is incredibly terrible at graphic design but knows what they like this could be a game changer as iterations of this technology progress. I can imagine going further than images and having AI/ML generate full HTML layouts in this iterative way where you start to define your vision for a website or app even and it spits out ideas/concepts that you can "lock" parts of it you like and let it regenerate the rest.

I'm not downplaying designers role at all, I'd still go to one of them for the final design but to be able to wireframe using words/phrases and take a good idea of what I want would be amazing, especially for freelance/side-projects.


Honestly though the hard part is the actual design which is already done here. Learning to vectorize a raster is something that can be done in a weekend with Inkscape, there's no reason to involve an actual graphics designer with this anymore.


> Learning to vectorize a raster is something that can be done in a weekend with Inkscape, there's no reason to involve an actual graphics designer with this anymore.

If you lined up 100 resulting images, 99 from weekend beginners and 1 from an actual artist. I guarantee you that you would pick out the artist every time.

It might be simple to trace over an image but you are probably better getting an artist to spend 2 hours on it, it will most likely look better than 2 weeks of tracing.


Time value of money. The most optimal use of money and time would be getting the ML to iterate until you have the finished product, then get a designer to vectorise it and fix it up. That way you pay the designer for one iteration and spend all the time you would have spent iterating with the designer iterating with the ML model instead.


I think you might be underestimating how much work goes into the last mile of a design. A lot of refinement work goes into typography in particular, a domain Dall-E isn’t yet proficient in at all.


Nice ideas, great enthusiasm.

I think your art/design/craft is pretty good. Some people use pencils, some use Adobe products, you have gone out there and tried the new Dall-E medium.

Glad you thought out the usage, I am sure that when the novelty wears off that you will have that neat-as-octocat logo sorted out.

I appreciate that you appreciate the value that highly skilled designers bring to a product with their visual expertise.

However, I would like to see you A/B test the Dall E logo versus the winning designer logo. You could show odd IP addresses one logo and even addresses the other.

I think the designer would edge the robot for what you need (a logo), however, the proof is in the pudding and conversion rate.


Plus there is no reason why someone couldn't build a specialised AI model to do vectorisation and another to generate simplified versions of vectors.

People are already doing by combining DALL-E 2 with gfpgan for face restoration. So there may be a role in understanding how to combine these tools effectively.


Yes! It gives powerful tools for someone with a concept to get much closer to visualization of their idea.

DALL E 2 is like a low or no-code tool in that way.

The outcome may not be a "finished" product, especially as viewed by a professional designer (or web dev). However, its a heck of a lot better than a tersely written spec.

And in some cases, the product will work well enough to unblock the business, get customer feedback and generally keep things moving forward.


I think this is more powerful than a simple exploration tool. It took the author a long time to find a query format that generated logo-like images. Once they had that part down, they were quickly able to iterate on their query to find an image they liked. They were even able to fix part of the logo using the fill-in tool. I'm not sure why you'd bring a human into the mix, especially if you're on a budget.


Ehnnnnnnnnn...

An experienced human designer, right away, is going to ask how you want the logo to be used. That's going to have a major impact on how it's designed.

So yeah, this may be like working with a doodler, but, as the author intimated, this is far from an ideal experience in getting a professionally designed logo. This is more like "Hey, you, drawing nerd, make this thing."

Nevertheless, astonishing technology in its own right.


Nah, people will leave out the professional. The same wild west grab whatever you can, steal, plunder to the detriment of artists, writers etc. And when the legislation arrives it will be already too late, accidentally.


Why should there be legislation? Do you want to restrict what people can do, just to force them to employ artists and writers? We could also forbid people from filling the gas tanks in their own cars, to protect the job of gas station attendant, but nobody wants to live in New Jersey.


you remember the concept of dumping, i.e., flooding a market with below cost product to drive out competing businesses? This is dumping for creatives.

editing: not that it's intentional, but these things will have the same effect; way too much product even for creative works. No one will be able to make money off the product but the tools.


Is it below cost though? It might just be very cheap to run.


"Why should there be legislation?" Lol. Read the uber files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: