Hacker News new | past | comments | ask | show | jobs | submit login
DALL·E 2 vs. $10 Fiverr Commissions (simonberens.me)
310 points by sberens on Oct 8, 2022 | hide | past | favorite | 194 comments



I am a guy who does those kinds of Fiverr commissions. Not the $10 ones, but plenty of $50-100 dollars ones. I have a lot of thoughts and concerns about the impact of Dalle-2 on visual arts as a whole, but I see no threat at all to my Fiverr business.

90% of my clients couldn't do anything without a human in chat that walks them through all the steps. There's no possible interface simple enough for them to do everything without my help. They can't figure out which files they want and what to do with them once they got it. If there's any possible customisation option - they will use it to make the pre-made template uglier, and then will ask me if I could do something to make it look good again. That's what they are paying me for.


I'm working at a small food startup and we needed a logo. One the one hand I was ordering a commission from UpWork and on the other side I was trying to create a logo on my own using DallE2. Dalle2 took a lot of tries and I ended up eventually with 2 good candidates. Unfortunately neither were perfect and I don't have time to try and edit it despite having some graphic design experience. The commission, on the other hand, allowed me much more control over the situation as I could ask for incremental changes and then see how I feel about them. Dalle2 is really lacking. If you wanna do things DIY you may find it useful but it'll still take some extra work. If you just want a good product that is ready for market, you need an artist


When I've worked with graphical artists if the idea is tough to explain I usually include some rather awful mocks. I postulate in that regard Dalle could be used as an intermediary step to create visually appealing mocks for graphical designers to realise and expand upon.


This is what I thought as soon as I saw the blog post, but in reverse: as a tool for graphic artists.

My wife ended to turning her artistic abilities into a greetings cards / wedding stationery because her social anxiety and low self esteem make it extremely difficult for her to work through the process of figuring out what the customer actually wants and how much she should charge for a commission. The way she describes it, many customers think that they can give you a one-sentence request and get back exactly what's inside their head, except that there is nothing inside their head at all, just a very loose idea. Essentially, they want to flip through an infinite set of mock-ups (that they don't pay for) until they finally stab one with their finger and say "THIS!", but they have no idea in advance what "this" is. When they finally come to payment, they only want to pay for the time it took you to produce the final result, which is "just a simple design!"

In fact, the red-flag customers sound like this: "Hello. I'm looking for the simplest thing in the world and it probably won't take an amazing artist like you 15 minutes to make. It'll be used as a logo at our business so it would be great publicity for you!"

Person doesn't value your skill and will try to low-ball you. Ask them to clarify their one-sentence request and they say "Oh, you know, just a simple logo with something nautical on it". Tell them you'll charge for every set of mock-ups as you slowly figure out what they want, and they disappear.

I think that tools like these could be the first step in your journey with a customer. They have to explain to AI what they want, and refine their statement to the point where it produces "mock-ups" something in the right ballpark. Then you can take their top 3 results and talk through them.


Hah. That could be a great way to use those models. "Talk to the AI until you know what you want, then I'll make it for you".

I'm totally with your wife, btw, the attitude of her customers sounds horrible. On the other hand, my experience is that one artist took my $25 and has still not produced what he agreed three months later, and yet asked me if I had more work for him. Another guy offered to do it for free and did it for free in a few days and then refused to accept my money when I explained that I was already paying another guy for the same task so it was only fair that I paid him, too. This was some cover art for a vanity project of mine and I was asking for free contributions but also paid the first artist because he was evidently trying to become a professional. Fat chance of that. Bottom line, if you want good art you have to find the people who are passionate about it.

Oh and image models can't create the art I want, because it's text-based art. Even if they could generate the images I want, they couldn't output them in ASCII or ANSI. In fact I tried and they give me kind of pixelated results, but not recognisably text-character based.


In general, I think that in cases where intentionality around specific details is required humans are going to outperform AIs for quite some time, in any creative domain. Conversely, when I don't really know what I want beyond vague direction, Stable Diffusion's results have been good.

I guess I'd say, the less specified your prompt, the more it seems the AI is able to "read your mind." An interesting little tidbit in the world of human/machine interaction. It's like the results make you say, "Yes, that IS what I was thinking of!" But as soon as you have a really specific idea in mind it kind of stumbles a bit for me.


Right now dalle is lacking. I feel very strongly that we will see the tech improve exponentially in a short time. Img2img strategies, for example, might allow you to ask for modifications to a previous output. Machine learning tech is already there, it just needs to be put in the right package. Add in future advancemens in AI, and we are likely to see high quality products built on this within a couple years.


Could a webform be used to ask the questions that you ask (based on a very large decision tree, of course), and then either (1) format the results in a way that AI can generate an appropriate image, or (2) have a human look at the results and in 4 minutes use AI to generate an appropriate image?

Do you think that people in your line of work, or adjacent lines of work, will use AI to offload brainstorming or to get inspiration?

My guess (as a complete outsider) is that the skill of drawing will remain important, but that there will emerge a new skill: an AI translator, who serves as a midwife for the creation of AI art.


Many services like that already exist - at this moment they are using stock images. Once AI generated images become as good and predictable as stock images - I have no doubt that the services will switch to them.

But, stock images have existed for many years. They are considerably cheaper than custom work, are as professional looking and are available immediately. Sounds like an absolute game changer, but in reality the market for custom design work didn't die.

I am not sure that I understand all of the reasons why people pay extra for custom design work in a world where automated stock services exist. Some of my guesses are

- People don't trust their visual taste and want a trusted human to make those decisions for them

- Discovery problem. People are simply unaware of such services and their benefits

- People are willing to pay premium for the knowledge that their design has a human author.

- Last mile problem. Even if the image looks 99% like what you want, you might still need a guy to save it / fix it / crop it / format it because you don't know how to do it yourself.

I am sure that there are more factors. And even if AI images will bridge the quality gap to human-made stock images, all of this will still apply to them. Many services and technologies have been trying to solve those problems for many years. AI will add to that process, but I don't see a reason for a dramatic change in the near future.


> - Last mile problem. Even if the image looks 99% like what you want, you might still need a guy to save it / fix it / crop it / format it because you don't know how to do it yourself.

That's the next step for AI generation. The AI image will be almost what you want but you will hire someone to fix it


I could imagine a service that creates logos/designs via an interactive process that lets the user/customer resize various elements, change colors/shapes along the way. This would be sort of similar to the way you can use AI tools to infill different parts of images.


Bingo. Artists have a new creative tool with new constraints, tricks, and prerequisite skills. I think the idea it'll destroy other forms of art (and art-derived commerce) is an unlikely one since it can be used to fuel many of the creative arts that currently exist, but I suppose we'll find out soon enough.


Midwaif


Forgive the dumb question but what kind of output do you make?

Do people ask for graphs on fiverr like the article? (I can only imagine sort of "must have a powerpoint ready for 9am in Tokyo sort of thing. I know that's a real industry even if that industry always seemed to me like everyone gathering round a fake painting with everyone knowing it's a fake)

Anyway - always interested.


I sell logos and vector illustrations. I have tons of pre-made ones in my portfolio. If someone likes one of my logos for their business - they can just buy it outright. They don't even have to contact me and can just buy the file package generated by the website.

If the client likes the logo, but can't figure out the interface, or want me to apply some changes - he contacts me. I talk to him, do everything that he requests and at the end I sell him the logo + premium for my time and additional custom work.

Usually people ask me for things similar to what they saw in my portfolio. Rarely do I get unusual requests like this graph. If I get a request that I can't do - I will say no or refer them to the graph guy. But if I am feeling creative - I tell them an unreasonably high price. Sometimes they agree and it turns out that I was a graph guy all along


That makes sense - thank you.


AI is moving so fast that we might be a couple of years prior to having a program capable of conversing with a dumb and indecisive person like this and outputting what they want


Dec 2016 - "These 20 companies are racing to build self-driving cars in the next 5 years." [0]

Oct 2022 - "Even after $100 billion, self-driving cars are going nowhere." [1]

People have been historically notoriously bad at predicting how good AI/technology will be in 5-10 years time. If the predictions from 2015 were right, the roads would have been filled with level 4 and 5 autonomous vehicles for years now.

[0] https://www.businessinsider.com/companies-making-driverless-...

[1] https://www.bloomberg.com/news/features/2022-10-06/even-afte...


I remember watching a panel discussion with some CEOs and some industry engineering veterans, I think hosted by NVIDIA, in 2016. The CEOs were saying we'd have self driving cars all over the roads by 2019, and the engineering veterans were saying we'd maybe have partial deployments by 2023. It's interesting that the veterans seem to have made accurate predictions.

I think what we see are CEOs looking to raise funds, and news organizations looking to sell an interesting story that will say "revolutionary tech is just around the corner", but this is motivated reasoning. You're right that this is the same with AI technology, where some people say AGI is just around the corner, whereas some veterans say it may well be decades still, and the truth is we don't know.

So anyway I guess I agree with what you are saying, which is that AI development is difficult to predict and many people make bad predictions. I just wanted to point out that it tends to be people with a motivation to predict rapid growth that tend to produce a lot of these errors. These errors get propagated widely because technology press is one of those groups with this bias. However not everyone makes such bad predictions.


With Stable Diffusion & co. I've had the opposite sensation. I was completely floored as to how it blew past all of my expectations.


Don't get me wrong, Stable Diffusion & co are incredibly impressive. I'm using NovelAI image generation for a project I'm working on, so it's already useful to me as more than just a toy, even. It is absolutely a massive technological step change.

But NovelAI and Stable Diffusion both have limitations. It's nearly impossible to generate two different specified characters, much less specify two characters interacting in a certain way. For NovelAI, common/popular art styles are available, but you can't use the style of an artist with ~200 pictures. (Understandable, given how the AI works technically, but still a shortcoming from a user's perspective.) Both are awful at anything that requires precision, like a website design or charts (as shown in the article). And, as most people know by now, human hands and feet are more miss than hit.

People are extrapolating the initial, enormous step change as a consistent rate of change of improvement, just like what was done with self-driving cars. People are handwaving SD's current limitations away; "it just needs more training data" or "it just needs different training data." That's what people said about autonomous vehicles; it just needed more training data, and then it would be able to drive in snow and rain, or be able to navigate construction zones. Except $100 billion of training data later, these issues still haven't been resolved.

It'd be awesome if I were wrong and these issues were resolved. Maybe a version of SD or similar that lets me describe multiple characters in a scene performing different actions is right around the corner. But until I actually see it, I'm not assuming that its capabilities are going to move by leaps and bounds.


I think you're wrong here.

My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful. They can easily generate hundreds to thousands of images per hour and yes, while SD is not great at hands and faces, if you generate hundreds or thousands of images, you get MANY which have perfect hands and faces. Additionally it's possible to chain together Stable Diffusion with other models like GFPGAN and ERSGAN, for up-ressing, fixing faces, etc.

Self driving cars are completely different, no one was using "version 1" of self driving cars within weeks of the software existing. Stable Diffusion and similar models are commercially viable right now and are only getting better in combination with other models and improved training sets.

I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."

The truth is, if I had to ask art professionals on Fiverr for "beautiful art photography of multiple characters doing different actions", it would be difficult and expensive for them too! And worse, you would get one set of pictures for your money and if you weren't satisfied, you're shit out of luck! On my PC, Stable Diffusion can crank out > 1000 unique pictures per hour until I'm satisfied.


> My partner works in design and her design teams have jumped all in on using Stable Diffusion in their workflows, something that is effectively in "version 1." For concept art especially it is incredibly useful.

I do agree if you are coming from the angle of "I need concept art of a surreal alien techbase for a sci-fi movie[0]" then SD&co are super useful. I'm not saying they don't have their uses. But those uses are a lot more limited than people seem to appreciate.

> I think you're shifting the goalposts to what success is here to be quite frank. "The model needs me to be able to specify multiple characters in a scene all performing different actions."

Having multiple, different characters in a picture/scene interacting in some way is not an uncommon, unrealistic requirement.

[0] high res, 4k, 8k frostbite engine, by greg rutkowski, by artgerm, incredibly detailed, masterpiece.


As far as I can tell, it is possible to draw such a scene by adding in the pieces and using the tools to paper over the boundaries and integrate those elements. It takes much more work than just generation but maybe one fiftieth to one hundredth of the work necessary for classic illustration.


It reminds me of one scene in I, Robot (2004)

https://www.youtube.com/watch?v=KfAHbm7G2R0


I have also been floored with their output, but it's because of that that the comparison to self-driving vehicles is so relevant. Even if we saw impressive growth over 5 years, it doesn't mean that growth will continue for another 5.

It's possible that Stable Diffusion, or minor improvements of, is our peak for the next few decades.


I think the future will involve “layering” different AIs for art. One for backgrounds, one for human poses, one for facial expressions, one that can combine them. That sort of thing.


The self driving car analogy isn't applicable here as the contexts are way different: operating conditions aside (roads not built for self driving cars, random unexpected situations, etc.) a single accident can result in one or more fatalities, which calls for extreme caution before wider adoption.


People tend to overestimate progress in the near future, and underestimate progress in the long term future.


Perhaps unpopular opinion but I think the tech is more than good enough that I think most cars should be autonomous already. However the reason I think there isn't is because public perception, regulation, changing tradition is hard, and peoples acceptable safety.

It seems like most would rather wait until autonomous cars are way better than human drivers while not truly acknowledging most human drivers are awful. Sure I dont want people hurt or killed but I think it could have made more progress in prod so to speak.


> However the reason I think there isn't is because public perception, regulation, changing tradition is hard, and peoples acceptable safety.

No, the reason is that for city driving there is no system that is even close to navigating typical driving problems that humans encounter multiple times on a daily basis. There are plenty of videos of self driving cars flummoxed by basic road obstacles.

What people like you call “edge cases” are actually common occurrences.

If you think any non geofenced system is close to average human level competence you are simply deluded.


I don’t agree with the gp, but humans, in my tiny village, drive into shit every single day. We just don’t accept that from self driving cars, but we do from humans because it’s normal.


Do they make catastrophuc errors like mistake back of a semi for an underpass?

Do they stop in front of a cardboard box and just stand there for minutes?


Human drivers drive into other people all the time, whether due to intoxication, tiredness, or just outright not paying attention. I know two people that have gotten rear-ended at a stoplight by another driver going >40mph. One of them was drunk. The other claimed to not be paying attention and otherwise seemed sober.

Likewise, plenty of people just stop paying attention and read their phones ... idling at intersections much much longer than necessary. Or drive stoned and drive around at ridiculously slow speeds.


Your assumptions are just wrong. Currently, self driving cars are much worse than humans. Invest some time and do the research. It's appalling how misleading sources like Tesla PR are.


> People have been historically notoriously bad at predicting how good AI/technology will be in 5-10 years time.

Taking the people who are most incentivized to overhype things to get clicks and/or funding as the consensus view is maybe not the best take here.

If you looked at people in general or engineers in general and looked at the median predicted timeframe, it would've probably been much more conservative.


This is like asking a caricature artist to design a bridge. DALL-E is not a graphing tool, so it's weird to see it treated as one. A better version of this article might explore the differences between DALL-E and Fiverr-designed characters, to contrast how AI and humans approach visual storytelling.


Almost feel like this was intentionally framed this way to build more engagement (via comments where it's posted). It's pretty well known dalle and stable diffusion are bad at text and precise vector-style graphics. Do this on a professional art piece and let's see how much $10 gets you.


Can you elaborate on how it looks like I framed it to build more engagement?


I think it's just that it's such a strange comparison to make, like making an article entitled, "Who's better at doing donuts in the parking lot: helicopters or planes?"

The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that. Fiverr artists will obviously be better, though even then, who the hell is paying people on fiverr to draw generic charts?

If you wanted to compare them, it would make more sense to compare them based on how they're actually used (especially in the case of the AI models): to make art.

Though if your title was more specific, ala "DALL-E 2 vs $10 Fiverr Commissions: Who's Better at Charts?" you'd probably get somewhat fewer complaints. Having the title be generic implies that you're gonna be looking at common/primary use cases.


> The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that.

Stable Diffusion was trained on images of charts and graphs. It knows what a powerpoint presentation and even an excel spreadsheet look like.

Here:

https://imgur.com/a/V4a6W4I

It just doesn't know how to generate a graph like the one it's asked to.


It's still stupid. This is like asking DALL-E to generate an image that solves a math equation step by step. Of course this is easier for a human to do.

Try getting a landscape in the style of vincent van gogh for 10$ on fiver though. AI will give you that in seconds easily, and that's what's amazing about it.


I was in a meeting on Cognitive AI at the Royal Society in London last week where a gentleman from Stanford presented work where GPT-3 was prompted to solve math equations step-by-step and did well (better than I would have expected). Point being, if GPT-3 can do it, DALL-E should also be able do it, and testing whether that is the case is not stupid, but interesting.

The big question with systems like those image generation models is to what extent their generation can be controlled, and how much sense it makes. This is exactly the kind of testing that has to be done to answer such questions. Just flooding social media with cherry-picked successes doesn't help answer any questions at all. Because cherry-picking never does.

To be honest, I don't get the defensiveness of the comments in this thread. Half the comments are trying to call foul by invoking some rule they made up on the spot, according to which "that's not how you should use it". The other half pretend they knew all along what the result would be, and yet they're still upset that someone went and tried it, and posted about it. That kind of reaction is not coming from a place of inquisitiveness, or curiosity, that is for sure. It's just some kind of sclerotic reaction to novelty, people throwing their toys because someone went and did something they hadn't thought about.

> Try getting a landscape in the style of vincent van gogh for 10$ on fiver though.

In another comment posted in this thread I tried to get Stable Diffusion to give me a graph with three lines in the style of van Gogh and other famous artists. I'd be very curious to see what that would look like and I can't imagine it easily. I'm left wondering, because Stable Diffusion can't do it. Maybe I should ask someone on fiverr.


I'm not saying there were zero such images, but it obviously wasn't the focus compared to art-type images.


What you said was that they weren't trained on chart images, not that they weren't the focus:

> The image generation models weren't trained on chart images, everyone already knows they're gonna be bad at that.

I have no idea how you could even know what was, or wasn't in those models training sets. Yet you posted with conviction as if you were sure you knew. What's the point of that?

Edit - Also, what do you mean "it obviously wasn't the focus"? The focus of what? The focus of training, or the focus of presenting the results on social media?


This is absurdly silly. These data sets contain millions of images at a bare minimum from web crawls, often billions, so of course there will be a non-zero number of charts in them. If you want to be pedantic about it be my guest I guess.

You could probably find a few driver's ed teachers who taught their students to do doughnuts too, but saying "driver's ed teachers don't teach their students to do doughnuts" would nonetheless be largely accurate.


Silly yourself. If there were simply a "non-zero" number of charts in them, the model wouldn't have, you know, modelled them. That the model can reproduce graphs is clear evidence that it saw enough graphs to reproduce them.

And don't call me silly just because you used imprecise language to try to make a vague point with great conviction as if you absolutely knew what you're talking about, when you absolutely didn't. Show some respect to the intellect of your interlocutor, will you?

And, seriously, you haven't answered my question: the focus of what? What do you mean by "it obviously wasn't the focus"?

I think you were emboldened by the downvoting of my comment and assumed you don't need to make sense, but I think the downvoters were downvoting something else than what you refuse to answer.


My man just look at the title, I clicked wondering if dall-e made better anime characters than $10 fiverr artists. But all I got was plots, who in their right mind asks for plots on fiverr.

Am I being engaged right now, was your comment also to generate engagement.hm.


I disagree and I downvoted you because I think you're being condescending and uncharitable.

I don't think OP chose graphs because they're "obviously" going to make AI look bad; I think he chose it because it's an incredibly simple image - extremely so. If the AI can't do this, how can you trust it to generate something complex? If it literally can't yet draw basic lines as described, how can it illustrate a story or any form of media where specifics matter?

And I don't think his post title implies that he was going to use some complex art prompt, either. Not in any way.


His description of the graph was not simple. I had to read it twice to understand. Its multiple sentences being referential in complex ways.

Its also an entirely different task than the one these AIs were designed to solve. Its like judging a fish by its ability to fly.


I agree that it's complex relative to what AI can currently handle (clearly) but I don't agree that it's complex in general. For a human, it's a simple description. You or I could draw it freehand correctly given 5 minutes, with no training or preparation.

I don't see how this isn't the task these AIs are supposed to solve. They are meant to take a text description and output a corresponding visual result. This just demonstrates the narrow limits on the complexity of the input they can take.

If you're saying they're not designed to deal with inputs more complex than one sentence, then sure, I guess I agree. But this post goes to show that if you require specificity in your desired visual output, then you need more than one sentence's worth of complexity, and therefore the current generation of AIs are not yet broadly usable.

It's about illustrating the current limitations. This post is not implying that the technology is a failure or that it isn't enormous progress.


> For a human, it's a simple description. You or I could draw it freehand correctly given 5 minutes, with no training or preparation.

In the blog post, the humans drew it incorrectly as well (although they got closer). If it was as simple as you say it is, i would not expect the humans to err as well.

> If you're saying they're not designed to deal with inputs more complex than one sentence, then sure, I guess I agree.

Indeed. I would further say its not designed for someone to use it as text directed paintbrush. This is not surprising since human graphic artists dont work that way either, or at least get very pissed off when they are micro managed in that fashion.

That said i think its also fairly obvious that these systems are also not replacements for graphic artists in general. The human element is important for a lot of reasons; graphic artists dont just "draw pictures". I dont think people seriously familiar with these systems have ever seriously suggested it was a full replacement for graphic artists, although in fairness random internet commentators certainly have been having a moral panic over it.

Not to mention its entirely possible that an AI more designed for this task would do better.

> But this post goes to show that if you require specificity in your desired visual output, then you need more than one sentence's worth of complexity, and therefore the current generation of AIs are not yet broadly usable.

I don't really agree that this post showed that, but i would agree that these AIs are not the best tools if you have very specific objective requirements.

AIs are tools not magic, there are things they are good at, but they aren't good at all the things and still require to be used with thought.

> It's about illustrating the current limitations. This post is not implying that the technology is a failure or that it isn't enormous progress.

I think the objection is that this article doesn't really demonstrate a meaningful limitation that wasn't obvious. It feels like a strawman. If dall-e or stable diffusion actually succeded at the task, i would be very impressed and consider it much more impressive than most of the pretty pictures everyone shows off.


If he didn't deliberately to make DALL-E look bad than he did it out of ignorance of what DALL-E's strengths and weaknesses are. Your evaluation of what is "simple" and what is "more complex" aren't in line with what DALL-E is capable of.

DALL-E isn't good with symbols like letters and numbers. It can't do even very much logical / mathematical reasoning. So a graph is one of the worst choices.

What it can do is make aesthetically pleasing images that match basic descriptions. So there are more "complex" images that DALL-E can produce than basic graphs.


Apparently the human artists also had trouble understanding the prompt, which is why they kept missing details he specified.


> Okay, well maybe it was impossible for anyone to deduce what I was trying to convey.

Don't know if this is evidence of "framing for more engagement," but this line irks me. The latent diffusion models are pretty powerful, but I don't think there's anyone claiming that today's diffusion models are able to interpret complicated queries better than humans. The interesting part of diffusion models is that they can produce good results at all, not that they are better than humans. We're not in AGI territory. Even text models are still limited in many ways, and latent diffusion is highly reliant on the text model to produce good results. Even simpler queries can run into quite a lot of problems, that's exactly why a lot of people have been trying to figure out the best prompts to improve results.


You could have asked those tools to create images like the ones found in AI catalogs like https://lexica.art/ and https://www.krea.ai/ and then compared with what you can get for $10. This would be a comparison more favorable to AI


But how do we know if the Fiverr provider is using Stable Diffusion?


Oh wow. There’s a genius but unethical short-lived market idea. BRB.


That’s not a fair analogy. If anything, you could say “this is like asking a caricature artists to draw a bridge”. Sure their bridge might not end up being architecturally or structurally correct, but it will mostly look like a bridge.

These image generation tools are being discussed as something that could replace graphic designers (didn’t OpenAI refuse to open source DALLE-2 at least partially due to this concern?). So it is absolutely a reasonable idea to compare image generation vs a human designer.

Saying that, the prompt the author chose to use was hard to parse even to humans, I am not surprised the tools failed so badly.


> didn’t OpenAI refuse to open source DALLE-2 at least partially due to this concern?

If they did claim this concern I think we can safely assume that was a lie. Their business model depend on having the models closed so they can more easily charge for access


DALL-E is for text-to-image, not text-to-art. This experiment is valid in terms of benchmarking the extent to which LLM understands a series of texts. For an AI researcher, this gives more fuel for the next iteration.


I’d love to try and get a detailed renaissance style painting of SpongeBob and Patrick done for $10 on Fiverr.


We already know what those models are good at. Everybody keeps posting their cherry-picked good results.

Why not get the chance to see some failures, too? Isn't it interesting to know what those models are bad at? There's too few examples of that around so that is definitely a very thing to know.


Right? It seemed like the author's premise was that the AI generators should do a good job with this prompt, but my expectation is solidly that they wouldn't. So then when the results met my expectation, the only confusing thing was the author's tone about it.


What a ridiculous and poorly thought-out experiment. Visual art can be incredibly detailed, complex, and imaginative. None of these qualities are captured by trying to create... plots... Literally anyone can do that using Excel or Google Sheets. You can't call this 'art' and you certainly can't call this science. No attempt has been made to conduct any kind of objective analysis of the results beyond 'lol, not too shabby.' What a half-assed post. I'd love to see someone like Gwern take on such a task.


I think the point was that it wasn’t a made up experiment. It was a real world problem.

For all the hype around these systems currently, it’s nice to see some places where it doesn’t work.


These AI systems create effectively dreamscapes. Apparently dream characters can't do math in lucid dreams.

I think these systems are best understood as similar to the 9 portrait drawings by that guy on LSD from the 1950s. They seem to be able to simulate some forms of consciousness well such as deep sleep and psychosis and be completely oblivious to others


Text to graph is a great idea, but you don’t want an image generation AI like DALL-E. You want a natural language-to-code model like GPT-3/Codex that is able to accurately translate your requirements into code that programatically generates the image using a good graph library.

Wouldn’t be surprised if this is already possible with today’s tech, and just waiting to be built.

edit: just tried OP’s prompt with Codex and Colab and generated this image: https://i.imgur.com/OyxJCbz.png

Not quite accurate, but shows the potential for a better language model or some prompt engineering to encourage fidelity to the prompt


Good thinking. Yes, it's not quite accurate, but that's not because Codex failed, it's because the author's graph is made up nonsense that doesn't actually represent any concrete properties/data.


Ugh. This could have been so much better.

It's not at all surprising that an AI is bad at drawing graphs, and it is also not surprising that even a non-artist human can draw graphs pretty well.


It's not even that "AI is bad at drawing graphs"-- these models were specifically trained on "aesthetic images".


Why is it not surprising? I don't see any fundamental reason for it. I think these models will be able to produce sensible graphs fairly soon.

You could equally say "it's not surprising that DALL-E can't draw words"... except that Imagen seems to be pretty good at it.

I think the real reason it's not surprising to you is that you've already seen enough DALL-E results to understand its limitations. It's not surprising that DALL-E can't draw graphs.


I do see a fundamental reason: The current crop of AI tools are horrible at logic. It's a complete inversion of how we think of computers.

If I want to convey happy emotions in the style of Rembrandt, SD or DALL-E will do brilliantly. If I want an apple BELOW a table, or worse, a geometric shape like a triangle, they'll crash-and-burn.

GPT-3 is also really empathetic, but struggles simple logic (and especially mathematics).

Graphs are like the horror case for these.

I can think of ways to make them better at this, but it's not a weekend of hacking.


We don't know if it's surprising, the author never tells us their hypothesis. They don't state any particular reason for the prompt they used, they don't explore the contents or qualities of the prompt compared to other AI-generated art, and they don't run multiple trials. Because of that, we can't conclude anything useful from this article. There's no frame of reference or scientific inquiry involved. If you find it entertaining, that's fine. As a scientific comparison, this verges on parody.


This is not a scientific experiment. The author compared results of a specific prompt.

Why are so many people overthinking this?


> Why are so many people overthinking this?

From reading the comments here, they're overthinking it because they seem to be taking this as a pre-planned "attack" on AI art generation, rather than just an interesting anecdote on the limitations of these tools.

As someone who has not played with said tools, it was an outcome I found interesting to know: DALL-E et al. can't do specific graphs or even specific logical things very well a lot of the time. That's good to know, and I didn't previously!


I played with these tools a lot, so expected bad results as soon as I read the prompt.

Still found the post really interesting as it explores a very realistic use case. A client needs something simple designed for a blog post. Should they use AI or a human designer?

I read somewhere in the comments here that these tools are very bad at counting. Which is an interesting limitation with far reaching implications.


> I read somewhere in the comments here that these tools are very bad at counting. Which is an interesting limitation with far reaching implications.

It may not make much difference, but it's not so much that they're bad at counting, as that they don't even try. The way the prompt is parsed and diffused doesn't allow for that sort of logic.

All a "two people" prompt or some such provides, is a hint to push the AI towards that section of latent space where training-set images titled "two people" exists.

That's not "counting", and it would be truly amazing if any sense of math emerged de-novo from this training process. Doesn't mean it can't be done — it means we aren't trying.


> it means we aren't trying

It'll be pretty exciting times once we do!


I think it's just a bad experiment. The author must have been truly ignorant about the capabilities and utilities of DALL-E if he thought this experiment would yield interesting results.


I am personally annoyed because I expected something more sensible from an article called “DALL·E 2 vs. $10 Fiverr Commissions”.

The author could also compare how well DALL·E draws text, but what would be the point of that? Is not being a scientific article a good defense for posting nonsense?


The most obvious reason would be that they're probably largely trained on art-type images, not charts and graphs.


It is still an interesting counterexample to the numerous really impressive AI drawing that have been shared here.


It’s not really, it’s a clickbait article for cheap views.


I was hoping for a chance to actuslly compare things like artistic license, stylistic choices, etc. But instead the author chose an absolutely terrible prompt. AI image generation is not intended to generate graphs, and I'm surprised it was even able to do anything passable given how few it was probably trained on (if anything I'm more impressed with the AI than I was expecting to be).

Please do this again with a better prompt.


Stop prompt blaming this guy. It's a legit experiment in my opinion


And it's a legit criticism. There are three major issues I see here:

1) The prompt uses fairly complex grammar which is incompatible with a token-based parser. In particular, symbolic references like "The third […] starts below the second, and generally follows the second" are going to be lost on it.

2) The prompt includes details which a generative network is spectacularly unlikely to be able to handle, like asking for text labels with words like "prosecution" which are unlikely to be present in its training material. (Generally speaking, image generation models can only output short words which they've seen many times, like "STOP" or "PIZZA", and even those can be iffy.)

3) Speaking of training material, most of the training material given to image generation models consists of photographs and artwork. Technical diagrams are much less common, and when they do encounter those images, they're unlikely to be paired with the sorts of detailed descriptions that would be required to produce them on demand.


I've gotten a complete 11 word sentence generated in an image in Midjourney so far. It seems somewhat better at text than the other models somehow.


An experiment is only a single part of the scientific method, and one can easily neglect the rest of the steps. This article doesn't start with inquiry or a hypothesis from the author. We just get data and "I told you so" at the bottom, which doesn't illustrate anything.

It's funny that what we don't see is a shorter prompt. If you ran this experiment with just "A graph with 3 slightly wavy lines", maybe the difference between AI and human results would be closer. Maybe that's the basis for a legitimate research project, but it's frustrating that the author takes the ball to the 80-yard-line and just gives up.


Hmm did you read to the end of the post? Because I included a section showing the results on "a graph with 3 lines."


You're right, I shouldn't have bailed after the first few paragraphs. Sorry about that :p


An experiment to prove that what? Humans are better at drawing graphs than computers? Well, Excel would like a word. That a neural net trained on photographs and artwork is bad at drawing graphs? Nobody expected otherwise. This "experiment" sets out to prove a hypothesis nobody had any doubts about.

A far more interesting blog post would have been looking at Fivr artists vs AI when it came to producing unique character artwork for games, or logos, or almost anything except what was done instead.


Prompt shaming is a thing now, sigh…

And it’s like a five minute job in Inkscape where he could’ve just done the paper drawing in that and be done.


Please explain why you think that, because I can’t possibly imagine what your justification is


Not GP but I did the same thing with trying to design a t-shirt… how is this not relevant? We’re trying to asses various tools to get our jobs done, not trying to create a peer reviewed scientific paper.


This post might as well be called “Check out what happens when you use a wrench to put in a screw”.

It’s just the wrong tool for the job.


How is this not the right tool? Where on DALLE-2 website does it say that it should not be used for artistic graphs?

Yes, we all have seen badly generated graphs from DALLE-2 before, so it feels like this is an obviously limitation of AI image generation tools. But why should this be such an obvious thing to absolutely everyone?


Doesn't change the fact that it's a bad experiment. Anyone familiar with DALL-E would predict this result. He should have taken some time to understand what DALL-E can do and proposed an actually interesting experiment.


I don’t think we should limit experiments only to those areas where DALLE-2 succeeds. Seeing failures is also valuable.


Cool article! I think on a more general note it underlines what I have been thinking for some time now: most people, also on HN, get this generative art stuff totally wrong.

Yes, this will be the end for some artists but not for others. DALLE2 et al. are merely new tools for new generation of artists. And, we are still figuring out how to use these tools effectively.

In other words: The “AI” is a tool that we humans will use to get things done faster/better etc. Nothing less, but also not much more.


To your point, I read in a board game group the other day of an artist using AI generation for a first pass with new clients to save themselves time. "Is this close to what you want?" then does the actual art with alterations by hand. If the client says no or flakes out, a lot less effort was lost than before.


While this is true to a certain extent, you're probably underselling it a bit. A lot of "evocative" art (e.g. art on Magic The Gathering cards) can now be done by complete non-artists playing around with prompts for a little while, in less time than it would take a professional artist to make the art manually.

Now, if you're actually Wizards of the Coast, you probably wanna spend the money with real artists anyway, but for any smaller teams, I can see the appeal of just using AI for that kind of use case now.


I think people overlook the fact that there's more to a photo than shutter speed, more to a comic than the drawing... There's message, composition, design, focused iteration, etc, etc. I really enjoy using DALLE for simulating photographs as it forces me to think outside of just the viewfinder.


The more worrisome aspect for me isn’t that it’s already going to replace some artists.

I always thought in my head that this level of creativity would remain our domain for centuries. Even as of like two or three years ago I thought that.

It’s insane to me that today some artists feel they’re going to be replaced soon. The idea of centuries is completely shattered for me and now I don’t know if we’re a year or 50 years away from AI replacing humans entirely in the creative domain. I spent the other day completely in an existential crisis, tbh.


>I always thought in my head that this level of creativity would remain our domain for centuries.

From what I've seen these networks are rehashing learning set images into something matching some criteria to produce visually pleasing results. Not to belittle the results - it's impressive - but the stuff I'm not seeing here is understanding of generated material - nonsensical z-order, scale/proportions, configuration.

Fantasy images are an easy target because it's all about visually pleasing nonsense.


From what I've seen, these tools aren't making _new_ styles (yet? I guess they will eventually, now that would be existentially fascinating/horrifying) -- so my worry with them is that they'll basically lock us in to what we have today.

But then that's sort of a self-limiting factor: it means theres still space for human creativity in creating new things, new styles (as not every style exists yet!) -- at least until said new style gets loaded into the model, I suppose?

Fascinating stuff, really.


The only reason an artist would think they’re being replaced is someone told them they would. So the solution is to not tell them that, as it’s not true.

(The main instigator on Twitter is a guy who draws “realistic Pokemon” and hates that an AI may have stolen the art he already stole from The Pokemon Company.)


> Yes, this will be the end for some artists

This isn't art. It's a graph meant to represent data.


Now I am somewhat sleep deprived but I found the description of the graph incomprehensible. "Starts near the bottom and goes up" I interpreted as, it is a vertical line, and that its direction would be expressed in the graph as a vector or something. (The horizontal position of this vertical line appeared to be unspecified, which puzzled me.)

In fact, my first urge was to ask you to just draw the dang thing already, so I am very glad you included the sketch later!

This might say more about me than your prompt, though, but I thought I'd share the data point.

Perhaps I would have been more successful if I read the instructions with pencil in hand, sketching it out as I went along instead of trying to fit the whole instructions in my head first and then visualize it.


Agreed. What a contrived, unnecessary verbose way of describing a 3 line graph. You don't need any skill beyond basic motricity to draw a back-of-the-envelope illustration of what you need, especially in this situation that involves just a few abstract strokes, as opposed to, say, subjects/objects/animals.


In my experiments with these tools I came to a conclusion that they are not very good at understanding very detailed clear directions. Which is fine!

I have a lot of fun treating AI as an absurdist philosophical visualizer. Feeding it very abstract prompts and getting back bizarre results that somehow make sense!


Ha. This is is great. One part that caught my eye was in regards to the first Fiverr drawing. The author says:

> However, it seems they didn’t catch the part where I said the black line should go between the first and third lines.

To me it seems the Fiverr person did attempt this part, but misinterpreted it. The black line is behind the blue line, but in front of the green and red lines. Does that count as "between" on the z-axis?


I found this comparison interesting! DALLE-2 was being discussed as a potential end of the graphic design industry. So seeing how bad it is at interpreting and visualizing this particular use case was great.

The prompt used by the author was hard to parse - I had to re-read it several times. Not surprised both some humans and AI failed.


Why didn't he specify that he wanted a y-axis? Or vertical axis on the left side? This is the issue with prompt engineering, you need to know the concepts and their names in order to elicit the response/image that you have in your mind.


I noticed this too! I think their result is a completely valid interpretation of the original prompt.


These text-to-graph problems seem like a good candidate for someone to create a training-dataset/benchmark of.

Bear in mind that the training data for these models has been mostly images and their alt text, scraped off the web. There is a good chance that there's nothing remotely like the examples given here in the training data. (People don't caption their graphs like that.) These models are undoubtably good at doing what they have been trained to do - but I think no-one disagrees that there's plenty of room for improvement.

(And bear in mind that these text2image models only released this year, and that this tech in general has only been invented in the last couple of years, so it's very early days...)


Excel or Numbers, backed with data, would have been the right tool for this job.

However, once the wow-factor for text to image AI wears off, you start to realize:

a) “AI” doesn’t understand anything about the real world (physics, proportions, shapes, etc.) You curate and pick the image that makes most sense. You start to settle.

b) TTI “AI” can never evolve and become better than the data it’s been trained on. It can combine things for sure, but it will won’t start evolving and become a master artist.

c) Limited applicability - there’s always going to be an edge case in every image that’s going to look off (strange blends, non-perfect circles, coloring…). You need to be a creative and know how to use photoshop to fix this stuff.

Right now it’s cool for creating album art, but that’s easy [1]. Generally I just hope that creative people aren’t too worried about “AI”, otherwise we’ll run out of training data.

[1] https://www.intheknow.com/post/album-cover-challenge-tiktok/


Thanks for sharing. Interestingly, I wrote a blog post about a similar topic: What will happen if ML builders and domain experts had co-ownership of the data and the model.* I am planning to generate the first training seed images by using Fiverr and giving the logo designers ownership rights of the data/model/profits.

https://blog.barac.at/a-business-experiment-in-data-dignity

* vs the current trend of training diffusion models on 400M images from the Internet (many of them being garbage) with mixed licenses and letting the user take responsibility of the generated images licensing issues.


Drawing graphs is probably one of the worst comparisons one can do in terms of evaluating these models. They seem to be trained to generate either photorealistic or stylized images.


I had personally never seen this side of the models, so I wanted to share this finding.

I agree that they were trained more on artistic images, but I was still surprised with how bad they generalized to a more theoretical(?) context.


It's not about it being theoretical, it's moreso that the language model is still far more simplistic than our own, and struggles with anything but the most basic relations between nouns. The "horse riding an astronaut" post is a good example of this.[0]

[0] https://garymarcus.substack.com/p/horse-rides-astronaut


See also “Shirt without stripes” https://github.com/elsamuko/Shirt-without-Stripes


It’s funny to see this comment on HN, because it gets refuted so many times by now you should be able to punch into google and find it.

https://twitter.com/Plinz/status/1529013919682994176


This is what it should look like:

https://www.bdaddik.com/en/comics-collectible-postcards/2967...

Modulo s/Lucky Luke/astronaut/g

Note that the image above should be in Dall-E's training set. So it's seen how a horse rides a human. No excuses there.


The comment you're replying to shows Dall-E doing exactly this. Did you click the link?


Isn't yours just a bad faith comment? Why would I not have clicked the link?


I just asked Stable Diffusion to generate 10 images of "horse riding an astronaut" and 10/10 were of an astronaut riding a horse.


Did you even read the linked article? They even cite the picture you link, it still proves their point though these models have no understanding of the language (like Google claimed).


I think the path forward for something like this is models that learn to execute python code and incorporate the results into their outputs. There are already projects that can generate correct matplotlib calls for prompts like yours, but I don't think we are to the point where those python outputs can be automatically combined with a diffusion model for style or whatever.


One thing they're not very good at is deducing spatial relationships. Concepts like "above", "inside" and "behind", or "after". I'd say the prompts you gave make sense to a human who is thinking of a visual progression from left to right.

I bet you could write a few copilot prompts to generate code which would draw a graph like this, though.


I was surprised while reading your article how good the AIs did. I think it's fascinating that your intuition was that these tools would be able to do a good job of drawing a graph based on a description of it...


After reading your comment I asked Stable Diffusion to create photorealistic images of a graph with three lines (similar to the smallest prompt in the article).

Here's the results of three attempts with slightly different prompts:

https://imgur.com/a/FqQT2Mk

As you'll see, Stable Diffusion

a) is perfectly capable of drawing graphs, and,

b) completely incapable of drawing a simple graph with three lines as prompted.


funny timing, just yesterday I finished a little app I was hacking on and needed a somewhat decent looking logo that was blocking the release. Instead of trying my luck in sketch and doodling around, I went to DALL E, and with my first prompt was able to generate better logos than I could have drawn. I was immediately unblocked and super happy with the results

It’s just amazing that non design people like me can just conjure up decent looking, and usable stuff with AI. I will definitely use DALL E much more going forward for creative work

The logos are a bit noisy and need redrawing in a proper vector tool but its a great starting point to try out different ideas immediately

(The results: https://twitter.com/dvcrn/status/1578710631838289922)


One missing dimension for this comparison is how long it takes to get results.

I've been using Imagen/Parti/Stable Diffusion already as a replacement for "clip art" because it takes ~15 seconds to get results and they are free. Fiverr takes at least 100 times that long and costs $10.

For tasks where the exact content isn't important and you can't invest more than a few seconds or wait a more than a few seconds for results, generative models are already a great solution.


Wait, 'Fiver[r]' costs $10? I assumed it was.. a fiver. Did it start off making sense and inflate?


I believe the creator sets their own rates on fiverr.


Some can be had free, but I quickly run up against usage limits and end up paying for it.


There are no usage limits if you happen to have a gaming PC.


It also takes less than 10 minutes on PC.


You can’t use Imagen/Parti unless you work at Google, so it sounds like he’s getting paid for it.


Would love to see a similar comparison of fiverr vs AI but for clip art!


Ignoring the obvious misapplication of AI art generators: What an absolutely baffling graph idea. That thing makes no sense whatsoever


But this kind of drawing doesn't require any artistic skills; it's a graph.

It seems fastest to just draw it yourself; even the pencil drawing was already decent; and you can buy color pens for less than $12.


Right, “I don’t need a designer, I can draw a logo myself”.


For what it’s worth:

I needed graphics for my personal never to be published game project. I’m baaaaad at graphics. I was going to use Fiverr but really didn’t want to spend money on what might be a project I never return to. Years later Dalle2 came around and last week I spent an evening using up all my free credits and got all the art I needed.

I’m confident humans can do a better job. But for what I needed, getting about fifty images for about two hours of work was pretty amazing.


Complicated generative models are the wrong tool for the job here.. And arguable Fiverr commissions are too, these graph prompts look like they would take about the same amount of time it took to write the prompt to do in a vector art program once you got some beginner skills in one. To me this is almost like asking it to graph functions and comparing it to excel's graphing tools.


I think this is more of a showcase of these model's poor graphing skills and not a comparison between these models and fiver artists.


The prompt he did is almost a captcha lol in terms of difficulty for AI vs human. Try a painting... Do a video game concept art or magic the gathering card. See how long it takes a human vs AI. The results are way farther to the side of AI, such that he might find it cost prohibitive to try to commission people to do it on fiverr for a blog post.


To demonstrate the flaws in DALL-E don't you just have to get it to draw anything that includes hands?


It's a lot better now, just generated this using SD:

https://imgur.com/lVyUQFb

https://imgur.com/KJRPHi9

But it's still not perfect, a few more examples in a grid, as you can see hands are still a problem:

https://imgur.com/ugpvE4a


Maybe it's better when hands are a central element, but I don't think I've ever seen it draw some that aren't weird when they are just a peripheral element of an image. But I haven't used it that much and those may be better too.


AIs (at least currently) have problems counting. So anything containing a number ("3 lines") will be difficult for them.

(My bias: I'm generally for art to be created by artists. I find AI generated images to be a fun game, though. Exploring the minds of those AIs, in a way.)


Google staff aren't allowed to share Imagen results with you, unfortunately.


I just tried the prompts with Imagen and Parti. They are similar to Dalle-2, with a bit more "variety" but none reproducing the author's specific prompt the way the author wants. For the prompt "a graph with 3 lines" both produce graphs with 3 lines at least 1/6 of the time.


Curious. I've played with quite a few of these models and one of the very consistent "tells" is that they're extremely bad at counting things. A friend of mine likes tarot and I tried a few prompts... great results for the major arcana, but good luck with "ten of cups"... without capability to edit & re-prompt, the only viable strategy appears to be "ask it to draw a bunch of cups repeatedly until you've collected all the numbers."

Getting 3 of something 1/6 of the time doesn't really sound like it groks the request.


Sure, but it doesn't have to count to be useful. When I run this locally on desktop GPU, I get 16 results in a few seconds. I can visually select the 2-3 that match what I want and pick one of them. I can try again ten times for 20-30 options, and it still takes less time than Fiverr.

I would not use these models for graphs yet, but for cool look tarot inspired "clipart" or background images, I think they are already usable.


The inability to count has some interesting knock-on effects. My favorite being eyes and hands. Especially the hands. The closer you look, the creepier it gets. That thumb has fingers of its own?! Thus far, the greatest utility I've seen has been on par with B-movies.


I am somewhat surprised at how bad these tools are at generating hands and feet. Is it just a matter of not having enough images to ingest?

The faces often look very good and they also have symmetric complexity and individual elements that come in a specific quantity (2 eyes, etc). Lower quality models do generate fly-like multi eye faces, but newer ones are so much more precise!


y not?


You are asked to not share the images when you use it.


Heh, remember when you signed up and still bought into Google's mission is to organize the world's information and make it universally accessible and useful.

It's kinda sad that Google desperately wants the cool points for having its own DL models but you can only see it in the form of a store window display.


> Heh, remember when you signed up and still bought into Google's mission is to organize the world's information and make it universally accessible and useful.

They probably also remember signing an NDA, and maybe taking some training about how not all of the world's information should be made universally accessible and useful to everyone. For instance, the contents of a user's inbox.


The are almost no companies that operate with a full public view into their in-progress projects.


I'm teasing about Google's 'so great you can't even see it' effort to build hype for its own offering.


This is a great idea executed very poorly. I would love to see a larger sample size of AI vs Fiverr with a wider range of prompts. Graphs are difficult for current models and that was already well understood.


This is not surprising for several reasons. One of which is Dalle simply can't count. Asking it for 3 of anything will give questionable results. Dalle also doesn't understand relations between objects. It will fail on anything of the form A on top of B. Indeed usually the order of words is irrelevant to the output. Lastly, features of certain length scales (such as lines below some thinness) are always garbled. Try generating the standard face cards in a deck of playing cards to see what I mean.


OK, so people complain that Stable Diffusion and friends are trained on art so it's not fair to ask them to produce graphs because graphs are not art. So I asked them to produce artistic graphs.

Here are the results of the prompts "a graph with 3 lines in the style of X" where X in {Rembrandt, jackson pollock, studio ghibli, escher, van gogh}:

https://imgur.com/a/JW5c4in

So now we're producing art!

Art that's nothing like the prompt.


I have used fiverr for web assets before and I really had to make clear that what I wanted were SVG assets with transparent background. Nevertheless it was routinely not understood and I always had to ask for redo for the purpose of correcting this. Also in the same manner as this author, I found that word prompts for images were inferior to me simply doodling out what I wanted and having the more artistic Adobe inclined person create a polished version with small variations.


This is the niveau of comparing the performance of a fast-food employee taking a weird order to trying to enter it on a touch-screen panel.


I personally don't love that you can commission art for $10. I know that this price point is more or less the 'trial' or 'promotion' price, but still - it's so hard for artists to pay their bills, and this sort of creates a perception that their work should be available for really cheap


I'm pretty sure the author spent more time writing and tweaking their prompt than it would've taken them to simply draw the graph they wanted. This isn't merely a matter of illustrating prompt engineering in humans vs machines.

I get the point the author is trying to make, but I really wish the example felt less contrived.


The actual timeline is reversed; I started with the sketch, submitted it to fiverr, realized I wanted to make a comparison to dalle, and only then I tried to come up with a prompt that could encapsulate the whole image.

I can see how it felt contrived, but I hoped to make an apples-to-apples comparison on a real use case. Then to reduce the complexity I tried it on a much simpler prompt.


You make graphs using data, not aesthetics.

Please don't make any more graphs not based on data.

You're not making the world a better place.


> A fair criticism of this post would be that I spent no time on optimizing my prompts, which has been shown to make a big difference in the quality of the output

How long before someone lists "DALL-E Prompt Optimization" as a skill on LinkedIn?


The labels in the charts are strangely reminiscent of Madoka Runes: https://wiki.puella-magi.net/File:Runes_chart_expanded.gif


The only substantive comment is that this is the nerdiest most useless comparison far divorced from any utility in this trending topic

Let’s show all the ways that these AI obliterate $10 Fiverr non-AI commissions, thats what people want to see


I want to make a series of profile images all in the same theme (e.g. 5 penguin members of a red-clothed theocracy in the style of Cristiano Ronaldo). Can DALL-E or any of its competitors do this yet?


I agree with the other posters: this is an absolutely inspired idea, which was then executed in the worst way possible by choosing a blind spot in AI generation.

The comment by nextaccountic is spot on.


I’m astounded that OP needed a outsourcing for this simple task. Take a table with a stylus, launch Concept, adjust the smoothness and you will be finish in no time at all.


Those tools are for generating images and art, not precise schematics. The post is criticizing them for failing at something they aren't meant to do.


> Those tools are for generating images and art, not precise schematics.

Who says that? Where does it say that Dall-E and Stable Diffusion are only for generating images and art? Why are graphs not images? And why can't they be art?

Those are models that generate images from textual prompts. Aren't you just moving the goalposts by saying they can't generate specific kinds of images?


The pictures on the homepage are astronauts and flamingos and stuff.

The homepage defines the process: "starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of that image"

So based on that description and context clues, I wouldn't expect it to generate a precise schematic.


If I understand your interpretation correctly, you wouldn't expect it to generate anything else precise, either. For instance, you wouldn't expect it to be able to generate the precise contours of a face, correct?


Does anyone know what software/font was used to create the chart in section "Fiverr Person #3"? Thanks.


I spent maybe over $100 on DALL-E trying to recreate a math t-shirt I used to have that I liked. Very much unsuccesful.


This post is (intentional?) rage bait for nerds - as everyone else says this is the wrong comparison.


Wait, why is this the wrong comparison? Where does it say that those models can't generate graphs? Why is everyone so sure that this is not one of the intended applications of image generation?

Who made up those rules that drive nerds to rage when they're broken? Where are those rules written down? Can you point at them? Or have people just made up those rules in response to that post?


I certainly didn't intend this as rage bait, but I can see how some people would have expected comparisons for typical dalle prompts.

I had never seen someone try to generate graphs with dalle so I thought it was worth sharing.


Now get a 10$ fiverr guy to draw an anime picture as good as SD could.. lol


"Anime character wearing a t-shirt with a graph with three lines on it"

Checkmate, robots.


AI Whisperer is totally going to be a job


wtf this is the worse possible use of DALL-E, it's not intended for that at all


how long before 90% of the work on fiverr is AI generated?


You got it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: