Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How to get into AI generation(images,text)
52 points by divdev_ on Nov 5, 2022 | hide | past | favorite | 13 comments
Hey! I am a software engineer with vast experience of building full stack application and lately I’ve been really mesmerized by popping up tools that utilize AI to solve common day to day problems. Generating blog outlines from couple of lines of text, creating realistic avatars of yourself in different settings, generating art from text prompts. I’ve never even had touch points with such technologies so it’s quite overwhelming for me in terms where to start!

Do I need to know the basics? Shall I just utilize the existing solutions like gpt-3, openAI, stable diffusion and built applications with them? Can I make those tools tailored for my uses cases(model training) or I should built similar from the scratch?

Looking for advice!




For the image generation (or even indexing with the CLIP interrogator) side of things, recommend just installing the AUTOMATIC1111 github repo (https://github.com/AUTOMATIC1111/stable-diffusion-webui), it's a web ui with pretty much every variant of stable diffusion you could want to try out, like txt2img, img2img, inpainting (both textual inversion and dream booth), outpainting, style customization, clip interrogation, etc. Most importantly, there are about 1000 youtube tutorials on how to do each of these things with it, so you can pick your interest areas and just try it out without having to understand all the details first.

From there, if you're interested in how it works, I highly recommend the last 4 videos on Jeremy Howard's youtube channel: https://www.youtube.com/user/howardjeremyp/videos

He's currently teaching a class on stable diffusion from the ground up and these lectures give a really good introduction to how it all works.


Thanks, quite an extensive guide really appreciate it!


1. It depends, if you just want to use some model and call APIs, then you do not have to learn any ML theory. You just have to learn using libraries following their GitHub Readme instructions. Get a Colab Pro+ subscription or run Kaggle Notebooks for free. You can also simply use GUIs built on top of Open Source models.

2. Learn to use the Hugging Face library, and use their stuff on your Notebooks.

3. Learn some ML theory so you can understand hyperparameters better, and can tweak them in a better way.

____

If you want to get into training models by yourself from scratch, you have to learn in a deeper manner, and cannot overlook learning ML theory in a deeper manner.

____

The most obvious ways would be:

1. Looking into stuff that John Whitaker does [0] and his elaborate free course on AI Art [1].

2. Learning ML from scratch starting from Andrew Ng ML, then going to DL, then learning about GANs.

3. Learning from fast.ai through their two-part course on Deep Learning, where Stable Diffusion is now being taught. Then learn PyTorch from another place like Sebastian Raschka's book.

4. Watching old videos from Stanford CS231n when Karpathy was a TA, and taught in the class. Then Deep Dream was standard.

_____

If you are a responsible, mature person, and you are in it for the long term, and have deep pockets, buy some GPU. 2x 3090 is reasonable, and should be enough.

____

Let me know if you have any further questions.

[0]: https://datasciencecastnet.home.blog/

[1]: https://youtube.com/playlist?list=PL23FjyM69j910zCdDFVWcjSIK...


Thanks a ton! Another great answer which I will use as a roadmap! Back to buying GPU question - why not renting it on some cloud provider? I have mac - not sure how external gpu fits in the picture here.


Libraries and end user facing projects are beginning to support Mac's GPU.

Renting on cloud platforms is $$$. But sure, you can start by trying there. After the crypto crash, GPU prices have dropped a lot, too.

Also, building a rig with GPUs, and SSHing into it from a Macbook is pretty common.

And also, be aware of roadmaps. Eevry person is different, and their roadmaps should as well be. Make a roadmap through trial and error by trying out many sources. I told you what I think are the best.

_____

Forgot to mention my Kaggle Notebook [0] on Hugging Face API through which you can start generating text today using their pretrained models available for download.

[0]: https://www.kaggle.com/code/truthr/a-gentle-introduction-to-...


Would also recommend paperspace gradient if you're wanting decent access to a GPU at $8/month for a pro-sub. Been great to help with learning before diving head first into buying one.


Where does the OpenAI APIs fit into this?


For search, ie ecommerce search; the person searches in long-sleeve or ripped-jeans tensor search helps to categorise text-vector, image-vector etc. I would reccommend, and actively use the Marqo repo @ https://search-the-way-you-th.ink/3FNq2lG . Super handy if you are focused on search and want to implement it into current projects. Only just using it out now and it's awesome! Although i can't comment much further as i've only just begun using it.


If you want to train the model, you can try Dreambooth-Stable-Diffusion. https://github.com/XavierXiao/Dreambooth-Stable-Diffusion


Awesome! Thanks for the suggestion!


As said elsewhere in this thread, Automatic1111 gives a local web frontend integration to a lot of the Stable Diffusion variants.

There is a pull request to integrate Dreambooth into Stable Diffusion, which I am already a heavy user of - https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull... . This makes it all push-button simple, and integrated with all the other Automatic1111 Stable Diffusion scripts.

While with the release of Stable Diffusion there has been a lot of innovation in text to image AI in the past two months, there are problem sets other than that, and Python libraries like scikit-learn ( https://scikit-learn.org/stable/tutorial/machine_learning_ma... ) use machine learning in other domains as well.


If you were to do Google colab, I'd recommend [fast-stable-diffusion](https://github.com/TheLastBen/fast-stable-diffusion). If not, I'm working on a [fork](https://github.com/askiiart/universal-fast-stable-diffusion). However, it's GPU-specific (and not functional yet), so if recommend checking out what other commenters say instead.


I've noticed a lot of the online services wrap openAI and sell a specific feature set. I'm also interested to see what I could build with gpt-3 without having to pay for it. But if I have to pay, I will as I don't have a lot of free time to learn.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: