Hacker News new | past | comments | ask | show | jobs | submit | davidbarker's comments login

In theory there shouldn't be — LLMs are pretty robust to typos and usually infer the intended meaning regardless.

  Location: London, UK
  Remote: Yes
  Willing to relocate: No
  Technologies: TypeScript, React, Next.js, PHP/Laravel, Generative AI, Photoshop/Sketch/After Effects
  Website: https://dvy.io
  LinkedIn: https://linkedin.com/in/dvyio
  Email: david@davidbarker.me
I'm a multidisciplinary designer-developer with deep curiosity and a passion for building intuitive, human-centered products, particularly those leveraging generative AI.

My professional roles have typically involved much more than just coding, spanning product design, strategy, marketing, and customer support. I thrive in small, ambitious teams where I can make a tangible impact.

Outside of work, I've built successful AI-driven side projects, including Balance, a free web app that anonymously helps people with acute anxiety, and AI Autotagger, an Eagle plugin currently processing over a million images and videos per month. (All projects listed on my personal website: https://dvy.io.)

I also have a Master's degree in Chemistry, with specialization in computational chemistry.

Right now, I’m selectively seeking roles at companies building meaningful products that integrate cutting-edge AI thoughtfully and creatively.


Claude 3.5 Sonnet is great, but on a few occasions I've gone round in circles on a bug. I gave it to o1 pro and it fixed it in one shot.

More generally, I tend to give o1 pro as much of my codebase as possible (it can take around 100k tokens) and then ask it for small chunks of work which I then pass to Sonnet inside Cursor.

Very excited to see what o3 pro can do.


If you do `/cost` it will tell you how much you've spent during that session so far.


If it's useful to anyone, I made a VS Code/Cursor extension that combines all open files into one big text document.

I use it with ChatGPT's o1 pro (which can handle around 100,000 tokens).

1. Open all of the files I think are relevant

2. Use the extension to combine them

3. Copy and paste into ChatGPT

https://marketplace.visualstudio.com/items?itemName=DVYIO.co...


I’ll be using this, thank you!


It disappoints me when otherwise intelligent people take him for his word at this point. Even ignoring his descent into political madness and conspiracy, he's simply not trustworthy.

Fool me once, shame on Elon. Fool me 194 times, shame on me.


It's trickier than I thought it would be.

Here are a few in Degar style I made after training for 2,500 steps. I'd love to hear what you think of them. To my (untrained) eye, they seem a little too defined, perhaps?

https://imgur.com/a/sqsQLPg


Yep absolutely nothing like degas well I take that back. I think it picked up some favorite colors/tones. But it has no concept of the materials or poses or composition. So plasticky! Compare to https://images.app.goo.gl/JiDRYNNKUP9tczkQ7


I suspect it really needs more training examples. The problem I found when I looked for images to use was that 60% were of dancers, and from past experience, it will end up trying to fit a dancer into every image you create. But of course, there are only a (small) finite number of Degas images that you can train with.

A possible solution may be to incorporate artificial images in the training data. So, create an initial LoRA with the original Degas images and generate 500 images. From those generated images, pick the ones that most resemble Degas. Add those to the training set and train again. Repeat until (hopefully) it learns the correct style.


Out of curiosity, what do you think of these? https://imgur.com/a/8p7RlMe


Significantly better - they feel like watercolor more than degas but if that’s flux I am impressed!


Unfortunately, not Flux. They're from Midjourney, using a few Degas as a style reference.

Whatever they're doing at Midjourney is still impressive. No training needed and a better result.


I'm curious to give this a go. I've been training a lot of LoRAs for FLUX dev recently (purely for fun). I'm sure there must be a way to get this working.

Here are a few I've recently trained: https://civitai.com/user/dvyio


This looks really good! What is your process to get this kind of high quality LoRAs?


Thank you!

A reasonable amount of training images (50 or so), and then I train for 2,000-ish steps for a new style.

Many of them work well with Flux, particularly if they're illustration-based. Some don't seem to work at all, so I didn't upload those!


How long does this take, and on what equipment? It's amazing to me that you can do this from just 50 images, I would have thought tens of thousands.


It's very impressive. I aim for around 50 images if I'm training a style, but only 10 to 20 if training a concept (like an object or a face).

I have a MacBook Air so I train using the various API providers.

For training a style, I use Replicate: https://replicate.com/ostris/flux-dev-lora-trainer/train

For training a concept/person, I use fal: https://fal.ai/models/fal-ai/flux-lora-fast-training

With fal, you can train a concept in around 2 minutes and only pay $2. Incredibly cheap. (You could also use it for training a style if you wanted to. I just found I seem to get slightly better results using Replicate's trainer for a style.)


$2 for 2 minutes? Can't you get less than $2 for 1 hour using GPU machines from providers like runpod or AirGPU? I found it a bit expensive to use replicate and fal after 10 minutes of prompting.

I have not used runpod or airgpu, and not affiliated.


Yes, renting raw compute via Runpod and friends will generally be much cheaper than renting a higher level service that uses that compute e.g. fal.ai or Replicate. For example, an A6000 on fal.ai is a little over $2/hr (they only show you the price in seconds, perhaps to make it more difficult to compare with ordinary GPU providers); on Runpod an A6000 is less than half that, $0.76/hr in their managed "Secure Cloud." If you're willing to take some risk of boxes disappearing, and don't need much security, Runpod's "Community Cloud" is even cheaper at $0.49/hr.

Similar deal with Replicate: an A100 there is over $5/hr, whereas on Runpod it's $1.64/hr.

And if you use the "serverless" services, the pricing becomes even more astronomical; as you note, $1/minute is unreasonably expensive: that's over 20x the cost of renting 8xH100s on Runpod's "Secure Cloud" (and 8xH100s are extreme overkill for finetuning image generators: even 1xH100 would be sufficient, meaning it's actually 160x markup).


Wow, fantastic, thanks! I thought it would be much, much more expensive than this. Thanks for the info!


Happy to help! It's a lot of fun. And it becomes even more fun when you combine LoRAs. So you could train one on your face, and then use that with a style LoRA, giving you a stylised version of your face.

If you do end up training one on yourself with fal, it should ultimately take you here (https://fal.ai/models/fal-ai/flux-lora) with your new LoRA pre-filled.

Then:

1. Click 'Add item' to add another LoRA and enter the URL of a style LoRA's SafeTensor file (with Civitai, go to any style you like and copy the URL from the download button) (you can also find LoRAs on Hugging Face)

2. Paste that SafeTensor URL as the second LoRA, remembering to include the trigger word for yourself (you set this when you start the training) and the trigger word for the style (it tells you on the Civitai page)

3. Play with the strength for the LoRAs if you want it to look more like you or more like the style, etc.

-----

If you want a style LoRA to try, this one of SNL title cards I trained actually makes some great photographic images. https://civitai.com/models/773477/flux-lora-snl-portrait (the download link would be https://civitai.com/api/download/models/865105?type=Model&fo...)

-----

There's a lot of trial and error to get the best combinations. Have fun!


Have you tried img2text when training a style?

I want to make a LoRA of Peokudin-Gorskii photographs from the Library of Congress collection and they have thousands of photos, so I’m curious whether that’s effective for autogenerating the caption for images.


It's funny you should ask. I recently released a plugin (https://community-en.eagle.cool/plugin/4B56113D-EB3E-4020-A8...) for Eagle (an asset library management app) that allows you to write rules to caption/tag images and videos using various AI models.

I have a preset in there that I sometimes use to generate captions using GPT-4o.

If you use Replicate, they'll also generate captions for you automatically if you wish. (I think they use LLaVA behind the scenes.) I typically use this just because it's easier, and seems to work well enough.


That’s awesome! Thank you for the replicate link too. I didn’t know they also did LoRA training. They’ve been kind of hitting it lit the park lately.


Thanks for all this! I had created a SD LoRA of my face back in the day, time for another one!


Awesome! :)


I was impressed by Upstash's approach to something similar with their "Semantic Cache".

https://github.com/upstash/semantic-cache

  "Semantic Cache is a tool for caching natural text based on semantic similarity. It's ideal for any task that involves querying or retrieving information based on meaning, such as natural language classification or caching AI responses. Two pieces of text can be similar but not identical (e.g., "great places to check out in Spain" vs. "best places to visit in Spain"). Traditional caching doesn't recognize this semantic similarity and misses opportunities for reuse."


I strongly advise not relying on embedding distance alone for it because it'll match these two:

1. great places to check out in Spain

2. great places to check out in northern Spain

Logically the two are not the same, and they could in fact be very different despite their semantic similarity. Your users will be frustrated and will hate you for it. If an LLM validates the two as being the same, then it's fine, but not otherwise.


I agree, a naive approach to approximate caching would probably not work for most use cases.

I'm speculating here, but I wonder if you could use a two stage pipeline for cache retrieval (kinda like the distance search + reranker model technique used by lots of RAG pipelines). Maybe it would be possible to fine-tune a custom reranker model to only output True if 2 queries are semantically equivalent rather than just similar. So the hypothetical model would output True for "how to change the oil" vs. "how to replace the oil" but would output False in your Spain example. In this case you'd do distance based retrieval first using the normal vector DB techniques, and then use your custom reranker to validate that the potential cache hits are actual hits


Any LLM can output it, but yes, a tuned LLM can benefit with a shorter prompt.


A hybrid search approach might help, like combining vector similarity scores with e.g. BM25 scores.

Shameless plug (FOSS): https://github.com/jankovicsandras/plpgsql_bm25 Okapi BM25 search implemented in PL/pgSQL for Postgres.


Thanks. I used the Reddit share link but I guess they use that as a way to try and get people to register.

Unfortunately I can't change the original link. Perhaps dang can.


Email such suggestions/requests to the mods at hn@ycombinator.com


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: