Hacker News new | past | comments | ask | show | jobs | submit | obviyus's comments login

> We have not received any unsolicited advice asking us to rewrite everything in Rust or Nim (yet).

One wonders if that’s changed since 2020…


Been using it for a couple of hours and it seems it’s much better at following the prompt. Right away it seems the quality is worse compared to some SDXL models but I’ll reserve judgement until a couple more days of testing.

It’s fast too! I would reckon about 2-3x faster than non-turbo SDXL.


I'll take prompt adherence over quality any day. The machinery otherwise isn't worth it i.e the controlnets, openpose, depthmaps just to force a particular look or to achieve depth. Th solution becomes bespoke for each generation.

Had a test of it and my option is it's an improvement when it comes to following prompts and I do find the images more visually appealing.


Can we use its output as input to SDXL? Presumably it would just fill in the details, and not create whole new images.


I was thinking that exactly. You could use the same trick as the hires-fix for an adherence-fix.


Yeah chain it in comfy to a turbo model for detail


A turbo model isn't the first thing I'd think of when it comes to finalizing a picture. Have you found one that produces high-quality output?


For detail, it'd probably be better to use a full model with a small number of steps (something like KSampler Advanced node with 40 total steps, but starting at step 32-ish.) Might even try using the SDXL refiner model for that.

Turbo models are decent at low-iteration-decent-results, but not so much at adding fine details to an mostly-done image.


How much VRAM does it need? They mention that the largest model uses 1.4 billion parameters more than SDXL, which in turn need a lot of VRAM.


There was a leak from Japan yesterday, prior to this release, and in that it was suggested 20gb for the largest model.

This text was part of the Stability Japan leak (the 20gb VRAM reference was dropped in the release today):

"Stages C and B will be released in two different models. Stage C uses parameters of 1B and 3.6B, and Stage B uses parameters of 700M and 1.5B. However, if you want to minimize your hardware needs, you can also use the 1B parameter version. In Stage B, both give great results, but 1.5 billion is better at reconstructing finer details. Thanks to Stable Cascade's modular approach, the expected amount of VRAM required for inference can be kept at around 20GB, but can be reduced even further by using smaller variations (as mentioned earlier, this (which may reduce the final output quality)."


Thanks. I guess this means that fewer people will be able to use it on their own computer, but the improved efficiency makes it cheaper to run on servers with enough VRAM.

Maybe running stage C first, unloading it from VRAM, and then do B and A would make it fit in 12 or even 8 GB, but I wonder if the memory transfers would negate any time saving. Might still be worth it if it produces better images though.


Sequential model offloading isn’t too bad. It adds about a second or less to inference, assuming it still fits in main memory.


Sometimes I forget how fast modern computers are. PCIe v4 x16 has a transfer speed of 31.5 GB/s, so theoretically it should take less than 100 ms to transfer stage B and A. Maybe it's not so bad after all, it will be interesting to see what happens.


If you're serious about doing image gen locally you should be running a 24GB card anyway because honestly Nvidia's current generation 24GB is the sweet spot price to performance. 3080 ram is laughably the same as the 6 year old 1080Ti and 4080 ram is only slightly more at 16 and costs about 1.5 times the 3090 second hand.

Any speed benefits of the 4080 are gonna be worthless the second it has to cycle a model in and out of ram anyway vs the 3090 in image gen.


> because honestly Nvidia's current generation 24GB is the sweet spot price to performance

How is the halo product of a range the "sweet spot"?

I think nVidia are extremely exposed on this front. The RX 7900XTX is also 24GB and under half the price (In UK at least - £800 vs £1,700 for the 4090). It's difficult to get a performance comparison on compute tasks, but I think it's around 70-80% of the 4090 given what I can find. Even a 3090, if you can find one, is £1,500.

The software isn't as stable on AMD hardware, but it does work. I'm running a RX7600 - 8GB myself, and happily doing SDXL. The main problem is that exhausting VRAM causes instability. Exceed it by a lot, and everything is handled fine, but if it's marginal... problems ensue.

The AMD engineers are actively making the experience better, and it may not be long before it's a practical alternative. If/When that happens nVidia will need to slash their prices to sell anything in this sphere, which I can't really see themselves doing.


>How is the halo product of a range the "sweet spot"?

Because it’s actually a bargain second hand (got another for £650 last week buy it now eBay) and cheap for the benefit it offers for any professional who needs it.

3090 is the iPhone of AI, people should be ecstatic it even exists not complaining about it.


> because honestly Nvidia's current generation 24GB is the sweet spot price to performance

You're aware the 3090 is not the current generation? You can see why I would think you were talking about the 4090?


Think it's weirder you assumed I meant 4090 when it doesn't really offer enough benefit over the 3090 to justify it's cost, and you mentioned an incorrect price for the 3090 anyway so it's not like you weren't talking about 3090.


> If/When that happens nVidia will need to slash their prices to sell anything in this sphere

It's just as likely that AMD will raise prices to compensate.


You think they're going to say "Hey, compute became competitive but nothing else changed performance therefore... PRICE HIKE!"? They don't have the reputation to burn in this domain for that IMHO.

Granted you could see a supply/demand related increase from retailers if demand spiked, but that's the retailers capitalising.


The price hike would come proportionally with each new hardware generation as their software stack becomes competitive with CUDA. Since it will most likely take them multiple generations to catch up. It doesn't make sense for them to sell hardware with comparable capability at a deep discount. Plenty of companies will want to diversify hardware vendors just to have a bargaining chip against Nvidia.


If it worked I imagine large batching could make it worth the load/unload time cost.


Shouldn't be a reason you couldn't do a ton of Layer C work on different images, and then swap in Layer B.


Should use no more than 6GiB for FP16 models at each stage. The current implementation is not RAM optimized.


The large C model uses 3.6 billion parameters which is 6.7 GiB if each parameter is 16 bits.


The large C model have fair bit of parameters tied to text-conditioning, not to the main denoising process. Similar to how we split the network for SDXL Base, I am pretty confident we can split non-trivial amount of parameters to text-conditioning hence during denoising process, loading less than 3.6B parameters.


What's more, they can presumably be swapped in and out like the SDXL base + refiner, right?


Can one run it on CPU?


Stable Diffusion on a 16 core AMD CPU takes for me about 2-3 hours to generate an image, just to give you a rough idea of the performance. (On the same AMD's iGPU it takes 2 minutes or so).


WTF!

On my 5900X, so 12 cores, I was able to get SDXL to around 10-15 minutes. I did do a few things to get to that.

1. I used an AMD Zen optimised BLAS library. In particular the AMDBLIS one, although it wasn't that different to the Intel MKL one.

2. I preload the jemalloc library to get better aligned memory allocations.

3. I manually set the number of threads to 12.

This is the start of my ComfyUI CPU invocation script.

    export OMP_NUM_THREADS=12
    export LD_PRELOAD=/opt/aocl/4.1.0/aocc/lib_LP64/libblis-mt.so:$LD_PRELOAD
    export LD_PRELOAD=/usr/lib/libjemalloc.so:$LD_PRELOAD
    export MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms: 60000,muzzy_decay_ms:60000"
Honestly, 12 threads wasn't much better than 8, and more than 12 was detrimental. I was memory bandwidth limited I think, not compute.


Even older GPUs are worth using then I take it?

For example I pulled a (2GB I think, 4 tops) 6870 out of my desktop because it's a beast (in physical size, and power consumption) and I wasn't using it for gaming or anything, figured I'd be fine just with the Intel integrated graphics. But if I wanted to play around with some models locally, it'd be worth putting it back & figuring out how to use it as a secondary card?


One counterintuitive advantage of the integrated GPU is it has access to system RAM (instead of using a dedicated and fixed amount of VRAM). That means I'm able to give the iGPU 16 GB of RAM. For me SD takes 8-9 GB of RAM when running. The system RAM is slower than VRAM which is the trade-off here.


Yeah I did wonder about that as I typed, which is why I mentioned the low amount (by modern standards anyway) on the card. OK, thanks!


2GB is really low. I've been able to use A111 stable diffusion on my old gaming laptop's 1060 (6GB VRAM) and it takes a little bit less than a minute to generate an image. You would probably need to try the --lowvram flag on startup.


No, I don't think so. I think you would need more VRAM to start with.


SDXL Turbo is much better, albeit kinda fuzzy and distorted. I was able to get decent single-sample response times (~80-100s) from my 4 core ARM Ampere instance, good enough for a Discord bot with friends.


Sd turbo runs nicely on a m2 MacBook Air (as does stable lm 2!)

Much faster models will come


Which AMD CPU/iGPU are these timings for?


AMD Ryzen 9 7950X 16-Core Processor

The iGPU is gfx1036 (RDNA 2).


If that is true, then the CPU variant must be a much worse implementation of the algorithm than the GPU variant, because the true ratio of the GPU and CPU performances is many times less than that.


Not if you want to finish the generation before you have stopped caring about the results.


You can run any ML model on CPU. The question is the performance


Interesting, which carrier?

I was told by AirTel transferring my eSIM is not supported. I had to go to a store to get a new SIM (physical this time).


Mine is AirTel and I was successful quite a few times.


The Planetscale blog consistently puts out such high quality posts. I can highly recommend following their YouTube channel as well. I’ve learned a ton from the incredibly well made videos Aaron puts out!


Yep! Aaron is great, isn't he!!? I make it a point to watch all the videos he puts out. We appreciate your support!


Thanks for the recommendation.

https://www.youtube.com/@PlanetScale

Subscribed.


Taken down? I see a 404 there now


Yep! We have a static recap on bfcm.stripe.dev and a text-based companion here in case you're curious: https://stripe.com/newsroom/news/bfcm2023.


> One of the unfortunate things that happened is I was put in a situation where I had to choose between doing what was needed to save the company and the businesses who relied on it or listen to the daily change in priorities from inexperienced non-technical leadership.

Learning to navigate this has been a skill in and of itself. Glad the intense effort worked out for the author


Uplause | https://uplause.io | React Native Developer | Full-Time | Hyderabad, India

At Uplause, we want to be the de-facto platform for creators to develop multiple monetisation streams.

We are looking for a React Native Engineer to join our mobile team to work on our suite of products; the Uplause Creator + Fan Apps (a total of 4). We've built our current portfolio of mobile apps with just a lean team of 3. We're pre-seed at this stage, so you'll be an influential part of our team, culture and direction.

Our architecture:

- Our backend uses Node.js, with detailed API docs for all interactions with the client

- Mobile apps (Creator + Fan apps for both iOS and Android) are made using React Native to optimise our velocity of shipping features. You can find them on both the App/Play Store

- We make heavy use of TypeScript to eliminate whole classes of bugs all around our codebases.

Due to the high degree of autonomy and responsibility of this role, we will not be able to consider applicants without prior experience with React Native.

You'll be working closely with our design team to build new features, work on optimising UX and have a high degree of autonomy in whatever you do. You can find more details here: https://rentry.co/uplause-me

Please feel free to reach out to me with your résumé at zaidi[at]uplause[dot]io.


o/

I find explicable joy in the social experience of listening to music together. One weekend I threw together a PoC with Mediasoup + socket.io and I was amazed how easy it was to get started.

In my very amateur testing, I was able to observe < 100ms of de-sync between 7 different devices, each on a different network. I'm still not too happy about the exact implementation so I'm hoping having some experienced eyes on the project could guide me in the right direction.

Source: https://github.com/obviyus/chanson.live


I've checked out your profile and uplause, it looks like you're also making an onlyfans for india? Not sure what the value add is.


We get that comparison a lot, which is fair considering OnlyFans is what comes to mind when you think “exclusive content” for a lot of folks.

We’ve been in conversation with > 150 local creators and the requirement isn’t just another paywalled Instagram. What about the ability to book a 1-on-1 consultation with a French tutor, along with integrated video calls on the same platform? What about the ability to sell courses? As it stands today, each requirement calls for adding another platform/tool to your dashboard, which in turn piles on its own fixed costs. In time, we plan to cater to a few select feature sets and do it well, all with simple, transparent pricing.

If OnlyFans/Patreon exist, what’s our value add? From what I can tell, neither of those platforms allow payments apart from Credit Cards (for context, fewer than 3% Indians have a Credit Card), neither caters to the language barrier and neither has regional pricing ($5/mo is a lot here!). Above all, their revenue cut is higher than it needs to be (18% for Patreon, 20% for OnlyFans).

We’ve been hard at work on developing these, so the marketing website has been a bit neglected. We should have an improved version up soon, that explains what we focus on a lot better.


Love to see more WebRTC! Shame to see FTL go, I contributed a few PRs to Lightspeed[1] which relied on FTL to achieve sub-second latencies.

On a slightly off-topic note, I’ve been working on a simple WebRTC radio project[2] for a social listening experience. In my limited testing with a few friends, I was able to get < 100ms of lag (audio de-sync) between different players on different networks. It has been an absolute joy to use. The social experience of listening to music together somehow really appeals to me.

It was pleasantly simple to get it up and running with socket.io + Mediasoup as the SFU. I plan to flesh it out a lot more shortly but I’m a bit of a novice. Would love to have some more experienced eyes on the project :D

[1]: https://github.com/GRVYDEV/Lightspeed-ingest

[2]: https://github.com/obviyus/radio


Big fan of FTL also, it is a shame that it was orphaned. In a world where Mixer lives on I could see FTL being a lot more places. ftl-sdk was nice to work with since it was just a single C library, and only concerned itself with streaming client/server.

The way I see it WebRTC is the best way forward to enable people to build interesting things. I was so excited to itemize all the use cases that would be added to OBS by merging WebRTC support. I also wrote a little bit about 'Why WebRTC' https://pion.ly/blog/why-webrtc/. The big advantage WebRTC has over FTL is that is does more. Broadcast is a small part of WebRTC. I think WebRTC's superpower is that it can be used for so many different use cases and industries. Even if another protocol is better at a single use case, it is hard to beat the breadth of WebRTC.

WebRTC isn't without its issues. I think that is fixed by better education + more implementations and owners. We shall see what the future holds :)


A few years back I also used WebRTC with different backends (Kurento, Puppeteer, Mediasoup) to try and make an audio-based social network. I got pretty far but I was on my own and a grad student at the same time so the project died. A few years later Clubhouse became viral, it was essentially the same thing I was doing.

I still use WebRTC for my personal projects but the difficulty of compiling and embedding it in a backend makes it really hard to use by the masses. My next project is going to use Pion, and I'm going to make a static compilation and just distribute the executable and have the main service use IPC to communicate with the Pion module.


That sounds like a really interesting project!

If there is anything I can do to help please tell me, always happy to help :) Either email me sean @ pion.ly or join https://pion.ly/slack.

Slack is better, but some prefer email!


OBS is heavily architected around plugins, many of the built-in features including FTL output are plugins, so even if it is removed it would likely be possible to turn FTL support into a separate plugin that users can install. For reference the FTL code is in the plugin called obs-outputs (part of the main OBS repo). The only real limitation I know of is that it is likely not possible to make the UI exactly the same as it currently is.


Has anyone who has used PlanetScale in production comment about their experience? I was evaluating a few options a couple of weeks ago but ended up going with just RDS due to lack of feedback for PlanetScale here on HN.


I left Aurora serverless (v1, v2 pricing was insane) for PS and I've been extremely happy. It was way cheaper and less of a headache than RDS and my actual usage was way less than I anticipated (since it's hard to think in terms of row reads/writes when working at a high level). With PS I get a dev/qa/staging/prod DB for $30/mo vs spinning up multiple RDS instances. Even with Aurora Serverless (v1) where you can spin down to 0 it was cheaper to go with PS. 1 DB unit on Aurora Serverless (v1) cost like $45/mo (for my 1 prod instance) so for $15 less I got all my other environments without having to wait for them to spin up after they went to sleep.

My usage is way under some of my sibling comments but it's a been a joy to use and $360/yr to not have to worry about scaling my DB, backups, schema migrations, and now caching is a steal for me. Could I run my DB on a $5/mo DO box (or similar)? Probably, though I'd probably want at least the $10/$15 size box for when my software gets a little more load. Even if I knew for sure I could run on the $5 box I'd still rather pay $30/mo to never worry about my DB and the schema migration tool is awesome.


We have been running PlanetScale as our production database for about 6 months, migrated from Aurora Serverless. I love it, their query insights tool has been a game changer for us and has allowed us to optimize a ton of queries in our application. Their support is always available and highly technical.

For a sense of scale, we have ~150gb of data running around 5 trillion row reads + 500 million row writes per month


We’re you using the Aurora Serverless data APIs? Curious if there is something equivalent on PlanetScale.



I was not, we are a Laravel PHP backend, using the standard PHP stuff for connection management


From what I understand your webserver and php implementation is stored on different servers from PlanetScale's DBs(?)

Just wonder: How are the DB queries from your php implementation to the Planetscale DBs affected by network latency (hops and length between servers) as well as bandwidth (query results returned by PlanetScale DBs)?

Thanks! :-)


We looked at it, but it was a little "different" and we didn't want the learning curve, so we went with ScaleGrid instead.

This caching does look cool, perhaps I'll revisit PlanetScale later on my own time.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: