Hacker News new | past | comments | ask | show | jobs | submit | more m11a's comments login

Perhaps a silly question, but should an event loop actually be multithreaded?

My understanding was that tasks in an event loop should yield after they dispatch IO tasks, which means the event loop should be CPU-bound right? If so, multithreading should not help much in theory?


If your workload is actually CPU-bound after you've deferred IO to the background, that's exactly when multithreading can help.

I've seen code that spends disproportionate CPU time spent on e.g., JSON (de)serializing large objects, or converting Postgres result sets into native data structures, but sometimes it's just plain ol' business logic. And with enough traffic, any app gets too busy for one core.

Single-threaded langs get around this by deploying multiple copies of the app on each server to use up the cores. But that's less efficient than a single, parallel runtime, and eliminates some architectural options.


You are correct, async-io is cooperative. This seems an attempt to enhance these async-io cooperative threads work more like goroutines in golang. Golang can start threads if it "thinks" the workload needs more CPU.


This was exactly my question. Why do you even need an event loop? If awaits are just thread joins then what is the event loop actually doing? IO can just block, since other coroutines are on other threads and are unaffected.

Which is to say, why even bother with async if you want your code to be fully threaded? Async is an abstraction designed specifically to address the case where you're dealing with blocking IO on a single thread. If you're fully threaded, the problems async addresses don't exist anymore. So why bother?


Looking at the article he's not implementing `Task` with `Thread` - he's round-robinning processing `Task`s through simple `ThreadPool`. So instead of a single `Thread` making continuous progress on the work in the event loop he instead has a set of `Thread`s making progress _in parallel_ on work in the event loop. This is very much Java 21's approach to virtual threads (as well as in-language task runners like the kind you find in Scala libraries like ZIO, Monix, Cats, and the venerable Scalaz).


How is that materially different than just making async invocations thread forks and awaits into joins? I understand what the code is doing, I just don't understand what the point is, when it seems like the net effect is the same as just writing threaded code.


The difference is that you can spin up only so many OS threads and you can run several orders of magnitude more "green threads" / "tasks" like this that round-robin onto system threads that comprise your event loop executor. The key thing to understand is that `await` doesn't block the backing thread it simply stops the current task (the backing thread moves on to picking up the next ready task from the queue and running the new task to its next await point).


If I understand correctly, it sounds like the idea is to map N tasks to M threads.

I suppose it’d only really be useful if you have more tasks than you can have OS threads (due to the memory overhead of an OS thread), then maybe 10,000 tasks can run in 16 OS threads.

If that’s the case, then is this useful in any application other than when you have way too many threads to feasibly make each task an OS thread?


The idea is to map N tasks to M threads. This is useful more than just when you needd more threads than the OS can spin up. As you scale up the number of threads you increase context switching and cpu scheduling overhead. Being able to schedule A large number of tasks with a small number of threads could reduce this overhead.


Having too many threads all running at the same time can also cause a performance hit, and I don't mean hitting the OS limit on threads. The more threads you have running in parallel(remember this is considering a GIL-less setup) the more you need to context switch between the. Having fewer threads all running in an event loop allows you to manage more events with only a few threads, for example setting the number of event loop threads to the number of cores on the cpu.


> Companies like Oracle, Deloitte, McKinsey, etc. are experts at extracting large sums of money from large dysfunctional organizations.

I wonder, how do they do it? How do they sell sub-par products/services that a company arguably doesn't need at a premium price?


Lot of people don't care about spending someone else's money efficiently. And in some cases spending more will make them look better and lead to better opportunities. And this happens on every level, lot of employees just don't care. And those that might probably focus on wrong things.

And no software people are no better, going for expensive tools, expensive cloud spending or just next shiny thing is often argued with some time to market or future scaling excuse. Or just wanting to do something different.


Paul Graham’s essays are well regarded, and I think they include all of the advice others gave mentioned here from other books

https://paulgraham.com/articles.html


Nice thing about git feature flags: the state of your system is all in one place. It changes only when you do a code release, which includes feature flag values, or an infrastructure release. You can easily see, just off git history, when the state of the system changed (and what changed), which makes incident debugging much easier.

With DB feature flags, there's one more source of truth for changes to production infra.

(downside, of course, is that changing feature flag values is much slower using git vs DB)


This! 100% All the same benefits of config as code, infra as code. And with feature flags if something goes wrong its a simple `git revert` to get back into the previous state.

Another benefit is you can easily replicate the current (or previous) state of production/staging/etc flags locally just by doing a `git clone` and then run our self-hosted version locally. Its a single binary, can be installed with curl or homebrew and can read the flag state from your local filesytem.

This allows you to test your code locally or in CI with the same state in production


Maybe I’m not fully understanding what this product is doing. For something like configs/feature flags, it imo should be dynamic.

I was thinking that git (in a separate repo from main code) would be used to store the changes to configs, but then you would still need a system that tails the git changes and distributes them to clients.

That’s the way config serving was done at Facebook - a mercurial repo with all configs and tooling to edit configs which creates mercurial commits. Then, the mercurial repo is continuously tailed and values are saved to ZooKeeper, and then client libraries read config data from zookeeper / subscribe to updates / etc.


This is actually how it works.

Flipt is live tailing the repository and serving this dynamically to the clients.

The repo with flag configuration can be solely for flags, or alongside other infra configuration on in more of a monorepo. You decide how you want it setup.

Obviously if it is alongside code, you may to contend with CI in order to validate a change. But with the rules in CI or other monorepo tooling, what runs and when can adjust this behavior to improve time for configuration to become live.

Once a configuration change is integrated into a target branch in the repo, then it becomes readable for Flipt and servable once fetched.


Although I agree with all the issues you raise, I think the confusing nature of the pricing is by design.

Stripe's fee model probably makes serious bank, because it's quite non-transparent and basically nothing is provided for free. Want even the slightest extra feature? +1% of your revenue (and repeat)


It's worth noting that most customers with any volume will be on custom contracts that will either be cheaper, include more features, or both. At my previous workplace we had this with relatively modest revenue and would renegotiate every year or two.


Is there a process for getting an account manager with Stripe? I think it would be useful for other reasons but cheaper fees is certainly good too.


I'd do like that:

- Have a large enough transaction volume, say, above $100k / mo.

- Use the contact form.

Note that if your transaction volume is $100k/mo, and Stripe takes 3% of it, it's $3k/mo for them. Maybe a customer that's worth paying extra attention to, but of course by spending a minimal amount of time.

I suspect that if your transaction volume were, say, $1M/mo, they would have contacted you already.


There's a "Contact Sales" link right next to the sign-in button on stripe.com.


“Everyone gets cheaper pricing except anyone you know.”


Ah, the "we're out of ideas and have nothing, time to squeeze until the company dies, to chase stock price" phase of a business.

Like your cable company!


It's worked for 45+ years for cable companies.


Negotiating for medium and high volume sales isn’t exactly unheard of…

It’s kind of the rule rather than the exception in B2B. Hell, a lot of businesses will give you better terms just for opening a business account.


Yeah well, this article is called “I hate Stripe.” People hate Comcast too.


Stripe is also one of the most expensive payment services - enough so that I worked at a company where we had an in-house payments frontend for Asia which was basically a common integration over about 6 different major regional banks. The reason was pretty obvious: Stripe was something like a 2.5% surcharge, our system 0.2-0.5%. That was a huge difference in revenue.


How did you get the visa and mastercard interchange fees down so low? Just the interchange fees for non debit cards can be as high as 2.5% (premium travel cards for example), not even mentioning Stripe's fee then.


AFAIK it was basically just minimising Stripes cut so overall the cost may have been higher then that.


> Stripe's fee model probably makes serious bank, because it's quite non-transparent and basically nothing is provided for free.

Whatever it is making, it is apparently not enough to IPO. I would temper my expectations.


Stripe ['s leadership] doesn't want to IPO, and its balance sheet doesn't need to. There's ridiculous demand for a Stripe IPO, whether it could is not at all in question.


An IPO is, if anything, a bad thing for a company. I wouldn't consider it a black mark that they aren't selling stock publicly and subjecting themselves to the "must grow every quarter at all costs" insanity.


You can make plenty of money and still be private. Look at Steam.


I’d say it’s not enough to satisfy early investors, who probably want a 10~50x return before IPO. They could definitely go public if they wanted to, and do ‘well’.


Does Stripe need to IPO? Are they just privately held, or do they have VCs or others that need to be paid back?


They have raised at least $6.5 Billion.


I think some contributing factors are:

1) a lot of questions have been answered. How often do you ask something that is answered by a 2010-2014 answer very well? IME, often

2) programming isn’t as niche as it used to be. these days, a lot of SO is just asking for free labour on very niche questions like an exception someone is getting. which was never the point of SO

there’s a wealth of answers on SO historically, which are great. But I don’t remember the last time I used SO to answer a specific question for a recent software tool, or relied on any post-2021 question/answer


Could use the notify to awake, and then the worker needs to lock the job row? Whichever worker gets the lock, acts on the message


Thats exactly what we do but taking a lock takes 1 RTT to the database which means about 100ms. it limits the number of events receivers can handle. IF you have too many events, receivers will be just trying to take a lock most of the time.


Of my head, you could attach a uuid or sequence number to the emitted event. Then based on the uuid or sequence you can let one or the other event consumer pick?

Ex. you have two consumers, if the sequence number is odd, A picks it, if its even B picks.


Great observation, I wrote a little wrongly. What we want ideally is guaranteed delivery to one random free worker. Uuid strategy is better than locking but this could mean that if one worker gets a longer job, all the others on this worker are delayed even if another worker is free.


I think this concept you mentioned would be called sharding? It's kinda required for apache Kafka and others.


Without more logic than this does this mean any workers that are busy or down mean you lose jobs?


We use a garbage collector to error restart if a job is not served within a specified amount of time.


Maybe it's just me, but I think it's purely rational?

A lot of these OSS projects provided a commercial offering in the form of SaaS. Yet AWS/GCP/Azure can just take the OSS project, not contribute anything, and reap all the profit.

AFAICS, these licenses are only intended to defend against the cloud providers, not against companies just using the product commercially and internally.


Amazon, Google, and Microsoft can do anything, snuffing out some mid company is the least of their crimes.

> not contribute anything, and reap all the profit

Yeah, it's called capitalism. What's their stance on Lina Khan breaking up the monopolies? Have they written letters criticising Reid Hoffman for pressuring Kamala?


> This is exactly a great point. When data size goes to a billion rows, Postgres is tough. MongoDB just works without issue.

Personally, I've not seen any application that seriously needs a billion rows in a single table. (except at truly massive scale, but then you're not using Mongo)

The real solution is implementing archiving to a file store like S3 and/or ship it off to a data warehouse. You don't need billions of rows in a `record_history`/`user_audit` table going back 5 years in your production database. Nobody queries the data.


May be we are the odd one here but we need that data at millisecond latency (no those are not logs, we use ClickHouse for that)

Just wanted to put here that it's possible to scale Mongo to this level.


I suppose the question is: at what cost?

eg on RDS, they'll give you instances with 1TB of RAM, eg a `db.r6idn.32xlarge`, at the nice price of $75/hr ($54k/mo). Not to mention that, in a microservices architecture, assuming you're not sharing a database, you might be multiplying that figure out a few times.

So just because it's possible for it to fit in RAM doesn't mean it's economical. RAM isn't exactly getting exponentially cheaper or more spacious anymore. The hope was flash memory would be the solution, but not sure how far that's getting these days.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: