I'm not sure of the exact threshold, but the pathological case seemed to be (1) many tasks in the backlog, (2) many workers, (3) workers long-polling the task tables at approximately the same time. This would consistently lead to very high spikes in CPU and result in a runaway deterioration on the database, since high CPU leads to slower queries and more contention, which leads to higher connection overhead, which leads to higher CPU, and so on. There are a few threads online which documented very similar behavior, for example: https://postgrespro.com/list/thread-id/2505440.
Those other points are mostly unrelated to the core queue, and more related to helper tables for monitoring, tracking task statuses, etc. But it was important to optimize these tables because unrelated spikes on other tables in the database could start getting us into a deteriorated state as well.
To be more specific about the solutions here:
> buffered reads and writes
To run a task through the system, we need to write the task itself, write the instance of that retry of the count to the queue, write an event that the task has been queued, started, completed | failed, etc. Generally one task will correspond to many writes along the way, not all of which need to be extremely latency sensitive. So we started buffering items coming from our internal queues and flushing them once every 10ms, which helped considerably.
> switching all high-volume tables to use identity columns
We originally had combined some of our workflow tables with our monitoring tables -- this table was called `WorkflowRun` and it was used for both concurrency queues and queried when serving the API. This table used a UUID as the primary key, because we wanted UUIDs over the API instead of auto-incrementing IDs. The UUIDs caused some headaches down the line when trying to delete batches of data and prevent index bloat.
IMHO, with this type of issue is often more likely blowing through the multixact cache or the query planner reverting to SEQSCAN due to the number of locks or mxact id exaustion etc.. It is most likely not a WAL flush problem that commit_delay would help with.
From the above link:[1]
> I found that performing extremely frequent vacuum analyze (every 30 minutes) helps a small amount but this is not that helpful so problems are still very apparent.
> The queue table itself fits in RAM (with 2M hugepages) and during the wait, all the performance counters drop to almost 0 - no disk read or write (semi-expected due to the table fitting in memory) with 100% buffer hit rate in pg_top and row read around 100/s which is much smaller than expected.
Bullet points 2 and 3 from here [2] are what first came to mind, due to the 100% buffer hit rate.
Note that vacuuming every 30min provided "minor improvements" but the worst case of:
25000 tps * 60sec *30min * 250rows == 11,250,000,000 ID's (assuming worst case every client locking conflicting rows)
Even:
25000tps 60sec 30min
Is only two orders of magnitude away from blowing through the 32bit transaction ID's.
45,000,000
4,294,967,296
But XID exhaustion is not as hidden as the MXID exhaustion and will block all writes, while the harder to see MXID exhaustion will only block some writes.
IMHO, if I was writing this, and knowing that you are writing an orchestration platform, getting rid of the long term transactions with just a status column would be better, row level locks are writing to the row anyways, actually twice.
Long lived transactions are always problematic for scaling, and that status column would allow for more recovery options etc...
But to be honest, popping off the left of a red black tree like the linux scheduler does is probably so much better than fighting this IMHO.
This opinion is assuming I am reading this right from the linked to issue [1]
> SELECT FOR UPDATE SKIP LOCKED executes and the select processes wait for multiple minutes (10-20 minutes) before completing
There is a undocumented command pg_get_multixact_members() [3] that can help troubleshoot as many people are using hosted Postgres, the tools too look into the above problems can be limited.
It does appear that Amazon documents a bit about the above here [4].
I’m curious about the async aspect of this. I was under the impression PDF processing like OCR is purely CPU bound. OS file I/O interfaces are sync, so async does not help. With GIL, so single threaded Python, I can’t see how async improves performance for the PDF use case. Only parallelism helps, and concurrency doesn’t. When would it yield back to the event loop when it’s busy number crunching?
It just litters perfectly reasonable python code with async/await. Maybe they are preparing for something we don't know, like a parallel async executor which can be set up to use native threads without changing code and somehow protects you if it detects shared state.
Caveat: I have not looked at the neither the API nor the implementation of Kreuzberg, this is purely from personal work.
Even with CPU bound code in Python, there are valid reasons to be using async code. Recognizing that the code is CPU bound, it is possible to use thread and/or process pools to achieve a certain level of parallelism in Python. Threading won't buy you much in Python, until 3.13t, due to the GIL. Even with 3.12+ (with the GIL enabled), it's possible (but not trivial) to use threading with sub interpreters (that have their own, separate GIL). See PEP 734 [0].
I'm currently investigating the use of sub interpreters on a project at work where I'm now CPU bound. I already use multiprocessing & async elsewhere, but I am curious if PEP 734 is easier/faster/slower or even feasible for me. I haven't gotten as far as to actually run any code to compare (I need to refactor my code a bit with the idea of splitting the work up a bit differently to account for being CPU instead of just IO bound).
Will it lock the GIL if you use thread executor with asyncio for a native c / ffi extension? If that’s the case, that would also add to benefits of asyncio.
> It just litters perfectly reasonable python code with async/await
Yeah. As an API consumer I would not expect a PDF API do IO, hence be async. Have the library be sans-io, the interfaces sync and callers from async code handle IO on their end, offloading to IO threads.
Async is also referred to as “best practice”, but it’s just a tool, for specific use cases. And I say that as an “async fan”!
That said, perhaps it’s easier nowadays to just do async by default, as you say. The real world is async anyway, so why not program closer to that reality.
Async is great when you truly need it, but it can overcomplicate things when misused. Having both sync and async options, seems like the best approach. Lets devs choose based on their needs rather than forcing one paradigm.
It is probably not worth the complexity currently but considering they are using small local CPU models for OCR like tesseract, if they add the support of reading files on the web then I wouldn't be so sure of the CPU bound aspect.
The example you gave is the most trivial one possible. There is 0 reason to write that code over PrintAnything(item Stringer). Go doesn't even let you do the following:
auto foo(auto& x) { return x.y; }
The equivalent Go code would be
package main
import "fmt"
func foo[T any, V any](x T) V {
return x.y
}
type X struct {
y int
}
func main() {
xx := X{3}
fmt.Println(foo[*X, int](&xx))
}
which does not compile because T (i.e. any) does not contain a field called y. That is not duck typing, the Go compiler does not substitute T with *X in foo's definition like a C++ compiler would.
Not to mention Go's generics utterly lack metaprogramming too. I understand that's almost like a design decision, but regardless it's a big part of why people use templates in C++.
Interesting, thank you for the example. I'm mostly used to how Rust handles this, and in its approach individual items such as functions need to be "standalone sane".
func foo[T any, V any](x T) V {
return x.y
}
would also not fly there, because T and V are not usefully constrained to anything. Go is the same then. I prefer that model, as it makes local reasoning that much more robust. The C++ approach is surprising to me, never would have thought that's possible. It seems very magic.
Lots of C++ is driven by textual substitution, the same mess which drives C macros. So, not magic, but the resulting compiler diagnostics are famously terrible since a compiler has no idea why the substitution didn't work unless the person writing the failed substitution put a lot of work in to help a compiler understand where the problem is.
package main
import "fmt"
type Yer[T any] interface {
Y() T
}
func foo[V any, X Yer[V]](x X) V {
return x.Y()
}
type X struct {
y int
}
func (x X) Y() int { return x.y }
func main() {
xx := X{3}
fmt.Println(foo(&xx))
}
It is not equivalent because, per the monomorphization discussion above, putting an interface in there means that you incur the cost of a virtual function call. The C++ code will compile down to simply accessing a struct member once inlined while the Go code you wrote will emit a ton more instructions due to the interface overhead.
Depending on which implementation you use, the following may produce the same instructions as the example above. Go on, try it!
package main
import "fmt"
func foo(x *X) int {
return x.y
}
type X struct {
y int
}
func main() {
xx := X{3}
fmt.Println(foo(&xx))
}
Now, gc will give you two different sets of instructions from these two different programs. I expect that is what you are really trying and failing to say, but that is not something about Go. Go allows devirualizing and monomorphizing of the former program just fine. An implementation may choose not to, but the same can be said for C++. Correct me if I'm wrong, but from what I recall devirtualization/monomorphization is not a requirement of C++ any more than it is of Go. It is left to the discretion of the implementer.
Tried it out in godbolt, yes you are right that with the above example gc is able to realize that
func foo[V any, X Yer[V]](x X) V
can be called with only one type (X) and therefore manages to emit the same code in main_main_pc0. It all falls apart when you add a second struct which satisfies Yer [1], which leads the compiler to emit a virtual function table instead. You can see it in the following instructions in the code with a second implementation for Yer added:
Was it really necessary to try in gc...? We already talked about how it produces different instructions for the different programs. Nice of you to validate what I already said, I suppose, but this doesn't tell any of us anything we didn't already know.
The intent was for you to try it in other implementations to see how they optimize the code.
Your first paragraph is a great description of what I am currently feeling as well. There just isn’t anything there. I’m familiar with Python, Rust, C# and other random languages. Go just has nothing to offer meanwhile. I am struggling with nil and zero values every day, the lack of enums, and the bizarre error handling (wrapping etc.).
I’m waiting for it to click but nothing yet. Channels in combination with effortless (uncolored) asynchrony are maybe it. I haven’t reached that point yet though as I’m still struggling with just expressing my domain, without having invalid and primitive values, constructed and passed around by design, all over. I suppose it is not possible. Every time I see “url string” in a function or struct I die a little inside.
I kind of like the error handling, just because it's a little bit of character from a language that otherwise seems afraid to express itself. It has succeeded in ensuring that I don't have the error case as an afterthought, even if it's ugly as sin.
But it puts me in this mode where I occasionally want to write a function which doesn't return an err, and I think it ought to be doable, because I've declared a type for this string and I happen to know it's only one of three values, but then the lack of enums means that the type checker doesn't know that it's one of those three values so I've got all this extra runtime code for checking against error cases which could've been caught at compile time.
I think you're right that the concurrency story is the key to making it feel real: go routines, channels, etc. I haven't had occasion to use those things yet though, so it's pretty bland so far. When I hear people talk about them though it seems like they'd be better served by Erlang and have just settled for Go because it's popular.
Yes. The concept of references (which includes that of null pointers) and that of optionality (which includes that of nothingness) are orthogonal. The former is just an artefact of computer architecture, the latter is everyday business logic. Go mixes the two into a single concept, which is painful.
As a newcomer to Go, I find myself struggling to express business logic all the time. Zero-valued types make the situation even worse. You’re constantly constructing invalid values of types (nil/zero-valued), by design. Go is “so easy”, but when you inevitably run into one of the footguns it’s “yeah just don’t do that”.
One of the primary reasons for people to dislike Python is its dynamic typing. When Go came out, that was totally fair. But since then, Python has evolved and improved massively. It/mypy now supports type-safe structural pattern matching, for example. It’s very expressive, and safely so.
Meanwhile Go barely evolved. Generics landed just recently. They’re only now experimenting with iteration, lifting it from a purely magic, compiler intrinsic concept. And still, no enums of course. The “type system” are structs, or very leaky type wrappers (nowhere near the safety of Rust new types, for example). People are obsessed with primitives.
I can see the appeal of a simple, stable platform, but Go really ran too far with that idea.
Basically if the Jump function takes an Animal interface, the package that defines the jump function is the one that defines the Animal.Jump method. So if you provide it with whatever entity that has this method, it will just work. You don’t have to define the Animal interface in your Cat package. But if your cat does not Jump, it won’t be accepted. Of course you can also pass in a box that Jumps.
Most C# education will teach you to always make an interface for everything for some reason. Even in academia they’ll teach CS students to do this and well… it means there is an entire industry of people who think that over-engineering everything with needless abstractions is best practice.
It is what it is though. At least it’s fairly contained within the C# community in my part of the world.
Isn't that "for some reason" in C# being it's the standard way of doing dependency injection and being able to unit test/mock objects?
I've found it easier to work in C# codebases that just drank the Microsoft Kool-Aid with "Clean architecture" instead of Frankenstein-esque C# projects that decidedly could do it better or didn't care or know better.
Abstraction/design patterns can be abused, but in C#, "too many interfaces" doesn't seem that problematic.
I agree with you on this, my issue is mainly when they bring this thinking with them into other languages. I can easily avoid working with C# (I spent a decade working with it and I’d prefer to never work with it again), but it’s just such a pain in the ass to onboard developers coming from that world.
It may be the same for Java as GP mentioned it along with C#, but they tend to stay within their own little domain in my part of the world. By contrast C# is mostly used by mid-sized stagnant to failing companies which means C# developers job hop a lot. There is also a lot of them because mid-sized companies that end up failing love the shit out of C# for some reason and there are soooo many of those around here. Basically we have to un-learn almost everything a new hire knows about development or they’ve solely worked with C#.
> I've found it easier to work in C# codebases that just drank the Microsoft Kool-Aid with "Clean architecture" instead of Frankenstein-esque C# projects that decidedly could do it better or didn't care or know better.
I agree, for the most part. There's a little bit of a balance: if you just drink the kool-aid for top level stuff, but resist the urge to enter interface inception all the way down, you can get a decent balance.
e.g. on modern dotnetcore. literally nothing is stopping you from registering factory functions for concrete types without an interface with the out-of-the-box dependency injection setup. You keep the most important part, inversion of control. `services.AddTransient<MyConcreteClass>(provider => { return new MyConcreteClass(blah,blah,blah)});`
I created NS records on example.com to delegate all of home.example.com to a wholly different DNS provider. That provider then manages (all of) that zone, but nothing more (important records such as MX remain on example.com).
That way you can never lock lock B if you have not received a guard aka lock from lock A prior. Ensured on the type level.
I suppose doing this at scale is a real challenge.
reply