I'd just call the function once by avoiding the global; construct your database access object at the start of your asynchronous main method and dependency inject it to other tasks.
His asyncpg example doesn't make much sense to me. What if there was a config change with a bad password? I would like to know this immediately on startup, else my rolling deploy is going to bring down all the previously well configured instances, and by the time we lazily try to connect to postgres it's too late.
I'm not a big python user, but I do find it kind of surprising there isn't an awaitable and thread safe mutex in the stdlib.
Python is highly opinionated towards single-threadeding, and the infamous GIL makes "true" multi-threading hard. Async and multi-threading has only recently gotten some love at all in the language and stdlib.
I'd say single-threading is the right call for 99% of Python's use-cases and users. C extensions are available and widely used for core functionality that needs to be fast and/or parallel, e.g. large parts of numpy are in the c extension.
Can you clarify what you mean by dependency injection in python? Did you mean a DI framework or something more informal?
I've seen DI frameworks in python but not really used them. At a glance they don't strike me as pythonic. Rolling your own kind of inversion of control can result in unruly "config" or "context" objects that bring difficulties as well.
Indeed. If you want it to be cleaner, a discipline of constructor arguments for injection and __call__ arguments for semantic input is really all you need.
I must say i havnt done any dependency injection in python. Did so in scala and JS though.
So i assume he means initilize it once in program entry and Pass in the pointer to the object that needs it via including it the constructor or a function call.
Do you think There’s a pythonic way of doing these things?
Been coming across lot of these issues. Asyncio requires slightly different thought processes.
As soon as you have an `await` anywhere in the code, you've got to assume that your code will be re-entered. Lots of asyncio.Locks all over the place for me.
Glad people are bringing this up. I had to learn this on my own.
> As soon as you have an `await` anywhere in the code, you've got to assume that your code will be re-entered.
At least the re-entry points are explicitly marked with `await`. IMO that's the main benefit of async-await (stackless coroutines) over stackful coroutines or threads, which allow your code to be suspended and re-entered almost anywhere.
Of course the drawback of async-await is the "function color" issue [0], in which it's difficult for functions that don't suspend to call those which do.
Every time I find something that seems unnecessarily awk in asyncio, I eventually find out there's a good reason. But plenty of things that are written with it aren't using it exactly right.
> Unfortunately this has a serious downside: asyncio locks are associated with the loop where they were created. Since the lock variable is global, maybe_initialize() can only be called from the same loop that loaded the module. asyncio.run() creates a new loop so it’s incompatible.
I work on several async projects, but I never had to use multiple event loops. What are use cases for using multiple event loops?
There may be other use cases, but it can be a useful pattern for mixing async code into a non-async project. In the specific places where using async for some task makes sense, you would just spawn a thread with an event loop, then push work into the new loop from non-async code using run_coroutine_threadsafe.
There is more than one way to make awaitables in asyncio -- at the core, this is about sharing a single future, for which there's a joyfully boring native standard constructor.
For example, when working w/ immutable GPU dataframes to represent our user's datasets, we often get into variants where loading a dataset may take a bit and thus get multiple services requesting it before ETL is done. So, we want to only trigger the parser once per file and have any subsequent calls wait on the first one:
datasets = {}
async def load_once(name):
if not (name in datasets): # sync, many
fut = asyncio.create_future() # sync, once
datasets[name] = fut # sync, once
fut.set_result(await load(name)) # async, once
return await datasets[name] # async, many
Unfortunately, this naive method is buggy, I have had to debug and fix this exact code in production :)
The issue is with exception safety - first, this does not handle exceptions in load() properly, but that is a trivial fix.
The more insidious problem is due to the fact that Python future are cancellable - and exceptions cancel futures.
What this means is that if two callers call load_once() in parallel, and the first caller encounters an exception (eg. from calling something else in parallel), the load() future will be cancelled for _all_ callers (eg. the second one), and will remain in a permanently wedged state.
So load() needs a try/except, and except is either kill process / retry / clear / cancel, and the other loader should also expect an exn depending on that choice. All of that is app/use-case dependent.
Usefully, Futures and async/await natively represent most of these. The case we find missing is around back pressure, retry, etc., but I haven't seen a good lang construct for that. We used to do a lot of Rx for that, and now a lot of HTTP headers/libs when a remote call, but it all feels messy. This is somewhat orthogonal to futures for load once tho.
We have trio somewhere in our stack and it is on the list of to-delete, but mostly as part of continued elimination of maintenance and consistency burdens
I tried skimming that article, but it comes off as a long and hard to read rant, which suggests the author needs to understand their idea better or pick an explanation/analogy that is more direct. Maybe something like the ocap reasoning for promises vs futures, or say the issues of coloring, may help...
How about we just use actors instead? Preemptable actors are the only good concurrency model I've ever come across. Everything else has massive problems
Actors aren’t a panacea either - your logic ends up more spread out. You’re still able to shoot yourself in the foot quite easily too, e.g. when deciding whether to use a “pull” or “push” model for concurrency.
I found async testing in python to be annoying, although i found a couple of libraries to make it nicer (pytest-async and i forget the name of the other).
Async await scales well to codebases with millions of lines and thousands of developers. As a result, large companies and ecosystems have mostly adopted async/await and the tooling and runtimes in those languages is now much more mature.
If you're using cpython since python 3.2, you don't need to lock. You can use `dict.setdefaut` or another similar method that is guaranteed to be atomic.
Something like this should work, no? Only run the coroutine if you won the race.
D = {}
async def maybe_initialize():
global D
this_setup = asyncio.ensure_future(one_time_setup())
actual_setup = D.setdefault('initializer', this_setup)
if this_setup is not actual_setup:
this_setup.cancel()
await actual_setup
return
I think something might be off in your mental model of python coroutines. They work via cooperative multitasking. Atomic operations aren't really necessary or relevant in such a system where you are in complete control of all yield points (i.e., calls to `await`).
If you read the full blog post, the author solves the problem without setdefault, and without having to cancel any futures.
I understand them quite well, thank you.
You don't need setdefault if you only call the initializer from one thread. My code can be made thread safe (as long as you only use one event loop per thread).
It's way easier with a single thread. You don't need a lock or atomic operations.
initializer = None
async def maybe_initialize():
global initializer
if initializer is None:
initializer = asyncio.ensure_future(one_time_setup())
await initializer
return
Maybe it's me not grokking async, but the code seems right to me. `once` is a decorator, so `future` is "instantiated" once per decorated function, which is then accessed as nonlocal in the wrapper. So in the last code sample, every call to `one_time_setup()` is accessing the same `future`.
You have to make it clear you’re speaking of CPU atomics, because whatever atomic operations are trying to solve, for example mainly “read then write” operations - may still easily cause issues within asynchronous code.
RuntimeError: Task <Task pending coro=<maybe_initialize() running at test.py:20> cb=[_run_until_complete_cb() at /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py:158]> got Future <Task pending coro=<one_time_setup() running at test.py:11> cb=[<TaskWakeupMethWrapper object at 0x10cafdeb8>()]> attached to a different loop
This can be a lot simpler. Just set "one_time_setup" to a single instance of the method, and all calls are waiting for the exact same run.
If that doesnt work, then set it to an 'asyncio.event`, and run the one_time_setup "in the background" (create_task), and when its done it marks the event as complete.
Go offers this out of the box via the sync.Once function. Do other languages? Kind of surprised python doesn’t as this sort of pattern is common in applications dealing with concurrency
Erlang has features for this baked in. What's more, if initialization of any subcomponent fails (say one of its dependencies hadn't completed booting yet due to race condition), if the author made it throw, the dependent subcomponent will automatically restart itself and try again. There are also one line strategies for trying again later, etc, so you don't even have to worry about blocking to prevent those race conditions.
> Kind of surprised python doesn’t as this sort of pattern is common in applications dealing with concurrency
Apple’s Grand Central Dispatch concurrency library has dispatch_once [0], which does something similar. It relies on non-standard “block” extensions to C, which are a way of defining lambda functions, and in practice you only see it used in Apple platforms.
lazy init in kotlin and scala is essentially the same thing.
the good thing with go's sync.Once is that it's implemented as a library instead of something in the language itself, so it's easy for curious user to see how it's actually implemented. they even have comments there pointing out wrong implementations, which I have seen people make the exact same mistake during code reviews (in other language).
I would add a note that if you are running in a cluster environment like Kubernetes this won’t work because your containers would be running potentially in different machines. In those scenarios you would need another service just for the locks.
On k8s, for example running multiple parallel jobs that need to initialize only once, It used to work for me the redis redlock (it's around with multiple implementations). The first job takes the lock while initializing, the rest just waits the release, to start working on prepared items by the first.
On asyncio, caches, we used a lock to prevent dogpiling on cache initialization.. prevent multiple tasks cashing the same in parallel.