
Exactly-Once Initialization in Asynchronous Python - ingve
https://nullprogram.com/blog/2020/07/30/
======
alexchamberlain
I'd just call the function once by avoiding the global; construct your
database access object at the start of your asynchronous main method and
dependency inject it to other tasks.

~~~
np_tedious
Can you clarify what you mean by dependency injection in python? Did you mean
a DI framework or something more informal?

I've seen DI frameworks in python but not really used them. At a glance they
don't strike me as pythonic. Rolling your own kind of inversion of control can
result in unruly "config" or "context" objects that bring difficulties as
well.

~~~
fulafel
DI is just a convoluted synonym for passing in arguments.

~~~
alexchamberlain
Indeed. If you want it to be cleaner, a discipline of constructor arguments
for injection and __call__ arguments for semantic input is really all you
need.

------
cheez
Been coming across lot of these issues. Asyncio requires slightly different
thought processes.

As soon as you have an `await` anywhere in the code, you've got to assume that
your code will be re-entered. Lots of asyncio.Locks all over the place for me.

Glad people are bringing this up. I had to learn this on my own.

~~~
pansa2
> _As soon as you have an `await` anywhere in the code, you 've got to assume
> that your code will be re-entered._

At least the re-entry points are explicitly marked with `await`. IMO that's
the main benefit of async-await (stackless coroutines) over stackful
coroutines or threads, which allow your code to be suspended and re-entered
almost anywhere.

Of course the drawback of async-await is the "function color" issue [0], in
which it's difficult for functions that don't suspend to call those which do.

[0] [http://journal.stuffwithstuff.com/2015/02/01/what-color-
is-y...](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-
function/)

~~~
cheez
That's a good perspective.

------
OrangeTux
> Unfortunately this has a serious downside: asyncio locks are associated with
> the loop where they were created. Since the lock variable is global,
> maybe_initialize() can only be called from the same loop that loaded the
> module. asyncio.run() creates a new loop so it’s incompatible.

I work on several async projects, but I never had to use multiple event loops.
What are use cases for using multiple event loops?

~~~
itayperl
There may be other use cases, but it can be a useful pattern for mixing async
code into a non-async project. In the specific places where using async for
some task makes sense, you would just spawn a thread with an event loop, then
push work into the new loop from non-async code using
run_coroutine_threadsafe.

------
lmeyerov
There is more than one way to make awaitables in asyncio -- at the core, this
is about sharing a single future, for which there's a joyfully boring native
standard constructor.

For example, when working w/ immutable GPU dataframes to represent our user's
datasets, we often get into variants where loading a dataset may take a bit
and thus get multiple services requesting it before ETL is done. So, we want
to only trigger the parser once per file and have any subsequent calls wait on
the first one:

    
    
      datasets = {}
      async def load_once(name):
        if not (name in datasets):          # sync,  many
          fut = asyncio.create_future()     # sync,  once
          datasets[name] = fut              # sync,  once
          fut.set_result(await load(name))  # async, once
        return await datasets[name]         # async, many
    

And then throw in an async lru.. :)

~~~
jaen
Unfortunately, this naive method is buggy, I have had to debug and fix this
exact code in production :)

The issue is with exception safety - first, this does not handle exceptions in
load() properly, but that is a trivial fix.

The more insidious problem is due to the fact that Python future are
cancellable - and exceptions cancel futures.

What this means is that if two callers call load_once() in parallel, and the
first caller encounters an exception (eg. from calling something else in
parallel), the load() future will be cancelled for _all_ callers (eg. the
second one), and will remain in a permanently wedged state.

Fixing that is, well, quite a bit more code...

~~~
lmeyerov
Yep, we see the same, good to point out!

So load() needs a try/except, and except is either kill process / retry /
clear / cancel, and the other loader should also expect an exn depending on
that choice. All of that is app/use-case dependent.

Usefully, Futures and async/await natively represent most of these. The case
we find missing is around back pressure, retry, etc., but I haven't seen a
good lang construct for that. We used to do a lot of Rx for that, and now a
lot of HTTP headers/libs when a remote call, but it all feels messy. This is
somewhat orthogonal to futures for load once tho.

~~~
infinite8s
Have you seen trio
([https://trio.readthedocs.io/en/stable/](https://trio.readthedocs.io/en/stable/))
and more generally the notion of structured concurrency?
([https://vorpus.org/blog/notes-on-structured-concurrency-
or-g...](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-
statement-considered-harmful/))

~~~
lmeyerov
We have trio somewhere in our stack and it is on the list of to-delete, but
mostly as part of continued elimination of maintenance and consistency burdens

I tried skimming that article, but it comes off as a long and hard to read
rant, which suggests the author needs to understand their idea better or pick
an explanation/analogy that is more direct. Maybe something like the ocap
reasoning for promises vs futures, or say the issues of coloring, may help...

------
smabie
How about we just use actors instead? Preemptable actors are the only good
concurrency model I've ever come across. Everything else has massive problems

~~~
mgraczyk
Async await scales well to codebases with millions of lines and thousands of
developers. As a result, large companies and ecosystems have mostly adopted
async/await and the tooling and runtimes in those languages is now much more
mature.

~~~
pansa2
> _Async await scales well to codebases with millions of lines_

Interesting - do you think there's a better solution than async-await for
smaller codebases?

------
mgraczyk
If you're using cpython since python 3.2, you don't need to lock. You can use
`dict.setdefaut` or another similar method that is guaranteed to be atomic.

    
    
        initialized = D.setdefault('initialized', True)
        ...

~~~
sicromoft
dict.setdefault doesn’t solve the problem that he’s using the lock for
(atomicity is not the problem).

~~~
mgraczyk
Something like this should work, no? Only run the coroutine if you won the
race.

    
    
        D = {}
        async def maybe_initialize():
            global D
            this_setup = asyncio.ensure_future(one_time_setup())
            actual_setup = D.setdefault('initializer', this_setup)
            if this_setup is not actual_setup:
              this_setup.cancel()
            await actual_setup
            return
    

complete example:
[https://gist.github.com/mgraczyk/e251443bccfe54505e75b652655...](https://gist.github.com/mgraczyk/e251443bccfe54505e75b6526550c113)

~~~
sicromoft
I think something might be off in your mental model of python coroutines. They
work via cooperative multitasking. Atomic operations aren't really necessary
or relevant in such a system where you are in complete control of all yield
points (i.e., calls to `await`).

If you read the full blog post, the author solves the problem without
setdefault, and without having to cancel any futures.

~~~
mgraczyk
I understand them quite well, thank you. You don't need setdefault if you only
call the initializer from one thread. My code can be made thread safe (as long
as you only use one event loop per thread).

It's way easier with a single thread. You don't need a lock or atomic
operations.

    
    
        initializer = None
        async def maybe_initialize():
          global initializer
          if initializer is None:
            initializer = asyncio.ensure_future(one_time_setup())
          await initializer
          return

~~~
jwilk
This is the same as the solution from the article, except that:

\- Chris forgot to declare the variable as global;

\- Chris used asyncio.create_task() instead of asyncio.ensure_future().

~~~
andreareina
Maybe it's me not grokking async, but the code seems right to me. `once` is a
decorator, so `future` is "instantiated" once per decorated function, which is
then accessed as nonlocal in the wrapper. So in the last code sample, every
call to `one_time_setup()` is accessing the same `future`.

------
nhumrich
This can be a lot simpler. Just set "one_time_setup" to a single instance of
the method, and all calls are waiting for the exact same run.

If that doesnt work, then set it to an 'asyncio.event`, and run the
one_time_setup "in the background" (create_task), and when its done it marks
the event as complete.

------
waterside81
Go offers this out of the box via the sync.Once function. Do other languages?
Kind of surprised python doesn’t as this sort of pattern is common in
applications dealing with concurrency

~~~
dnautics
Erlang has features for this baked in. What's more, if initialization of any
subcomponent fails (say one of its dependencies hadn't completed booting yet
due to race condition), if the author made it throw, the dependent
subcomponent will automatically restart itself and try again. There are also
one line strategies for trying again later, etc, so you don't even have to
worry about blocking to prevent those race conditions.

> Kind of surprised python doesn’t as this sort of pattern is common in
> applications dealing with concurrency

Well yeah, python was not designed for that.

------
natch
Trying out the last snippet. What am I doing wrong here? Python 3.7.7

[https://pastebin.com/E9KWCmky](https://pastebin.com/E9KWCmky)

~~~
zeronone
Indentation is wrong in following statement ``` return once_wrapper ```

~~~
natch
Duh, thanks! Now I have a new one but I think it just means the script needs
to stay running longer so the coroutine can have time to do its thing.

    
    
        ./once.py:29: RuntimeWarning: coroutine 'once.
        <locals>.once_wrapper' was never awaited
        one_time_setup()

~~~
prashnts
Your main function needs to be async as well, since the one_time_setup is
async. Linking this SO answer that also links to the docs (may come handy!).

[https://stackoverflow.com/questions/57399157/runtimewarning-...](https://stackoverflow.com/questions/57399157/runtimewarning-
coroutine-main-was-never-awaited)

Edit: Also tangentially relevant is the article linked by pansa2 in another
thread, which was also discussed on HN a few months ago:
[http://journal.stuffwithstuff.com/2015/02/01/what-color-
is-y...](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-
function/)

~~~
natch
Thanks!

------
reedwolf
>"global"

Please write classes, people!

~~~
zbentley
Why? To hide the fact that something is global behind mutation by reference?

"global" is a fine way to do that when you need it. Simple and says what it
means.

------
rburhum
I would add a note that if you are running in a cluster environment like
Kubernetes this won’t work because your containers would be running
potentially in different machines. In those scenarios you would need another
service just for the locks.

~~~
jordic
On k8s, for example running multiple parallel jobs that need to initialize
only once, It used to work for me the redis redlock (it's around with multiple
implementations). The first job takes the lock while initializing, the rest
just waits the release, to start working on prepared items by the first. On
asyncio, caches, we used a lock to prevent dogpiling on cache initialization..
prevent multiple tasks cashing the same in parallel.

