A really good way to learn about concurrency is to write a SOCKS5 proxy. Start with a simple one using Python threads, then Python asyncio, and if you're brave C sockets with poll/select. The C parts are in Beej's guide to network programming. The SOCKS5 spec is very simple and has an RFC.
The GIL is not there to prevent data corruption on shared objects - it only protects the interpreter's internal state. The fact that sometimes you can get away with it is an accident of the GIL's implementation, not a feature anyone should rely on. It also means that you cannot rely on that behavior not to change on a successive versions of CPython.
The only safe way to do shared state between threads in CPython is locks/mutexes/message passing/etc. Even things like a simple addition to an integer are absolutely not made thread safe by the GIL.
To say concurrency is broken in Python because of the GIL seems to be reiterated adage without actual wisdom behind it.[1]
Are you writing an IO-bound task? Then asyncio[2] (even if just viewed as an interface: e.g., you can make it even more performant by using uvloop[3]) is a great way to achieve performant concurrency. As Raymond Hettinger likes to point out, even if you throw away the GIL, you will have to implement locks on shared resources (unless you're writing embarrassingly parallel codes--in which case either you can invoke them separately or use the multiprocessing module to great effect, or see how I do it easily below), and by the time you get all the locks implemented in your GIL-less world (locks which are error-prone), you've given up the performance you thought you'd get due to throwing away the GIL.
Are you writing a CPU-bound task? I've only encountered this in scientific computing, as I'm a scientist, and for this (as another commenter mentions) I either use numpy[4] (strictly within a numpy call, the GIL is not held) or numba[5] (by default the entirety of your numba-fied functions do not hold the GIL). Therefore, I can either use Python threads or asyncio or Numba's built-in parallelism[6,7] (summary: some numpy routines for working with arrays of values are auto-parallized if called within a numba-fied function, and you can explicitly parallelize your own loops via `for i in numba.prange(N):`; in any case, then decorate the function with `@numba.njit(parallel=True)`) to run these numerical routines in parallel. I easily get 100% of all my CPU's threads doing work this way.
I have parallelized numba-fied functions both via numba.prange and via python threads (yes, threads) in the past few weeks to speed up 20-60 minute-long-running codes, and even though I've done this before, I was still very pleasantly surprised at how easy it was, and I got very nearly a 16x speedup on my 16-thread laptop, which made my code practical for interactive data exploration and building up simulations to help me understand some things I was observing, which is where Python really shines for me.
Also note that writing a routine in numba provides a straightforward path to putting your numerical codes on GPU[8,9], too, if the that model makes sense for your algorithm (though I've primarily used PyCUDA for this, which worked well).
Are you writing something that needs to use distributed (e.g., cluster) based resources? Then dask (dask.distributed) is a great way to go, with a similar interface to the Python standard-library `futures`. Runs all the same code and you get the same forms of parallelism, but distributed across computers.
In any case, do you get to FORTRAN or C speeds? Probably not, but close enough, and with the benefits (and costs) of using Python code.
This has been discussed plenty of times and it basically boils down to "just get f*cked and work around it". In my case, I just use different programming languages for performance sensitive tasks.
But it's amazing how people can come up with all sorts of justifications for a broken system that they happen to love. And then people wonder how there are people who can support Trump :D.
To make this less personal and more specific for someone not enlightened on the topic, please point out how I am incorrect, specifically, or at least provide a source or point to such a discussion that shows how I am incorrect.
I am not a professional programmer, but I get a lot of work done very productively in Python (or at least I thought I was being productive; in light of how you speak to me, it sounds like I'm actually not being as productive as I thought I was, rather I'm in some world believing fake insinuations that what I'm doing is being productive...).
* Threads are useful not just for speeding up, but also for parallel execution (storege/network io, graphics, ...)
* If you really care about speed and would like to parallelize CPU-bound code, you'll probably do better using another language like C/C++ (or Java, Go, D etc.) for this particular task
* In general, threads are used instead of processes because of the overhead involved in creating the latter
* Python multiprocessing module isn't the most efficient anyway, for example when passing large objects between processes because of the way they're serialized
* It also has some other quirks so I would definitely not advise to use it for "everything".