> If you ever doubt it, go check out Google Takeout. You'll be shocked at the amount of data you see there.
I sign in browser-wide and I do takeouts regularly. I don't see my browsing data.
> It doesn't ship with most essential apps, including a Phone app. In previous versions of Android, all of these were a part of AOSP.
And back when they were part of AOSP I never saw these example apps in the wild. Every vendor ships their own phone app. Every single one.
There's some "hey we compile a extremely old and vulnerable version of AOSP"-style Android distributions, mainly advertised for builtin su/Magisk or "degoogle", which did use these example apps, though.
That's unfalsifiable conjecture. I could just as easily assert that dang is building secret dossiers on all of us from our IP-request logs - we just can't see it
> It doesn't matter that you can cryptographically verify that a package came from a given commit if that commit has accidentally-vulnerable code, or someone just gets phished.
If that commit has accidentally-vulnerable code, or someone just gets phished and attacker added some malicious code to the repository with his creds, it is visible.
However, if the supply chain was secretly compromised and the VCS repo was always clean, only the release contains malware, then good luck finding it out.
We've all witnessed this earlier this year, in the xz accident, while the (encrypted) malicious code was presented in the source code repo as part of test data, the code to load and decrypt it only ever existed in release tarballs.
People have different definition of what "freely iterate to build new things" means. For me, having a binary only does not prevent me from doing so.
For example, Minecraft was never distributed with source code, it was binary-only from day one. But the modding community would hard disagree with you if you say there was no way to "freely iterate to build new things", probably in GenZ term, "skill issue" :p
> Python is never really going to be 'fast' no matter what is done to it because its semantics make most important optimizations impossible
Scientific computing community have a bunch of code calling numpy or whatever stuff. They are pretty fast because, well, numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.
Okay, you may ask, why not just use a lot of processes and message-passing? That's how historically people work around the GIL issue. However, you need to either swallow the cost of serializing data over and over again (pickle is quite slow, even it's not, it's wasting precious memory bandwidth), or do very complicated dance with shared memory.
It's not for web app bois, who may just write TypeScript.
This is misleading. Most of the compute intensive work in Numpy releases the GIL, and you can use traditional multithreading. That is the case for many other compute intensive compiled extensions as well.
No. Like the siblings said, say, you have a program which spends 10% time in Python code between these numpy calls. The code is still not scalable, because you can run at most 10 such threads in a Python process before you hit the hard limit imposed by GIL.
There is no need to eliminate the 10% or make it 5% or whatever, people happily pay 10% overhead for convenience, but being limited to 10 threads is a showstopper.
Python has no threads or processes hard limit, and the pure Python code in between calls into C extensions is irrelevant because you would not apply multithreading to it. Even if you did, the optimal number of threads would vary based on workload and compute. No idea where you got 10.
Because of GIL, there may be at most one thread in a Python process running pure Python code. If I have a computation which takes 10% time in pure Python, 90% time in C extensions, I can only launch at most 10 threads, because 10 * 10% = 100%, and expect mostly linear scalability.
> the pure Python code in between calls into C extensions is irrelevant because you would not apply multithreading to it
No. There is a very important use case where the entire computation, driven by Python, is embarrassingly parallel and you'd want to parallize that, instead of having internal parallization in each your C extensions call. So the pure Python code in between calls into C extensions MUST BE SCALABLE. C extensions code may not launch thread at all.
This is your original comment, which as stated is simply incorrect
> numpy isn't written in Python. However, there is a scalability issue: they can only drive so many threads (not 1, but not many) in a process due to GIL.
Now you have concocted this arbitrary example of why you can't use multithreading that has nothing to do with your original comment or my response.
> instead of having internal parallelization in each your C extensions call ... C extensions code may not launch thread at all.
I don't think you understood my comment - or maybe you don't understand Python multithreading. If a C extension is single threaded but releases the GIL, you can use multithreading to parallelize it in Python. e.g. `ThreadPool(processes=100)` will create 100 threads within the current Python process and it will soak all the CPUs you have -- without additional Python processes. I have done this many times with numpy, numba, vector indexes, etc.
Even for your workload, using multithreading for the GIL-free code in a hierarchical parallelization scheme would be far more efficient than naive multiprocessing.
> This is your original comment, which as stated is simply incorrect
I apologize if I can't make you understand what I said. But I still believe I said it clearly and it is simply correct.
Anyway, let me try to mansplain it again, "numpy isn't written in Python" - and numpy releases GIL. So as long as a Python code calls into numpy, the thread on which the numpy call runs can go without GIL. The GIL could be taken by another thread to run Python code. I have easily have 100 threads running in numpy without any GIL issue. BUT, they eventually needs to return to Python and retake GIL. Say, for each such thread and for every 1 second there is 0.1 seconds they need to run pure Python code (and must hold GIL). Please tell me how to scale this to >10 threads.
> Now you have concocted this arbitrary example of why you can't use multithreading that has nothing to do with your original comment or my response.
The example is not arbitrary at all. This is exactly the problem people are facing TODAY in ANY DL training written in PyTorch.
I have a Python thread, driving GPUs in an async way, it barely runs any Python code, all good. No problem.
Then, I need to load and preprocess data [1] for the GPUs to consume. I need very high velocity changing this code, so it looks like a stupid script, reads data from storage, and then does some transformation using numpy / again whatever shit I decided to call. Unfortunately, as dealing with the data is largely where magic happens in today's DL, the code spend non-trivial time (say, 10%) in pure Python code manipulating bullshit dicts in between all numpy calls.
Compared to what happens on GPUs this is pretty lightweight, this is not latency sensitive and I just need good enough throughput, there's always enough CPU cores alongside GPUs, so, ideally, I just dial up the concurrency. And then I hit the GIL wall.
> I don't think you understood my comment - or maybe you don't understand Python multithreading. If a C extension is single threaded but releases the GIL, you can use multithreading to parallelize it in Python. e.g. `ThreadPool(processes=100)` will create 100 threads within the current Python process and it will soak all the CPUs you have -- without additional Python processes. I have done this many times with numpy, numba, vector indexes, etc.
I don't think you understood what you did before and you already wasted a lot of CPUs.
I never talked about doing any SINGLE computation. Maybe you are one of those HPC gurus who care and only care about solving single very big problem instances? Otherwise I have no idea why you are even talking about hierarchical parallelization after I already said that a lot of problem is embarrassingly parallel and they are important.
[1] Why don't I simply do it once and store the result? Because that's where actual research is happening and is what a lot of experiments are about. Yeah, not model architectures, not hyper-parameters. Just how you massage your data.
Your original comment as written claims that numpy cannot scale using threads because of the GIL. You admit that is wrong, but somehow can't read your comment back and understand that it says that. What you really meant was that combinations of pure Python and numpy don't scale trivially using threads, which is true but not what you wrote. You were actually just thinking of your PyTorch specific use case, which you evidently haven't figured out how to scale properly, and oversimplified a complaint about it.
> I don't think you understood what you did before and you already wasted a lot of CPUs.
No CPUs were wasted lol. You are clearly confused about how threads and processes in Python work. You also don't seem to understand hierarchical parallelization, which is simply a pattern that works well in cases where you can better maximize parallelism using combination of processes and threads.
There are probably better ways to address your preprocessing problem, but I get the impression you're one of those people only incidentally using Python out of necessity to run PyTorch jobs and frustrated or haven't yet come to the realization that you need to learn how to optimize your Python compute workload because PyTorch doesn't do everything for you automatically.
It’s an Amdahl’s law sort of thing, you can extract some of the parallelism with scikit-learn but what’s left is serialized. Particularly for those interactive jobs where you might write plain ordinary Python snippets that could get a 12x speedup (string parsing for a small ‘data lake’)
In so far as it is all threaded for C and Python you can parallelize it all with one paradigm that also makes a mean dynamic web server.
Numpy is not fast enough for actual performance sensitive scientific computing. Yes threading can help, but at the end of the day the single threaded perf isn't where it needs to be, and is held back too much by the python glue between Numpy calls. This makes interproceedural optimizations impossible.
Accellerated sub-languages like Numba, Jax, Pytorch, etc. or just whole new languages are really the only way forward here unless massive semantic changes are made to Python.
These "accelerated sub-languages" are still driven by, well, Python glue. That's why we need free-threading and faster Python. We want the glue to be faster because it's currently the most accessible glue to the community.
In fact, Sam, the man behind free-threading, works on PyTorch. From my understanding he decided to explore nogil because GIL is holding DL trainings written in PyTorch back. Namely, the PyTorch DataLoader code itself and almost all data loading pipelines in real training codebases are hopeless bloody mess just because all of the IPC/SHM nonsense.
What you are talking about sounds like claiming that the US is not a country built by immigrants because there are Indians.
I can understand why you insist on such ideas if you are, well, the so-called "Taiwanese indigenous peoples". If you are not and your family was largely brought to Taiwan in 1940s with KMT then idk, maybe think harder?
Either way, I don't believe it's wise to continue the talk as your view - The only connection between Taiwan and whatever definition of China was, Taiwan was occupied by Qing, and then KMT - isn't popular or accepted at all outside Taiwan. Not even in the anglo-sphere. Citing a lot of facts while conveniently leaving out others does not help, too.
CUDA is not merely software control. Let's face it: it's tied to their architecture and evolves with their architecture. Ignore whatever patent and software copyright you are talking about, they are simply the best at implementing this programming model.
If you port CUDA over and want high performance, you must build very similar GPUs. And can you beat NVIDIA on building their own GPU architecture without much space for innovation?
And yes, this does mean that NVIDIA themselves is also facing increasingly absurd constraints and after a few generations CUDA as a programming model may not be sustainable any more.
I've always been running a headless Linux VM wherever I do any dev work, Windows or Mac. Works very well.
Glad to see that WSL2 and OrbStack promotes this workflow.
For now my only complaint is I prefer full-blown VMs instead of WSL-like setup (so that I can work with the same Linux kernel my code targets), but client virtualization support on Apple platforms are pretty meh, definitely not at Hyper-V level yet. OrbStack went out of their way to fix a lot of these but they are not interested in making a VM product.
I'd add that a lot of the described advantages come from culture. For web applications manual memory management is 100% a friction instead of a relief. But the culture in Rust community in general, at least for the past ten years or so, is to encourage a coding style with inherently fewer bugs and more reusable, maintainable code, to the point of consistently want something to not happen if they weren't sure they got it right (one may argue that this is counter-production short-term).
It is this culture thing makes adopting Rust for web apps worthwhile - it counters the drawback of manual memory management.
If you hire an engineer already familiar with Rust you are sure you get someone who is sane. If you onboard someone with no Rust background you can be pretty sure that they are going to learn the right way (tm) to do everything, or fail to make any meaningful contribution, instead of becoming a -10x engineer.
If you work in a place with a healthy engineering culture, trains people well, with good infra, it doesn't really matter, you may as well use C++. But for us not so lucky, Rust helps a lot, and it is not about memory safety, at all.
I haven’t worked at a place that checks the above boxes for making C++ a great choice for bulletproof code. There seems to be large variation in C++ styles and quality across projects. But it seems to me that for orgs that indeed do C++ well, thanks to the supporting aspects above, moving to Rust might make things even smoother.
I sign in browser-wide and I do takeouts regularly. I don't see my browsing data.
> It doesn't ship with most essential apps, including a Phone app. In previous versions of Android, all of these were a part of AOSP.
And back when they were part of AOSP I never saw these example apps in the wild. Every vendor ships their own phone app. Every single one.
There's some "hey we compile a extremely old and vulnerable version of AOSP"-style Android distributions, mainly advertised for builtin su/Magisk or "degoogle", which did use these example apps, though.
I agree with other critics, they are toxic.
reply