Hacker News new | past | comments | ask | show | jobs | submit | rnmmrnm's comments login

reminds me how the other day I circumvented disabled SSH port forwarding by using https://github.com/nwtgck/yamux-cli + ncat already on the machine to do something like: yamux -l lport | ssh server -- <script: ncat --proxy-type=http -klp proxyport & and yamux localhost proxyport>.sh

good thing I needed HTTP access or else i had to find that socks5 server that's actually working again.


I've been trying to read up on TCP/UDP multiplexing and I'm having a hard time understanding what the use cases are, or advantages over just using regular connections, sockets, and application logic.

Do you have a good resource recommendation to learn this more?


Well here I tunneled multiple TCP connections traffic over a single input/output streams (the SSH shell IO stream).

Basically when you got a single communication channel and it's expensive to create more channels to tunnel multiple TCP streams where each connection get's its on channel, this is when you reach for the muxer.

i.e. here I probably could make it without yamux if instead I used SSH control port(socket? master?) and created many shell IO streams. but using yamux makes it cleaner. Also the server I'm working on is very buggy and you really get like 50% success rate logging in.

I suggest you download this `yamux` utility and play around with it really :)


Nice! It's so good when you can cobble together various tools to achieve something like that.


I also need elaboration on this. I presumed calling lua from c can be used to implement user-defined callbacks. Launching a thread for lua + locking seems even slower?


pretty fun though.


I didn't finish yet but I was thinking he's talking about memorizing the algorithm to calculate sines in your head.


A bit off topic: I just wrote a bunch of lua C debug api code (lua_sethook) to pre-emptively run and context-switch multiple lua "coroutines" (well not so "co").

Is this library offering a lua implementation more well-designed for this use-case? I got all this code to unload the coroutine stack to store and and reload it before continuing it later. Does having C bindings to this library makes sense?


I think yes?

My understanding is that the "stackless" concept here means that it does not store its execution state in the "C runtime stack" (or Rust, in this case).

So, there is some blob of memory describing that info, managed by Piccolo, rather than it residing on the "real" OS execution stack.

In particular, for call chains like: Lua -> C -> Lua -> C (or so), it is normally hard to save/restore the "C" parts of that chain, since you need to peek into the not-normally-accessible and platform-specific C runtime stack details. I wonder: how are you doing it in your system?

In Piccolo, I imagine it would be easier, since even the "-> C ->" portions of the stack are being managed by Piccolo, not the base C runtime. I don't know what the APIs to access that stuff actually look like, though.

Aside: have you looked at Pluto/Eris for Lua serialization of coroutines?

----

EDIT Yes, it seems like the section The "Big Lie" is right up your alley (:


Well the only thing that's really itching me is the fact that the whole lua debug.* section is documented as

>You should exert care when using this library. Several of its functions violate basic assumptions about Lua code (e.g., that variables local to a function cannot be accessed from outside; that userdata metatables cannot be changed by Lua code; that Lua programs do not crash) and therefore can compromise otherwise secure code (from the manual)

kinda creeps me out.


If you find a debugger that can't ruin a function's local variables then that debugger sucks.


obviously, but i'm not so keen to turn it into a pre-emptive scheduler either lol.


reading beyond the first paragraphs, I see this practically what he advocates to do :P


She, not he. kyren is a woman.


Terribly sorry!


sane error handling in go is more productive any day.


Calling if boilerplate sane is an oxymoron.


tbh only like 50% of my `if err != nil { return err }` are mindless. rest of the time, the fact that error handling is explicit and in my face has helped me alot during my time with go.


the day the go codebase throws random panics is the day I quit the company.


So you quit the day encoding/json was written?


you wrap those properly. believe me waking up in the middle of the night because someone abused panic to throw state around, end up being called from a new goroutine, just enough to crash the entire pod, sucks even more.


You are probably thinking about (proto)reflect.


No, I am thinking of encoding/json. It uses Go's exception handlers to pass errors around, much like the code above.


Forgive me as I'm not experienced in Go. I had a look at the API reference for encoding/json[1] and performed a (very hasty) search on the source code[2].

The API reference doesn't state that panics are a part of the expected error interface, and the source code references seem to be using panics as a way to validate invariants. Is that what you're referring to?

I'm not entirely sure if the panics are _just_ for sanity, or if input truly can cause a panic. If it's the latter, then I agree - yikes.

[1] - https://pkg.go.dev/encoding/json

[2] - https://cs.opensource.google/search?q=panic&sq=&ss=go%2Fgo:s...


The use does not transcend package boundaries, but you will find its use within the package itself, which is within the Go codebase spoken of earlier.


And he didn't even get into how mac docking stations fucking suck and full of graphics bugs (everybody in my company actually turned off chrome GPU accel. because the display sometimes went crazy while rendering.)


my company was too scared to roll X509 signing code on their own so here we are.


More than seeing it in main, I'm happy for the "python thread slow" meme officially going away now.


I doubt that Python will ditch the meme. The fundamental model of dynamic dispatch using dictionaries on top of a byte code interpreter is pretty slow. I wouldn't expect it to get within 2x of JavaScript.


Javascript may not have an official bytecode, but is it not also based on the same concept of using dictionaries to dispatch code and slow as a result? I certainly had always filed it away as "about as fast as python" in my head. Why else would it rely on evented i/o?


You are correct, but they have (1) all of the money in the world as the fundamental programming language of the Internet and as a result (2) they have a state of the art tiered JIT for dynamic languages. The blood of countless PhD students flows through v8. I don't know if python will get the same treatment.


Ok but given enough cores even python code will run into memory bandwidth problems rather than be bottlenecked by memory latency.


I still hear the "java slow" meme from time to time... Memes are slow to die, sadly. Some people just won't catch on with the fact that java has had just-in-time compilation for like 15 years now (it was one of the first major platforms to get that), has had a fully concurrent garbage collector for a number of releases (zgc since java 11) and can be slimmed down a lot (jlink).

I work on low-latency stuff and we routinely get server-side latencies in the order of single to low double-digit microseconds of latency.

If python ever becomes fully concurrent (python threads being free of any kind of GIL) we'll see the "python slow" meme for a number of years... Also doesn't help that python gets updated very very slowly in the industry (although things are getting better).


I think java being slow has less to do with the implementation (which is pretty good) and more to do with the culture of overengineering (including in the standard library). Everything creates objects (which the JIT cannot fully eliminate, escape analysis is not magic), cache usage is abysmal. Framework writers do their best to defeat the compiler by abusing reflection. And all these abstractions are far from zero cost, which is why even the JDK has to have hardcoded special cases for Streams of primitives and ByteBuffers.

Of course, if you have a simple fastpath you can make it fast in any language with a JIT, latency is also generally not an issue anymore, credit where credit is due - java GCs are light years ahead of everything else.

Regarding jlink - my main complaint is that everything requires java.base which already is 175M. And thats not counting the VM, etc. But I don't actively work with java anymore so please correct me if there is a way to get smaller images.


I feel Java deserves better. When Python finally gets true thread concurrency, JIT (mamba and the like), comprehensive static analysis (type hints), and some sophisticated GC, and better performance, people will realise Java have had them all this time.


GraalVM is a pretty magical tool


Well, technically it still won't be able to use the full power of threads in many situations because (I assume) it doesn't have shared memory. It'll presumably be like Web Workers / isolates, so Go, C++, Rust, Zig, etc. will still have a fundamental advantage for most applications even ignoring Python's inherent slowness.

Probably the right design though.


Why would you think it's not shared memory? Maybe I'm wrong here but by default Python's existing threading implementation uses shared memory.

AFAIK we're just talking about removing the global interpreter lock. I'm pretty sure the threading library uses system threads. So running without the GIL means actual parallelism across system threads with shared memory access.


Yeah I think you're right actually. Seems like they do per-object locking instead.


I wish I had your optimism. Thoughtless bandwagon-y "criticism" is extraordinarily persistent.


There's no need to pretend Python has virtues which it lacks. It's not a fast language. It's fast enough for many purposes, sure, but it isn't fast, and this work is unlikely to change that. Faster, sure, and that's great.


Although true, it doesn't mean they can't improve its performance.

Working with threads is a pain in Python. If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.

Removing the GIL and refactoring some of the core will unlock levels of concurrency that are currently not feasible with Python. And that's a great deal, in my opinion. Well worth the trouble they're going through.


Working with threads is a pain regardless of which language you use.

Some might say: "Use Go!" Alas: https://songlh.github.io/paper/go-study.pdf

After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

Where it doesn't work is in a generic worker pool where you need to put mutex locks around everything -- and then prod randomly deadlocks in ways the developer boxes can't recreate.


> After a couple decades of coding, I can say that threading is better if it's tightly controlled, limited to usages of tight parallelism of an algorithm.

This may be a case of violent agreement, but there are a few clear cases where multithreading is easily viable. The best case is some sort of parallel-for construct, even if you include parallel reductions, although there may need to be some smarts around how to do the reduction (e.g., different methods for reduce-within-thread versus reduce-across-thread). You can extend this to heterogeneous parallel computations, a general, structured fork-join form of concurrency. But in both cases, you essentially have to forbid inter-thread communication between the fork and the join parameters. There's another case you might be able to make work, where you have a thread act as an internal server that runs all requests to completion before attempting to take on more work.

What the paper you link to is pointing out, in short, is that message passing doesn't necessarily free you from the burden of shared-mutable-state-is-bad concurrency. The underlying problem is largely that communication between different threads (or even tasks within a thread) can only safely occur at a limited number of safe slots, and any communication outside of that is risky, be it an atomic RMW access, a mutex lock, or waiting on a message in a channel.


> Working with threads is a pain regardless of which language you use.

That's not true at all. F#, Elixir, Erlang, LabVIEW, and several other languages make it very easy. Python makes it incredibly tough.


> Python makes it incredibly tough.

I disagree, Python makes it incredibly easy to work with threads in many different ways. It just doesn't make threads faster.


In what way? Threading, asyncio, tasks, event loops, multiprocessing, etc. are all complicated and interact poorly if at all. In other languages, these are effectively the same thing, lighter weight, and actually use multicore.

If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.


> If I launch 50 threads with run away while loops in Python, it takes minutes to laumch and barely works after. I can run hundreds of thousands and even millions of runaway processes in Elixir/Erlang that launch very fast and processes keep chugging along just fine.

I'm not sure that argument helps your position on threading. I once saw a java program spin off 3000 threads doing god knows what. Debugging the fucking thing was impossible.


The point there is that processes in Elixir and Erlang are effectively like functions, in that you do not need to "manage" them in any sort of way. They are automatically distributed across all cores, pre-emptively scheduled, killable, have a built-in inbox, etc. One doesn't need to worry about what concurrency library to use nor manually create mailboxes using queues or whatever else. It just works, and you fire them off to do whatever you need. So there is no ceremony. Threads in many other languages and in Python in particular, require a huge amount of ceremony and management.


> require a huge amount of ceremony and management

I think Java made it quite easy to spin off threads, and again, it doesn't help the argument. It just made the f'ing thing worse. Race conditions are still f'ing hard to solve. Particularly when a shared-mutable-state exists outside of the program.


The whole purpose of threads is to improve overall speed of execution. Unless you're working with a very small number of threads (single digits), that's a very hard to achieve goal in Python. I wouldn't count this as easy to use. It's easy to program, yes, but not easy to get working with reasonably acceptable performance.


And the python people would just point to multiprocessing...which works pretty well.


Which has its own set of challenges and yet another implementation of queue.


Yes, but the shared-mutable-state issue goes away.


It's not such a big pain in every language. And certainly not as hard to get working with acceptable performance in many languages.

Even if you have zero shared resources, zero mutexes, no communication whatsoever between threads, it's a huge pain in Python if you need +10-ish threads going. And many times the GIL is the bottleneck.


This is where Python's GIL bit me: I was more than familiar with how to shoot myself in the foot using threads in other languages, and careful to avoid those traps. Threads spun up only in situations where they had their own work to do and well-defined conditions for how both failure and success would be reported back to the thread that requested it, along with a pool that wouldn't exceed available resources.

Like every other language I've used this approach with, nothing bad happened - the program ran as expected and produced correct results. Unlike every other language, spreading calculations across multiple cores didn't appreciably improve performance. In some cases, it got slower.

Eventually scrapped it all, and went with an approach closer to what I'd have done with C and fork() decades ago... Which, to Python's credit, was fairly painless and worked well. But it caught me off-guard, because with asyncio for IO-bound stuff, it didn't seem like threads really have much of a purpose in Python, other than to be a tripwire for unwary and overconfident folks like myself!


Not disagreeing. The only case for threading in python is for spinning something to handle IO.

But now with async even that goes away.


Concurrency with rayon in Rust isn't pain, I'd say. It's basically hidden away from the user.


> If you want to spawn +10-20 threads in a process, it can quickly become way slower than running a single thread.

as you know thats mostly threads in general. Any optimisation has a drawback so you need to choose wisely.

I once made a horror of a thing that synced S3 with another S3, but not quite object store. I needed to move millions of files, but on the S3 like store every metadata operation took 3 seconds.

So I started with async (pro tip: its never a good idea to use async. its basically gotos with two dimensions of surprise: 1 when the function returns, 2 when you get an exception ) I then moved to threads, which got a tiny bit extra performance, but much easier debugability. Then I moved to multiprocess pools of threads (fuck yeah super fast) but then I started hitting network IO limits.

So then I busted out to airflow like system with operators spawning 10 processes with 500 threads.

it wasnt very memory efficient, but it moved many thousands of files a second.


This is entirely fair, and I wish I'd been a little less grumpy in my initial reply (I assign some blame to just getting over an illness). Thank you for the gentle correction!

That said - I think it's fair to be irritated by people who write Python off as entirely useless because it is not _the fastest_ language. As you rightly say - it's fast enough for many purposes. It does bother me to see Python immediately counted out of discussions because of its speed when the app in question is extremely insensitive to speed.


It’s all about values.

I have been on teams where Python based approaches were discounted due to “speed” and “industry best practice” and then had the very same engineers create programs that are slow by design in a “fast” language and introduce needless complexity (and bugs) through “faster” database processes.

Like you said, it’s the thoughtless criticism. The meme. I am happy for Python to lose in a design analysis because it’s too slow for what we are building; I am loathe to let it lose because whoever is doing the analysis with me has heard it’s slow.

Which is to say, I get what you’re saying. I think people have been a little ungenerous with your comment.


> I think people have been a little ungenerous with your comment.

Eh - I engaged with a fraught topic in a snarky way without clarifying that I meant the unintuitive-but-technically-literally-accurate interpretation of my words. Maybe some people have been less-generous than they could have been, but I don't begrudge it - if I look sufficiently like a troll, I won't complain when I get treated like one. Not everyone has the time and mental fortitude to treat everyone online with infinite patience and kindness - I know I sure don't.

Thank you for the support, though!


In some ways the weakness even was a virtue. Because Python threads are slow Python has incredible toolsets for multiprocess communication, task queues, job systems, etc.


"Faster, sure" seems unnecessarily dismissive. That's the whole point of all this work.


Maybe it'll shut up "architects" who hack up a toy example in <new fast language hotness>, drop it on a team to add all the actual features, tests, deployment strategy, and maintain, and fly away to swoop and poop on someone else. Gee thanks for your insight; this API serves maybe 1 request a second, tops. Glad we optimized for SPEEEEEED of service over speed of development.


You seem to be implying that there is something inherently slow to Python. What?

This topic is an example: a detail of one particular implementation, since GIL is definitely not inherent to the language. Just the usual worry about looseness of types?


There are worse hills to die on than this. But the Python ecosystem is very slow. It's a cultural thing.

The biggest impact would be completely redoing package discovery. Not in some straightforward sense of "what if PyPi showed you a Performance Measurement?" No, that's symptomatic of the same problem: harebrained and simplistic stuff for the masses.

But who's going to get rid of PyPi? Conda tried and it sucks, it doesn't change anything fundamental, they're too small and poor to matter.

Meta should run its own package index and focus on setuptools. This is a decision PyTorch has already taken, maybe the most exciting package in Python today, and for all the headaches that decision causes, look: torch "won," it is high performance Python with a vibrant high performance ecosystem.

These same problems exist in NPM too. It isn't an engineering or language problem. Poetry and Conda are not solutions, they're symptoms. There are already too many ideas. The ecosystem already has too much manic energy spread way too thinly.

Golang has "fixed" this problem as well as it could for non-commercial communities.


The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.

Or did you mean to say the "Python language"?


> The "Python ecosystem" includes packages like numpy, pytorch & derivatives which are responsible for a large chunk of HPC and research computing nowadays.

The "& derivatives" part is the problem! Torch does not have derivatives. It won. You just use it and its extensions, and you're done. That is what people use to do exciting stuff in Python.

It's the manic developers writing manic derivatives that make the Python ecosystem shitty. I mean I hate ragging on those guys, because they're really nice people who care a lot about X, but if only they could focus all their energy to work together! Python has like 20 ideas for accelerated computing. They all abruptly stopped mattering because of Torch. If the numba and numpy and scikit-learn and polars and pandas and... all those people, if they would focus on working on one package together, instead of reinventing the same thing over and over again - high level cross compilers or an HPC DSL or whatever, the ecosystem would be so much nicer and performance would be better.

This idea that it's a million little ideas incubating and flourishing, it's cheerful and aesthetically pleasing but it isn't the truth. CUDA has been around for a long time, and it was obviously the fastest per dollar & watt HPC approach throughout its whole lifetime, so most of those little flourishing ideas were DOA. They should have all focused on Torch from the beginning instead of getting caught up in little manic compiler projects. We have enough compilers and languages and DSLs. I don't want another DataFrame DSL!

I see this in new, influential Python projects made even now, in 2024. Library authors are always, constantly, reinventing the wheel because the development is driven by one person's manic energy more than anything else. Just go on GitHub and look how many packages are written by one person. GitHub & Git, PyPi are just not adequate ways to coordinate the energies of these manic developers on a single valuable task. They don't merge PRs, they stake out pleasing names on PyPi, and they complain relentlessly about other people's stuff. It's NIH syndrome on the 1m+ repository scale.


yeah. like xkcd 927 to the nth degree.


CPython is slow. That's not really something you can dispute.

It is a non-optimizing bytecode interpreter and it makes no use of JIT compilation.

JavaScript with V8 or any other modern JIT JS engine runs circles around it.

Go, Java, and C# are an order of magnitude faster but they have type systems that make optimizing compilation much easier.

There's no language-inherent reason why Python can't be at least as fast as JavaScript.


I've read that it can't even be as fast as JS, because everything is monkey-patchable at runtime. Maybe they can optimize for that when it doesn't happen, but remains to be seen.


I've heard similar claims but I don't think it's true.

JavaScript is just as monkey-patchable. You can reassign class methods at runtime. You can even reassign an object's prototype.

Existing Python JIT runtimes and compilers are already pretty fast.


Python is probably much more monkey patchable. Almost any monkey patching that JavaScript supports also works in Python (e.g. modifying class prototype = assigning class methods), but there are a few things that only Python can do: accessing local variables as dict, access other stack frames, modifying function bytecode, read/write closure variables, patching builtins can change how the language works (__import__, __build_class__). Many of them can make a language hard to optimize.


You can always use optimistic optimization strategies where you profile the fast path and optimize that. When someone does something slow, you tell them to stop doing it if they want better performance.


JavaScript doesn't have to contend with a plethora of native extensions (which, to be fair, are generally a workaround for python slowness).


JavaScript, at least on the Node.JS side, make plenty use of native extensions written in C++ https://nodejs.org/api/addons.html

In any case, that should be irrelevant to getting a reasonably performant JIT running. Lots of AOT and JIT compiled languages have robust FFI functionality.

The native extensions are more relevant when we talk about removing the GIL, since lots of Python code may call into non thread safe C extension code.


Python is inherently slow. That’s why people tend to rewrite bits that need high performance in C/C++. Removing the GIL is a massively welcome change, but it isn’t going to make C extensions go away.


It isn't thoughtless. I'm working in Python after having come from more designed languages, and concurrency in Python is an absolute nightmare. It feels like using a language from the 60s. An effectively single threaded language in 2024! That's really astonishing.


If your criticism isn't thoughtless, then that's not what I'm complaining about. Specifically, I'm annoyed about people who _just_ say "Python isn't fast enough, therefore it's not suitable to our use-case", when their use-case doesn't require significant speed or concurrency. If you thoughtfully discount Python as being unsuitable for a use-case that it's _actually_ unsuitable for, then good luck to you!


Python has been too often just a -bit- too slow for my use cases; the ability to throw a few cores at problems more easily is not going to eliminate this criticism from me but it's sure going to diminish it by a large factor.


most software doesnt need multi threading. most times people cry about pythons performance then write trivial shit programs that take milliseconds to run in python as well


Nearly every time I've interactive with Python, its execution speed is absolutely an issue.


Please do give an example.

I see is people crying how python is slow and then use a proper fast programming language to write code that gets executed so few times that even if python was 100x slower it wouldn't matter or the program is so trivial that python's speed definitely isn't an issue.

I have even sometimes seen people stop using a tool when they find out they were written in python - now all of a sudden they are unusably slow. Then they try to justify it by writing some loop in their favourite proper fast language and tell me how fast that tight loop is or they claim that some function is X times faster, but when I actually compile it and run something like hyperfine on it and python version the difference is hardly ever X since there is already so much more over head in a real world.


Python being slow, and working to speed python programs up, helped me immensely to build a mental model for what makes programs slow. After learning C in school, when I first learned how python was implemented, I was shocked that it was even usable.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: