What do you mean with non-english text? I don't think "Ä" will be more efficient in utf16 than in utf8. Or do you mean utf16 wins in cases of non-latin scripts with variable width? I always had the impression that utf8 wins on the vast majority of symbols, and that in case of very complex variable width char sets it depends on the wideness if utf16 can accommodate it. On a tangent, I wonder if emoji's would fit that bill too..
I am not sure if you mean me, as I just asked a question. I wonder what the best way is to handle this disparity for international software. It seems like either you punish the Latin alphabets, or the others.
> I wonder what the best way is to handle this disparity for international software. It seems like either you punish the Latin alphabets, or the others.
there are over a million codepoints in unicode, thousands for latin and other language agnostic symbols emojis etc. utf-8 is designed to be backwards compatible with ascii, not to efficiently encode all of unicode. utf-16 is the reasonably efficient compromise for native unicode applications hence it being the internal format of strings in C# and sql server and such.
the folks bleating about utf-8 being the best choice make the same mistake as the "utf-8 everywhere manifesto" guys: stats skewed by a web/american-centric bias - sure utf-8 is more efficient when your text is 99% markup and generally devoid of non-latin scripts, that's not my database and probably not most peoples
> sure utf-8 is more efficient when your text is 99% markup and generally devoid of non-latin scripts, that's not my database and probably not most peoples
I think this website audience begs to differ. But if you develop for S.Asia, I can see the pendulum swings to utf-16. But even then you have to account for this:
«UTF-16 is often claimed to be more space-efficient than UTF-8 for East Asian languages, since it uses two bytes for characters that take 3 bytes in UTF-8. Since real text contains many spaces, numbers, punctuation, markup (for e.g. web pages), and control characters, which take only one byte in UTF-8, this is only true for artificially constructed dense blocks of text. A more serious claim can be made for Devanagari and Bengali, which use multi-letter words and all the letters take 3 bytes in UTF-8 and only 2 in UTF-16.»¹
In the same vein, with reference to³:
«The code points U+0800–U+FFFF take 3 bytes in UTF-8 but only 2 in UTF-16. This led to the idea that text in Chinese and other languages would take more space in UTF-8. However, text is only larger if there are more of these code points than 1-byte ASCII code points, and this rarely happens in real-world documents due to spaces, newlines, digits, punctuation, English words, and markup.»²
The .net ecosystem isn't happy with utf-16 being the default, but it is there in .net and Windows for historical reasons.
«Microsoft has stated that "UTF-16 [..] is a unique burden that Windows places on code that targets multiple platforms"»¹
> The goal is to transparently replace dedicated developer workstation
Isn't there a less convoluted way of making the best engineers leave? I am half serious here. If you want your software to run slow, IT could equally well install corporate security software on developer laptops. Oops, I did it again. Oh well, in all seriousness, I have never seen any performance problem being solved by running it on Azure's virtualization. I am afraid you are replacing the hardware layer by a software layer with ungodly complexity, which you are sure of will be functionally incomplete.
Are you sure they don't have to fix the build pipeline first? Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
> I have never seen any performance problem being solved by running it on Azure's virtualization
Sorry, I wasn't clear. I am not virtualizing the workspace. I'm using `recc` which is like `distcc` or `ccache` in that it wraps the compiler job. Every developer keeps their workstation. It just routes the actual `clang` or `gcc` calls to a Kubernetes cluster which provides distributed build and cache.
> Isn't there a less convoluted way of making the best engineers leave?
We have 7000+ compiler jobs in a clean build because it is a big codebase. People are waiting hours for CI.
I'm sure that drives attrition and bringing that down to minutes will help retain talent.
> Tens of thousands of vCPUs for a single compilation run, or to accommodate 100 developers who try to compile their own changes?
Because it uses remote execution, it will ideally do both. My belief is that an individual developer launching 6000 compiler jobs because they changed a header will smooth out over 300 developers that generally do incremental builds. Likewise, this'll eliminate redundant recompilation when git pulling since this also serves as a cache.
Thanks for expanding on it, now it's more clear what you want to achieve. If I see things like this, it seems Linus was up to something for banning C++. That sounds like a nasty compilation scheme, but I guess the org has painted itself too deep into that corner to get out of it.
This makes absolutely no sense to me. Are you really recompiling 6000 things each time a dev in the company needs to add a line somewhere in the codebase?
Have you thought about splitting that giant thing in smaller chunks?
> Are you really recompiling 6000 things each time a dev in the company needs to add a line somewhere in the codebase?
It happens when someone modifies a widely included header file. Which there are a lot of thanks to our use of templates. And this is just our small team of 300 people.
> Have you thought about splitting that giant thing in smaller chunks?
Yes. We've tried but it's not scaling. Unfortunately, we've banned tactics like pImpl and dynamic linking that would split a codebase unless they're profiled not to be on a hot path. Speed is important because I'm writing tests for a semiconductor fab and test time is more expensive than any other kind of factory on Earth.
I tried stuff like precompiled headers but the fact only one can be used per compilation job meant it didn't scale to our codebase.
Thanks for the detailed breakdown. The template header cascade problem makes total sense, I underestimated how bad it gets at scale with heavy template usage.
The semiconductor fab constraint is interesting too. When test time costs that much per minute, banning pImpl on hot paths is a pretty obvious call, even if it makes compile times painful.
Appreciate the real-world context.
Don't do the AC thing, it is a stupid trope under blogfluencers. There are no restrictions (besides positioning the outer unit in such a way that you cause your neighbors to lose sleep). As the summers get more extreme in Europe, more residents decide getting one is starting to pay off, so you see more AC's, but many people think they are doing fine without.
Yeah, never heard of such a thing. The restrictions are placing the units in common areas of the buildings -- in that case you need permission -- and external walls are usually common parts. Placing them in the façade may have additional restrictions.
But, if anything, energy efficiency standards for new construction are so strict that heat is becoming less of a problem.
> Why do 90% of Americans have AC while only 20% of Europeans do?
Maybe because the majority of Europe is closer to Canada, latitude-wise, than to Phoenix, AZ, and there is simply less demand? Less wealth is certainly a factor, too, especially considering how the warmest nations in Europe all tend to be weaker economically.
> Why does US have ~4 heat related deaths per million while Europe has ~235 per million?
Maybe its just the higher life expectancy increasing susceptibility? Everyone has to die of something at some point.
It is a statistic, 'treacherous' is a word often lurking around the corner.
No healthy person all of a sudden dies from heat, I am sorry to tell. If that would be the case, everyone would be as panicked as you are. Europe has comparatively older demographics. Heat risk mainly affects infants and the elderly.
Most EU countries have free health care, so even people not caring enough for themselves will have a comparatively higher chance to survive into an old age. But also those who didn't die because of a bad lifestyle are part of this demographic. Like I said, treacherous, because you should look at this demographic and start to ask how many hours of life expectancy is lost. Healthcare keeps finding that the elder people just don't drink enough during these warm days.
I guess that if you want to win back these hours, you have to convince those elderly people to install AC or get them to drink enough during hot days. At this age people have a certain flexibility of mind, complicated by the fact that heat waves these days are really more severe than in their lived past.
Let me assure you: if people think it is too hot for them at home and they don't see an alternative, they will install AC. It is affordable enough.
But there might be a cultural difference, people don't think of AC as the first line of defense against the hot days. Environmental awareness is higher; AC's contribute to global warming. Anecdotally, looking around I see there is a preference for sun protection over AC's.
Most of Europe simply doesn't need an AC. Spain, south of Italy, south of France, parts of the Balkans. But in countries like UK, the Nordics, Germany, etc. you'd need something more than "open windows" for mere days of the year, if that. The people who live in the places that need AC usually have AC. It's actually pretty damn simple.
Could be lot of reason.
Older European cities with high-density stone buildings and less green space often trap heat more effectively than typical U.S. suburban layouts.
Europe has a larger proportion of elderly residents (aged 80+), who are the most susceptible to heat stress.
You just picked a data and are trying to fit your narrative on top of it without really considering all possible aspects.
> Older European cities with high-density stone buildings and less green space often trap heat more effectively than typical U.S. suburban layouts.
Doesn't that mean that they would need AC, then? At least for those specific buildings.
However, as a European living in Paris, one of the densest cities in the world, I only feel the need for AC like 2-3 weeks a year. I think the issue is that most people dying of heath are already very old and much more sensitive to it.
But if you live in any kind of share building, you can't just go and set up a split. If it is outside the building, you need permits, both from the architects, so that you don't deface your ugly concrete building, and from your fellow residents, who usually vote "no" by default.
You make a lot of great points. You know what would be great for helping those elderly residence prone to heat strokes living in high-density stone buildings with less green space? Air conditioners! In face, I think EU should mandate air conditioners in every home.
I would indeed be cautious about attributing economic downturn to holiday spending, but I don't think Las Vegas can breathe freely now. It could be a canary in the coal mine. Some might say, the death of a canary is a rounding error. Others might say: what else is at risk?
St Patrick’s day folds into March madness folds into nba/nhl playoffs folds into Memorial Day folds into Independence Day, Vegas is about to get slammed.
Those people didn't already come to the USA for starters, NYC has been crazily expensive for years.
There are many reasons people might have, none are good. There is for instance also a risk factor of being harassed and detained by ICE. Cruelty and incompetence are a feature of authoritarian governance, not a coincidence. So anyone going there takes a kind of risk. As has been shown, even Europeans aren't safe from the whimsical paramilitary.
EDIT: I don't think that tourism is a big factor, but as I said elsewhere, it could well be the proverbial canary in the coal mine.
I think you can safely omit 'maybe'. OOP is harder and requires more design experience to achieve good results than functional programming. I welcome you to look at OOP code from people who don't get the patterns.
OOP can be wonderful, but the people who aren't able to step up a level in conceptual abstraction should really not touch it. Remember, for many years languages like Java didn't have any concept of lambda's and higher order functions, so design patterns were essential for elegant solutions. As they say, a design pattern is a symptom of the language being not expressive enough. In other words, many design patterns in OOP languages express the same thing as first-class language features in the functional paradigm would do, Visitor vs fold for instance.
Add to that that there is, on the tactical level, a severe unpreparedness for the new battlefield. A recent exercise with the Ukrainians only underlined that once more, as in: their drone team completely obliterated the western teams. Luckily, it was just an exercise.
It used to be possible in openssh to use -c none and skip the overhead of encryption for the transport (while retaining the protection of rsa keys for authentication). Even the deprecated blowfish-cbc was often faster than aes-ni for bulk transfers. I remember cutting off hours of wait time in backup jobs using these options.
Sadly it appears those days are gone now. 3des is still supported, probably for some governmental environments, but it was always a slower algorithm. Unless there are undocumented hacks I think we're stuck with using proper crypto. Oh darn.
If blowfish was faster than AES, then it is certain that either the CPU did not support the AES instructions (AES NI = AES New Instructions), or the ssh program, either at the client or at the server was compiled without AES instructions support.
Blowfish is many times slower than AES on any CPU with AES instructions. Old CPUs with AES support needed around 1 clock cycle per byte, but many modern CPUs need only around 0.5 clock cycles per byte and some CPUs are even twice faster than this.
0.5 clock cycles per byte means a 10 GB/s = 80 Gb/s encryption throughput per core @ 5 GHz, so only for 100 Gb/s or faster Ethernet you might need to distribute encryption on multiple cores to reach full network link throughput.
For full AES speed, one must not use obsolete modes of operation like CBC or obsolete authentication methods like HMAC. For maximum speed, one must use either aes128-gcm@openssh.com or aes128-ctr + umac-64@openssh.com.
For increased security at lower speed, one can use aes256-gcm@openssh.com or aes192-ctr or aes256-ctr with umac-128@openssh.com.
In general, one should never use the default configuration of ssh for cipher and MAC algorithms, but one should delete all obsolete algorithms and allow only the few without problems, unless one has to make connections to legacy systems.
I don't remember which generation of Sparc CPU we were working with, but yes, Blowfish was faster. I made a matrix of relevant options and benchmarked all combinations.
Do you have citations for sustained 0.5 core-cycles/byte (80 Gb/s)? The benchmarks I have seen are closer to ~20-30 Gb/core-s though I have heard claims of 40-50 Gb/core-s.
It is a bottleneck for multiple files, but will it speed up with a single file?This is how we sent files for decades. Archive, transfer, unarchive. So I'm wondering what the point is.
It depends on the size of the file, of course. For copying your 90 line .bashrc, probably not noticeable in the noise. For copying an 800GB database? Um, yeah. :-)
I see this project's main value in turning loose the power of multiple cores on a filesystem full of manifold directories, backed by flash based storage that only runs optimally at queue depth >1 (which is most of them). On spinning rust this will probably just thrash the heads.
Hmmm. I wonder how 2 or 3 threads perform with zfs and a reasonable sized ARC?
We definitely need more information. 9 or 10 years ago, under Solaris/Sparc with 1000BaseT connections, things were quite different than even the most boring and average Linux environment today.
This discussion should drive home the importance of not listening to conventional wisdom and presuming optimal performance without actually testing your options. And particularly, don't presume that a blog post (or an LLM that scraped it) knows what's right for your particular use case.
reply