Hacker News new | past | comments | ask | show | jobs | submit login
Linux 5.6 is the most exciting kernel in years (phoronix.com)
714 points by alexellisuk 56 days ago | hide | past | web | favorite | 246 comments



It's great to see all these new features entering the kernel. Here are a couple of other things I've seen in recent changelogs:

- The 5.4 kernel can use CIFS as a root filesystem, meaning CIFS can replace NFS for diskless boot.

- The 5.3 and 5.4 kernels both include prep for merging PREEMPT_RT into mainline. This will be a fantastic addition that benefits embedded Linux nerds like me.


For more info on the second point: https://en.wikipedia.org/wiki/Linux_kernel#Preemption seems to have something to do with real-time operating systems, but I didn't dive into it.

Edit: Thanks for the downvotes without commenting what I should do differently next time?


Currently the Linux kernel disables preemption inside spinlocks and interrupt handlers, and these code sections are not necessarily carefully written to be short and constant-time, so you can in theory get in a situation where all cores are running spinlock or interrupt handler code from low priority threads for arbitrarily long times, preventing higher priority realtime threads to run (which in practice means that whatever hardware you are controlling will be uncontrolled, and thus your self-driving car, rocket, plane or industrial machine might be destroyed).

The PREEMPT_RT patch solves this by reimplementing all kernel spinlocks as mutexes with priority inheritance that are fully preemptible and all interrupt handlers as preemptable threads.

The result is that if you have a bunch of real-time process/thread that do not make system calls, it's guaranteed that in a bounded amount time the highest priority set will be running.

If you do however make system calls (or even if you cause page faults because you didn't call mlockall() and accessed non-mlocked memory), with the exception of some system calls that provide better guarantees, then you can still have kernel spinlock contention with other threads that may hold the same spinlock for an unbounded amount of time, so the system is more limited than dedicated real-time OSes that are careful about this.


About PREEMPT_RT merge prep: are we talking about the patchset that leads to ultra-low-latency audio-friendly kernels tagged -rt, like https://aur.archlinux.org/packages/linux-rt/ or https://aur.archlinux.org/packages/linux-rt-bfq/ ? If so, I use these for pro audio (for live dsp / effects under 5ms), and have a few questions:

- Will these kernels become obsolete/unnecessary with the merge?

- What about BFQ, is it going to be merged at the same time too?

- This patchset has been existing for a long time. What prevented an earlier merge?

- Is the merge already scheduled to happen for a specific kernel release, or "when it's ready"?


> Will these kernels become obsolete/unnecessary with the merge?

If it merges, sure. It's nice to see it may be getting close.

> What about BFQ, is it going to be merged at the same time too?

BFQ merged a couple of years ago.

> This patchset has been existing for a long time. What prevented an earlier merge?

It's been scary in the amount of things it touches and the amount of semantics it changes. It's a big change in mindset. Many drivers have been broken or simply unsupported at times on the preempt-rt branches.

More and more of the underlying infrastructure has made it in over the past couple years.

> Is the merge already scheduled to happen for a specific kernel release, or "when it's ready"?

Nope... the most we can say is it is "close".


Thanks.

> "BFQ merged a couple of years ago."

Okay, got it. My question was driven by confusion of what exactly was package https://aur.archlinux.org/packages/linux-rt-bfq/ . Reading now the wiki in GitHub ( https://github.com/sirlucjan/bfq-mq-lucjan/wiki ), it's clear:

> Development version of BFQ

> The development version of BFQ [...] differs from the production version in that:

> - it contains commits not available for that kernel version;

> - it contains a lot of consistency checks to detect possible malfunctions.

So, linux-rt-bfq is linux-rt, sprinkled with extra fixes to bfq unmerged yet to mainline.


It aims to turn Linux into a real-time OS. The preempty patchset aims to give faster response times but the main focus is to remove all unbounded latencies - for example, situations where the delay can vary depending on other factors.


It's not strictly true that the rt-patchset is meant to give faster response times. It provides bounded response times. You get a guarantee that some code can run within the time frame that you specify. That's not necessarily faster.


I was at a meeting with people that were doing signal processing in "embedded" environments. They said the patch slowed them down a lot. They got better performance and more determanism even, by pinning processes to cores using prior knowledge.


This would kill most use cases for QNX+POSIX.


You'll still need QNX if you want to run a full GUI Unix-like OS from a single floppy disk.


That hasn't been the case for several years now has it, since perhaps QNX v4? If not, you'll need a time machine as well.

http://toastytech.com/guis/qnxdemo.html

http://qnx.puslapiai.lt/qnxdemo/qnx_demo_disk.htm


Recent QNX has removed Photon and encourages the use of Qt instead which while can be made small, certainly does not fit in 1.44mb especially in 64bit


Yeah, the usual size for a minimal Qt with either QWidgets or Qt Quick / QML is about 15 MB.


Do people still have this requirement in 2020?


Usually such use cases require a certain level of certification and security that Linux doesn't offer.


Having preemption throughout the whole kernel helps with everything Low-Latency whether on desktop, mobile or server systems; it's not limited to the traditional real-time use case, although it might obviously help there as well to some extent.


It's essentially just applying these patches: https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt...

The above is for kernel 5.4, as the patchset for 5.5 doesn't exist at the time of this post.

After applying the patches, you'll want to run "make nconfig" or "make menuconfig" and search for CONFIG_PREEMPT_RT and enable it and you should be good to go. The only downside in my experience is Virtual Box doesn't work with the -rt kernel as of right now, so I can't use it as my daily driver. It is a my preferred kernel though for gaming and overall desktop use. I use a custom low-latency kernel when I'm not using the -rt and it's been working well for me.


[flagged]


I didn't (and can't, because you replied to me) downvote, but since the downvoters are not sharing their reason, I guess I'll share my thoughts on it: I think the "quite some people at HN just enjoy downvoting" is potentially offensive as well as an overgeneralization, and they probably don't like either or both.

Aaaand downvotes for this as well, so I guess that means this is not it?


It's because discussing downvotes is against the HN Guidelines[1]:

> Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

Folks downvote comments about downvoting because it's boring to read, and to discourage others from doing it.

[1]: https://news.ycombinator.com/newsguidelines.html


Well, you're now discussing about "downvoting" and by that logic, you are now breaking the HN Guidelines.


Note that your original comment is not downvoted anymore. That's the best reason not to complain about downvotes; usually it will be reversed soon enough if the comment has merit.

Early downvotes can happen through accident or randomness; all it takes is one or two people to misread your comment, or mistakenly tap the down arrow. Or they might have a valid reason for disagreeing and are in the process of writing a comment putting a different point of view.

But getting offended and defensive and editing your comment to complain about it only makes the original comment worse, and leads to an off-topic subthread like this.

Sure, edit the comment if you need to, but try to make it clearer and more informative; that's what I do if I get an early downvote and realise the comment could have been better-worded.


The down votes were because you clearly didn't understand what preemption(or that this is really about it being a realtime flag) is or add much to the conversation(mentioning anything at all that might add something, like even a bad question, "I heard people using raspberry pis for embedded, would this help?"). The downvotes on the other guy are because asking for feedback on downvotes is discouraged.

To contribute, the PREEMPT_RT has been around as a set of patches(if not this specific one, then at least some version of making linux a realtime os.). If you need to know that something will run within a specified amount of time realtime matters.

EDIT: Don't use this and think it will help performance, generally it kills throughput.


You don't talk about fight club here. I don't know who was the smartass that decided that but here we are. Stupid rule but it's their place.

And adding "offensive" every other sentence [1] assumes (for me at least - for others not that much) that you are talking about us as crybabies being easily offended. I almost downvoted for positioning yourself as a purveyor of ethics with the implied assumption that everybody will agree with you.

But in general people around here indeed _are_ easily offended - see for example 'rzv' in this thread being downvoted for talking about this very problem. Depends of course on the thread and on the self-selected audience.

So - what can you learn from a downvote? For me is: don't debate here - HN it's not a platform suitable for debating ideas.

[1] not by _you_ - but in general


Random feedbackless downvoting is more and more being clearly abused to censor seemingly random and perfectly acceptable comments. The fact that talking about it is against the site guidelines only makes the problem worse. HN needs to deal with this.

PG seems to understand that a community censorship mechanism needs safeguards. From http://www.paulgraham.com/hackernews.html :

> I think it's important that a site that kills submissions provide a way for users to see what got killed if they want to. That keeps editors honest, and just as importantly, makes users confident they'd know if the editors stopped being honest. HN users can do this by flipping a switch called showdead in their profile.

How can editors be "kept honest" if a bunch of their minions can just downvote users for no reason and any complaining is forbidden?


it's not "minions." Way way way back in the early days of the board we actually discussed what downvotes should mean. Argument 1 was that a vote should only indicate whether or not you thought a comment contributed to the discussion- and not indicate agreement or disagreement. This gave an indicator with a single purpose. Argument 2 was that a vote could also be used to indicate disagreement. This made one indicator mean more than one thing. Some suggested a second vote- in order to keep the indicators clear on what they were reporting.

pg eventually came down on the side of Argument 2, so voting can be either "I disagree" or "you comment is not useful."

I do agree that adding a comment when downvoting is useful to the downvotee, but alas sometimes these come off as mean or supercilious and leave a bad tone in general, spoiling the conversation (or igniting a flame war).


More and more often recently, I find downvoted comments that I thought are all perfectly useful and reasonable to the discussion. I in fact thought the downvotes were neither "useful" nor "contributing to the discussion". I would rather like to downvote the downvoters to reduce their karma and their power to perform these downvotes, but the only mechanic I have on HN to express my preference is to re-upvote the comment, which is not in fact its original intended purpose.

Rather, I would like to ignore all downvotes made by particular downvoters because I believe their judgements are highly flawed, independent of the quality of the original comment that was downvoted.

https://abstrusegoose.com/527 is highly relevant.


Agreed. And great link. The “your” seals it.


I wasn't able to find documentation for using CIFS as / - is there a guide somewhere?

What are the advantages of this as compared with NFS?


I can only speak from personal experience, so don't take this as true for everybody.

But for me, NFS has always been a colossal pain to use. The server has to run in kernel-space. Shares have to be enumerated in /etc. There are a couple of userspace options, but I've never been able to make them work reliably. Once I do get it working, it hangs all the time for no easy-to-debug reason. Also it needs multiple ports to be open, and it expects UID and GID to be the same on client and server.

CIFS has its problems for sure. But it's been pretty straightforward for me every time I've had to use it. If I was trying to set up a production-line machine to flash Linux-based devices, I'd choose CIFS every time because it's so much less hassle. And now that it's rootfs-capable, I just might be able to do it.


  https://www.kernel.org/doc/Documentation/filesystems/cifs/cifsroot.txt


>A CIFS root mount currently requires the use of SMB1+UNIX Extensions which is only supported by the Samba server. SMB1 is the older deprecated version of the protocol but it has been extended to support POSIX features (See [1]). The equivalent extensions for the newer recommended version of the protocol (SMB3) have not been fully implemented yet which means SMB3 doesn't support some required POSIX file system objects (e.g. block devices, pipes, sockets).

As a result, a CIFS root will default to SMB1 for now but the version to use can nonetheless be changed via the 'vers=' mount option. This default will change once the SMB3 POSIX extensions are fully implemented.

Who thought re-enabling uses of SMB1 was a good idea?



It's not just CIFS root that needs SMB1.

SMB1 has to be used any time you need the POSIX extensions, with Samba at the server side and Linux at the client side.

I find it comes up reasonably often, because Samba is so configurable. For example remapping user ids, or mapping user-group permission bits; these are hard or impossible to do in NFS, depending on available NFS server version.


I think the real question is: Is SMB1 less secure than NFS?


Probably not since NFS to my recollection barely support anything resembling transport encryption, but it allows Authentication if you like Kerberos.


NFS has supported transport encryption since as long as I can recall. It's enabled by the sec=krb5p mount option.


It can also be secured using IPsec (or other host-to-host supporting vpn that can have per protocol security associations)


SMB doesn't require me to setup a whole VPN connection (with it's own problems) just to get secure transport going.


True. But neither does NFS like the kerberos comment you replied to described :)

A third way to do this with NFS is to forward the TCP connection over stunnel, ssh forwarding or other similar thing.


As mentioned, if you like Kerberos. It's not the nicest way to do anything. Kerberos is also only supported if you (can) use NFSv4, NFSv3 doesn't support Kerberos on all clients.


NFSv3 is very dead.

I like Kerberos a good bit and I think the complexity of running an LDAP/Kerberos infrastructure is greatly over estimated, but it is disappointing that none of the theorized alternatives ever really appeared. Last I read, LIPKEY was the only serious contender and there were some security concerns that got it nixed.


And if you don't use Kerberos, NFS has no authentication. For extra credit, it's generally paired with NIS, so everyone can see everyone else's password hashes. What's not to like for an attacker?



CIFS is I think the protocol used on Windows and SAMBA.


Yes, ish. CIFS is the protocol that came before SMB, but people still use the term to describe modern Windows SMB or Samba.


SMB was first. CIFS is a rebranding that happened when Microsoft was forced to release documentation for the protocol.


Yes, but CIFS was a rebranding of SMB1; successive versions of SMB have just been called SMB. So first there was SMB, and then CIFS, and then (and now) it's all been just SMB. The only ones still using the term "CIFS" at this point are the Linux/Samba folks.


It's a case of running the Common Internet File System on top of the Server Message Block protocol.

SMB isn't just for filesystems. It is also used for printing, among other things. CIFS is the filesystem.

Outside of Linux/Samba folks, neither term is popular. Users say "Windows share" or "shared drive" or "network folder" or something like that.


Don't see someone call SMB "modern" every day...


I'm really excited about the NFS client cache stuff.

It's crazy that the whole system grinds to a near halt if it loses connection to the NFS server, from an end user perspective.

I look forward to some time in the future when Debian incorporates this kernel. I prefer to use stock kernels. I used to enjoy messing with my distros, but these days I prefer stability.


You will be able to use this via buster-backports:

https://backports.debian.org


> It's crazy that the whole system grinds to a near halt if it loses connection to the NFS server, from an end user perspective.

Indeed. The only solution that works consistently, if the NFS server is not coming back up anytime soon: Enable an NFS server on localhost; add the address of failed NFS server to a local adapter; wait for retry to finally get an error answer; kill ip address and local NFS server.

Client side caching will only delay the inevitable.


I’m not an NFS expert, and I’ve often wondered about client caching of NFS to improve performance. Does this exist at all currently?


Sure, the kernel caches everything in RAM by default (as with any other file system, not just NFS).

If you also want to cache on local disk, there's "cachefilesd", which does exactly that. You can specify a certain percentage of the disk that should be kept empty, and cachefilesd will use the rest of the available space for caching.

(It works very well, but is broken on kernel 5.x for me (it just doesn't read from the local cache, even though everything looks fine). But I just mention it off-hand, I don't have time to diagnose this in more detail, I just remain on 4.15 for the time being.)


I think this was what I tried to use for one of those shared file system setups for a web app where we needed shared files.

And the daemon responsible for this would end up hard-locking the VPS it was running on.

Oh yeah, here's the issue I found that mirrored my experience: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=751933

That was a rough experience since it would run fine until it hit a certain level of load.


NFS has had caching modes since the beginning. Or at least as long as I have been using it. It's behavior can be controlled by using async or sync options.

NFSv4 is actually quite nice. Anything earlier should be avoided.


> NFSv4 is nice

Wow, never thought I would see this. Are you saying that based on experience or just looking at its feature set?

I used to work with NFS performance on NetApp filers at NetApp, and NFSv4(.1,.2) we’re god awful. I used to see almost 1/3 of v3 performance in simple read and write ops. On top of it I need to align all planet to get it to not fail with (EIO) during the test. Is that not the case anymore? Did somebody do some magic I missed in the last couple of years?


Isn't that more a problem with a specific product (NetApp) ?


Could you elaborate on why it’s nice? I only ask because the OpenBSD maintainer, Theo de Raadt, seems to believe, quite strongly, the opposite: http://openbsd-archive.7691.n7.nabble.com/nfsv4-td18690.html

I don’t understand NFS well enough to know what to believe.


I've seen that thread before but just read it through in its entirety. All I got from it was "OpenBSD people are assholes, and not in the Torvalds 'constructive asshole' way, just in the 'being an asshole for the sake of being an asshole' way".


Wow, that's impressive in a bad way. Recommended reading for everyone who thought Linus is exceptionally rude and want some nuance.


I think Theo is notorious for this.


it is the reason openbsd exists


There is few strong words here and there, but nothing more injurious than what you would hear by staying in a school for 5 minutes. Once you understand the liking of such words by their writers, it's easy to mentally replace them by more politically correct version if it bother you. Moreover, they just critize tech, not people and that's an important distinction imo.


The big problem with that thread isn't profanity or insults or tone. It's that the developers are repeatedly refusing to provide any detail or justification for their less-than-polite assertions that failed to answer the question. Some degree of insults and profanity can be tolerated if there's also useful content, but Henning Brauer and Theo de Raadt didn't make any useful contributions to that thread.


Sorry. No. There's no "political correctness" at here

Theo's comments in particular seem to start at

"NFSv4 is a gigantic joke on everyone."

This isn't politically incorrect, it's just noise. Nothing is gained from this comment. Even Linus with his infamous "rants" would have actual commentary laced in, Theo is just being dismissive

After someone reaffirms that they would like to use it, he replies

"Hahahahaha. That's a good one."

"I guess by "all the other protocols" you must be rejecting all the rest of your network traffic as "not protocols" or "not services"."

Again. Not helpful, just noise


Well, to be fair there was at least one poster who suggested that OP use SFTP instead of NFS... I mean why bother with a vegetable peeler when you could use a chisel instead?

So yeah, like you said--absolutely no useful information was conveyed in that thread. Besides any lurking bystanders learning those "in the group" are complete assholes.

http://openbsd-archive.7691.n7.nabble.com/nfsv4-tp18690p1871...


Well, there was this prediction: "In about a decade the people who actually start auditing it are going to see all the mistakes that it hides."

So, since he wrote that in 2010, we can look forward this year to seeing if he was right.


Wow. What a toxic community. I'd never build a business or platform around a community like that--any time I had a question I'd be constantly worried all the maintainers would bite my head off. Coupled with the fact that I can't be alone in thinking that, I'd also worry about the long term success of a project.

And yeah, OpenBSD has been around forever.... but you'd be hard to call them anything more than a tiny little niche operating system. Probably because of how toxic they are.



Wow. That guy can go fuck himself. In fact every snarky worthless reply to that poor original poster can go fuck themself. Man. That is some hot garbage...


I've always had better luck with CIFS/SMB myself... but my exposure is relatively limited.


i don't really understand this: "The AMD k10temp driver finally starts reporting voltage/current for Zen CPUs and numerous thermal reporting improvements. This is a big step forward thanks to the community but unfortunate these Zen/Zen2 thermal/power reporting bits have taken so long and there are still some mysteries that remain."

why wouldn't AMD set aside a couple of engineers (or at least one) to make sure their processors are supported here? it's not like this would be a huge drain on their manpower resources and the company itself should have an immense interest in the best possible linux support.


Zen CPUs are notoriously broken on Linux and AMD doesn't seem to care. If you ever try to install Linux on a first-get Ryzen box and want an uptime longer than 3 days, you'll want to keep this handy:

https://github.com/qrwteyrutiyoup/ryzen-stabilizator


Yet everyone on here praises these chips and AMD as the (second?) coming of Christ.

I don’t get it, it seems like there are technical leaps but sloppy execution.

In the end I still trust intel (even with specter/meltdown etc) more than AMD because at least my machine:

a) boots

b) stays up long enough to have me even worry about an attack


Maybe a lot of people don’t run Linux or their Zen/Ryzen machine? Since as you’ve said, it has issues.


I have a Linux machine with a 1st gen Ryzen 1700. It locked sometimes in the first couple of months after release, but has been stable for a good while. Not sure about newer gen.


You have to disable a key security feature (ASLR) to have a stable AMD-based linux system??

That should have been treated as an all-hands-on-deck emergency by AMD.


In my case I only had to use the "Power Supply Idle Control" workaround. Apparently some BIOSes have a setting for this but mine doesn't.


Does Windows have to disable ASLR on ryzen? Why not?


I’m not sure if I’m some weird exception, but I’ve never had issues running Linux at long uptimes on my R5 1600 box.


It wouldn't surprise me if the quality and stability of the motherboard is a factor here.

R5 1600, never had any of these problems, and I have friends running Threadrippers that don't see these problems either.


Seems plausible. The build my R5 1600 is in has a reasonably high-end Asrock X370 mobo. Is your setup similar?


It's an Asrock B450 Pro4.


Hah! I almost bought that board, but ended up going with another for built-in wifi. Asrock has been a really solid brand for me.


My Pro4 has been solid for me. I upgraded about a year ago from an AM3+ FX-8350 build on another Asrock motherboard. That was back when Microcenter was selling the R5 1600 for $79 with $30 off a qualifying motherboard.


Impressed that an 8350 lasted you that long at a reasonable performance level. An R5 1600 for $79 is definitely an awesome deal; mine still crushes pretty much any work I throw at it.


I have a Ryzen 1700 and haven't noticed any issues other than the GCC segfaulting one, but that's a hardware problem sadly.


Does this also apply to Threadrippers? I have a 1950x and it seems to sometimes suffer from stability issues, and wondering whether it’s related.


For some more information on the various features in this release, check out the 5.6-rc1 LWN[1] article. As usual, that article only covers the first week of the merge window (Corbet is probably writing the second part as we speak).

[1]: https://lwn.net/SubscriberLink/810780/ae4429af6a4ba40d/


Why are you posting a SubscriberLink to an LWN article to HN? That's not really the point of the links. They're for sharing in small groups not for giving away paid content to thousands of people.


LWN are happy to see them posted: https://news.ycombinator.com/item?id=1966033


There are a number of new io_uring opcodes appearing with 5.6 as well: fallocate, openat, close, statx, fadvise, madvise, openat2, non-vectored read/write and send/recv.


> Year 2038 work beginning to wrap-up for 32-bit systems

I wonder how many of these vulnerable systems will be upgraded to kernel 5.6?


... I’m struggling to think there was ever a reason to store epoch seconds as signed.

Was there a use case for “we need to pretend to have 65 years worth of system time before 1970?


From a favourite Reddit thread:

- Why in the hell would a date value-- particularly one that's an offset-- be signed?

- - How were you planning on indicating dates prior to 1970-01-01?

- - - What date before 1970? We just all assume those do not exist.

- - - - "Now you've made me feel old." "How old are you?" "Let me put it this way: when I was born, time didn't exist."

https://old.reddit.com/r/programming/comments/2b4kpg/conspir...

The full context is ... kind of fun:

https://old.reddit.com/r/programming/comments/2b4kpg/conspir...


Well if we bite the bullet and use a 512 bit unsigned int we can represent time zero as the Big Bang and we should be good until the heat death of the universe, based on my napkin math. But I'm no physicist or kernel contributor so unsure of the implications.


No way. Do we know exactly when big bang was or that a second has always been a second since then?

We don't want time implementations based on some definition of big bang version 1.1.3 and all other versions as physicians change the definition of time since big bang.


If the big rip theory is correct, then 64 bits will suffice.


What happens when we revise the moment of the Big Bang? Maybe need signed 1024 bit just in case.


Might as well just make it signed at that point. We'll never need more than kilibit to represent time.


Instead we now assume dates past 2038 don't exist...


Time is an illusion.

Lunchtime doubly so.


Huh, just realized - time ends at pi-o'clock.


This is how I usually compute pi.


I would bet they were more confident someone would want to represent data from the recent past than that someone would still be using their research lab OS in 60 years.


Well, now we should use 64-bit timestamps, and 64-bit timestamps is I think sensible to be signed, since there shouldn't need a reason for unsigned 64-bit timestamps.


“In 30 years from now everything will probably be 512 bit and flying cars, right?”


That’s the wrong thing. It’s not “will this computer still be running” it’s “does the life of this format work over the time people will use it”. If in 2000 we had 512bit and flying cars, Unix epoch still could be the standard.

The first epoch seconds computers aren’t running things anymore. But, it lives on as a standard.

So the issue isn’t “was it a good idea to count seconds starting at 1970”, but rather “why the hell would anyone use signed for a value that can only be positive? ... and then keep doing it once they realized the problem!?”

2105 or 2106 wont be an issue for unsigned. No one will use a 32bit standard much longer let alone in the 30 years leading up to the problem date.


Drooling over that picture of a dozen AMD EPYCs. Those processors are seriously awesome. The newest sr.ht server, tenshi.sr.ht, was provisioned to run git.sr.ht and has an EPYC 7402 24-core @ 3.35 GHz at its heart. It can compile the Linux kernel x86_64 defconfig from scratch on an NVMe in under 30 seconds. Our main build server is also EPYC, a 7281.


Just bought three Dell 7xxx series with Epyc 7402s. Our IT consultant recommended against AMD in favor of Intel. I didn’t like their reasoning.

Side take... I sized our servers to be adequate with the idea virtual cores won’t be turned on. Just assuming more specter and meltdown discoveries, and while AMD has fairer better it’s not impossible they have their own demons.


Curious, what are their reasoning?


Not OP. I purchased 8 Cascade lake servers for HPC after testing AMD EPYC latest. One of the reasons is Intel compiler! - we use it in scientific software to get extra performance, and it shits over AMD processors. I saw the same software take approx. 30 times longer to run because it was not an intel CPU(worst case. Other GCC compiler not so much). This is not AMD’s fault. Just saying there are some of us who are stuck with the devil that is Intel.


Doesn't the Intel compiler let you disable the runtime CPU detection and generate an executable that unconditionally uses the instruction sets you explicitly enable? I know they also provide environment variable overrides for at least some of their numerical libraries.


It is a bit willy-nilly to get it to do the right thing. We compile with arch specific settings, and add features we’d like as well, but in spite of it it does not look like it is using all the facilities available(unverified claim based on perf outcome). I guessed it ignored our flags once it did not see “Genuine Intel” on the model field. To be honest, I had to calculate cost benefit of trading off my time to figure this out vs the savings I get from going AMD. Two things made us stop, and buy Intel:

1. Our major cost in BOM is memory, not the CPU. So, a 30% savings in CPu cost is not 30% off the bill, but much less.

2. Even if we found a way to tip the scale in AMD’s favour, our binaries still need to run in the rest of the intel servers without significant perf hit. So, our liberty to change is limited.

It’s sad, but reality that we had to buy more intel. But, luckily, their prices are far lower than our last purchase before AMD lit a fire under their asses. So, there is that.


Maybe you have a problem that can use AVX-512 trivially in the compiler, in which case yes, Intel is hugely better. We are all very luck to have the crazy high end hardware we have. I can't wait in a few years I should be able to get a fairly cool Mac Mini+ equivalent with 32 cores so that I a javascript test suite can take less than 10 minutes to run...


There was a HN discussion a couple of weeks ago about whether the intel cpu detection “feature” is an evil money grab, or a legitimate way to prevent unexpected runtime behavior on AMD CPUs.


I'm not sure which discussion you're referring to; I've seen the topic come up many times. But I haven't seen a reasonably non-evil explanation for why the compiler should preemptively assume that AMD's CPU feature flags cannot be trusted, while Intel's can. Detecting known-buggy CPU models is fine, but assuming that future AMD CPUs are more likely to introduce bugs to AVX-whatever than future Intel CPUs is not something that I have seen justified.


Okay so there was another article recently which talked about a new fuzzing tool implemented by google that revealed thousands of bugs in safari and open source projects.

I assume the exploitable edge cases are so numerous and so hard to have 100% test coverage on (is it even possible?) that it is hard enough for Intel to deal with correct execution on their own platform.


Were you able to bench AOCC, or was that compiler not a feasible option?


I didn’t press but it was along the lines of “everyone uses Intel” and surely would have ended with “no one was ever fired for choosing IBM”.

In reality with AMD, the cost was better, the ram/bus was faster, I liked the specs and options of the 7xxx vs the 7xx series better.

They couldn’t give me a reason not to chose AMD, only that “Dell always uses Intel for a reason”


> “Dell always uses Intel for a reason”

That reason sometimes being that Intel paid them off to block AMD from competing in the market:

https://www.nytimes.com/2010/07/23/business/23dell.html

https://www.extremetech.com/computing/184323-intel-stuck-wit...


There's 2 main valid reasons larger companies won't touch AMD for servers:

1) You don't know if a given linux kernel/other software will work unless you test it ... for each future version

2) The firmware updates for Intel and AMD are different.

Additionally, the excellent Intel C compiler focuses on their own processors.

The above doesn't mean you can't choose AMD, but don't assume they're interchangeable CPUs.

Disclosure: I worked for Transmeta, whose entire DC was based an AMD servers. The reason was that Intel was a larger competitor for their code-morphing CPUs than AMD was.

Coincidentally, Linus Torvalds entered the USA on a work visa from Transmeta after DEC bailed on his job offer.

I bought CS22 at Transmeta's wind-down auction, which I will donate to the Computer Museum. Several large CPU designs during that era were verified on it because it was a 4 CPU Opteron with 64 GB RAM, and 32 GB RAM wasn't enough.

Aside from Apple's A-series, that was the end of Silicon Valley being about silicon. (Many of the chip engineers on my last project ended up at Apple on the A-series.)


>Additionally, the excellent Intel C compiler focuses on their own processors

This is a new and creative use of the word "excellent" to mean Intel are so dishonest they have been caught out using their compiler as malware delivery to make /your/ compiled binary test for an Intel cpu when being run by /your/ customer and if it finds your executable binary being run on a competitor, eg amd, makes the code run every slow path despite the optimised code running fast on that cpu.

Wildly dishonest. Malware delivery mechanism are somewhat more traditional uses of the English language to describe the Intel compiler.

You cannot trust Intel. They've earned that reputation all by themselves.


Malware? Are we just redefining words when we don’t like something?

> malware (n)

> software that is specifically designed to disrupt, damage, or gain unauthorized access to a computer system.

How is a dispatch system (which GCC supports) malware? Yes, Intel “cripples” AMD by requiring an Intel processor, but it’s not malware.


It's sneaky, it behaves badly and counter to the user's interests, and because it's a compiler, it propagates that bad behavior (though not in a self-reproducing viral fashion). It's fairly mild on the scale of malware—I'd rank it slightly less bad than adware, but roughly on par with AV software that deletes any tools it deems to be for piracy purposes.


I call stealing your customers cpu cycles without permission for marketing purposes malware. If you don't that's ok. We can disagree.


I literally posted the definition of malware. Where is it gaining unauthorized access?


It says 'or'.

If it disrupts, that fits the definition you gave.

Or do you think a trojan that deletes your boot sector isn't malware?


Your definition isn't the only reasonable way to define the term, and you seem to be parsing it incorrectly anyways.


Seems pretty disruptive to my layman eyes to force code to run slower on a competitors hardware.


Oh for sure. I’m quibbling over the use of the word “malware”


> 1) You don't know if a given linux kernel/other software will work unless you test it ... for each future version

Huh? Sure, some software may break, but there's more than enough AMD out there to make sure that linux and other common software won't break.

> Additionally, the excellent Intel C compiler focuses on their own processors.

IME it's actually not that commonly used outside of benchmarking (among other reasons, it's fairly buggy - perhaps somehat of a chicken/egg issue).


Actually, if you want to run Wayland, and a more powerful GPU than Intel's integrated stuff, AMD has much better support, to the point that running Wayland isn't even an option on anything Nvidia (and Intel CPU + Radeon dGPU is relatively rare). Though I'm a bit confused about whether Wayland (as an experimental option) in upcoming Ubuntu 20.04 LTS is even supported. It should be, because X.org's universal trackpad driver sucks compared to what was available in Ubuntu 16.04, and overall gnome-shell feels clunky and a regression compared to Unity. Having just setup a ThinkPad E495 (Ryzen) over the weekend, I'm both impressed with easy out-of-the-box installation, but also seriously disappointed with gnome-shell and the state of Wayland that I'm considering alternatives to it.


> Though I'm a bit confused about whether Wayland (as an experimental option) in upcoming Ubuntu 20.04 LTS is even supported.

I've been using Wayland out of the box on 19.04 and 19.10 to get fractional scaling and independent DPIs on multiple monitors (Thinkpads of various ages with Intel GPUs). If it's experimental, they've certainly hid that well. It was just a login option on the display manager with no warnings about it during install or later.


Appreciate the experience report. But non-lts releases are by some definitions all "experimental" - I had the impression canonical pushed for Wayland in 18.04,but then walked it back a bit.

Hm, Wayland by default in 17.10, then back to optional in 18.04 - and so it might stay:

https://www.omgubuntu.co.uk/2018/01/xorg-will-default-displa...

https://www.phoronix.com/scan.php?page=news_item&px=No-Wayla...

I'm a little surprised, not default for 18.04 made a lot of sense, but I'm not sure why 20.04 won't see a switch.


Not being experimental and being the default option are still two different things though. Even in 19.10, while it is installed as part of the default install without experimental warnings, it still isn't the default session option.

It is still a very slightly rougher experience than xorg - mainly due to some 3rd party apps not fully handling it yet. But the scaling options more than make up for it with me. One of those features (either fractional scaling or independent DPIs) was still regarded as experimental enough to require a CLI command to enable it though.

So, not perfect, but good enough for me.


That's encouraging to hear - I'll give it a try.


Does Intel work without testing?


Most kernel devs have Intel processors, and anecdotally, it does seem like you see more AMD-specific patches coming in the changelogs as people with the chips get new kernel versions and find new breakages.

Another side effect of Intel's market penetration is that the Intel implementation of any given featureset is targeted first. Things like nested virtualization may work mostly-OK on Intel by now but are still in their infancy on AMD; for example, it appears that MS still blacklists AMD from nested virtualization. [0]

[0] https://github.com/MicrosoftDocs/Virtualization-Documentatio...


> and anecdotally, it does seem like you see more AMD-specific patches coming in the changelogs as people with the chips get new kernel versions and find new breakages.

You have to factor in how stagnant Intel's chips have been for many years. There's simply not much new stuff showing up on Intel platforms, and half of the new features are fundamentally incompatible with Linux anyways and thus will never lead to upstreamable patches. AMD catching up to Intel on feature support also necessarily means AMD is adding features at a faster rate that requires more feature enablement patches over the same time span.


That will change though.


>“Dell always uses Intel for a reason”

Which is just as likely to be more to do with commercial arm twisting and incentives from Intel than anything technical.


For one, AMD killed OpenCL support for Ryzen. Like wtf... I've got 24 hyperthreads and OpenCL doesn't work!!


If Dell went all in on AMD, could they produce enough chips to satisfy demand?


The chips are actually produced by TSMC. So my guess would be yes.


Depends on what kind of pricing and lead time Dell gives AMD, because TSMC 7nm production capacity is definitely limited and their required lead time for wafer orders hit 6 months as of last fall. AMD has already experienced some shortages due to their supply being relatively unresponsive.


Shouldn't be a huge problem right now since Apple is moving to 5nm.


In fairness, so has Intel themselves.


Sure, but people expect that of Intel because they own their fabs and have to build a new fab if they want more capacity. AMD's only buying a slice of TSMC output, but that doesn't mean they're able to suddenly buy a much larger share of that output.


Another question.. what exactly are IT consultant's job description?


you get more for less money and their commission is smaller


I usually don't mind some on-topic advertising, but this is a little too bold for my taste.


This is not some large corporate-driven, heavily marketed project. It is real, community FLOSS.


I seriously like these CPUs, and the context in which I evaluated them is sourcehut. Sorry if this comes across as spam.


Software for people not for salesmen, is an exception.


First time I am seeing sourcehut and it seems cool. Is it basically a git, CI, and mailing list platform? Or is there more to it?


It's basically that, yes. Also has bug tracking, pastes, and wiki services, and a few more in the works.

https://sourcehut.org


I highly recommend following Drew's work and blog in general. We don't always agree, but I usually come away smarter.


I'm sure Drew can provide a much more cogent answer, but Sourcehut also offers consulting services and sponsors open source developers. I recommend checking out the blog at https://sourcehut.org and seeing what's been going on lately.


Are you running in a colo? Did you bootstrap with a cloud VPS or have you always been on your own hardware?


Running in a colo. The legacy services were on a VPS but what you know as SourceHut today was always on owned hardware.


Cool, thanks


Any indication whether this make it into Ubuntu 20.04?


They are staying with 5.4 but will back port WireGuard.


Yikes, that feels like a mistake. 5.4 has awful issues with intel integrated graphics crashing.


The timing is a little unfortunate. 5.6 will be too late to make it in for 20.04, but 5.5 isn't an LTS kernel and 20.04 is an LTS distribution, so I understand not wanting to use 5.5.


The issue of a rigid release schedule! April means April whether you miss out on something or not.


You still end up having to make compromises no matter what.


One method allows for you to determine which compromises are acceptable, while the other does not.


Not really. Let's consider an alternate flexible version that stretches up to 2-3 months outside of emergencies. Versus a non-flexible version that stretches 0 months outside of emergencies.

If you compare the flexible version targeted at June, and the non-flexible version targeted at August, you'll find that they're making almost the exact same compromises.

Nothing ever stops you from using an earlier version if it's more stable. So both schedules get to chose from multiple versions. Maybe flexible chooses from 6 month old to 3 months in the future code, via delaying. But non-flexible can choose from 9 month old to 0 month old code. It works out the same, and the only difference is how you label it.


I'm kind of curious why Canonical don't simultaneously release a non-LTS release at the same time as the LTS release, to continue the flow of non-LTS releases. It seems like it'd make gearing up for the "LTS + 6mo" non-LTS release easier. Plus, then there'd be a clear non-LTS release to attribute any backported update packages to (that show up before they cut the "LTS + 6mo" non-LTS release.)


They do keep latest kernel PPAs for those who want to experiment. https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.5/


When I was involved we did LTS, and then half the distro team was focused on the LTS .1 release which happens 3 months later. The reality of software is that no amount of internal testing is the same as the amount of bug reports you get after the release. And, for an LTS as it's going to be around for a long time getting fixes in is more important - with a standard release you can choose to leave some that you'll resolve in the next six month release.

The reality of doing LTS, plus LTS .1 mean that the next standard release didn't get as much attention. So in reality the first six month release was often very rough, with a lot of new things in it that would be straightened out over the course of the entire cycle.

As someone mentions later in the thread Canonical started doing HWE releases, which means you can run a later kernel on the LTS. So I'm running 18.04.4 (I think) which has a HWE kernel that is newer than the one that originally shipped - not just security fixes but newer hardware compatibility.


ITT: why Manjaro's nerfed rolling is best model while point releases are a server meme.


Eh, just as Canonical will backport Wireguard, I'd expect them to backport stabilization fixes to address issues like that.


First time I put Ubuntu on a laptop (2004 or so), there was an intel graphics regression so bad, I put windows back on it... This past October was the first time, in a very long time, Linux went on as my main OS. And running a new x570 board with an rx 5700xt has been painful until much later, when support got dramatically better.

Still have an issue with when I come back from the blank screen and login, the login screen won't go away... all the top/side bar shows, but it covers apps... I wind up ctrl+alt=f3 to login to a terminal and reboot. Should really figure out the commands to kill and restart gnome, but it's really been a pain. not sure if KDE might be better.


(sudo) pkill Xorg should likely do the trick, but check e.g. htop (hit F5 to get a tree view) for the name of the root process of the graphics session.


switched over to KDE after writing that... will see if it's still an issue in a day or two.


Have the issues been fixed in 5.5 or 5.6?


Yes, it was fixed in 5.5. There's a few corners of the internet explaining the issue...this one is good enough https://bbs.archlinux.org/viewtopic.php?id=250765 .

I have a newish all intel setup, and 5.4 crashes to the point of insanity. I gave up and installed 5.5 and it went away. It's a serious issue for an LTS release.


Really? Do you have a source for that? (I'm not doubting you, I just haven't heard anything about that, and would be very happy if this is true).



Sorry, I meant the part about backporting wireguard. I did find a phoronix article on it: https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-2...


https://www.phoronix.com/scan.php?page=news_item&px=Ubuntu-2... is one source but I remember seeing it elsewhere.


After ubuntu 20.10 is released 20.04 will probably get a later version of the kernel in the linux-hwe package


It's gonna be 5.4


Nothing forbides you of installing 5.6 on Ubuntu 20.04 or other older version.


any reason for choosing ubuntu over arch for personal computers? Or any rolling release distro for that matter


Ubuntu takes all of about 10 minutes to install a full blown desktop and Just Works™ based on my recent experience. Not to take anything away from the amazing tweakability and customizability of Arch, but the target audience is completely different.


I use the anarchy distro to help me through the installation. It doesn't change the fact you need to understand the underlying installation process, but reduces the installation time to several minutes. There is also arco linux for a more user-friendly interface


The out of the box crowd can always use Manjaro.


Python packages don't install on /usr/local on Arch which makes it easy to wreck your installation, many projects and scripts out there only support Ubuntu and not Arch, etc.

And I say this as an Arch fan who would rather avoid Ubuntu...


interesting, I never looked where the python aur packages were installed


This isn't AUR-specific, it's an issue that comes up with sudo pip install.


pip with sudo is always discouraged. Did you encounter use cases that absolutely requires you to install packages with `sudo pip`?


I don't even encounter use cases that "absolutely require" me to use Arch in the first place.


I assume no then, so just stop misusing python package manager and stop blaming distros for it


You can assume whatever you want, and I'm not blaming anyone, just telling you what reasons people have for using one distro over the other. You're arguing with the wrong person for the wrong reason.


Sorry I came down to arguing like that, but I really honestly wanted to know what problems there were with the scenario you proposed. I am in devops and those are the kind of nuances I need to account for. That I why I asked for examples and felt your answer was condescending. But anyways, I can see from this thread that asking questions is somewhat frowned upon here, when the question itself is of a nature that displeases people, so I'll refrain from participating like this in the future.


virtualenv?


Before I try to respond, you have to ask yourself: what do you imagine is the reason why Ubuntu folks have gone out of their way to patch Python for /usr/local? Surely it's not out of ignorance for virtualenv?


really folks, I know the downvote system is useful, but on a question? I mean, I have to know the answer beforehand so I don't get downvoted? I am really tempted not to use this platform anymore


Downvoters are aggressive here yes. It's a shame, though your question isn't particularly pertinent to this article.

Kind of a double-edged sword that keeps the discussion on target.


It's interesting to see how adding of features correlates with the quality of a software. I haven't had problems with wifi on Linux since 2012 (when I came back to it from OS X after leaving 2006 with huge problems before NetworkManager was introduced, etc.). It was until two or three months ago after a kernel update my laptop started dropping the connection with my 5 Ghz network, only at home. It seems to be a common problem with the Qualcom chip driver which worked nice on this laptop for almost two years.

I like progressiveness a lot and also new features, etc. and I found a workaround (switched off 5Ghz at home), so it's ok and I can always install an older kernel too which is the freedom of free software, I wasn't able to do stuff like that on OS X.


Quality regressions are common when running bleeding-edge software, this is why going with something LTS is recommended for actual productive use. Now if average non-enterprise folks could get a proper, LTS version of Windows 10...


Ah! I should try this!

Arch on a dell xps, fine for two years and now have some issue where network buffer doesn't flush (cursory investigation). Have been fairly sure it's an issue post update. Thanks for the idea :)


Yeah mine is a XPS 13 with Arch on it.


Intel iGPU driver (i915) is broken in 5.4 release, I think there are still big issues with QA.


Looking forward to multipath TCP.

Anyone tried it out yet, how does it look so far?


Looks good to me, but the Google guys are trying very hard to move the web over to UDP based HTTP/3 instead of MPTCP.


Wireguard!! Woot!

I'll be putting this kernel on all production systems, without testing, on Sunday night while everyone is off.


Phoronix loves clickbait titles. This kernel has not even been released, so if you use it on production, you will have to use the testing versions.



Why not Friday night so that everyone can get three full nights of rest before checking to see if anything's on fire?


Because Sunday still means they get three full nights of rest before checking to see if anything's on fire.


Brilliant.


But does it need ODBC?


See you on LinkedIn!


What product are you launching tomorrow morning?


It's a new blockchain-based analytics database which uses SOAP as the wire protocol.


IPO/ICO. Now.


InstaVPN, it mines your traffuc and gives you ICO tokens as a reward!


You're probably joking, but there's several "let us use your connection to ''' mine ''' coin", and as far as I know people are just being used as open proxies for malicious purposes.


[flagged]


Parent got your point and is joking along


Oh heck. I woooshed myself.


Don't be a party pooper. Do it on Friday at 7pm. Just after everyone thinks they have a nice relaxing weekend to look forward to.


> Intel MPX support is completely removed.

A bit sad about this one, Intel should talk to Oracle, ARM and Cambridge Computer Laboratory on how to implement this kind of feature properly.


The Year 2038 stuff is cool. Do we have projections for how much unpatched stuff might still be around in 2038 to cause issues? The benefit of getting the fixes merged 18 years ahead of time is so those fixes proliferate, but I have to imagine there's a nonzero amount that won't/can't upgrade... Curious if there's any idea of how big an issue this could be, even if we solved everything today.


Rc1 was released just 30 minutes ago! I’m compiling away!


There's certainly a lot in there!

However, there's not much that is of interest to me. Given that, plus the mammoth amount of changes in the kernel, I think that I'll be delaying updating to this kernel for a good, long time. Just in case.


crap there's a USB4 already?


Already compiling... :)


I've rolled back to 4.19 as 5.x kernels works very bad on my setup. Any IO makes system unresponsive. Box is i7, integrated video, ram 32Gb, SSD 512Gb running debian testing. run "apt update" and mouse stops, music skips. 4.19 runs flawless under any reasonable load on this system.


What options do you see in /proc/cmdline when you are on 5.x?


And could you add the output of 'lspci -vv' perhaps on pastebin


I'm seeing the same thing (and have also rolled-back).


My machine won't boot when it tried with a version 5 kernel.


This sounds like a lot of impressive work. I wonder if there would be any benefit to the Linux kernel adopting more of agile release cycle. This waterfall like release surely has to increase the risk surface.

Perhaps the mindset of web software and operating system kernels don’t overlap enough for this to be reasonable?


Linux development is hardly waterfall development.

Take io_uring. It started with a very small subset of syscalls and they are now adding new ones at every release at every stage.

Take bpf. It was meant for networking, and it is slowly transforming into a brain new kind of OS.

Even before a feature lands in mainlinel, we're far from waterfall. Usually a developer will issue a RFC thread implementing a feature as quick and dirty, but as a basis to discuss, and to have people start to test and tinker with it to see what can be done with it.


Not sure why getting downvoted. Is this an ignorant question? Or maybe just not fans of the word agile? Oh well. Guess I’ll just have to ask someone who contributes the the kernel in person the next time I meet one.


It's a pretty ignorant question; it could also be taken as flamebait trying to start up an agile vs waterfall argument.

New stable kernel branches are released every 2-3 months after weekly release candidates, with bugfix releases in between new stable branches and a new LTS branch every year or so. If you want anything "more agile" than that for OS kernel development, then you're probably prioritizing agile dogma over the realities of trying to not break the most fundamental component of the operating system.


Agile is juatvas much about small features as time. Imo this kernel comes with too many changes; smaller steps miggt be preferable.


It sounds like you're just not accustomed to paying attention to a project with such broad scope. Most of these changes have fairly narrow impact on just a single driver or subsystem. It would be silly to say "we're going to postpone release of this new GPU driver because we have too many new network drivers in this release", because there's no reason to expect such a policy to actually improve the quality of kernel releases, and it would obviously be detrimental to the pace of feature additions.

Actual far-reaching systemic changes are a pretty small portion of this and most other stable kernel releases.


No one actually agrees what Agile is.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: