His arguments seem really weak. Apparently the AMD team believes that the kernel should accept this code despite it not meeting their standards because:
* Otherwise they'll write angry blog posts about how mean the kernel maintainers are.
* They're a primarily Windows shop.
* They don't have the resources to do it right.
* They don't have the time to do it right.
* They're 'trying' to do the right thing.
* They believe that AMD is a big enough company to get special treatment.
* "But Andrrroid gets special treatment"
* There are lots of people with unreleased hardware that desperately need driver support.
* They're doing in wrong, but the current kernel maintainers are doing it wrong in a different way so it should be okay.
* Graphics drivers are what's preventing the 'year of the Linux desktop'.
> There's only so much time in the day; we'd like to make our code perfect, but we also want to get it out to customers while the hw is still relevant.
And this has always been the damn problem with AMD's drivers, even in their wheelhouse, Windows. They just have a slipshod attitude toward the software end of their core business.
I think you have answered your own question here :)
Driver support on platform -> Games released on platform -> Gamers using the platform.
The upshot is, kernel developers have been complaining about nVidia lots for the route they took. When I install Linux on a reasonably recent desktop I still have to fiddle with boot options and/or blacklist the nouveau driver for some nVidia cards. And now nVidias approach has been completely vindicated.
ATi was legendary for their terrible drivers. The fglrx drivers took literally 5 years to not suck on first-generation radeon cards. By "not suck", I mean that the system would not have kernel panics at least once a day due to them. While they were piling on support for newer drivers in their proprietary stack, they were leaving triaged bugs open for years. When AMD bought them out, they didn't fire every developer they had and replace them with better talent, which is why we're seeing this LKML thread today.
I attempted to use ATi/AMD video cards in systems every time they'd make a big linux announcement, to attempt to support them for their decisions. Every time, I would have new linux converts telling me they were going back to windows due to how unstable their linux systems were, all thanks to these terrible bug-ridden drivers.
I and many other linux zealots now refuse to use ATi/AMD video cards to this day for that reason. It's clear that AMDGPU is only slightly improved from that situation, so I don't anticipate myself changing that opinion any time soon. The open radeon/radeonsi/radeonhd drivers are substantially more stable than the proprietary ones, adhere to kernel standards, and are now starting to reach feature/performance-parity with the proprietary drivers, with barely any help from ATi/AMD.
The Linux kernel will be just fine without ATi's terrible code infecting the tree. Eventually, the vendors start playing ball once they realize the kernel doesn't budge on code quality/standards, as the net80211 debacle proved. If they had just released all the specs without substantial NDA's in the way, ATi could rely on the volunteer kernel hackers to produce a driver that outperformed their windows equivalents, just like how Intel is currently benefitting.
I don't have an AMD card handy but could you tell me more about the open radeon drivers? And if we do have those, why did this discussion start in the first place?
Are you talking about the Intel firmware or the OpenGL mesa drivers? I would say it has improved but the one pain of tearing videos and panning windows makes me really sad.
This discussion (which if you read the dri-devel thread, ended with the ATi guys saying "we're sorry. we'll do better") is happening because AMD's marketing department dictates that they keep as much of the driver closed and developed in-house as possible. They're completely opening the kernel driver, but every layer above that is planned to be closed. There will be a free Mesa/Xorg/Wayland stack developed by volunteers in parallel with the proprietary drivers, for those that actually want to use their computer and not experience kernel panics (at a performance cost, due to not having full documentation of the card). Oh yeah, the proprietary drivers haven't even begun support for Wayland yet, and probably won't for another year or two.
This is a step up from before, mind you; AMD's previous setup under fglrx was a closed-source kernel driver with binary blobs, which would usually require a recompile every time you upgraded your kernel. In addition, because their software team was insane, they would take a snapshot once a year of the mainline kernel/X11, and build against that instead of merging to the latest tree. This means that all distros would have to pin their kernel/Xorg packages to a specific version (again, a year out of date), otherwise you just plain wouldn't have accellerated video. Arch (and other rolling distros) got so fed up with it that they stopped building fglrx in their repository, forcing people to use the open radeon/si/hd drivers. You had to do a rather immense amount of voodoo (add third-party repository, pin xorg/kernel to ancient versions included only in that repository, downgrade packages) just to get them working. If there's an exploit in the kernel or xorg, tough. If you want to have system uptime measured in days/weeks/months, tough. If you want Xinerama (good multi-monitor that plays nice with xrandr) support, tough.
In comparison, this new situation is a lot better. AMD is at least playing nice with the kernel developers, and the development team has fought enough with the marketing department to embrace a semi-open development model, which is where this dri-devel discussion comes in. We also got a pretty good inside view of way that AMD develops drivers/hardware, and some amusing commentary by the dri-devel maintainers (including Intel employees) about how screwed-up AMD's internal culture is to produce these sorts of problems. For instance, their software team cannot communicate with the hardware team, because by the time the drivers are started, the hardware team has moved on to another card, and all the knowledge is pretty much lost/changed/irrelevant. This bears repeating: the hardware and drivers are developed separately and at different times.
The end result of the thread is that they're going to split this 100,000 lines of hardware abstraction into much smaller chunks and merge it piece by piece. Hopefully this means that AMDGPU will be worth using in Kernel 4.10 or 4.11, depending on how long it takes.
I'm still using intel's mesa/kernel drivers in my chromebook. Yes, tearing/panning is gross, but if I had to pick between Intel and AMD's code on my machine, it's a no-contest.
Except for the annoying tearing. I'm considering a RX480 for my next GPU thanks to it.
Example: ubutnu cannot consistently ship a version of network manager that supports reconnecting to a wifi network suspend with out being manually restarted.
I suspect that at a dollar value there is most no point but I don't have access to enough data to prove it.
Your experience is limited, and tainted. wpa-supplicant by itself can accomplish what you desire; NetworkManager (poorly) adds three or four extra layers of abstraction on top of that.
Meanwhile, Wicd and connman do exactly what you need, and don't constantly crash while doing it.
Blame ubuntu for following Red Hat's lead, and using their "solutions" to the problem.
I can understand AMD wanting to unify their codebase. I agree with the parent that they shouldn't worry so much about upstreaming into the kernel. I think their HAL approach is the right engineering one, given the constraints that their team is working under. The team seems to be trying to do the best with what they've been given.
Also, hasn't the graphics division been spun-off into a separate company now (Radeon Graphics Group)? Or is it just a more focused internal division within AMD?
Their paid developers are being outclassed by volunteers who write more solid drivers. Somehow this doesn't sink in.
Well, as far as I understood, that's exactly what the entire discussion is about: their HAL that would allow them to reuse large parts of their well optimized and tested Windows code and integrating that into the kernel.
Later in the dri-devel thread, they even admit that DC isn't even being used in the Windows drivers, because by the time the software devs get to write drivers, the hardware devs they should be hooking up with have already moved to the latest/greatest thing, and can't be bothered with legacy.
AMD's drivers are far from "well-optimized", and their internal developer culture perpetuates this problem.
That being said, the current situation is bleak, and no, they never made a ridiculous amount of money. They've had to go to pretty extreme lengths not to go under already (eg, selling off their headquarters building, then leasing it back, just to scrape together some quick cash). They're very, very cash strapped. So much so that the news they signed a somewhat nebulous licensing deal with a Chinese company to help them make servers caused their share price to jump 52%, just because they'd be getting ~$300m in licensing fees.
How can you lose billions of dollars and still be in business? Are they borrowing the money from someone? Where is the money they don't have coming from?
Each year you spend $100 million on salaries, rent, and materials, and you earn $500 million in sales, leaving you with $400 million in the bank at the end of year 1, $800 million after year 2, $1.6 billion after year 4, etc.
Pretty good, right? Not really.
You're cash flow positive to the tune of $400 million/year, but you're not profitable. You spent $9 billion on that fab; since it'll last for 15 years that means each year costs about $600 million. Or to put it another way, at the end of 15 years you'll have $6 billion in the bank, but you started with $9 billion. Turning $9 billion into $6 billion is the opposite of a profit. And since it's not enough to build a new fab, it's also the opposite of "having a functional business".
Another example might be selling off a profitable business for an injection of cash. The cash helps you pay salaries and keep the lights on, but if that's all you do with it you're now even less profitable than when you started. Or as in AMDs case, you could sell off your headquarters, then lease it back. You get a pile of cash initially, but you then have to pay it all back and more just to keep using your headquarters, and the increased costs will lower profits.
Similarly, if you can convince people to keep investing, you can run keep running a loss but not run out of cash.
(All numbers utterly hypothetical. I'm also simplifying a lot.)
Accounting has several layers:
- Cash flow: this is what you'd look at for your lemonade stand. Actual money comes in and goes out (either "cash cash" or you bank balance, both is "cash" in this regard)
But that layer isn't the most important one for incorporated companies. Yes, running out of cash is a problem. But what usually / actually happens is failure on the "value" level:
- Your company has a value of which cash is only one, usually small, part. Stuff you own, like buildings and patents and brands are another. So is debt your customers have with you. On this level, you can spend money without any effect on the value: If you buy a skyscraper in Manhattan, you may spend %2 billion in cash, but you get a $2 billion building in return. You can also increase the value ("make a profit") without actually getting any money: if you sell the skyscraper for $4 billion on December 20th, 2016, you've made a $2b profit in 2016, even though the money will only arrive in 2017.
The reasoning is that this system results in a more accurate picture of a company's finances.
Looking at the quarterly data, as of Sept 2016, you can see in the Balance Sheet, there is a "Capital Surplus" line showing they have raised $8.2B of equity over the life of the Corp. Now look at the "Retained Earnings", they have lost $7.7B of it. They also have $1.6B in debt.
AMD's existence prevents Intel from having to face prosecution for monopoly status.
Additionally, think about Google, Amazon, MS Azure and all the other big players who push large volume on Intel's higher end SKUs, what do you think they will do if Intel becomes their sole source vendor? I'd predict all 3 will take their current dabbling in ARM servers and amp it up, since being captured by a single vendor is a serious issue for all of them.
Correct Link: https://en.wikipedia.org/wiki/Advanced_Micro_Devices,_Inc._v....
1 - http://www.theverge.com/2014/6/12/5803442/intel-nearly-1-and...
Even that is misleading... More recently, AMD lost $406 million between June to Sept 2016.
The belie, however, is an appalling response really:
I realize you care about code quality and style, but do you care about stable functionality? Would you really merge a bunch of huge cleanups that would potentially break tons of stuff in subtle ways because coding style is that important? I'm done with that myself. I've merged too many half-baked cleanups and new features in the past and ended up spending way more time fixing them than I would have otherwise for relatively little gain. The hw is just too complicated these days. At some point people what support for the hw they have and they want it to work. If code trumps all, then why do we have staging?
Wut? If you want stable functionality, then you follow the kernel conventions, practices and coding styles.
This whole screed equates to: "We've got some code we've put together in our silo, you amateurs who want perfection are wrong and our unified layers and push to get our processors to market mean you can't comment on our code."
The bit about cleanups makes little sense to me, they seem to have some fundamental issues baked into their code that will necessitate these very cleanups...
From a hardware and driver development standpoint, I think that makes a decent amount of sense. Hardware is often weird and quirky, so making sure you handle all the edge cases in two different codebases is a lot harder than doing it once. Certainly too much work for a handful of people who just want to plug their code into Linux.
Stability here is on the driver/hardware interaction, not the kernel/driver boundary, which is what the kernel folks seem to care about, naturally.
I think not. So no, it didn't help.
And it's not even true what AMD says - that to commit to the kernel you need to be a funky part-time hacker or an immense behemoth. There are tons of small hardware shops out there with drivers in the kernel, written by paid developers. True, a graphics card is much more complex than any other appliance, but the characterisation is extremely wrong - the most active part of the kernel is the drivers, and they're not written by "redhat + movie-style hackers", but largely by the companies that make the hardware.
I don't believe he is speaking on behalf of AMD. He is speaking for himself and trying very desperately to shift his blame onto anything he could think of at that time.
He clearly tried to blame the problems caused by his technical decisions on subjective and social reasons.
I don't believe that a AMD PR employee would be so stupid to ever think of saying such nonsense, particularly in such an eggregious manner.
The developer claims he has been working for AMD for 10 years, and yet he still is unable to write a driver, let alone get one accepted. Facts speak for themselves.
I loved that line from his post, in particular. "We have the resources to design a billion-transistor GPU, but not to write a Linux driver for it."
Poor management on AMD's part does not constitute an emergency on the Linux kernel's part.
Linux gaming is a blip when you compare it to (a) the total gaming market, (b) the professional creative market and (c) the big data/deep learning market. All of which AMD is selling graphics cards into. And it's not like Linux is even some massive growth market that justifies some risky investment.
This is not true for the animation/CGI/effects industry where Linux not only looms large but is growing.
I'm also sceptical of Linux having a small share of big data/machine learning since this is adjacent to servers where Linux dominates- would you mind sharing your sources? Nvidia's CUDA is cross-platform, this sets a baseline for whatever response AMD is planning.
the sooner fglrx dies, the better off the planet will be.
Today's 3rd spot on Top500 is occupied by a GPU-based supercomputer.
...which is equipped with NVidia cards.
And... I am not sure AMD has resources to design good GPUs either, the GPU I own, the 380X, is hot, power-hungry and buggy.
The RX480 launched recently to replace it, draws too much power from PCI-E slot and can damage people motherboards and raiser cards/cables.
So I am not entirely sure they have resources for their hardware division either, in fact the recent card launches from them all looked "rushed" in some way, and undertested. Some bugs are haunting their cards for 3 years now, and they don't even bother in putting in the "known issues" list anymore, because they have no idea how to fix it, despite having a 300-page+ thread on their support forums about it with people contributing lots of information.
But that only works when you're in the programming equivalent of a 20 member socialist commune. It doesn't work at scale, and the Linux kernel is the effing king of large open-source projects. At scale, you must have standards to ensure the value added is higher than the cost of maintenance. Who's going to set the standards if not the kernel maintainers? The contributors? That's anarchy, leading to collapse. The wider community, in some democratic fashion? Easiest way to kill momentum and thus kill the project.
Honestly I hear this quite often from big companies and it always is the biggest load of BS ever..
Perhaps it's just internal politics: the company did in fact allocated the resources, the people put in charge failed to do their job, and once their screwup goes public they opt to shift the blame elsewhere.
AMD has the resources. The problem seems to be getting them allocated to this particular issue. I also find your phrasing interesting. Doing it "right" is always my top priority and every compromise from that is considered and balanced. In my experience, not doing it right is almost always more costly but sometime necessary to appease someone. It's interesting that Dave is standing his ground against a multi-billion dollar company for their own long-term good - or at least what he perceives it to be.
I as a full time debian user for the past few years get it. AMD's Linux devs heavily dislike direct rendering manager and want to provide all the cool crap their proprietary driver does to Mesa, cause DRM is legacy tech and they have over 1000 SKUs of cards to support, 2 competitors breathing down their neck on both the GPU, CPU and SOC sides, and they wanna get this shipped.
I hope I've illuminated the state of the situation for you, I do think the right decision was made to not merge this code, but be realistic about AMD's position, they are so far in the hle financially that customers are having chips made to order after paying, meaning its at least 3 months from the time f order until you get your chip. They spend nearly all that time making the silicon, then packaging it so it can go onto your motherboard.
What was that 2001? I can probably dig up an old slashdot and find it...
One recent example is that I was debugging a rogue DMA issue in a driver that uses the linux kpi shims. I wanted to use the Intel DMAR to try to catch the DMA, but because of the use of linux shims, the driver would not work at all with the DMAR. We had to improve the linux kpi shims to do busdma, rather than just use pmap_kextrect() to convert kernel virtual to physical addresses (and this was a hack, because there is a gigantic imedence mismatch between dma_map_single and busdma). And, as soon as we had the driver working with the DMAR, we caught the rogue DMA.
Weeks of my time could have been saved if, instead of writing to Linux, the vendor driver had included a full hal that supported busdma. Instead, they wrote to the linux kpi.
And I fully blame the Linux kernel maintainers for this. They're on top of the world and can dictate to hardware vendors to remove their portability shims. Meanwhile, other OS projects get the dregs.
Companies don't write FreeBSD drivers because there's no ROI for the hardware companies. Making the drivers doesn't help them sell more hardware. Likewise, the LK devs made these decisions because their vision of the kernel doesn't involve HALs, not to spite FreeBSD.
On the other hand, I'd really love to see the Linux team getting the same treatment that Microsoft gets whenever they encourage lack of portability, even where portability would be irrelevant. A whole thread about this, and not once was the word "EEE" uttered.
I'm sure business had nothing to do with the Linux team's decision here, I'm just a little pissed at our double standards ("our" as in the open source/free software community of users and developers). AMD's criticism is not without valid points. Getting drivers to work (let alone in upstream) while the hardware is still relevant is difficult and requires a lot of maintenance, hacking and testing due to things like API changes, undocumented/shifting ad-hoc conventions and so on. Driver development on Linux is very much unlike what you expect with a Windows background; the sheer fact that they managed to convince a largely Windows-only shop to let them do it, with an eye to the future, is amazing.
I'd have expected to see questions like "why did these guys write the whole fsckin thing, all 100,000 lines of it, and only found out it's not upstreamable now". I've been hearing of AMD trying to get their Linux drivers in good shape for a long time now. "We don't do the thing that is most fundamental to your architecture" looks like the kind of problem that could have surfaced within, I don't know, two emails?
Edit: I do think that the LK maintainer was right not to merge this. What bothers me is that everyone's focusing on everything except the examination of the technical issues and what would benefit Linux users.
AMD was told 6 months ago that it wouldn't be merged if they didn't follow certain guidelines. Then they didn't follow the guidelines.
What bothers me here, is the way AMD's management has handled the whole thing.
It seems like management demanded it be a certain way, and the coders were forced to build something they knew would be unmergable. And then management chucked a hissy fit.
There's been really good work here, and management has got upset, rather than follow guidelines, or nVidia's example.
It reflects really badly on the company, which is sad considering the space for AMD left by nVidia's Optimus kafuffle.
There is a market for GPUs here, but it does need to show some professionalism, which they (management) haven't.
It may seem -- and may well be -- a sub-optimal solution, but it's not worse than what we have now, and AMD looks willing to commit to the long-term support of the HAL and the drivers. This is likely something that they want to do not just because they're lazy and would rather spend the money on something else -- it's likely that their management genuinely sees the development and maintenance of an entirely unabstracted set of drivers for Linux as inefficient, especially when you look at how much money they make out of it. And they aren't entirely wrong.
Deucher's remark about the Red Hat silo may look malicious and abrasive, but it has a glimpse of truth. I could make a really cool photo album by taking snapshots of developers and managers who are only familiar with Windows and hear about the challenges involved in writing (and upstreaming) a non-trivial Linux driver.
I'm not saying that the driver should have been merged as it is just because there's no alternative. I do think, however, that it's a little presumptuous to think its architecture is the way it is just because managers are stupid. Maybe a third option, that's not HAL but also addresses the concerns and requirements of AMD exists.
This essentially gives a nod to the way Nvidia's been treating Linux.
Additional shit in mainline increases the maintenance burden. If the code isn't putting upstream maintainability above all else in its implementation, then it's generally not getting merged unless someone wasn't paying attention or some other forces compelled an exception.
Alex's first reply reads like he's willfully ignoring that aspect of the NACK. It's not a coding style issue, it's a HAL issue. Upstream isn't interested in maintaining a HAL and living with the impedance mismatch from day zero. The driver can stay out of tree. This is just about code getting into mainline.
I'm personally very happy to see the gpu subsystem maintainers paying attention and having the maturity to know a fool's errand when they see it.
We're talking about having a constant up-to-date driver on par with Windows for a major GPU card manufacturer. A driver which has caused a large amount of desktop users to return to Windows due to its historical issues.
While I get the reasoning for rejection, the typical OSS rejection attitude is also problematic. I kinda don't see a dialog happening here on how to resolve problems both sides have, just a repeat of posturing which has historically brought us awesome things like binary blob drivers that only work with single version of kernels (yes, I'm looking at you every ARM GPU ever!).
Nvidia manages to do this without any conflict with the kernel maintainers by simply not mainlining the code. If functionality trumps code quality, AMD can release kernel modules.
Perhaps the best current solution for all parties is for AMD publish sources on github and release kernel modules while getting PRs from the community to get the code up to meet the kernel standards. When the time comes, it will be duly mainlined.
> Nvidia has been the single worst company we've ever dealt with.
> - Linus
Kernel devs are antagonizing the only two GPU makers that matters.
Besides the kernel, there has been some flame going between NVIDIA and Wayland devs too.
Open source devs are immature men who do not understand the word compromise.
This had nothing to do with the kernel and everything to do with the lack of optimus support (years later, its still shit).
> Kernel devs are antagonizing the only two GPU makers that matters
Maybe the 2 should ask Intel for some pointers on how to contribute to the kernel the right way.
> Open source devs are immature men who do not understand the word compromise.
I suspect that's part of the reason the kernel is stable, and for that I'm thankful. Not compromising on code quality is something I wish more projects would do, if they had the well-earned political/social capital the Linux kernel has.
Intel is not waiting for them to ask:
> This is something you need to fix, or it'll stay completely painful forever. It's hard work and takes years, but here at Intel we pulled it off. We can upstream everything from a _very_ early stage (can't tell you how early). And we have full marketing approval for that. If you watch the i915 commit stream you can see how our code is chasing updates from the hw engineers debugging things.
(More specific advice follows.)
If AMD manages to reach that level of out-of-the-box working graphics driver, that will definitely reflect positively on their brand of graphics cards, and might give them an advantage over Nvidia.
I suspect that working with the kernel developers and maintainers will also be beneficial to the driver itself. It seems to me that for cooperating you gain valuable feedback on your code from people well-versed in kernel and driver code.
I extend my sympathy to anyone paid to contribute code to Linux.
Most people who are paid to contribute to Linux, including myself, are very happy to do so, and no, it's not a case of Stockholm syndrome.
I keep seeing already cash strapped projects go off the rails because someone decides that they need a gender oriented outreach program, complete with elaborate gatherings and whatsnot.
And when that crash and burns, their excuse for the cash bonfire is that the FOSS world is misogynistic...
In late October they dropped a steaming pile of 100k lines of code with a HAL and are whining it isn't getting merged.
The solution is AMD keeps code out of mainline or obeys kernel code standards.
Now we're up to 100K+
Who's to say its replacement will be any better? If the hygiene isn't there, maybe the quality isn't either.
Kernel developers do not care. They care about good software.
Define "good." Is quality a thing just defined by code style and properness, or is it defined as fitness for a purpose, connection to human use and usefulness, usability or function, or lack of defects to the end user?
Define "software." Is software just code, or does it also include the experience the code generates for the user? Is software only considered in terms of what developers of software are interested in, or does it include what users need and want?
Of course Linux kernel developers are only interested in narrow definitions of those things, and that's surely part of the problem. It's good to think about what "good software" really means, in my opinion.
That being said, I do think Dave probably made the right call, but I also think he may need to compromise on this one. That being said, AMD should strategicly try and get their GPUs supported. Couple that, with merging an OpenCL version of Tensorflow or something and AMD would probably be more valuable than Nvidia. I think both groups could benefit and should probably sit and try to work it out in person.
I think we’d all have some sharp words we’d like to use in that situation even if the better part of our natures might counsel us to keep them to ourselves.
It was unrealistic six months ago as it is now. And yet, I still don't see a constructive debate from Linux (or AMD) side on how to sync the goals. All I see is Linux people posturing about how they don't want to be the second platform and AMD refusing to rebuild the driver just for them.
These kind of attitudes bring us awful situations like Android Linux devices each having a broken fork of Linux kernel to accomodate closed proprietary blobs because neither side is ready to step together.
Certainly, from what I've seen the first few stable releases after every new kernel version are often full of fixes for modesetting-related regressions.
Which ends up in the same situation as older catalyst.
Maybe for you, every kernel upgrade I have done in the past would break my desktop for a day or two until a patch is released. Not too mention oddities and artifacts, I get from time to time. Things that I find unacceptable and embarrassing if it happens on Windows or Mac and I have to learn to live with it on Linux.
I am probably out of luck but I think I would start to see you if FreeBSD would be a good replacement for Linux
Reconfiguration/rebuild on every kernel update is definitely a hassle, and the NVIDIA scripts which try to automate the process still occasionally fail (IME, generally due to conflicts with the system package manager constantly deciding it knows best and trying to displace them with its own versions). But that's essentially an unavoidable consequence of combining out-of-tree drivers Linux's extreme no-stable-driver-ABI-ever design which requires all drivers to be rebuilt directly against the exact current kernel source: you always have to rebuild the bindings on kernel upgrade. The bindings can still be maintained by the distro vendor and its package manager, so they're centrally rebuilt (and fixed if needed), and updated in lock step with the kernel. This works well in Ubuntu, at least, these days. But absent that, the Linux model fundamentally requires local rebuild automation (or full manual rebuilds), which is prone to breakage, like any other "just download and rebuild from scratch against a new version" you've ever done, kernel/driver or otherwise.
Of course, if your distro tracks the latest kernel upgrades in a frequent fashion (e.g. ArchLinux) then it's more problematic, since API changes in the kernel will actually cause compilation failures. But actually making a rebuild happen? Easy. Debian derivatives tend to have stable kernel versions for years and years, so DKMS works quite well here for modules that are out of tree (another example is ZFS on Linux).
Don't you agree that it is very silly to try to force cross-platform code into the very source of a very specific platform?
After all, the people working for that platform need to actually maintain that code.
or open source it, develop along with upstream but don't make it part of upstream.
I'll start with a short answer - AMD wants to have at minimum a reasonably good baseline functionality on linux, out of the box. If that answer doesn't make much sense, please read on for the complex bits.
To provide some historical context, Linus gave a VERY public, VERY brutal rant on NVIDIA and their drivers four years ago. At the time, NVIDIA's closed drivers were an opaque blob which basically reimplemented the entirety of OpenGL. These collided with everything else.
The open drivers (nouveau) were slow and far behind in features. The situation was bad enough that you couldn't necessarily even install a linux distro on a system with NVIDIA chip because the open drivers wouldn't work well enough for X and/or desktop environment to initialise properly, let alone remain up and functional.
In effect, if you wanted to run linux, you were best off without NVIDIA.
Fast forward year and a half. NVIDIA had come out and committed to improving the linux driver situation. They still couldn't open source their current-generation drivers, but they had realised the horrible reputation of their hardware on linux was going to be a persistent PR nightmare, and thus an existential threat to their growing mobile division. A space where they were up against Imagination Technologies. (Don't get me started on ImgTec, please...)
NVIDIA needed to get their hardware and software processes aligned in a way that they would work reliably out of the box on linux, everywhere. Even if the user only needed to keep the freshly installed or upgraded system up long enough for them to download and install the closed drivers, the system really should not break. Incidentally this meant that even the open drivers should be "good enough".
Over the past 3+ years, the situation has improved. NVIDIA has managed to shed their reputation of being completely broken on linux, they have worked closely with kernel folks to get their more recent hardware supported sensibly out of the box and in the process (I believe) they have managed to reduce the code delta between their windows drivers and linux drivers.
It helps that during the same 4 years we have had OpenGL on mobile drive the separation of duties between EGL and GLES (v2+). These changes, with all the refactorings, have also provided a cleaner split on desktop - to the point that it is no longer absolutely necessary to provide full OpenGL implementation. You can, for most parts, expect that EGL just works; that DRM and KMS both just work; and that your highly optimised GLES implementation can happily live on top of these layers.
As a result, your closed driver offering has less to override. Of course it's going to be less unstable!
Disclosure: in my previous job, I helped integrate couple of EGL+GLES driver stacks with Wayland on mobile systems. I learned to hate mobile GPU drivers with a passion.
P.S.: I haven't read enough about Vulkan to know if it improves things or not.
Nvidia hasn't improved performance or contributed features to nouveau.
Their changes are limited to code that is mobile chip-specific and from time to time they contribute some reliability fix that affects non-mobile chips, but only if benefits mobile.
Since Maxwell 2 (9xx+) their chips are designed to be hostile to nouveau, by requiring firmware loaded by the driver to be signed by Nvidia (hardware refuses to load firmware that wasn't signed by them). It means that without Nvidia blessing nouveau for example can't change the fan speed (but still can change clocks! how ridiculous it is?).
Nvidia contibuted signed firmware loading for non-mobile chips (so called SecureBoot), but only because it's also required by mobile. And they still have not released enough firmwares for desktop cards to be usable...
> Nvidia hasn't improved performance or contributed features to nouveau.
Fair point, I didn't realise it could be read that way. Thank you.
Nvidia has contributed enough fixes to make their hardware sort of respond out of the fox. Yes, primarily for devices on mobile space, and occasionally on non-mobile when the same fixes happen to apply. I believe this is directly a result of same hardware designs being used across the board.
I didn't mean NVIDIA were being particularly nice, although I didn't know about the active hostility against nouveau. (IIRC their driver employees are contractually prevented from contributing to nouveau, but I can't find reference. At least there I can understand the reason.)
Depends on your choices. I guess if you throw Intel in there as an option, then maybe. But as of 4 years ago, if you were running a desktop -- Nvidia was pretty much the only acceptable choice if you wanted discrete graphics on Linux of any form, IMO, and driver quality was a massive part of why this is true.
Optimus, though -- that's what was really the debbie-downer for Linux/Nvidia, and Optimus is what inspired the question which lead to Linus's "Fuck you" rant, because Optimus support was so bad, and is still bad.
But the mainstream cards have worked fine for a long time, and, at least for me and everyone I know -- were the only acceptable ones for Linux, until relatively recently, and most certainly as of ~5yrs ago. As opposed to AMD, where I don't think I ever heard of a single instance of fglrx ever being anything but a nightmare.
Perhaps it was strategically important to them to be upstream (marketing point, driver ubiquity)? It may not be the year of Linux on the Desktop, but GNOME and the like are well-placed to grow in usage over the next 5 or so odd years. As others here have said, that other than for average desktop users, I don't see this as being as important to more technical Linux users (including corporate).
Speaking as an ATI user, having the drives in the kernel means that the GPU card just works right out of the box.
No need to mess up with drivers. Just install a distro and you're good to go.
"I brought up the AMD culture because either
one of two things have happened here, a) you've lost sight of what
upstream kernel code looks like, or b) people in AMD aren't listening
to you, and if its the latter case then it is a direct result of the
AMD culture, and so far I'm not willing to believe it's the former
(except maybe CGS - still on the wall whether that was a good idea or
a floodgate warning)."
And, Dave's absolutely key point:
"Code doesn't trump all, I'd have merged DAL if it did. Maintainability
Or you can run CPU only - if you have an eternity to let your training loop run.
It seems to me that they made no threat. They simply endorsed NVidia's product line for anyone interested in running Linux on the desktop.
That is only relevant if any third-party has any interest in taking up the challenge of developing an alternative driver.
If you aren't going to take the trouble to pull out short quotes to respond to, please just delete the message altogether. All our clients have excellent support for threading these days, and we can find the message you're responding to without needing it repeated in full every. single. time.
(This criticism also applies to the message this was responding to: https://news.ycombinator.com/item?id=13136426)
Kernel is not your AMD's kitchen-sink:) They are always fighting for any holes. The maintainers become the maintainers because the community think they has the qualified capabilities. The best way is to argue in technical side: for example, why you should have such complexities, how to control the complexities or something like. No technical response makes non-sense and low down your reputation in the community.
I understand why AMD wants to have one code base, but the kernel needs to have one consistent code base and style and can't afford to do otherwise. There's nothing stopping AMD distributing the code, it just won't be upstreamed.
The net result of attitudes like these is that I can't have a working multi-monitor desktop Linux setup, because everything is hopelessly broken and requires many hours of tinkering just to kind-of sometimes work.
I find it sad.
 Apple has been excelling at this recently, too, with the forced move to USB-C and dropping of the headphone jack — to "make progress" at my expense, where I am held hostage in the dongle-world as companies "move to new standards".
I take issue with the held hostage hyperbole, though. Yeah, you're inconvenienced by the OS/hardware options you have available. Your freedom of movement is not at stake. Your life is not at risk. I know it's fashionable to take poetic license and raise the stakes for everyday inconveniences, but let's keep things in perspective. Nothing's preventing you from using your old phone, or switching to another platform, or buying a couple of inexpensive dongles (very inexpensive relative to the hardware you've chosen to buy into).
KDE usually screws up when I plug a monitor into my notebook, but Xfce doesn't.
It's not a kernel issue.
KDE has a guy, he's singlehandedly awesome, yet of course a bit strange when it comes to interacting with mortals, who works on "these" things: https://blog.martin-graesslin.com/blog/2016/09/to-eglstream-...
And see, that it's NVIDIA that's currently holding back the year of the multi-monitor Linux desktop, because they're not supporting Wayland, like the others (Intel, AMD, Android).
Sadly the latter people are now in control of "X" development, or perhaps i should say post-X development, thanks to Wayland.
Intel got their sh*t together, why can't AMD ?
The KMS/dri driver itself certainly has _major_ unsolved issues since basically forever. The KMS driver for broadwell graphics was trashing the screen with massive horizontal flicker until kernel 4.8. That's not a long time ago. It's still not solved BTW, just got less frequent. Random pipeline stalls happen on an hourly basis. Tons of graphical glitches and random performance issues with the "glamour" accell path. I've stopped reporting issues at their tracker, as they just get ignored.
The performance of the KMS driver is also inferior to their existing xorg driver in a number of very important scenarios (Xrender is particularly affected).
Sure, I do have vulkan drivers as first class, but my screen flickers, I get graphical corruption and the driver hangs the entire system with certain shaders. I can see inkscape repainting beneath my eyes like it's '84. Wow. And I'm trying that with 4.9 rc8.
I've been using laptops with integrated graphics for almost 10 years. The moment you can get the intel driver to work half-decently, they're already rewriting it. I wish I was kidding.
Intel does have the money. They're actually doing worse in my mind.
> The reason the toplevel maintainer (me) doesn't work for Intel or AMD or any vendors, is that I can say NO when your maintainers can't or won't say it.
And telling them to sit in a corner and really THINK about what you've done, young man:
> I'd like some serious introspection on your team's part on how you got into this situation and how even if I was feeling like merging this (which I'm not) how you'd actually deal with being part of the Linux
kernel and not hiding in nicely framed orgchart silo behind a HAL. I honestly don't think the code is Linux worthy code
All pretty mild, really, but I'm not the one who got the email telling me my months of work was not Linux-quality and would not be merged.
(I am giggling at the idea of "Linux quality" being held up as an ideal)
See, the thing is that they were told previously -- back in February? March? -- that this wasn't gonna happen. Instead of trying to work out a better way forward, AMD effectively ignored that, went back to work, and just now came back and said "here's 90k LOC, please merge".
Dave replied -- rightfully, in my opinion -- "no".
Certainly pretty mild considering the source, though I understand why the second quote would stir things up a bit, especially making it into an "Us vs Them" situation.
Much more constructive interactions, I think.
Could someone here fill in the details for the uninitiated? Do the kernel devs feel a pressure for the kernel to stay relevant in the Android world?
Which is clearly the reason why there aren't actually 14,000 different Android kernels and 2,000,000 different kernels for your $50 router running around. Because they all get involved, of course, from being so buddy buddy.
Naturally, this attitude costs Linux developers nothing at all for the most part, and keeps their lives easy (no legal shit, no hard times) -- while absolutely hurting users who can't get the source to their devices, and completely eviscerating the social/political capital of a license like the GPL, and all the people who use it.
When literally the biggest GPL success story can't get off their ass and prosecute license violators, who actually will care when you try to use it as a tool, one which actually has teeth to back it up? Why use a license if its major champion treats it like a complete piece of trash, a worthless bargaining chip, a chip which is only possible because of an effectively unique, lighning-in-a-bottle position?
If the kernel developers just don't give a shit about proprietary vendor kernel forks (I really, really don't think they really do, at least nobody with actual meaningful, large scale influence cares at all), and want to force involvement by "getting them in the cycle" and being buddy-buddy and just not-giving-a-fuck about people outside the source tree, making sure constant churn is how people have to keep up -- they should just use the BSD license.
It seems to have worked out pretty OK for LLVM, and this is basically their operating philosophy, too. At least then, maybe another project can arise that actually takes its own license terms halfway seriously...
Android is important, because it's successful, despite how crude it is. And the Kernel maintainers - basically as a modern-day abstracted version of self-preservation - want to see the mainline kernel in/on Android devices, and so they are thinking about __why__ Android had to fork. (Had to? Or were they just that stubborn and time-limited?)
Proposed driver code: https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/d...
The file with the atomic functions: https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/d...
I'm glad that AMD is still not giving up on Linux!
(on litigation against companies that contribute)
> Anyone in the company that pushed to use Linux is now seen as "wrong" and instantly is pissed off that external people just messed up their employment future.
> - Anyone in the company that resisted the use of Linux (possibly in ways that caused the code not to be released over the objection of the previously mentioned people) are vindicated in their opinion of those "hippy" programmers who develop Linux.
> Now, even if, after many years of work on your part, you do get that code, what is the end result? You have made an enemy for life from the people who brought Linux into the company, you have pissed off the people who didn't like Linux in the first place as they were right and yet you "defeated" them. Both of those groups of people will now work together to never use Linux again as they don't want to go through that hell again, no matter what.
(source and context: https://lists.linuxfoundation.org/pipermail/ksummit-discuss/... )
I think this is the worst part of this SNAFU - Dave has just proven every naysayer in AMD right that Linux is not worth supporting. He cut away feet from the team that has supported and fought for an equivalent opensource driver support on Linux which would be released in lockstep with Windows releases. He has also proven right everyone on nVidia which has talked against opensourcing their drivers and keeping them as a horrendous binary blob. He has also pissed off his greatest ally inside AMD.
All with a single "no". Not "We can't accept this, let's talk about how to make us both happy". With that he gave new ammunition to every manager at every large hardware corporation that's fighting against opensourcing their drivers and made every Linux supporting team lead in such corporation less likely to push opensource world forward. It seems we're stuck in shitty Android kernel forks with shitty GPU binary blobs in near future, with only Windows as a proper contender for good 3D performance.
EDIT: To be clear, I'm not blaming Dave for refusing the patch for tech cases. I AM blaming him for refusing it so flatly and not actively working more with AMD to get the situation fixed. This is not a minor thing - having stable AMD drivers in kernel would really push Linux desktop forward, make sure Linux is compatible with several Macs among other machines and put pressure on nVidia to opensource theirs. But you don't get there by belittling contributors and cultivating "us vs. them" mentality.
He's been giving the "let's talk about how to make us both happy" answers for months and months. That's not a single "no." That's not flat. What that is, is a lot of design review and "soft nos." (I do still agree that it's ammunition. People who are looking for a reason not to contribute to the kernel will have no problem using this email out of context either.)
Other than eventually "taking one for the team," caving, and merging what AMD wants, what level of support are you talking about here? As far as I can tell, your post takes the stance that a "hard no" to any patchset is going too far.
He gave them that answer 6 months ago, and they ignored it. This was not an abrupt rejection out of nowhere.
It was unreasonable then as it is now.
If I tell you "You need to rewrite your product in Brainfuck or I'll kick you out in 6 months.", the fact that I told you that doesn't make it any less insane.
To merge a driver into the kernel is to accept responsibility for maintaining it, which they recognize they are not prepared or even interested in doing.
Why is that so unreasonable?
If I recall correctly, Google tried the same thing some time ago and it didn't go well for them either.
Linux is no longer in the weak position it once was. It has won the war for server market share (at least to a degree where using it has nothing to do with being a "hippy" programmer), and it has lost the desktop war so thoroughly that at this point it really doesn't matter anymore.
In this position, Linux can live without AMD drivers, probably more so than AMD can.
Now if Linux only wants to be a server OS then that is fine but if it wants to try and grow then it does need either AMD or Nvidia. I think that for now Nvidia will continue to do what they have always done(binary blob) but if AMD gains market share by being more open then Nvidia will respond in turn.
Anyone that is serious about graphics programming is on Windows and Mac.
I learned the hard way that the way that FOSS religion and graphis programming industry don't mix (lost a few job opportunities due to that).
As for the movie industry, they basically have heavily customised GNU/Linux workstations, using the workflows they ported from SGI workstations into GNU/Linux.
The sooner they realize that, the faster they can actually work together with Linux kernel maintainers.
FWIW, a proprietary "binary blob" driver has already been available for a while that was built out-of-tree, but it constantly has to be updated for the latest kernel and doesn't release a version for every single kernel, so it's very difficult to use it unless you either 1) specifically use the exact version the driver requires (which is ridiculous-you should be able to dictate which kernel version you want) or 2) use a well-known and well-supported repo (which limits your choice as well). Ultimately, the best way to get driver support into the kernel is to go through the standard kernel channels.
And also FWIW, the kernel does care about how drivers are written-even if the driver is out-of-tree, its possible shittiness reflects back (totally unfairly, I know) on the kernel. By having a gatekeeper for device drivers, they can ensure that the kernel is as stable as possible for as many users as possible, and that's a laudable end goal. It's also orthogonal to wide hardware support, as all that was required in this situation was for AMD to adhere to the guidelines properly, but they didn't do so, so they have no right to be upset that their code got rejected.
There is no excuse for AMD. The response reads like a total whine to me.
It's not the kernel maintainers fault that your company does not give you proper resources or approach Linux in a correct way.
If you had proper opensource drivers I'm sure the "in" crowd hackers you speak of would take it from there...
I've personally had horrible experiences with AMD drivers on both Windows and Linux. I had a GPU that worked wellfor my purposes, even ran games well enough for me, but you dropped driver support for it, so I was left with no option but to buy another card...
An unmaintainable mess with lots of users is not a victory unless users are paying you for shitty work.
And there was a function with the bane "validate" that didn't, well, validate. In a bit of code that rung alarm bells.
> And by following that pattern (and again you can store whatever you want in your own private dc_surface_state) it makes it really easy for others to quickly check a few things in your driver, and I wouldn't have made the mistake of not realizing that you do validate the state in atomic_check.
so what? you still dont seem to grasp the concept of plugins. Plugin = the 3rd party developer can do whatever he wants and it doesnt hurt the core product.
The point that has been made over and over in these threads is that if you want to develop Linux code then you can't just stick a development team to work in complete isolation from everyone else in the Linux development community and expect to be able submit grand unifying architectures you designed to make it easier for your company but that make it harder for everyone else.
If you want to do this, then you really need to work within this particular community to effect change. For instance, there apparently are some standard idioms that have emerged from within the atomic code. The way AMD have done things is different enough to confuse the core maintainer, and he has reasonably said that he doesn't want to accept a commit like this. Hence his comments about the HAL and a massive middleware layer.
The bottom line is: AMD want to merge this into the kernel's main tree. But if they want to do this, they have to get through the maintainers, and the maintainers have to consider the whole picture and notjust your team, no matter how hard they have worked on their code.
The AMD team seem to have worked in a silo, not released to the CI servers and from what I'm reading broke stuff that others then fixed. So when the AMD guys did a big release all at once like this, then they got told - politely! - that their code wasn't up to scratch.
> The point that has been made over and over in these threads is that if you want to develop Linux code then you can't just stick a development team to work in complete isolation from everyone else
that's a problem for Linux
> The bottom line is: AMD want to merge this into the kernel's main tree
No, AMD wants to have working AMD drivers on Linux. It's more than likely that they were told to do it this way and this way sucks. A lot.
But hell, maybe Linux devs think that Linux is so important now that they can pressure AMD devs into doing whatever they want from them. Maybe it works, maybe it wont.
"I literally don't even know what I'm talking about at all, I'll admit it -- but definitely, trust me and my immediate assessment of the situation, it's accurate"
>> The bottom line is: AMD want to merge this into the kernel's main tree
> No, AMD wants to have working AMD drivers on Linux.
Are you even reading the words you type? AMD _already has working drivers_. They're right there. You can go look at the code right now, 'git pull' it and install it on your machine. What's stopping you? Your inability to read, apparently?
No, it is literally -- by the definition of the above email -- the case that they want to merge already existing code upstream, into the kernel, and have upstream share the maintenance burden. That's part of the deal -- if AMD code goes upstream, everyone helps maintain it, and in turn, they help maintain everyone elses.
But it turns out, upstream doesn't want their code in its current state. Of course, they don't have to merge it upstream -- they just want to. They don't even have to merge it upstream now or "soon", but they would have liked that. They could easily ship the AMDGPU driver as an external module using DKMS or something, just as things like ZFS-on-Linux do, and start ironing out problems for upstreamability while actually shipping drivers to people.
They have drivers. The drivers work already, in fact. Having them upstream is totally different. Try reading the article and doing some digging through this thread to understand the context.
> But hell, maybe Linux devs think that Linux is so important now that they can pressure AMD devs into doing whatever they want from them. Maybe it works, maybe it wont.
You realize that given AMD's history -- it's entirely possible AMD needs Linux more than Linux needs AMD, right? Linux doesn't need to win the desktop or win over AMD, it thrives in its own market and has been surviving perfectly well without them.
Windows presents an API to drivers that's very painful for them to change. This is a problem that Linux can avoid by not treating drivers as black boxes.
Typical AMD bullshit. They do have resources for developing garbage like 'Gaming Evolved App' just to abandon it 6 months later tho.
Not that this is the crux of the problem, just curious.
After reading the arguments, I'm kind of on AMD's side. I get what Dave wants, but it seems extremely idealistic.
AMD is perfectly within their rights and abilities to ship an out-of-tree driver. As I understand it, DKMS exists to make that use case easier for end users. The popular consumer-facing distros would probably make it easy as dirt to install too.
The difference between that and upstreaming into the kernel is primarily shifting some of the the maintenance burden onto the kernel developers. That comes with the condition of adding stakeholders to the driver design, not just using them as a code dumping ground.
"Totalitarian" seems a bit over the top.
> near-totalitarian level of power.
I don't think you meant to use that word. Linus and the kernel contributors together own the copyrights on the kernel. It's not totalitarian for them to dictate its future. It's their creative work. It's downright egalitarian that they let you use the kernel however you like, provided that you give the source in turn to whoever you hand it to.
I'm not sure if that's a good thing or a bad thing, to be honest, it's just... particularly interesting about the way Linux works.
Impartial expert judges are usually considered a good thing. Note that Dave Airlie is not acting on a whim here; the Linux development community has discussed HALs for probably 20 years and has come to the conclusion that they are a net negative. For better or worse, the Linux development process does not care about democracy or market share or vendor relations. It's a somewhat unusual way to build software, but the result pretty much speaks for itself.
Which are running a severely modified kernel because the process of going through mainline to get it mobile ready would be far too painful.
The questionable part is CPU companies not pushing for upstream integration, so you get e.g. Qualcomm Linux. In fact, the process is not pushing, it is pulling. (some people have started working on it, still long way off)
It can be done, as shown by efforts by TI, ARM and many more...
The funny part is on Windows Phone, Microsoft learned from the kernel development process and made Samsung, LG, HTC, et all upstream their drivers, whereas Google in the same position with the same vendors has not done this and thus Sony is the only OEM pushing drivers upstream.
The kernels used by actual Allwinner-based Android devices have almost nothing in common with the upstream support. They use a completely different mechanism for describing the hardware configuration, a completely different set of drivers, and are based on a kernel that predates upstream support.
I would not expect much from Allwinner directly, but at this point there is a sizable ecosystem and mainline support for most of their chips, which can't be said for most other ARM vendors. Another aspect of this is Allwinner sells most of their chips thru multiple business units, with the silicon being exactly the same, just what is silkscreened on the top of the package being different.
1 - http://www.cnx-software.com/2015/11/10/allwinner-a64-datashe... and https://forum.armbian.com/index.php/topic/1917-armbian-runni...
2 - http://www.cnx-software.com/2016/08/17/allwinner-h5-is-a-qua...
3 - https://forum.armbian.com/index.php/topic/2099-crypto-engine...
They want it in kernel, however, and that means they have to play by the in kernel rules.
A year ago AMD said they would just use what is currently in the kernel with a few minor modifications to get their driver working in userspace, apparently they did not follow through on that though
> Given the choice between maintaining Linus' trust that I won't merge 100,000 lines of abstracted HAL code and merging 100,000 lines of abstracted HAL code I'll give you one guess where my loyalties lie.
He is acting in a way he thinks would keep Linus' trust. Also, usually the maintainers have far superior technical knowledge of their areas to Linus. I mean, Linus is just one guy and each maintainer have their own specialty.
> After reading the arguments, I'm kind of on AMD's side.
And that's why stuff like this doesn't get put to a vote.
I just want a system that will work properly without them. Not 60FPS on Ultra settings in <game> work, I mean boot and function as a normal user's desktop.
Of course companies had an agenda to improve it to the point they wouldn't need to keep paying Sun, SGI, HP, IBM, Compaq, Unisys,...