
Safe and Secure Drivers in High-Level Languages [video] - DyslexicAtheist
https://media.ccc.de/v/35c3-9670-safe_and_secure_drivers_in_high-level_languages
======
saagarjha
Personally, I'd call this writing "Safer and More Secure Drivers in High-Level
Languages", because there are still unsafe operations going on (for DMA,
etc.): [https://github.com/ixy-
languages/ixy.swift/search?q=Unsafe](https://github.com/ixy-
languages/ixy.swift/search?q=Unsafe)

While it's great that you get improved safety (and often nicer and easier to
reason about code) by using something other than C, you can still have memory
safety issues from cavalier or incorrect usage of unsafe APIs, since they
undermine the guarantees the language provides with regards to correctness.

Also, unrelated: does anyone actually have the slides (you know, the
presentation file with text in it, rather than the mp4 that CCC is offering
me) for this presentation? It's really annoying to scrub through a video to
find stuff on a slow internet connection :(

~~~
emmericp
I've added them to the git repo: [https://github.com/ixy-languages/ixy-
languages/blob/master/s...](https://github.com/ixy-languages/ixy-
languages/blob/master/slides/35C3.pdf)

~~~
edwintorok
Direct link for download (GitHub won't render the entire PDF by default):
[https://github.com/ixy-languages/ixy-
languages/raw/master/sl...](https://github.com/ixy-languages/ixy-
languages/raw/master/slides/35C3.pdf)

------
duneroadrunner
Another option is to use a memory safe subset of C++ [1]. It should be less
work to migrate existing C drivers as (reasonable) C code maps directly into
the safe C++ subset. And the migration can be done incrementally with
corresponding incremental safety benefits.

[1] shameless plug:
[https://github.com/duneroadrunner/SaferCPlusPlus](https://github.com/duneroadrunner/SaferCPlusPlus)

~~~
pjmlp
Only if the team plays along regarding static analysers and compiler warnings
as errors.

~~~
newnewpdro
Wouldn't you simply enforce this with automation if you were making a serious
effort? It's already quite common for github PR's to require myriad CI tests
to pass before anything can be merged... those can incorporate static analysis
and warnings as errors.

~~~
pjmlp
Still needs to have the buy-in from the team.

Try to be that clever guy putting such gates into place without having the
team be on the same wave length.

Github is a bubble, there are tons of software projects out there, using a
myriad of build infrastructures, or even just doing plain old IDE builds (yes
I know, but it is as it is).

~~~
newnewpdro
Convincing your team to switch languages is infinitely more difficult than
adding infrastructure to enforce good hygiene. So I don't really see what your
point is, it's moot.

~~~
pjmlp
This whole thread started about enforcing behaviors that are largely ignored
by enterprise developers outside HN bubble.

In no point there was a mention of switching languages.

~~~
newnewpdro
Ah, this had left an lingering impression in my mind:

"Why do that, if there are languages that do it by design?"

But I see it wasn't your comment, my bad :)

------
emmericp
Code on GitHub: [https://github.com/ixy-languages/ixy-
languages/](https://github.com/ixy-languages/ixy-languages/)

Discussion about the C version on GitHub in 2017:
[https://news.ycombinator.com/item?id=16014307](https://news.ycombinator.com/item?id=16014307)

~~~
guerby
Great work! Do you have by any chance the C driver latency measurements? It
would be nice to have them on the same graph as those for Rust and other
languages.

~~~
emmericp
Would look the same as rust; I haven't run the full measurement with all the
packets. But sampling 1k packets/second yields the same result for C and Rust.

You can't really get a faster speed than Rust here; the only minor thing that
could be improved is the worst-case latency by isolating the core that the
driver is running on (isolcpus kernel option) to prevent other random
interrupts or scheduling weirdness. But that optimization is the same for all
languages and should get rid of the (tiny) long tail.

~~~
guerby
Thanks!

------
mpweiher
Great direction! One of the lesser known but really nifty bits of NeXTStep was
DriverKit:

[http://www.nextcomputers.org/NeXTfiles/Software/OPENSTEP/Dev...](http://www.nextcomputers.org/NeXTfiles/Software/OPENSTEP/Developer/DriverKit/DriverKit.pdf)

Objective-C was a great match for driver development, devices tended to have a
very natural OO flavour to them and naturally sorted into classes.

Putting things in user-space seemed a natural extension, that sadly didn't
happen at the time even though Mach had the hooks for it (we never got user-
level pagers either, which would have rocked together with garbage collected
languages). There certainly didn't seem to be a good reason why I had to
reboot the machine and wait for fsck when there was a minor bug in the little
driver I was writing to talk to an EISA printer controller that had nothing to
do with the basic functioning of the system...

(Why would a printer controller be on an EISA controller, you ask? It directly
drove a Canon Laser Copier, so, yes!)

Oh, and not surprised by the abysmal Swift performance. Alas, Apple's
marketing of Swift as a "high-performance" language has been very successful
despite all the crushing evidence to the contrary.

~~~
saagarjha
> Objective-C was a great match for driver development, devices tended to have
> a very natural OO flavour to them and naturally sorted into classes.

How do you feel about the current state of driver development on macOS, with
Objective-C basically being replaced with Embedded C++ with partial
reflection?

~~~
mpweiher
It was part appeasement of the "never Objective-C" crowd (hello
CoreFoundation, hello CocoaJava, hello Swift) and part the exact sentiment
discussed here, that you cannot possibly do kernel development in a higher
level language.

What I heard (quite some time ago) is that this move is now seen as a mistake.

~~~
mrpippy
ObjC was definitely seen as a dead-end at the time: it would either be
replaced with Java, or maybe Mac developers would just stick with Carbon and
C/C++. Either way, all driver development (on classic Mac, Windows, Unix) was
in C, and C++ would be much more familiar than the “weird obsolete square-
brackets NeXT language”

I made a post a few years ago discussing the issue:

[https://news.ycombinator.com/item?id=10006411](https://news.ycombinator.com/item?id=10006411)

------
pjmlp
As addition to the talk,

Android Things userspace drivers are done in Java, and since Treble, it is
also possible to write Android drivers in Java.

MicroEJ and Windows IoT Core also allow for such capabilities.

Loved the talk.

------
henrikeh
It is disappointing that Ada seems to be completely shunned from all
discussions of safe high-level languages despite having both a track record
(several successful safety critical systems) and a unique feature set
(multiple compilers, provably correct subset, ranged subtypes).

Is it purely a matter of what is well-branded?

~~~
agumonkey
Ada also suffered from bad roots and bad timing. We're off a decade of simple
dynamic languages. Ada seems like an old and immense ruin. Maybe there will be
a julia/rust equivalent for Ada.

~~~
henrikeh
And it is very true -- Ada never truly gained any foothold in anything but
situations where it truly delivered upon a requirement.

If anything, I can take solace in the fact that it is very much alive despite
the exaggerated rumors of its death. I'd encourage everyone to try it out and
steal all the ideas.

------
Santosh83
> Our drivers in Rust, C#, go, and Swift are completely finished, tuned for
> performance, evaluated, and benchmarked. And all of them except for Swift
> are about 80-90% as fast as our user space C driver and 6-10 times faster
> than the kernel C driver.

Interesting. What is the reason for higher performance of user space C driver
(and the other user space drivers for that matter) when compared to the kernel
C driver? Will this hold for all driver types or is this a rather uncommon
property of this particular kind of driver?

~~~
usaphp
> A main driver of performance for network drivers is sending/receiving
> packets in batches from/to the NIC. Ixy can already achieve a high
> performance with relatively low batch sizes of 32-64 because it is a full
> user space driver. Other user space packet processing frameworks like netmap
> that rely on a kernel driver need larger batch sizes of 512 and above to
> amortize the larger overhead of communicating with the driver in the kernel.

~~~
mpweiher
Interesting!

As our devices have gotten faster, the user-space/kernel boundary is becoming
more and more of an issue. I was shocked when my supposedly super-fast MacBook
Pro SSD (2+GB/s) was only giving me around 250MB/s.

Turned out mkfile(8), which I was using without thinking much about it, is
only using 512 byte buffers...

[https://blog.metaobject.com/2017/02/mkfile8-is-severely-
sysc...](https://blog.metaobject.com/2017/02/mkfile8-is-severely-syscall-
limited-on.html)

~~~
zozbot123
The main thing affecting performance at the userspace/kernel boundary these
days is Spectre and Meltdown mitigations, FWIW...

~~~
vlovich123
No, userspace/kernel transitions are always and will always be slow. Every
time it happens you've got to do a context switch which is super-expensive +
cache unfriendly. You also pay a penalty keeping the kernel mapped at all
times in terms of more pressure on the TLB but due to Spectre and Meltdown
mitigations the kernel has actually been unmapped hurting the performance of
switching into kernel further although this will be undone eventually.

------
emmericp
Previous discussions about our Rust and Go drivers:

[https://news.ycombinator.com/item?id=18405515](https://news.ycombinator.com/item?id=18405515)

[https://news.ycombinator.com/item?id=18399389](https://news.ycombinator.com/item?id=18399389)

------
heyjudy
What would make device drivers safer:

\- Microkernel OSes - one driver failing is okay, because it runs in mostly in
user-space. The big gotcha in microkernel design is transactions that touch
multiple components. Some sort of across-driver transaction API is needed
(start, commit, rollback) in order to undo changes across several userspace
subsystems.

\- A standard language (like the talk suggests) and shipped as portable
bytecode to run on a VM or compile to native, so that drivers are portable and
runnable without knowing the architecture.

\- Devices themselves containing OS-signed drivers rather than each OS having
a kitchen-sink installation of all drivers. Each bus would have an
interrogation call to fetch the driver.

------
fulafel
Is there a plan for letting applications using the normal networking APIs see
this, or is it currently just for "process raw ethernet packets in userspace"
kind of apps?

(The latter is a great thing to have built, of course, just thinking aloud
about how this might replace existing drivers)

~~~
emmericp
Yeah, I'd love to port a Go TCP stack like
[https://github.com/google/netstack](https://github.com/google/netstack) to
use our driver to build a microkernel-style service offering network
connectivity. Running taps
([https://datatracker.ietf.org/wg/taps/about/](https://datatracker.ietf.org/wg/taps/about/))
on top of that would be ideal for a modern setup. But that's a lot of work...

------
pjmlp
Swift loosing heavily due to reference counting GC instead of a tracing GC.

Now I have a talk to point out to those that always trump that referencing
counting GC are so much better than tracing ones.

~~~
tom_mellior
Their notion of "better" may differ from yours. Do you actually know people
who claim that reference counting is _faster_ than a marking GC?

Reference counting can be better in terms of ease of implementation, cross-
language interoperability, and by reclaiming memory immediately when the last
reference to it disappears.

~~~
pjmlp
Reference counting is only better in terms of ease of implementation.

Hence why it is usually one of the earliest chapters in any CS book about GC
algorithms.

Reclaiming memory immediatly only works in simple data structures. Naive
reference counting impletations have similar stop-the-world effects when
releasing relatively big data structures, which can even lead to stack
overflows, if the destructor calls happen to be nested.

In case of Objective-C and Swift, Objective-C tracing GC project failed due to
the underlying C semantics, thus they took the 2nd best approach of enforcing
Cocoa retain/release patterns via the compiler, which only applies to a small
set of Objective-C data types.

Swift naturally had to support the same memory management model, as means to
keep compatibility with the Objective-C runtime.

~~~
vlovich123
Eh. In practice reference counting solves 90% of the problem and keeps memory
usage low at all times. It's part of why Java programs are so hard to tune for
performance & end up eating so much RAM. If you don't believe, compare Android
& iOS where even though Android has enormous financial & competitive pressure
to improve the performance they still end up requiring 2x the amount of RAM
that iOS does which is partially driven by the choice of Java.

~~~
pjmlp
People keep referring to Java to talk bad about tracing GCs.

The fact is that Java isn't the only game in town, and all GC enabled system
programming languages do offer multiple ways to manage memory.

Value types, traced GC memory references, global memory allocation, stack
values, RAII, untraced memory references in unsafe code blocks.

Not every OOP language is Java, nor every GC is Java's GC.

Additionally, not every Java GC is like OpenJDK GC's, there are plenty to
chose from, including soft real time ones for embedded deployments.

As for Android, it is a fork still catching up with what Java toolchains like
PTC/Aonix are capable of, all because Google decided it could do better while
screwing Sun in the process.

~~~
zozbot123
> and all GC enabled system programming languages do offer multiple ways to
> manage memory.

Since "GC-enabled system programming languages" is an oxymoron, a claim about
what such languages may or may not include is just not very useful. But it's
definitely the case that properly combining, e.g. "traced GC memory
references" and RAII including deterministic deallocation for resources is
still a matter of ongoing research, e.g.
[https://arxiv.org/abs/1803.02796](https://arxiv.org/abs/1803.02796) That may
or may not pan out in the future, as may other things such as _pluggable_ ,
lightweight GC for a subset of memory objects, etc., but let's stop putting
lipstick on the pig that is obligate tracing GC.

~~~
pjmlp
An oxymoron only in the minds of anti-tracing GC hate crowd.

\- Mesa/Cedar at Xerox PARC

\- Algol 68 at UK Navy computing center

\- Modula-2+ at Olivetti DEC

\- Modula-3 at Olivetti DEC/Compaq/HP and University of Washington

\- Oberon, Oberon-2, Active Oberon, Oberon-07 at ETHZ

\- Oberon-07 at Astrobe

\- Component Pascal at Oberon microsystems AG

\- Sing#, Dafny and System C# (M#) at Microsoft Research

\- Java when running AOT compiled on bare metal embedded systems like PTC Perc
and Aicas Jamaica

\- D by Digital Mars

\- Go at Google (Fuchsia) and MIT (Biscuit)

Lets stop pretending reference counting is the best of all GC algorithms, in
spite the fact that is quite basic and does not scale in modern multi-core
NUMA architectures.

~~~
vlovich123
No one is saying that reference counting is the best. What I am saying is that
reference counting tends to offer a good set of advantages (predictable memory
performance, no hogging of memory, no pauses) for a minimal cost (more
frequent GC, more overhead to store reference counts).

The comment about "does not scale in multi-core NUMA" only applies if you have
objects that are shared between threads because otherwise there's no atomics
going on. For example, Rust has a generic ref-count mechanism that
automatically uses atomic operations for ref-counts when an object might be
shared between threads but otherwise does simple arithmetic. Non-atomic
refcounts are most likely also going to be faster than any other global GC
algorithm. Other languages require explicit differences but are still able to
offer the same thing.

The fact of the matter is that the majority of objects do not require
expensive GC of any kind & can live on the stack or have explicit ownership
guarantees. Choosing defaults where everything might be shared is not a good
default for systems languages as it pessimizes memory usage & CPU performance
to a drastic degree.

That being said, GC does have its place in all manner of applications and has
other advantages like making developers more productive which isn't a bad
thing but these are domain-specific decisions. There are plenty of techniques
- reference counting, memory pools, static memory allocation, various GC
algorithms, etc, etc. Each has tradeoffs & every single GC system I've
encountered means variable latency/stop-the-world and greedy memory usage
(optimized for the 1 application). That's valid in some domains but certainly
isn't desirable. If there were an awesome GC system like you claim that could
perform that well it would have been deployed already to inemurable
applications like all Java vendors, Javascript VMs, C#, etc, etc. It's an
extremely complex problem.

Most of your links are niche commercial systems or even pure academic research
systems. They're not proof of anything other than GC being possible to
implement for various languages/machines which isn't a claim that's been
disputed at all.

> Go at Google (Fuchsia)

AFAIK Fuchsia does not use Go for any systems-level portions. Those are
written in C/C++/Rust last time I checked (with Rust being the official
default going forward). Do you have any links to the contrary?

