
Linux 3.13 - edwintorok
http://kernelnewbies.org/Linux_3.13
======
wtbob
>
> [http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.g...](http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6e9fa2c8a630e6d0882828012431038abce285b9)

I really, really, _really_ wish that the Linux CSPRNG would quit having its
flaws papered over. A fellow submitted a patch to implement the Fortuna CSPRNG
_years_ ago, and it wasn't accepted because of a misguided belied in entropy
estimation.

I'm not saying that Fortuna is the One True CSPRNG—it's not—but _any_ clean
design would be preferable to the current Rube Goldberg mechanism. I'm pretty
sure that /dev/random as it currently stands is secure enough, but 'pretty
sure' isn't very reassuring.

~~~
forgottenpass
Have you brought any of this up on lkml?

~~~
wtbob
Jean-Luc Cooke tried to submit a patch back in September 2004 (yes, a decaded
ago) and was shot down. I don't know if anyone has tried since.

------
dded
One significant user-visible feature of 3.13 is nftables:
[https://lwn.net/Articles/564095/](https://lwn.net/Articles/564095/)

~~~
esbranson
I've been wanting to implement a dynamic ARP filter (DHCP snooping + ARP
filtering) for ages now, but arptables/ebtables just didn't cut it. Hopefully
this will be easier/viable now. Because I still don't think ArpON sniffs DHCP
leases for its mapping (it intercepts and replays ARP requests or something)
and it doesn't filter rogue DHCP servers. I'm just amazed ARP spoofing/ARP
cache poisoning is still a viable attack vector on home networks in 2014.

~~~
adekok
Please see the "master" branch of FreeRADIUS.
[https://github.com/FreeRADIUS/freeradius-
server/](https://github.com/FreeRADIUS/freeradius-server/)

It can accept both DHCP and ARP protocols, and will decode them into
attribute-value pairs. Those can then be referenced in a policy language, and
stored to / read from a database.

I'm the author. :) It's no longer just a RADIUS server. I've been looking for
a DHCP / ARP checker for a while, and couldn't find anything useful. Rather
than writing something from scratch, I decided it was easier to just add ~2K
LoC to FreeRADIUS. I could then leverage the policy language and database
integration, so I didn't have to re-write all of that, either.

~~~
esbranson
Excellent. :) I will look into this.

I am targeting embedded devices on OpenWRT, which means it needs to be as
simple and small as possible, so I hope the code is tight.

But on the other hand, I would prefer to not reinvent the wheel.

~~~
adekok
The code is tiny. It already runs on OpenWRT, so there's no issue there.

------
bjackman
Great summaries, thanks! I love when kernel news is made as accessible as this
(thanks also to LWN).

Looks like a pretty major release. I only wish I had a rackfull of cutting-
edge SSDs to try out the new block layer on!

------
arielweisberg
Great to see the NUMA balancing in. My question has always been what workloads
require NUMA balancing in the first place? If I present the kernel with the
same number of threads as cores and keep all data local to a thread would the
existing approach of allocating on the NUMA node the thread is running on have
been enough?

~~~
fsaintjacques
When your process use more than half of the memory, e.g. DBs.

~~~
arielweisberg
The kernel will balance loaded threads across cores so with a policy of
allocating off the local NUMA node you actually end up with balanced
allocations in practice if you run shared nothing thread per core.

In memory databases are my day job so I am pretty interested in cases where
things go south because memory isn't balanced. To date it appears like no
special actions were necessary, stuff just ends up balanced across nodes.

That's why it would be great of someone could characterize when balancing is
necessary outside of obvious cases like allocating an entire buffer pool from
one thread.

~~~
pmenon
You should take a look at "OLTP on Hardware Islands" in VLDB 2012:
[http://vldb.org/pvldb/vol5/p1447_danicaporobic_vldb2012.pdf](http://vldb.org/pvldb/vol5/p1447_danicaporobic_vldb2012.pdf)

~~~
MichaelGG
It applies to more than DBs. Running VoIP software, we found that just by
setting CPU affinity, we got a major increase in performance. The software in
question, FreeSWITCH, is inanely-threaded in a misguided believe that "more
threads=more performance" (well that, and it's also just easier to program).
When there's thousands of threads going, keeping them and their data local to
one NUMA node or less really makes a huge difference.

------
sandGorgon
3.13 is the first release with full opensource Intel Broadwell drivers -
switched off by default though.

So by the time Broadwell actually lands in Q3-Q4, the kernel should have
stabilized nicely. Perfect for a cheap Steambox.

~~~
higherpurpose
I doubt Broadwell will have any performance improvements over Haswell at the
same price points. You're probably better off buying a cheaper Haswell then if
you want a cheap Steambox. Intel has pretty much given up on improving overall
performance of its chips. IVB was only 10 percent improvement over SNB, and
Haswell only 5 percent over IVB, and the difference in price between new-gen
and last-gen is probably more like 30 percent.

~~~
sandGorgon
Intel’s Ben Widawsky, who works on Intel’s Linux graphics driver efforts, says
that “Broadwell graphics bring some of the biggest changes we’ve seen on the
execution and memory management side of the GPU… [the changes] dwarf any other
silicon iteration during my tenure, and certainly can compete with the likes
of the gen3->gen4 changes.”

This combined with the fully opensource Linux driver for Broadwell would mean
that there is a very _high_ chance it would perform significantly better.

~~~
wmf
Broadwell will perform better, but Intel may decide to charge more for it.
That's what they did with Ivy Bridge and Haswell.

------
pshc
Anyone heard any news on Google's user mode thread[1] kernel syscalls? I was
really excited for that when it was announced but haven't heard a peep about
it since.

[1][http://youtube.com/watch?v=KXuZi9aeGTw](http://youtube.com/watch?v=KXuZi9aeGTw)

~~~
gcr
Is this similar to the old "Scheduler Activations" idea?
[http://homes.cs.washington.edu/~tom/pubs/sched_act.pdf](http://homes.cs.washington.edu/~tom/pubs/sched_act.pdf)

I'd love to see that idea wind up in a mainstream kernel.

~~~
twic
It was in FreeBSD!

[http://www.freebsd.cz/kse/index.html](http://www.freebsd.cz/kse/index.html)

For some reason, it didn't work out, and FreeBSD switched back to conventional
threads, around 7.0, i think.

------
saosebastiao
I noticed a couple of commits regarding btrfs. Can anybody summarize them to
someone that doesn't know anything about kernels and very little about file
systems?

~~~
dsr_
The big improvements to btrfs came in 3.12. The 3.13 changes are rather minor:

a mount option to specify the maximum delay before committing writes to
storage (default 30 seconds, no maximum, warning for 300+ seconds -- make sure
you have battery coverage for whatever this is set to plus a few seconds, and
try not to crash...)

a mount option for emergency use that will force the rebuild of the UUID tree

userspace tools that read FIEMAP_EXTENT_SHARED can now use that on btrfs; no
functionality change, really, just making the info available in the same way
that ocfs2 does it.

------
contingencies
For the block layer update, before anyone gets excited like I did, the paper
actually suggests it's not useful to most people at all with current era
hardware.

 _In this paper, we have established that the current design of the Linux
block layer does not scale beyond one million IOPS per device. This is
sucient for today 's SSD, but not for tomorrow's. We proposed a new design
for the Linux block layer. This design is based on two levels of queues in
order to reduce contention and promote thread locality. Our experiments have
shown the superiority of our design and its scalability on multi-socket
systems. Our multiqueue design leverages the new capabilities of NVM-Express
or high-end PCI-E devices, while still providing the common interface and
convenience features of the block layer._

~~~
mbjorling
This statement should be seen as there's no way to scale the old block layer
to new devices. To current SSDs, its already useful, in that it decreases
latency and CPU usage for current generation of drives.

It's currently only enabled using the virtioblk driver. But there's work
underway to make the scsi layer and all the others drivers use it (already
patches out for the mtip and nvme driver).

~~~
contingencies
Thanks for the clarification. IIRC you are a co-author of the paper, so
perhaps you can answer a follow-up question.

What kind of latency or CPU usage change should a typical modern SSD on an
amd64 class multicore processor observe when using the new block layer?

Also correct me if I'm wrong but since Linux aggressively caches already and
SSDs are already way faster than older drives for normal (ie. ~random access)
loads, plus RAM is cheap and plentiful these days, I am guessing that very few
applications will honestly be IO-bound enough to see that benefit.

~~~
mbjorling
One thread issuing IOs: A reduction of 2x in the IO path latency isn't
unusual. The overhead of the code path drops from 5us to around 2us. When
there's multiple IO threads, the gain is much higher (to 38x in the 8 socket
setup). Thus, the more complex workload, the better performance.

I don't have any up-to-date numbers on CPU usage. When we did the experiments
on the mtip drive, it was around 20% less CPU usage when performing roughly
the same IOs.

For a typical workstation workload, the SSDs access times are still too high
to feel the reduced latency. A typical modern SSD is around 50-100us for an IO
access. The win there will be the lesser CPU usage that free up resources for
other things to do.

Applications are still bound by the round-trip time of getting IOs. Just
because we get more memory, we still have to persist data at intervals to
prevent data loss, and everything that can help in decreasing the overhead is
a win.

------
shimon_e
Great finally some drivers I've wanted have been merged.
[http://kernelnewbies.org/Linux_3.13-DriversArch](http://kernelnewbies.org/Linux_3.13-DriversArch)

------
netcraft
Do we know what the next LTS kernel release will be and when it is expected?

~~~
sp332
Greg Kroah-Hartman hasn't mentioned it on his blog. He's doing one per year,
but he didn't announce last years' until it had been out for a month already.
[http://www.kroah.com/log/blog/2013/08/04/longterm-
kernel-3-d...](http://www.kroah.com/log/blog/2013/08/04/longterm-
kernel-3-dot-10/)

Edit: more official link
[https://www.kernel.org/category/releases.html](https://www.kernel.org/category/releases.html)

