A lot of people have absolutely no problem with proprietary software until it br...

nols · on March 17, 2014

It's like the news that majority of the world's ATMs run on Windows XP or earlier. Or lab equipment that's air-gapped because it only works on some obsolete OS that's horribly insecure.

Proprietary software is fine, but if long-lasting hardware is dependent on it bad things happen when that software company decides it's no longer worth supporting.

valarauca1 · on March 17, 2014

The problem with airgap test equipment (I work on this kinda of stuff professionally). Is often the redevelopment of real time software for lab and calibration equipment is very expensive.

Its easy to think, "Why don't they use [New Hotness Software ]?" Which on the surface seems to be a good idea. Until you absolutely need sub-millisecond precision I/O, then you kinda start to cry when you realize how hard precise timing in computers is.

If you use lab equipment in say Linux, BSD, OSX, Windows. Your using a time shared, not Read-Time OS. So your I/O events aren't when the event happens, but when the scheduler is damn well ready to let you know the event happened.

The easiest example is a timing equipment I was using to count digital pulses form a quartz crystal. In a 'modern' secure OS I couldn't really get below 0.1% margin of error. Which wasn't acceptable for the equipment which wasn't low enough for our uses. I fall back to an older insecure real-time platform and bumped up to 0.005%.

Security is great. I attend security conferences in my spare time and try to stay up to date on the topic. The main problem is when you get into most this computing everything isn't running glibc and win32. Hacking isn't very easy unless you know the system to start with.

RyanZAG · on March 17, 2014

Wouldn't it have been better to just stick a specialized piece of hardware/fpga to read directly from the crystal, buffer it, and pass it on? In fact that's what the timing equipment usually does for you. There's no reason to give up security when a few minor changes in architecture will give you performance, security and ease of maintenance and interconnectability.

valarauca1 · on March 17, 2014

Your acting like the read/write time to the hardware/FPGA is negligible which it isn't.

When you access hardware on say a PCI bus (which you would in this scenario). Your call to the PCI bus does not take place WHEN you call for it to take place. You call the Kernel, which calls the scheduler, which calls the hardware manager, which calls the driver, which finally processes your request.

Now your request is processed all this is dumped and something else runs while the processor waits to hear back from the PCI bus with the response because this takes ages in processor time.

Finally an interrupt arrives, is made sense of, the appropriate driver is called, then it gives your information back to your process and you back on your merry little way.

:.:.:

The problem is when you called OS to start this long chain of events the real world didn't stop. Your real time module is still counting the 32,600,000 pulses per second.

This is where you'll get errors. Because often its easy to think things in a computer happen instantly, or so blindingly fast you don't care what happens, order or speed.

The situation you described is what originally gave me 0.1% error. Eventually I switched to a more aggressive tact of polling asynchronously in a separate thread and when a process called for the time responding with the latest received time.

This got me down to 0.07% error. Still not acceptable.

:.:.:

Its nice to be starry eyed and think there is no reason to give up security. But sometimes secure software can't do what you need it to do. The more crap you put between you and the metal the more time it'll take to execute.

This is logically provable.

If you take process A and B. Both are the optimal way to do something. There is no faster way to do this task. Therefore one can assume A and B's execution time are equal.

Yet B operates in a secure sand-boxed environment with a time sharing OS. Therefore B's true execution is B+C+D.

We know A = B, but for A = (B+C+D) C and D must be Zero, which they can never be in the real world.

RyanZAG · on March 17, 2014

I get what you're saying about how it's definitely going to be slower, but I don't understand why the read/write time to the hardware matters if the hardware is buffering the last 100 timings or so. What I'm trying to point out is that you don't have to make your entire system down to the keyboard real time - you only need to make the tiny piece that is doing the physical process real time (with some very simple logic gates that can run far faster than any general purpose CPU), and then sending the results over the PCI/whatever bus in batches later to be processed.

valarauca1 · on March 17, 2014

The only real time component of the software stack is the kernel. If you want another real time module your screwed because you need to run it in kernel space, but if you have a kernel you can't.

Or you run a real time OS which may have problems because they aren't developed with security but IO timing in mind.

It's a fundamental flaw of time shared OS's.

:.:.:

Second security works in a simple way.

Cost to secure vs money lost.

Lab equipment is expensive. The loss of an entire calibration bench could run into the $250,000 to $1million and beyond range.

But redeveloping and entire OS to do this? Your talking about spending 20 to 100x MORE on security then your losses. That's idiotic at best.

LukeShu · on March 18, 2014

    > The only real time component of the software stack is
    > the kernel. If you want another real time module your
    > screwed because you need to run it in kernel space, but
    > if you have a kernel you can't.

I think you're confused about what RyanZAG is saying. If I'm reading correctly, he's saying "don't run the real-time stuff on the CPU." Have that stuff run on a much simpler piece of hardware that is real-time, and runs the real-time code, then have the non-real-time userland on the CPU talk to it in not-real-time.

To take your example:

    > When you access hardware on say a PCI bus (which you would in
    > this scenario). Your call to the PCI bus does not take place
    > WHEN you call for it to take place. You call the Kernel, which
    > calls the scheduler, which calls the hardware manager, which
    > calls the driver, which finally processes your request.

Design the PCI card to have it's own, smaller CPU (or FPGA, or whatever), that does the real-time interaction with the "32,600,000 pulses per second." Don't have the real-time bits depend in any way with the code running on the CPU. Have it buffer the data. The, when the PCI card is accessed by the userland program on the CPU, it dumps the buffer onto the PCI bus. The userland would obviously have be fast enough that the buffer doesn't fill up, but that speed is much less than "real time". You can then work with the data in the userland, running in non-real time.

valarauca1 · on March 18, 2014

      >don't run the real-time stuff on the CPU.

You have to is what I'm saying.

      >that does the real-time interaction with the data...

I already gave an example where I literally said I do this. What your not understanding is the time to poll and respond are part of this real time system.

As I said before. The amount of time between "Kernel, I need this time stamp." and "PID 1337 here is your time stamp" is not instant, and is not constant. There are several stages of blocking I/O, which are not always given priority over other threads. This amount of time I stated will not be instant, nor constant. While for 100% accuracy it needs to both. This part, the collection and storage, needs to ALSO take place in real time BUT BEING IN USERLAND, it can't.

So to outline

      Topic: UserLand   Kernel                      RealTimeCard
     Stage
      1)     Request                                Counting Pulses (You Want This)
      2)                Unknown time                Counting Pulses (Additional Error)
      3)                Spent Doing I/O             Counting Pulses (Additional Error)
      4)                Changing Tasks              Counting Pulses (Additional Error)
      5)                etc.                        Counting Pulses (Additional Error)
      7)                                            Near Instant Response 
      8)                Unknown time (+Error)
      9)                Changing Tasks (+Error)
     10)                Managing Memory (+Error)
     11)                Higher Priority Threads (+Error)
     12)     Data received

What this example boils down to you get a time stamp 8 'cycles' after you thought you'd get it. But that actual time stamp is really 5 'cycles' off of what you should have gotten.

Those 2 unknowns between your real time and userland are where your error comes from. You have both pre-call and post call error added. Neither are avoidable.

No matter whats on the other end of your bus unless your bus moves faster then light.

im3w1l · on March 17, 2014

It would be so nice if programs could request a dedicated core for these kinds of things.

MertsA · on March 18, 2014

Under Linux you can. You can boot the kernel with the maxcpus argument set to 1 or the number of cores you want the Linux kernel to use and then on a quad core machine you have 3 cores available all the time and setup to where Linux won't ever need to handle interrupts on that core. Then you just start your application and set the affinity to that unused core.

You can extend this further by doing things like mmaping in 4GB of memory through the kernels hugepage support and lock the physical to virtual address map so the kernel can't touch your physical block of ram you just allocated. Then you can do things like talk directly to a pci device like a network card and set up DMA directly from a NIC into a buffer in your applications memory.

All of this is done completely in userspace but you get all the performance benefits of implementing everything like it was running in Ring 0 and the kernel is not involved in anything apart from the initial setup and teardown. You can build an extremely high performance application basically running on bare metal but with the Linux kernel still running on a different core to handle anything that doesn't directly involve your application and there wouldn't need to be any syscalls between the two to service some request.

valarauca1 · on March 18, 2014

What I'm confused about. I'll likely just have to play around with this feature at some point. I knew about APIC, but not completely sandboxing cores.

Is if you have bare metal operations do you still have access to kernel functionality ala stdio and libc libraries? Normally when you hit bare metal your on your own. I'm just wondering because the idea of writing my own threading, and memory management libraries excites me to no end </sarcasm>.

Also if you can call these functions like your in userland then do they block until execution has completed on the other 'kernel' cores? Also if you creating P-Threads elsewhere but not managing their execution on the 'non-kernel' core what happens?

>userspace but you get all the performance benefits of implementing everything like it was running in Ring 0

Can you give any literature on this? these terms are contradictory.

MertsA · on March 22, 2014

Sorry for the delay, I'm probably not the best person to answer this and I know just enough on the subject to be dangerous so with that in mind I'll give it a shot.

>Is if you have bare metal operations do you still have access to kernel functionality ala stdio and libc libraries? Normally when you hit bare metal your on your own.

That's just it, your process is just another Linux process. The difference is that the scheduler will put it on lets say core 2 but everything else has an affinity for core 1 and interrupts will also be handled by core 1 meaning your application is never interrupted on core 2. You still get every feature that you normally get in Linux.

>Also if you can call these functions like your in userland then do they block until execution has completed on the other 'kernel' cores?

You are in userland, normal userland. Implementation details of syscalls are black magic as far as I'm concerned so take this with a grain of salt but apart from kthreads the kernel isn't running in some other thread waiting for a syscall to service, a syscall is just your program calling int 80 which jumps into the interrupt handler in kernel mode on the same core that was just running int 80, does it's work figuring out what syscall you're making and finishes handling the interrupt. So basically yes your thread "blocks" while the syscall is in progress on your special isolated core, not core 1 like everything else running on the system.

>Also if you creating P-Threads elsewhere but not managing their execution on the 'non-kernel' core what happens?

I'm not entirely sure what you mean by this, specifically "managing their execution on the 'non-kernel' core". It's just a thread like a normal Linux thread, but at first a new thread is going to have an affinity for only core 1 which you can change to core 2.

>Can you give any literature on this? these terms are contradictory.

What I meant was that generally if you want to do certain low level things like talk directly to hardware you need to be running in kernel mode. But really you don't need to be in kernel mode all of the time, just initially to allow normal user mode code to talk to the hardware instead of having to use the kernel like one big expensive proxy. As for why a user mode driver for a network card would be such a huge performance gain there are a number of reasons such as every syscall will be a context switch, whatever data you're sending or receiving to the network card will need to be needlessly copied to/from the buffer instead of reading and writing directly to it, you have to go through the entire Linux TCP/IP stack when there's tons of functionality in there that you might not need but have to have so it's just wasted cycles, and the list goes on.

I did manage to find an old Hacker News comment on the subject for further reading from someone much more well versed on the topic than I am. https://news.ycombinator.com/item?id=5703632

Also of interest might be Intel's DPDK which is basically what we're talking about, moving the data plane out of the kernel completely for extreme scalability.

valarauca1 · on March 17, 2014

Even that wouldn't work. Their are a lot of parts of the CPU and all of them work together. RAM access, Cache access, Interrupts, and Memory management is handled globally not on a pre-core basis. You'd run into the problem of needing multiple north bridges (do we even have those anymore or are those on chip now?) which you couldn't have.

You have to build an entire OS with real time I/O at heart. They do exist, some are secure (Blackberry's platform) but they aren't deployed to the test industry. The most likely version is licensing fees, nobody writings Data Acquisitions for them.

jkimmel · on March 17, 2014

>lab equipment that's air-gapped because it only works on some obsolete OS that's horribly insecure.

Just to emphasize how prevalent this is, I work in a 5 story life sciences research center with hundreds of personnel. All of our lab computers are airgapped, many of them stuck on OS versions >10 years old.

I used OS X 10.1 the other day. That was certainly an experience.

scholia · on March 18, 2014

Mac OS X 10.1 is roughly as old as Windows XP, isn't it?

tshaddox · on March 17, 2014

To play devil's advocate, one could say the same for a lot of open source software: it's great until it breaks. Your claim implies that broken proprietary software is bad because you can't access the source code, while mine implies that broken open source software is bad because there might be no company or organization dedicated to providing support.

valarauca1 · on March 17, 2014

But dealing with real life probabilities the likelihood of the event occurring (Unless your approaching the fringe of newer/locked down hardware) is slim.

npsimons · on March 17, 2014

Irrelevant; you can fix (or pay/trade/beg someone else to fix) FLOSS. You can't say the same for closed source.

tshaddox · on March 17, 2014

For many things, that's only true as a technicality. Having access to the source code of a complicated piece of software doesn't instantly make it easy, or even remotely feasible, to fix it yourself.

For example, back when I ran a Linux desktop and had problems with my video card or video settings, there wasn't a chance that I could fix it myself. The odds of me being able to do so were roughly equivalent to the odds of being able to fix a broken closed-source video driver by opening the binary up in a hex editor. Technically possible, yes, but not remotely feasible.

derrida · on March 18, 2014

You didn't have access to the source of the video driver. Not an example of having the source being & difficult to fix.

nemothekid · on March 17, 2014

How so? You could similarly beg or pay for Apple to fix the issue.

Consider OP's use case. Lets say hes a non-programmer and Keynote is open source, but similar in complexity to WebKit. Now he comes across the error - what can he do? Learning to program or finding someone with the time and knowhow of Keynotes (or WebKits) inner workings may take months.

On the time scale of months, he could also bitch enough at Apple that they may release a tool.

However in both cases, the most time efficient solution is to download Keynote 09.

legulere · on March 17, 2014

You can also patch the binaries of proprietary software, technically possible, but exactly as unfeasible as your solution when you want to get stuff done.

eru · on March 18, 2014

> [...] technically possible [...]

But often not legally possible?

userbinator · on March 18, 2014

Practically, it's more of a "not legally possible to distribute".

npsimons · on March 17, 2014

In a similar vein, I'm now accelerating my plans to move to something like this: http://phpmygpx.tuxfamily.org/phpmygpx.php because Google has made Maps where I can no longer just paste a URL to a KMZ on another server and have it popup in maps, shareable with friends and family. It's not lockin, but it's a similar symptom of relying on software that you have no control over (closed source). I distinctly remember similar grumblings when Google shutdown Reader . . . .

itcmcgrath · on March 17, 2014

s/proprietary//

While I like non-proprietary software, there is nothing inherit that makes much of a difference for the lay person. As the complexity of the software goes up, so does the level of what is considered a 'lay person'. For any sufficiently complex software and non-pervasive problem - you're SOL in both cases. Standards help more than proprietary/non-proprietary. Lower complexity also helps more.

dalke · on March 17, 2014

For example, if you had your own compiler which generated a.out output, then the Linux 1.2 switch to ELF would have broken the code in a similar backward incompatible way, despite there being no proprietary code involved.

(Standards wouldn't have helped either.)

npsimons · on March 17, 2014

I'm calling your bluff:

https://plus.google.com/115250422803614415116/posts/hMT5kW8L...

Specifically, Alan Cox's comment that he can still run an a.out rogue binary from 1992 on a 3.6 version of the Linux kernel. Linux 1.2 was released March 1995.

dalke · on March 17, 2014

Bluff called, I fold .. and sweet!

Looking around, it requires a bit of work (see for example http://www.ru.j-npcs.org/usoft/WWW/www_debian.org/FAQ/debian... ).

Here's a recent (2014) report of success: http://www.linuxquestions.org/questions/slackware-14/running...

rogue is text only. According to a report from 2009 at https://www.complang.tuwien.ac.at/anton/linux-binary-compati... :

> A Mosaic binary from 1994 just segfaults, as well as all the other ZMAGIC binaries (1994-1995) I have lying around.

but OTOH that may be because of the setting of noexec and other flags.

markrages · on March 17, 2014

Uh.

  # modprobe binfmt_aout
  # uname -rv
  3.2.0-60-generic-pae #91-Ubuntu SMP Wed Feb 19 04:14:56 UTC 2014

Standards help a lot, that's why a.out still works on modern Linux.

pbhjpbhj · on March 17, 2014

I'd imagine if it doesn't work though, in contrast to the OP's situation, you could spin up a virtual machine with a pre-1.2 version of Linux and run the code without having to pay anything to do so. Also this is a quite different situation to reading a document.

voltagex_ · on March 18, 2014

I'm having trouble locating a (floppy?) image that old but I'd love to try - AFAICT VirtualBox was written with kernel 2.6+ in mind.

Edit:

Debian: http://archive.debian.org/debian/dists/Debian-0.93R6/disks/

RedHat: http://archive.download.redhat.com/pub/redhat/linux/1.0/en/o...