You could just as easily take the same facts and say:
"I took a flight on a 777 where there actually was a cabin fire (or at least the crew thought so). But our pilot never lost contact with the ground and we landed safely -- we certainly didn't disappear for over a week, fly far beyond our emergency landing opportunity, or perform bizarre evasive maneuvers. Just sayin'."
> i've worked for a company that wanted a tool that could automate any task, and that anyone could use. They ended up writing a large, clunky program with a custom configuration format
Lots of people design terrible, overly complicated systems. This is an argument for good design, not for ad hoc shell scripts for everything.
Would you write a build system in shell? Why not? Because "make" is far better suited to the job. (Even "make" has a lot of room for improvement, but that's a story for another day).
As another example, even Debian (known for its conservatism) decided that systemd was a net benefit over sysvinit, where the latter could quite accurately be described as a pile of shell scripts. Abstractions that suit the problem domain are far superior to ad hoc imperative bags of commands.
And my argument was that a simple design is good design, and in my opinion, better than something that isn't simple.
You could definitely write a build system in shell. In my opinion, "the job" is subjective and you shouldn't have to pick one tool for it. But I would prefer a simple one where I can see pretty much everything it's doing up front and not have to download source code or search through docs.
Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.
And in defense of shell scripts, nearly every single operating system you can think of other than Windows uses scripts to initialize (hell, even Windows has some startup scripts), and that is an abstraction that's suited to the problem domain. It's not like for the past 50 years people have been too stupid to come up with something better than shell scripts. The whole point of init scripts is to let anyone be able to modify their operating system's start-up procedures quickly and easily. If we didn't need to modify them to suit our individual cases they'd be statically compiled C programs or assembly (which, by the way, I have done in the past; it's not a good idea)
> You could definitely write a build system in shell. In my opinion, "the job" is subjective and you shouldn't have to pick one tool for it. But I would prefer a simple one where I can see pretty much everything it's doing up front and not have to download source code or search through docs.
You likewise have to read docs or the source to know what your shell is really doing. Or search through sysvinit docs to know which shell scripts it will run and when. I have to look up how to use shell conditionals every time, and it's extremely easy to get it subtly wrong such that it's not doing what you think it's doing. There are subtle distinctions between shell built-ins and programs of the same name, and it's very unclear in some cases which is being called. The shell is far from being a beautifully simple and transparent abstraction. Perhaps you know it too well to be able to appreciate this.
The shell and sysvinit are abstractions over writing C programs that make POSIX library calls directly. It's all abstractions unless you are writing machine code directly, (and even then you have to read docs to know what the instructions really do). So given that we're all using abstractions, we ought to use the best ones; the one that gives us the most advantages and the fewest disadvantages given the problem we are trying to solve. The ones that have the greatest combination of simplicity, transparency, and expressiveness.
If you want to make sure that only one copy of a program is running, is it a better fit to write a shell script that creates a pidfile, checks to make sure there's a running process that matches it, and that handles a million cases about how the two can fall out of sync? Or is it better to be able to express directly "I want one copy of this program running" and then let the abstraction handle the hard cases?
You might say "well the shell script is easier to debug, because I wrote it and I understand it." Well of course, you always understand the things you wrote yourself. But if I had to debug either a pidfile-making shell script or a well-designed babysitter tool, I'll take the specific tool every time. If it's mature, it's also a lot less likely to have bugs and race conditions.
> Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.
I don't think you're very well-informed about this. Your claim of a "vocal minority" is particularly suspect, considering the decision was made by a vote of the technical committee. Though there was disagreement, almost no one was in favor of sysvinit. And the position paper in favor of systemd identifies specific things that are lacking in sysvinit: https://wiki.debian.org/Debate/initsystem/systemd
What you're arguing for is essentially the difference between a shell script and a more complicated shell script [some tool designed to "express directly" what you want to do]. You ask, why wouldn't we use something more complicated if it does exactly what we want to do? Because unless you really need to do some exact thing, it adds unnecessary complexity which leads to many additional problems. And if you need the added functionality, you can always add a feature or use a pre-existing tool.
Your debugging argument is bonkers. You claim shell scripting is too hard, then say it must be easy to troubleshoot a "well-designed babysitter tool", which requires WAY more domain-specific knowledge of debugging tools! If you don't know how to write bash scripts, you sure as hell aren't going to have an easy time figuring out why your package manager's borking on an update of openssl.
Did you even read the executive summary of the position paper? "Systemd is becoming the de facto standard" .. "Systemd represents a leap in terms of functionality" .. "People are starting to expect this functionality [..] and missing it could [..] make Debian lose its purpose." They only want it because it has new features, and people are starting to expect it. It's a popularity contest, and systemd won.
> What you're arguing for is essentially the difference between a shell script and a more complicated shell script
You seem to have a really shell-script-centric view of the world, as if the shell is somehow a fundamental and "pure" tool, and everything else is a more complicated version of a shell script.
What you are missing is that the shell is just another tool, and does an awful lot behind the scenes to achieve the functionality that appears "simple" to you. Bash is tens of thousands of lines of C and its manpage is hundreds of thousands of words. Using the shell is not "free", complexity-wise. The shell is a programming language, and not a particularly well-designed one at that. Shell script programming introduces its own peculiarities, and it is known not to scale in complexity very well.
> Your debugging argument is bonkers. You claim shell scripting is too hard
No, I claim that the shell is not a particularly simple tool. There is a difference.
I write JIT compilers for fun. I don't shy away from things that are hard. But my brain has only so much room for uninteresting and arbitrary facts. The very pecular way that conditionals work in the Bourne Shell is not something my brain can be bothered to remember.
> Did you even read the executive summary of the position paper?
You have edited out precisely the parts that contradict your position: "It replaces the venerable SysV init with a clean and efficient design [...] It is better than existing alternatives for all of Debian’s current use cases." Yes, the momentum of systemd was clearly a factor, but your claim that it is only but a popularity contest is not a conclusion that a dispassionate observer would reach.
Is it really so hard to remember these three forms of conditionals?
if ! grep foo /some/file ; then
if [ $INTEGER -eq 0 ] ; then
if [ "one" = "two" ] ; then
Those are the only conditionals I ever use. I don't really use conditionals in any other way, and it's really not that complicated to see how they work. Sure, there are more complicated forms, and forms that other shells use (what you see above are fairly compatible, common forms of conditionals, except for the -eq). But if you go back to the original shells and how they did scripting, that should work for most if not all other shells today.
Also, those parts you quote don't contradict anything. It's saying systemd has a "better design", which means it is a shinier, fancier new toy to play with. But the paper never once points out any flaw in sysvinit. But that's obvious; Debian had been chugging along for over over two decades with sysvinit without any problems. If your argument is that suddenly, after 20 years, someone realized sysvinit was some horribly flawed design that needed to be replaced, and it just so happens that systemd came along right when they realized it, I don't buy it. What the paper does spell out, though, is all the advantages of systemd for things other than system init. Basically it says "Hey, we want all these new features, and we need systemd to replace init for it all to work, so please just go along with it because it's a much better design."
Here are some things that seem reasonable but don't work:
if [ $FOO = bar ] ; then
echo "not equal!"
This errors out with:
test.sh: line 4: [: =: unary operator expected
This is because the shell is based on text substitution. So once it's replaced "$FOO" with nothing, it ceases to be an actual token, and the expansion of:
if [ = bar ] ; then
...is an error. This is terrible.
One solution you sometimes see to this is:
if [ x$FOO = xbar ] ; then
echo "not equal!"
This handles the empty string and other simple strings, but once $FOO has any spaces or shell metacharacters in it, it will break also. This is also terrible.
> It's saying systemd has a "better design", which means it is a shinier, fancier new toy to play with.
You seem dismissive of new technology. If you want to keep using CVS while browsing the web with lynx and sending your email with pine, more power to you (after all, graphical web browsing is just a "new feature"). But the rest of us are moving on.
It's not new technology that bothers me. Me having to do more work bothers me. Systemd is going to make my job more difficult in terms of troubleshooting and maintenance - way more difficult than remembering that an operator requires two operands to evaluate.
What's really funny about systemd is I think that all its features have tons of value, and I would definitely use them. But I also think its creators are completely fucking batshit insane for making it mandatory to replace huge chunks of the operating system just to get those features. You should be able to just run systemd as a normal user process and still maintain the same level of functionality, but for some fucked up reason somebody thought it would be a great idea to make it a completely non-backwards-compatible non-portable operating system requirement. It's a stupendously bad idea, and the only reasoning anyone can come up with for why they designed it that way is "It's Advanced!" Of course, I should add the caveat that I don't care at all about boot times, and so people who are obsessed with short boot times will find systemd very refreshing, in the way an Apple user finds replacing their old iPhone with a new iPhone very refreshing.
> Debian picked up systemd because it was a popular decision, not because it was better. As far as i'm aware there was nothing broken in sysvinit that systemd fixed; there was just a vocal minority that wanted some new features and insisted it be shipped as default.
I don't feel that's a fair summary of the lengthy debate had about this. There is a long page here  listing the reasons for selecting systemd. There is also , a very good summary by Russ Allberry of the different init systems being suggested (including staying with sysvinit). Debian has rarely been accused of taking a major technical decision because it is hip.
Shell scripts are not simple. They rely on mutable state and loads of it. For every bit (in the binary sense) of mutable state you double the number of possible states your system can be in. This rapidly expands to ridiculous levels of complexity, the vast majority of which the programmer is ignorant of. This is not good design.
It may not be as "good design" as functional programming, but functional (in the sense of usable by human beings) it is very good design. It's essentially a flat program that can be interpreted by non-developers, edited on the fly, is incredibly flexible and customizable and is backwards-compatible by 40 years. It's actually pretty damn useful. But I can see how someone might not like just getting things done and would rather design an immutable state functional program to start their ssh daemon.
It's not about getting things done, it's about reliability and repeatability. When you deploy large numbers of nodes in a system you don't want little bits of state causing random failures. You want everything to be as homogeneous and clean as possible.
Who says scripts have to be ad-hoc? They can be very well designed and tested. In fact, in some ways I agree with the parent: simple well tested scripts are a very powerful and often underutilized tool.
This reminds me of a particularly devious C preprocessor trick:
#define if(x) if ((x) && (rand() < RAND_MAX * 0.99))
Now your conditionals work correctly 99% of the time. Sure it's possible for them to fail, but unlikely.
Now you might object that C if() statements are far more commonly executed than "apt-get install". This is true, but to account for this you can adjust "0.99" above accordingly. The point is that there is a huge difference between something that is strongly reliable and something that is not.
Things that are unreliable, even if failure is unlikely, lead to an endless demand for SysAdmin-like babysitting. A ticket comes in because something is broken, the SysAdmin investigates and found that 1 out of 100 things that can fail but usually doesn't has in fact failed. They re-run some command, the process is unstuck. They close the ticket with "cron job was stuck, kicked it and it's succeeding again." Then go back to their lives and wait for the next "unlikely but possible" failure.
Some of these failures can't be avoided. Hardware will always fail eventually. But we should never accept sporadic failure in software if we can reasonably build something more reliable. Self-healing systems and transient-failure-tolerant abstractions are a much better way to design software.
That difference goes away at the point where other risk factors are higher. How high is my confidence that there isn't a programming bug in Nix? Above 99%, perhaps, but right now it's less than my confidence that apt-get is going to work.
Most of us happily use git, where if you ever get a collision on a 128-bit hash it will irretrievably corrupt your repository. It's just not worth fixing when other failures are so much more likely.
The point of my post wasn't "use Nix", it was "prefer declarative, self-healing systems."
Clearly if Nix is immature, that is a risk in and of itself. But all else being equal, a declarative, self-healing system is far better than an imperative, ad hoc one.
Other risk factors don't make the difference "go away", because failure risks are compounding. Even if you have a component with a 1% chance of failure, adding 10 other components with a 0.1% chance of failure will still double your overall rate of failure to 2%.
This is not to mention that many failures are compounding; one failure triggers other failures. The more parts of the system that can get into an inconsistent state, the more messed up the overall picture becomes once failures start cascading. Of course at that point most people will just wipe the system and start over, if they can.
File hashes are notable for being one of the places where we rely on probabilistic guarantees even though we consider the system highly reliable. I think there are two parts to why this is a reasonable assumption:
1. The chance of collision is so incredibly low, both theoretically and empirically. Git uses SHA1, which is a 160 bit hash actually (not 128), and the odds of this colliding are many many orders of magnitude less likely than other failures. It's not just that it's less likely, it's almost incomparably less likely.
2. The chance of failure isn't increased by other, unrelated failures. Unlike apt-get, which becomes more likely to fail if the network is unavailable or if there has been a disk corruption, no other event makes two unrelated files more likely to have colliding SHA1s.
It always amazes me that after all these years, Linux still hasn't fixed this.
In my experience, any program that overloads I/O will make the system grind to a halt on Linux. Any notion of graceful degradation is gone and your system just thrashes for a while.
My theory about this has always been that any I/O related to page faults is starved, which means that every process spends its time slice just trying to swap in its program pages (and evicting other programs from the cache, ensuring that the thrashing will continue).
I've never gotten hard data to prove this, and part of me laments that SSDs are "fast enough" that this may never actually get fixed.
Can anyone who knows more about this comment? It seems like a good rule inside Linux would be never to evict pages that are mapped executable if you can help it.
In its heyday, Solaris was outstanding in terms of being responsive while simultaneously doing large amounts of I/O. (Or at least that's my perhaps clouded recollection, I haven't used Solaris in over 5 years).
Just after the OS boots, Dropbox needs to index 120 GB of files. Any other program that wants to access the disk takes forever. For Dropbox to finish, for my mail and IDEs to open takes about 10 minutes. Any other program that needs the disk is uselessly slow.
Interesting. How many files do you have? I recorded a trace of dropbox executing on my windows machine (mostly flat folder hierarchy, ~500MiB , ~1000 files) and the file I/O for querying all my data took 71542.070μs (0.07s). I believe dropbox also does some extra things (reading the NTFS journal, its own file cache-journal, updating hashes, etc ) and so the total File I/O cost was around 2944815.431μs (2.9s). Note that the I/O happened sporadically, and the wall clock time is higher as expected (it didn't block the scheduler from scheduling other processes).
I assume since my data was synced and didn't need to be indexed all over again - I got some savings there. Maybe your dropbox configuration data is corrupted and thats why it needs to index it all again.
> There are also 255 representations of almost all representable numbers. [...] Aside from the fact that you're wasting an entire byte of your representation
How is this different than any other floating-point representation? I'm pretty sure IEEE floating-point has the same redundancy, though numbers are normalized so comparisons are cheaper as you note. But IEEE doubles "waste" even more bits due to the 2^52 representations of NaN.
> For most comparisons [...] it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT.
Good point, sounds like a notable weakness and barrier to adoption.
> Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological.
IEEE 754 doesn't waste any bits – there is only a single representation of each value (except for multiple NaNs). In this proposal, there are 255 representations of most values, which means that it has almost an entire byte of redundancy. The waste is bad, but the lack of a canonical representation of each value is worse.
I personally think that the way to handle floating-point confusion is better user education. However, if you really want a decimal standard, then, as I mentioned above, there already is one that is part of the IEEE 754 standard. Not only do there exist hardware implementations, but there are also high-quality software implementations.
A better approach to making things more intuitive in all bases, not just base 10, is using rational numbers. The natural way is to use reduced paris of integers, but this is unfortunately quite prone to overflow. You can improve that by using reduced ratios of – guess what – floating point numbers.
> There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations.
You are not correct. The smallest significand possible is 1x10^1, but you can't delve further into positive exponents. Conversely, 56 signed bits allows the largest integer power of 10 as 10 000 000 000 000 000 so the exponent will be -15. So there are exactly 17 representations of 10, and that's the worst it gets. All other numbers except powers of 10 have fewer representations, and most real world data affected by noise has a single representation because they use the full precision of the significand, and you can't shift them to the right or left without overflow or loss of precision.
So the redundancy is much less than you think, one in 10 real values has two representations, one in 100 has three etc. This is common for other decimal formats and not that big of a problem, detecting zero is a simple NOR gate on all significant bits.
The real problem with this format is the very high price in hardware (changing the exponent requires recomputing the significand) and complete unsuitability for any kind of numerical problem or scientific number crunching. Because designing a floating point format takes numerical scientists and hardware designers, not assembly programmers and language designers.
Heck, the only reason he put the exponent in the lower byte and not the upper byte, where it would have ensured a perfect compatibility to most positive integers, is that X64 assembly does not allow direct access to the upper byte.
In one format, IEE754 has 24576 possible representations of zero, which fits your definition of "wasted bits". Some of your other criticisms might be valid, but at this point I'd like to see an accurate technical comparison between DEC64 and the decimal formats of IEEE 754.
This is why decimal floating-point formats are kind of a disaster in general and are only implemented in relatively rare hardware intended for financial uses. In many of those applications, using a decimal fixed point representations is better – i.e. counting in millionths of pennies (you can still count up to ±9 trillion dollars with 64 bits). But yes, a technical comparison of different decimal formats would definitely be interesting. I suspect that despite the occasional failure of intuitiveness, we're far better off with binary formats and better programmer education.
Nobody who does anything with numbers believes that! Even if all you can do is count your fingers you believe in the difference between integers and floats. They have different algebraic properties entirely and it takes a whole hell of a lot of work to get from one to the other---there's even a whole class (fractions) in between.
I'm not sure what that has to do with it. Even if you are ok with the idea that integers and "decimal numbers" are different, it's still confusing that 0.1 + 0.2 != 0.3.
It's confusing because it is very difficult to look at a decimal number and know whether it can be represented exactly as base-2 floating point. It's especially confusing because you get no feedback about it! Here is a Ruby session:
The precise value of double(0.1) is 0.1000000000000000055511151231257827021181583404541015625. That is precise, not an approximation.
If you know of a program in any of these languages that will print this value for "0.1" using built-in functionality, please let me know because I would love to know about it.
Likewise the precise value of double(1e50) is 100000000000000007629769841091887003294964970946560. Anything else is an approximation of its true value.
In another message you said that what's really important is that the string representation uniquely identifies the precise value. While that will help you reconstruct the value later, it does not help you understand why 0.1 + 0.2 != 0.3.
It helps because 0.1 + 0.2 produces 0.30000000000000004 for 64-bit floats – so at least you can see that this value isn't the same as 0.3. In Ruby you just get two values that print the same yet aren't equal, which is way more confusing. I agree that printing the minimal number of digits required for reconstruction does not help with explaining why 0.1, 0.2 and 0.3 in 64-bit floats aren't the real values 1/10, 2/10 and 3/10.
rasky at monocle in ~
Python 2.7.5 (default, Sep 2 2013, 05:24:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
rasky at monocle in ~
Python 3.3.3 (default, Dec 24 2013, 13:54:32)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
It used to, but it was changed to reduce users' confusion.
We may be speaking across each other here. Ruby is lying in the sense that there are multiple distinct float values that it will print as 0.3 – in particular, confusion ensues when two values look the same but are unequal. These other languages print each distinct float value differently, using just enough decimal digits to reconstruct the exact binary value. Ruby doesn't give you enough digits to reconstruct the value you have. Nobody actually prints the full correct value because it's fifty digits long and is completely redundant given that you know you're dealing with a 64-bit float.
Yeah, that's my point. 0.1 and 0.2 do add exactly to 0.3, but in any finite representation of real numbers you'll get rounding errors on some equations like this. If you use an infinite representation then equality is no longer computable.
Decimal floating point would hardly affect accuracy compared to binary floating point -- with the exception of domains where the inputs and output are strictly human-readable and human-precise constructs (e.g. banking... is there a better term for this?).
Binary floating point can accurately represent fractions of the form 1/(2^n). Decimal floating point can accurately represent fractions of the form 1/((2^n)*(5^m)). Either can only approximate fractions with other prime factors in the denominator (1/3, 1/7, 1/11, ...).
In terms of a programmer having to be concerned with accumulated errors due to approximations in the representation, I'd assert that decimal floating point in no way changes the scope of concerns or approach to numeric programming compared to binary floating point. I'd guess even in a domain with a very precise need of fractional decimals, a programmer would still need to reason about and account for numeric representation errors in decimal floating point such as overflow and underflow.
> Decimal floating point would hardly affect accuracy compared to binary floating point
The larger the radix, the more of the mantissa is wasted. Given fixed storage, base 2 floats will have higher precision over their range than higher bases, like 10 and 16.
The difference is easy to illustrate with base 16 and base 2, since we can convert between the two easily. Converting a base 16 float to base 2 will result in leading zeroes some of the time, which could have been used for storing data. The same is true with base 10, but you have to do more math to demonstrate it.
The thing that remains most compelling about decimal encodings is in the types of errors seen and how human-comprehensible they are.
For example, arithmetic operations in binary floating point are notorious for producing mystery meat error quantities, because the actual error quantity is in binary and only later represented as a decimal amount. And when coding it is most natural to account for floating point errors in decimal terms, even though this is a false representation.
So the main thrust of any decimal float proposal comes from "this is more in tune with the way humans think" and not the specific performance and precision constraints.That said, I have no special insights into Crockford's proposal.
I imagine it's because binary exponents correspond very well to binary arithmetics, while decimal exponents don't.
Imagine you have a floating point format which has 8-bit mantissa (i.e. 8 bits to store the digits, without the floating point). You're trying to calculate 200 + 200. In binary, that's
0b11001000 + 0b11001000 = 0b110010000
However, to represent the result you would need 9 bits, which you don't have, so you instead represent it as 400 = 200 * 2 = 0b11001000 * 2 ^ 1. Notice how the resulting mantissa is just the result (0b110010000) shifted one bit.
If your exponent is decimal exponent, you would instead have to represent 400 as 400 = 40 * 10 = 0b101000 * 10 ^ 1. In this case, the resulting mantissa has to be calculated separately (using more expensive operations), as it has no connection to the mathematical result of the operation.
Because division/multiplication by powers of two is a simple bitshift and can be implemented very easily on silicon. Division/multiplication by ten is complicated and needs many more gates and more time.
It's not inherently slower, however you have to add extra logic to correctly handle overflows and the like. However given the size of modern floating point units I doubt it's a massive overhead. Basically they would need to convert back and forth between the native zeros and ones of hardware and the decimal representation. And extra logic might mean slower hardware in certain circumstances.
Basically try to implement a BCD counter in verilog and you'll see where the overhead appears compared to a "dumb" binary counter.
In practice it would be slow because not a whole lot of CPU architectures natively handle BCD. If this "standard" goes mainstream maybe the vendors will adapt and make special purpose "DEC64 FPU" hardware.
I'm not really sure what's the point of using this floating point format outside of banking and probably a few other niche applications. For general purpose computing I see absolutely no benefits.
It's not inherently slower. It's a question of economics. How much are people willing to spend to get a CPU to make it be fast? With IEEE 754 math, there's a lot of monetary incentive because a lot of code uses 754. With a new decimal class, there is much less inventive.
There is no eye contact or body language possible between the pilot and the communicator in the SR-71. What makes that story so perfect is not just the speed aspect, not just the smugness of the other planes, but the interpersonal relationship between the pilot and his crew member. It's just perfect story-telling.
It's the same way with ICBMs. Once they've gone ballistic they look up at the stars to get their bearings. Then they switch to their gyroscope to guide themselves in. It makes sense, in a chilling way: you can't rely on radio signals like GPS, because in a nuclear exchange your satellites are likely to be destroyed or jammed. You can't rely on surface features because in a nuclear apocalypse those might be changing too. The only things that can be trusted to work are the gyroscope and the stars. And the payload.
(In case you're wondering, I know this because I was wiki-ing around after my iphone's accelerometer went nuts to figure out how state-of-the-art gyroscopes worked. Turns out that there's a special type of gyroscope, the Ring Laser Gyroscope, which uses relativity and interference patterns to achieve ~0 drift. RLGs were developed for ICBMs.)
Lack of Turing-completeness can be a feature. Take PDF vs PostScript. The latter is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first.
By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.
For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.
>There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.
Of course there's a "better way": running the code in a sandbox. You could do so using js.js, for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)
You're right inasmuch as I shouldn't have implied that unsandboxed interpretation is the only option.
But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:
1. I can know a priori roughly how much CPU I will spend evaluating this payload.
2. I can know that the payload halts.
This is why, for example, the D language in DTrace is intentionally not Turing-complete.
I agree 100% with you, but #1 isn't completely true. The counterexample is the ZIP bomb (http://en.wikipedia.org/wiki/Zip_bomb) Whenever you unzip anything you got from outside, you should limit the time spent and the amount of memory written.
1. imposing CPU limits incurs an inherent CPU overhead and code complexity.
2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.
So now if we fully evaluate the options, the choice is between:
1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.
2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.
I have to agree here. General Turing-completeness was known from the beginning to imply undecidable questions -- about it's structure, running time, memory and so on. I don't think this has a place as the 'data'.
Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.
Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.
If you have a nice data format like s-exprs, it's a fairly simple matter to just aggressively reject any code/data that can't be proven harmless. For example, if you're loading saved game data, just verify that the table contains only tables with primitive data; if there's anything else, throw an error. Then you can safely execute it in a turing-complete environment and be sure it won't cause problems.
Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.
While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.
The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).
Here's the InfoQ video and a few threads from when it was announced:
I've looked at EDN a bit, even started a sad little C# parser. I don't see what it has to do with my previous comment, which is all about how schemas are potentially useful. I'm trying to say that after you check the schema, you don't just read the data, you execute it, and that has the effect of applying the configuration or just constructing the object.
Pronouns are fine. Substituting 'the first' and 'the second' would be an improvement. It's specifically 'former' and 'latter' that should be deprecated. I'd be interested in seeing a study comparing readers' comprehensions of the various phrasings. What cost in clarity would you be willing to pay?
Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.