Hacker News new | comments | show | ask | jobs | submit login

This is a good point, but I feel that discouraging this type of approach is not the way to go.

I apologise in advance for ranting... I hope this is not too off-topic, but instead a "zoom out" on the issue.

This touches on something deep and wrong about how we use computers these days. Computers are really good at being computers, and the amplification of intellectual capabilities they afford is tremendous, but this is reserved for a limited few that were persistent enough and learned enough to rediscover the raw computer buried underneath, and what it can do.

For example, I dream of a world where everything communicates through s-expressions, all code is data and all data is code. Everything understandable all the way down. Imagine what people from all fields could create with this level of plug-ability and inter-operability. We had a whiff of that with the web so far, but it could be so much more powerful, so much simpler, so much more elegant. All the computer science is there, it's just a social problem.

I understand the security issues, but surely limiting the potential of computers is not the solution. There has to be a better way.




Lack of Turing-completeness can be a feature. Take PDF vs PostScript. The latter is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first.

By limiting expressiveness you also gain static analysis and predictability. It's not about limiting the potential of computers, it's about designing systems that strike the right balance between the power given to the payload and the guarantees offered to the container/receiver.

For example, it is only because JSON is flat data and not executable that web pages can reasonably call JSON APIs from third parties. There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.


If you have a nice data format like s-exprs, it's a fairly simple matter to just aggressively reject any code/data that can't be proven harmless. For example, if you're loading saved game data, just verify that the table contains only tables with primitive data; if there's anything else, throw an error. Then you can safely execute it in a turing-complete environment and be sure it won't cause problems.

Speaking for myself, in my ideal world this sort of schema-checking and executing is ubiquitous and easy. Obviously that's not the world today. While there are tools for checking JSON schemata there doesn't seem to be a standard format. I wonder how hard it would be to implement a Lua schema-checker.


Have you checked out EDN yet (https://github.com/edn-format/edn)?

It's a relatively new data format designed by Rich Hickey that has versioning and backward-compatibility baked in from the start.

EDN stands for "Extensible Data Notation". It has an extensible type system that enables you to define custom types on top of its built-in primitives, and there's no schema.

To define a type, you simply use a custom prefix/tag inline:

  #wolf/pack {:alpha "Greybeard" :betas ["Frostpaw" "Blackwind" "Bloodjaw"]}
While you can register custom handlers for specific tags, properly implemented readers can read unknown types without requiring custom extensions.

The motivating use case behind EDN was enabling the exchange of native data structures between Clojure and ClojureScript, but it's not Clojure specific -- implementations are starting to pop up in a growing number of languages (https://github.com/edn-format/edn/wiki/Implementations).

Here's the InfoQ video and a few threads from when it was announced:

https://news.ycombinator.com/item?id=4487462, https://groups.google.com/forum/#!topic/clojure/aRUEIlAHguU, http://www.infoq.com/interviews/hickey-clojure-reader


I've looked at EDN a bit, even started a sad little C# parser. I don't see what it has to do with my previous comment, which is all about how schemas are potentially useful. I'm trying to say that after you check the schema, you don't just read the data, you execute it, and that has the effect of applying the configuration or just constructing the object.


>There really is no "better way" -- if JSON was executable then calling such an API would literally be giving it full control of your app and of the user's computer.

Of course there's a "better way": running the code in a sandbox. You could do so using js.js[1], for example. (Of course, replacing a JSON API with sandboxed JS code is likely to be a bad idea. But it is possible.)

[1] https://sns.cs.princeton.edu/2012/04/javascript-in-javascrip...


You're right inasmuch as I shouldn't have implied that unsandboxed interpretation is the only option.

But my larger point still stands; the fundamental tradeoff is still "power of the payload" vs "guarantees to the container." Even in the case of sandboxed execution, the container loses two important guarantees compared with non-executable data formats like JSON:

1. I can know a priori roughly how much CPU I will spend evaluating this payload.

2. I can know that the payload halts.

This is why, for example, the D language in DTrace is intentionally not Turing-complete.


I agree 100% with you, but #1 isn't completely true. The counterexample is the ZIP bomb (http://en.wikipedia.org/wiki/Zip_bomb) Whenever you unzip anything you got from outside, you should limit the time spent and the amount of memory written.


If excess CPU/non-halting behavior is the issue, you could run the code with a timeout.


You could. But that has downsides also:

1. imposing CPU limits incurs an inherent CPU overhead and code complexity.

2. if those limits are hit, you can't tell whether the code just ran too long or whether it was in an infinite loop.

So now if we fully evaluate the options, the choice is between:

1. A purely data language like JSON: simple to implement, fast to parse, decoder can skip over parts it doesn't want, etc.

2. A Turing-complete data format: have to implement sandboxing and CPU limits (both far trickier security attack surfaces), have configure CPU limits, when CPU limits are exceeded the user doesn't know whether the code was in an infinite loop or not, maybe have to re-configure CPU limits.

Sure, sometimes all the work involved in (2) is worth it, that's why we have JavaScript in web browsers after all. But a Turing-complete version of JSON would never have taken off like JSON did for APIs, because it would be far more difficult and perilous to implement.


I have to agree here. General Turing-completeness was known from the beginning to imply undecidable questions -- about it's structure, running time, memory and so on. I don't think this has a place as the 'data'.

Abstractions exist for a reason -- this is analogous to source/channel coding separation or internet layers. They don't have to be that way, but are there for a reason.

Someone could change my opinion, though. Provide me a data format which proves certain things about it's behavior and that would be a nice counterexample.



Join me on my crusade to eliminate the use of 'former' and 'latter' in any writing unless the goal is obfuscation. It's error prone, almost always requires rereading, and never is the clearest choice.

Here's my attempt at a clearer version:

"Take Postscript vs PDF. Postscript is Turing-complete and therefore you cannot jump to an arbitrary page or even know how many pages the document has without running the entire thing first."


That's like campaigning to eliminate pronouns. "Former" and "latter" are just like "it", except they are for when you mentioned two things.

Repeating the proper noun doesn't achieve the goal of emphasizing the fact that you're referring to something you just mentioned.


Pronouns are fine. Substituting 'the first' and 'the second' would be an improvement. It's specifically 'former' and 'latter' that should be deprecated. I'd be interested in seeing a study comparing readers' comprehensions of the various phrasings. What cost in clarity would you be willing to pay?


Did you even read his comment? That is precisely what he said!

"Take PDF vs PostScript. The latter is Turing-complete"


I think that is what haberman meant.

Back on topic: The reason for PDF's existence is to be a non-turing complete subset of postscript. Features like direct indexing to a page are why Linux has switched to PDF as the primary interchange format.


In a world where users will willingly enter malicious code into their computers if they believe it will do something they want[1], can there really be a better way?

[1] https://www.facebook.com/selfxss


When it comes right down to it, you can't fully protect people from themselves. Even in 'meat space', which the general population is presumably experienced with, people talk others into doing things that they should not all the time. Anything from social engineering to bog-standard scam artists masquerading as door-to-door salesmen.


But in 'meat space', it is way harder and more expensive to do evil against large numbers of people. For example, phishing as done electronically (throw a very, very wide net, and hope) doesn't make economical sense if one had to do it manually.

Also, if, for example, cars and airplanes and banks and nuclear submarines would accept executable code as input, some people would do damage on a gargantuan scale.

Clearly, being liberal in what you accept must end somewhere. I argue that it should end very, very soon. Even innocuous things such as "let's allow everybody to read the subject of everyone's mail messages", if available at scale and cheaply, would entice criminal behavior, for example by those mining them for information that you are away from home.

Does anybody know how RMS thinks about passwords nowadays?


I'm not saying that scams in "cyberspace" don't present a greater threat than scams in "meatspace". I'm just pointing out that if you cannot protect people from themselves in "meatspace", then doing it in "cyberspace" is futile. You can fight it and cut back on it, but you will never actually win that fight. Technological problems to what is ultimately a sociological problem only go so far, and we should be careful to not obsess over them to a fault.


And yet, in "meatspace", chainsaws come with more safety mechanisms than butter knives because the damage they can do is so much larger. Yes, we can't win that fight; people will die from chainsaw accidents, but I disagree that we shouldn't be more vigilant about chainsaws than about butter knives.


This seems like exactly the problem the parent post is complaining about, though: the people in the limited group the parent talks about aren't the people being tricked by selfxss, it's the people who don't have the technical knowledge to understand what the developer console does and why pasting in random JS might be a bad idea. So the phenomenon of selfxss reinforces the point.


Perhaps modern operating systems (or hardware?) need two modes - "Safe mode", where everything is sanitised, checked, limited and Secure Boot-style verified, and "Open mode" where it's not; where experts and enthusiasts can work without limit and without DRM.


That only adds an extra step to the social engineering process.


Safe mode is called iOS


When my conputer has the potential to erase my bank account, I want to limit its potential.


You're discouraging it in the wrong place.

Learning to swim is not done by throwing a kid in the deep end of a pool. Learning to code is not done by encouraging bad security practices.


On the other hand, taking the easy way out when this kind of security problem comes up leads to having a machine that's just an appliance and not a computer. If you've closed every local code execution vulnerability, you've probably rendered your system completely non-programmable and erected a monumental barrier to learning how to hack.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: