Hacker News new | past | comments | ask | show | jobs | submit login
Jeeves – A Language for Automatically Enforcing Privacy Policies (mit.edu)
67 points by yiedyie on Feb 26, 2014 | hide | past | web | favorite | 34 comments

Link to the actual project: http://projects.csail.mit.edu/jeeves/

This article tells nothing useful.

Retrospectively I think was better if I had posted the project site.

No examples on the home page. Nor in the 'code' section. Or in the README on github. You'd think people would learn that most readers on the internet give up looking after 3 seconds.

There are some examples here: https://github.com/jeanqasaur/jeeves/wiki/A-Quick-Introducti...

Thanks for the feedback and sorry they were hard to find!

Your project looks super interesting. And unlike bjz_, "examples" are not the first thing I wanna see in a new DSL website but a link to the paper :)

Every time I see something like "Removes Human Error", I think of Nassim Taleb's Antifragile. This drive to remove errors is likely flawed from the start since we, as humans, are error prone. It's interesting from a theoretical standpoint but there will be human error, regardless of the language. Errors are part of human existence.

This drive to remove errors is likely flawed from the start since we, as humans, are error prone.

What? The drive to remove errors is with machines, not humans. Machines can be built which detect and fix human errors. Why is this a flawed premise? One need only look at the history of programming language innovations to see all the successful examples of errors being eliminated.

But that's the thing, the errors keep getting eliminated but we keep getting new errors. There is no utopia where all the errors are gone, especially not from a top down driven language design. Can we get better? Sure and we have been. That's awesome. But I'm fairly sure we cannot "remove errors" as in "make things error free".

I think you're being overly pedantic. When the author uses the phrase "remove errors" I take it to mean "remove a class (or classes) of errors". How is this not a worthy goal?

I don't believe it's pedantic to see a difference between the words "reduce" and "remove". We think we can design top down systems that are error free. We are wrong. Systems are never error free and thinking they could be is actually quite bad for us. By doing everything we can to remove errors completely, we actually just ensure that the errors that remain will be much worse in scope. Far better to build a redundant system that is prepared for the errors instead of building an efficient system expecting no errors.

It is certainly a worthy goal to reduce errors. To not do so would be silly. But expecting errors to go to zero is not a worthy goal because there are always errors.

I don't believe it's pedantic to see a difference between the words "reduce" and "remove".

No, but it's pedantic to generalize "remove errors" to "remove all errors". Actually, scratch that. It's just plain wrong.

Far better to build a redundant system that is prepared for the errors instead of building an efficient system expecting no errors.

Redundant systems only deal with certain classes of errors. You're making the same mistake you accuse the author of making.

I don't think "remove errors" means "remove all errors", it just means some unspecified number or category of errors will be removed. Given that, "reduce errors" is probably more clear.

Edit: oops - I meant to reply to the parent...

From the papers section: "A Language for Automatically Enforcing Privacy Policies" http://projects.csail.mit.edu/jeeves/papers/popl088-yang.pdf (I hope you like the lambda calculus!)

This is a better (video) introduction for programmers: http://projects.csail.mit.edu/jeeves/talks.php

This paper is the most up-to-date on the semantics of the current implementation: http://projects.csail.mit.edu/jeeves/papers/plas07-austin.pd...

@jeanyang - This is really amazing work, and I'm very interested in it.

One question for you: On slide 10 you show "state of the art" as a bunch of conditional logic guarding the returned values and "Jeeves" as just a simple properly lookup.

Do you know if Facebook's "state of the art" is as you are describing it in this slide, or do you think they have a proprietary framework similar to Jeeves that they use internally to abstract away all of these details (and potential liability)?

Seem that they would, but I thought you might have some first-hand knowledge you could share given the background research you must have done in order to get to the point.

Thanks for your interest! This is a great question. I've worked at Facebook on backend privacy, so let me think about what I can say without violating my NDA.

It's my understanding that large companies will create proprietary frameworks that help manage privacy policies on data. Programmers will typically be required to follow certain coding discipline when working with sensitive values so that they're calling library functions to manage policies. To my knowledge, however, these libraries deal with access control (who can access a specific piece of information) rather than information flow (how information may flow through a system).

Now here's the difference between access control and information flow--and why we need a language (or at least a DSL). When you only have access control, you're trusting the programmer to tell you correctly at one point where a piece of data is going. Even if a sensitive location value is used in a bunch of search queries, the result of which is shared as a status (that becomes visible to many people with different levels of access), the programmer is responsible for asking for the right level of access when accessing that location. With the complex policies we're starting to see in modern applications, managing this is becoming increasingly burdensome for developers. That's why were looking at how to automatically handle information flow: the system tracks how sensitive values are used in order to make sure the values--and resulting computations--are flowing only to those with appropriate permissions. While it's relatively simple to hook access control into existing programming models, automatically handling information flow requires enhancing the language semantics (especially for conditions and function calls) to track additional information.

Automatically managing information flow the way Jeeves does significantly relieves programmer burden, but can be computationally expensive. Much of our research these days is about how to make this more efficient so that companies can one day put this sort of mechanism into their production systems.

Happy to answer additional questions!

@jeanyang - Thanks, this is a terrific answer!

"When you only have access control, you're trusting the programmer to tell you correctly at one point where a piece of data is going."

Right, that pretty much sums it all up, and I have my suspicions (independent of your response) that something like Jeeves could add value even to a "state of the art" company like Facebook :)

I'll be watching your project with great interest and may even follow up with you "offline" once I dig in some more, because I think you're really onto something of great importance. Any sufficiently complex enterprise system eventually needs something not too different from what you're working on here...

Happy to answer questions!

Can you give a high-level overview of the implementation? I read the README, and if I could guess it seems enforcement is determined through wrapped values, lambda functions that determine access based on numeric thresholds, and roles are assigned numbers in that range. Is that close?

Yes, pretty close. In Jeeves, sensitive values are essentially pairs of values guarded by a label: <k ? vHigh : vLow>, where k is the label, vHigh is the high-confidentiality "view" (corresponding to, for instance, my sensitive GPS location), and vLow is the low-confidentiality "view" (corresponding to, for instance, some coarser-grained version of my location--such as what country I'm in). Rather than roles, we have these flexible policies can be arbitrary lambda functions that take a "viewer" argument. Each policy function is associated with a label and returns "true" if the label can be "high." The implementation evaluates each part of a multi-faceted value in order to return multi-faceted results. These results are "concretized" according to who is looking at them: given the viewer, we can use the policy functions to determine what the viewer should see. The implementation overloads Python operators and does dynamic source transformation (via the @jeeves macro) in order to support computations on these multi-faceted values.

Thanks for the detailed reply!

It's not a programming language, but a DSL.

Ah yes. From the README on github:

> Jeeves is a programming language for automatically enforcing privacy policies. We have implemented it as an embedded domain-specific language in Python.

I'm guessing the PR person writing the original post thought "Programming Language" sounded better.

Honestly, I'm glad its only a DSL. It means I can still use the programming language that I love but get the benefits of some obscure special purpose language.

Please explain the difference.

A language exists on its own, with its own syntax and semantics. It may interoperate with other languages, but it usually takes some work to get a language written in one language to talk to a language executing in another language. (Examples include C/OCaml and all the languages that execute on the .NET runtime.)

An embedded domain-specific language is a language with its own semantics that has been grafted onto another language. For instance, Jeeves is embedded in Python. We can use Jeeves as a Python library, but when we're using Jeeves functions, the program behave like Jeeves programs rather than vanilla Python programs. (In this case, it means that the runtime tracks different possible views of sensitive values and computations done on them.) When programmers use the @jeeves decorator and the Jeeves API, the programs look like Python programs and can even use Python built-in functions and libraries, but the Jeeves library is doing work behind the scenes to make the programs behave differently.

  Please submit the original source. If a blog post reports on something they found on another site, submit the latter.

In defense of the OP, it technically IS the original source. Technology review is a product of MIT and they were merely reposting, word for word, their press release.

Very true, though I think the OP should have used some judgement. In this case, the submitted article is a short paragraph, followed by a link to the interesting content. It seems like a no-brainer to submit the interesting content rather than the submitted article.

Thanks for bringing be that out now I'll read again the guidelines and this time more thoroughly, they are not that long either.

I am sure Facebook's oversharing is due to a programmer error. /s

Very interesting. Removing human error is maybe overstating the benefit, since you can still introduce bugs in your use of the DSL, but it's very interesting. The writeup is pretty opaque to me, but I'd love to see some libraries build on this.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact