

Jeeves – A Language for Automatically Enforcing Privacy Policies - yiedyie
http://projects.csail.mit.edu/jeeves/

======
dj-wonk
From the papers section: "A Language for Automatically Enforcing Privacy
Policies"
[http://projects.csail.mit.edu/jeeves/papers/popl088-yang.pdf](http://projects.csail.mit.edu/jeeves/papers/popl088-yang.pdf)
(I hope you like the lambda calculus!)

~~~
dj-wonk
This is a better (video) introduction for programmers:
[http://projects.csail.mit.edu/jeeves/talks.php](http://projects.csail.mit.edu/jeeves/talks.php)

~~~
jeanyang
This paper is the most up-to-date on the semantics of the current
implementation:
[http://projects.csail.mit.edu/jeeves/papers/plas07-austin.pd...](http://projects.csail.mit.edu/jeeves/papers/plas07-austin.pdf)

------
ptwobrussell
@jeanyang - This is really amazing work, and I'm very interested in it.

One question for you: On slide 10 you show "state of the art" as a bunch of
conditional logic guarding the returned values and "Jeeves" as just a simple
properly lookup.

Do you know if Facebook's "state of the art" is as you are describing it in
this slide, or do you think they have a proprietary framework similar to
Jeeves that they use internally to abstract away all of these details (and
potential liability)?

Seem that they would, but I thought you might have some first-hand knowledge
you could share given the background research you must have done in order to
get to the point.

~~~
jeanyang
Thanks for your interest! This is a great question. I've worked at Facebook on
backend privacy, so let me think about what I can say without violating my
NDA.

It's my understanding that large companies will create proprietary frameworks
that help manage privacy policies on data. Programmers will typically be
required to follow certain coding discipline when working with sensitive
values so that they're calling library functions to manage policies. To my
knowledge, however, these libraries deal with access control (who can access a
specific piece of information) rather than information flow (how information
may flow through a system).

Now here's the difference between access control and information flow--and why
we need a language (or at least a DSL). When you only have access control,
you're trusting the programmer to tell you correctly at one point where a
piece of data is going. Even if a sensitive location value is used in a bunch
of search queries, the result of which is shared as a status (that becomes
visible to many people with different levels of access), the programmer is
responsible for asking for the right level of access _when accessing that
location_. With the complex policies we're starting to see in modern
applications, managing this is becoming increasingly burdensome for
developers. That's why were looking at how to automatically handle
_information flow_ : the system tracks how sensitive values are used in order
to make sure the values--and resulting computations--are flowing only to those
with appropriate permissions. While it's relatively simple to hook access
control into existing programming models, automatically handling information
flow requires enhancing the language semantics (especially for conditions and
function calls) to track additional information.

Automatically managing information flow the way Jeeves does significantly
relieves programmer burden, but can be computationally expensive. Much of our
research these days is about how to make this more efficient so that companies
can one day put this sort of mechanism into their production systems.

Happy to answer additional questions!

~~~
ptwobrussell
@jeanyang - Thanks, this is a terrific answer!

"When you only have access control, you're trusting the programmer to tell you
correctly at one point where a piece of data is going."

Right, that pretty much sums it all up, and I have my suspicions (independent
of your response) that something like Jeeves could add value even to a "state
of the art" company like Facebook :)

I'll be watching your project with great interest and may even follow up with
you "offline" once I dig in some more, because I think you're really onto
something of great importance. Any sufficiently complex enterprise system
eventually needs something not too different from what you're working on
here...

------
jeanyang
Happy to answer questions!

~~~
mratzloff
Can you give a high-level overview of the implementation? I read the README,
and if I could guess it seems enforcement is determined through wrapped
values, lambda functions that determine access based on numeric thresholds,
and roles are assigned numbers in that range. Is that close?

~~~
jeanyang
Yes, pretty close. In Jeeves, sensitive values are essentially pairs of values
guarded by a label: <k ? vHigh : vLow>, where k is the label, vHigh is the
high-confidentiality "view" (corresponding to, for instance, my sensitive GPS
location), and vLow is the low-confidentiality "view" (corresponding to, for
instance, some coarser-grained version of my location--such as what country
I'm in). Rather than roles, we have these flexible policies can be arbitrary
lambda functions that take a "viewer" argument. Each policy function is
associated with a label and returns "true" if the label can be "high." The
implementation evaluates each part of a multi-faceted value in order to return
multi-faceted results. These results are "concretized" according to who is
looking at them: given the viewer, we can use the policy functions to
determine what the viewer should see. The implementation overloads Python
operators and does dynamic source transformation (via the @jeeves macro) in
order to support computations on these multi-faceted values.

~~~
mratzloff
Thanks for the detailed reply!

------
jmnicolas
It's not a programming language, but a DSL.

~~~
judk
Please explain the difference.

~~~
jeanyang
A language exists on its own, with its own syntax and semantics. It may
interoperate with other languages, but it usually takes some work to get a
language written in one language to talk to a language executing in another
language. (Examples include C/OCaml and all the languages that execute on the
.NET runtime.)

An embedded domain-specific language is a language with its own semantics that
has been grafted onto another language. For instance, Jeeves is embedded in
Python. We can use Jeeves as a Python library, but when we're using Jeeves
functions, the program behave like Jeeves programs rather than vanilla Python
programs. (In this case, it means that the runtime tracks different possible
views of sensitive values and computations done on them.) When programmers use
the @jeeves decorator and the Jeeves API, the programs look like Python
programs and can even use Python built-in functions and libraries, but the
Jeeves library is doing work behind the scenes to make the programs behave
differently.

------
ape4
I am sure Facebook's oversharing is due to a programmer error. /s

------
taybin
Very interesting. Removing human error is maybe overstating the benefit, since
you can still introduce bugs in your use of the DSL, but it's very
interesting. The writeup is pretty opaque to me, but I'd love to see some
libraries build on this.

------
dj-wonk
Github project:
[https://github.com/jeanqasaur/jeeves](https://github.com/jeanqasaur/jeeves)

------
milliams
Link to the actual project:
[http://projects.csail.mit.edu/jeeves/](http://projects.csail.mit.edu/jeeves/)

This article tells nothing useful.

~~~
bjz_
No examples on the home page. Nor in the 'code' section. Or in the README on
github. You'd think people would learn that most readers on the internet give
up looking after 3 seconds.

~~~
jeanyang
There are some examples here:
[https://github.com/jeanqasaur/jeeves/wiki/A-Quick-
Introducti...](https://github.com/jeanqasaur/jeeves/wiki/A-Quick-Introduction-
to-Jeeves)

Thanks for the feedback and sorry they were hard to find!

~~~
pera
Your project looks super interesting. And unlike _bjz__ , "examples" are not
the first thing I wanna see in a new DSL website but a link to the paper :)

------
scotch_drinker
Every time I see something like "Removes Human Error", I think of Nassim
Taleb's _Antifragile_. This drive to remove errors is likely flawed from the
start since we, as humans, are error prone. It's interesting from a
theoretical standpoint but there will be human error, regardless of the
language. Errors are part of human existence.

~~~
chongli
_This drive to remove errors is likely flawed from the start since we, as
humans, are error prone._

What? The drive to remove errors is with machines, not humans. Machines can be
built which detect and fix human errors. Why is this a flawed premise? One
need only look at the history of programming language innovations to see all
the successful examples of errors being eliminated.

~~~
scotch_drinker
But that's the thing, the errors keep getting eliminated but we keep getting
new errors. There is no utopia where all the errors are gone, especially not
from a top down driven language design. Can we get better? Sure and we have
been. That's awesome. But I'm fairly sure we cannot "remove errors" as in
"make things error free".

~~~
chongli
I think you're being overly pedantic. When the author uses the phrase "remove
errors" I take it to mean "remove a class (or classes) of errors". How is this
not a worthy goal?

~~~
scotch_drinker
I don't believe it's pedantic to see a difference between the words "reduce"
and "remove". We think we can design top down systems that are error free. We
are wrong. Systems are never error free and thinking they could be is actually
quite bad for us. By doing everything we can to remove errors completely, we
actually just ensure that the errors that remain will be much worse in scope.
Far better to build a redundant system that is prepared for the errors instead
of building an efficient system expecting no errors.

It is certainly a worthy goal to reduce errors. To not do so would be silly.
But expecting errors to go to zero is not a worthy goal because there are
always errors.

~~~
chongli
_I don 't believe it's pedantic to see a difference between the words "reduce"
and "remove"._

No, but it's pedantic to generalize "remove errors" to "remove all errors".
Actually, scratch that. It's just plain _wrong_.

 _Far better to build a redundant system that is prepared for the errors
instead of building an efficient system expecting no errors._

Redundant systems only deal with certain classes of errors. You're making the
same mistake you accuse the author of making.

------
untothebreach

      Please submit the original source. If a blog post reports on something they found on another site, submit the latter.
    

[http://ycombinator.com/newsguidelines.html](http://ycombinator.com/newsguidelines.html)

~~~
inkovic
In defense of the OP, it technically IS the original source. Technology review
is a product of MIT and they were merely reposting, word for word, their press
release.

~~~
untothebreach
Very true, though I think the OP should have used some judgement. In this
case, the submitted article is a short paragraph, followed by a link to the
interesting content. It seems like a no-brainer to submit the interesting
content rather than the submitted article.

