
Ask HN: How would you implement a verifiable open-source web application? - gvido
Wasn&#x27;t really sure how to make the title descriptive enough without putting the full question there.<p>Anyway, I have been toying with the idea of creating an open-source web application. Open-source in the way that I would host the product on my own servers but the full source code, except, of course, authentication info, tokens and things which shouldn&#x27;t be under source control anyway, would be available on Github. Anyone would also be able to set up a self-hosted version of the product, and if anyone wanted to contribute, I would accept pull requests, etc.<p>The idea behind that was that, since it would be targeted mostly at a tech-savy crowd and deals with personal information, I would like to introduce some level of trust that I&#x27;m not doing anything sneaky or unexpected behind the scenes (like storing information I shouldn&#x27;t be).<p>So basically I started wondering if it is possible to implement a way people could verify that the same code they see on the Github repo is the code that&#x27;s also running on the live hosted site? I will be working with node.js, but I don&#x27;t think the tech stack is too relevant here.<p>The client-side part, as I imagined it, would be relatively trivial to verify - run the build part locally, compare fingerprint of the live code and the local version.<p>But my idea got stuck on the server side. Since there, even if I would make a server side endpoint that returns the fingerprint of the live code, there is no way for someone to check what is actually going on, I could just as well return a static file with a hardcoded fingerprint.<p>I&#x27;m sure someone has dealt with or thought about similar problems before, and I would be happy to hear some insights. Feels like this might be a pipe dream, but at least some level of verification would be nice to achieve.
======
lmm
The only way you can do this is if the server is not fully under your control
but partially controlled by the remote client.

We've been here before: this is Trusted Computing. You need a Trusted Platform
Module on your servers (thankfully you're picking the hardware, so you can
make that a hard requirement). Your users can inspect and sign your code with
their keys, that they generate and keep on the client side (you never see
them). Or more likely, they sign that they trust a particular third-party
auditor. Either way, their data is uploaded encrypted with their keys and only
code they have signed will ever be allowed to decrypt it.

It won't be easy. You'll have to keep old versions of your code around in case
users haven't signed the new versions. The TPM-handling libraries are
immature, though they get better every day. But it's possible, particularly
since you only need to make it work with one particular model of TPM.

Good luck!

~~~
ethbro
I think I picked this up off Hacker News originally, but there's apparently
new Intel stuff (aka SGX) coming out to help with this.
[http://theinvisiblethings.blogspot.com/2013/08/thoughts-
on-i...](http://theinvisiblethings.blogspot.com/2013/08/thoughts-on-intels-
upcoming-software.html)

Unfortunately, I think the reason most open source people have a knee-jerk
aversion to trusted platforms are that they've historically been designed to
only serve the interests with the most money (read: the government and/or
content industry).

There's nothing inherently anti-open source about the schemes, and they would
provide innumerable benefits to increasing security confidence in a networked
world.

However, when you can rattle off enough failed or botched encryption
initiatives involving a hardware component to fill one hand just from the top
of one's head (CSS, AACSS, HDCP, UEFI/SecureBoot, FairPlay), confidence is not
inspired...

~~~
mindslight
Remote attestation is most certainly anti- Free software.

Bank: "For your security, you may only access our website with an officially
supported browser"

~~~
sweis
If you remotely attest your own software, how is that anti-freedom?

I use remote attestation to verify that my firmware, kernel, initrd, and
configuration were booted as expected. It's a tool you can use for your own
benefit.

What you are describing is someone else attesting that their software booted
on your computer. That was the scary scenario people were afraid of when
trusted computing rolled out, but it never materialized. Nobody is using TXT
for DRM.

~~~
mindslight
Like secure boot, it ultimately comes down to who has the keys.

If you generated the signing key and loaded it onto the hardware (or generated
on-chip but it's signed by nothing else), then I don't see the problem.

If the hardware has a factory-generated private key that you cannot get at and
the corresponding public key can be verified through some well known trust
root, then a third party can ask you to attest to what software is running on
your hardware and you _cannot lie_. This custom hasn't materialized yet, but
it's not too hard to imagine it catching on after support winds its way up the
software stack.

What are the specifics of your setup?

------
rainmaking
This is a noble goal. I would like to do this, too.

It appears to me, however, to be theoretically impossible to achieve. The code
is running on your hardware; there is simply at the moment no known way to
give any kind of assurance about remotely running code.

What would work, of course, would be to implement the entire app as a
JavaScript application that only asks the backend for information when
necessary. The blockchain.info bitcoin wallet does something to this effect.
So does the web version of the LastPass vault. There is, of course, still no
assurance that you will not occasionally inject compromising code when it is
difficult to spot, but this is the closest that I have seen.

Richard Stallman is also advocating something to this effect:
[https://www.gnu.org/philosophy/javascript-
trap.html](https://www.gnu.org/philosophy/javascript-trap.html)

~~~
MichaelGG
Stallman is just recommending a way to have scripts self identify as free
software. There's nothing that'll help there.

Implementing as an API doesn't help much, and in fact might just make it
easier to fake. This is because you have a reduced surface area that you need
to check and modify.

It's theoretically impossible if you assume full control of hardware. But most
people are not capable of controlling hardware, so secure enclaves and remote
attestation are likely to be a legitimate win if feasible.

~~~
rainmaking
The first statement is untrue: Stallman is also advocating measures to replace
obfuscated JavaScript web apps with free versions:

"Browser users also need a convenient facility to specify JavaScript code to
use instead of the JavaScript in a certain page. (The specified code might be
total replacement, or a modified version of the free JavaScript program in
that page.) Greasemonkey comes close to being able to do this, but not quite,
since it doesn't guarantee to modify the JavaScript code in a page before that
program starts to execute."

As for the second claim, actually implementing as an API does indeed help,
because most of the code is then running in the browser and can be audited.

As for the third, the are no known ways to implement secure enclaves and
remote attestation, that is what the questioner is asking. If you know of any,
do share them.

~~~
MichaelGG
OK, but unminimized JS still has no bearing on trusted computing.

What about Intel TXT (maybe?) and the upcoming SGX? Although I've not seen
details on how the key system works with SGX. But assuming each processor has
a unique ID/public key signed by Intel, and assuming we trust Intel and assume
it's not profitable/plausible for a darknet to undo Intel's hardware
protection, SGX seems to be _exactly_ what the OP is asking for.

~~~
rainmaking
Yes it does; the current state of the art of trusted computing is indeed "run
open source on your own hardware", and unminifed JS does that. It's only, as
of now, impractical, because the browser does not help with verification.

I was unaware of Intel SGX; sounds okay in principle, but I would consider the
jury out until it's released and a the security community has weighed in.

------
chubot
This is the goal of CloudProxy:
[http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-13...](http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-135.html)

which is open source:
[https://github.com/jlmucb/cloudproxy](https://github.com/jlmucb/cloudproxy)

It relies on TPMs (trusted platform modules, a hardware root of trust).

What confused me about the naming is that CloudProxy is an OS, not a proxy
server. It's a distributed OS that provides attestation of the identity of
remote code. To do this you need secure boot and key management.

If anyone dives further into it, let me know :) I'm curious how deployable it
is from the Github repo. I guess you can run it on Linux, but I'm not sure how
the kernel is involved in the chain of trust. I would have thought you needed
your own OS.

 _The CloudProxy Tao (henceforth, “the Tao”) is a recipe for creating secure,
distributed, cloud-based services by combining ingredients that are already
available in many cloud data centers. The Tao is realized as an interface that
can be implemented at any layer of a system. CloudProxy implements multiple
layers of the Tao and provides means for_

\- _protecting the confidentiality and integrity of information stored or
transmitted by some hosted program,_

\- _establishing that the code executed as a hosted program in a cloud is the
expected code and is being run in the expected environment, and_

\- _authenticating requests to the hosted program to check that they come from
a client executing some expected program in an expected environment, either
remotely or locally in the cloud._

 _CloudProxy is the first implemented, fully fleshed-out system providing
these properties along with key management and an appropriate trust model for
all principals._

------
kazinator
There is no way to know that some remote blackbox is genuine, except to feed
all the inputs that you are feeding to that blackbox also to a trusted
blackbox, and verify that the outputs are identical. This breaks down as soon
as random numbers are involved (session keys, etc).

Also, even this does not assure you that the blackbox isn't doing something
bad via a side channel. For instance, if we trust the genuine blackbox not to
transmit personal data somewhere, how do we know that the BUT (blacbox-under-
test) isn't doing that? We have to isolate the blackbox, and then destroy it
at the end of the test (so it can't store something and transmit later). You
can't isolate a remote box. A remote box has inputs and outputs unknown to you
that you cannot monitor.

Trust is a big problem with the SaaS model. When you're trusting a SaaS
application, you're really trusting the provider who is hosting that
application. There is no way around that.

I think, however, that there is an intermediate wortwhile goal: to have
assurance that you're trusting _only_ the host of the application!

That is to say, that the SaaS application doesn't contain malware which was
not intentionally injected by the provider, but somehow got there via an
upstream code source or through some exploit or whatever.

If you can provide a way that the SaaS provider itself can check that it's
running the code that it thinks it is running, that is valuable. If the SaaS
is rogue and customizes the code to do things that the users don't agree with,
that's a separate concern.

------
jewel
The answer is cryptography. You need to have the client encrypt everything
before sending it to the server, removing even the option for wrongdoing. This
only works if the client is a program, app, or browser extension that the
users can compile themselves. There's not (yet) a way to verify the client
code that is running in the browser.

If the server needs to do stuff with the data, then what you want probably not
possible. (It depends on the exact thing that needs to be done, as there are
things like homomorphic encryption.) Instead, you should focus on non-
technical assurances that you are acting in good faith and that promote trust.

This would include things like having a privacy policy on the site with strong
guarantees about what can be changed in the future and having a physical
address. You could put funds in escrow that would pay out to the users if you
violated the policy. You could have outside auditors come and verify your
procedures.

Honestly, you can't guarantee that you won't have a security breach or that
the government will give you a national security letter. I'd focus on building
your service and making it useful enough that users deem the risk a worthwhile
trade-off.

~~~
kancer
Homomorphic encryption is the solution when you want the server to process the
data, if it every becomes efficient.

------
bcg1
This is not a technical solution, but you could release your code under AGPL
and take code contributions.

By doing this, you are essentially telling the world that not only do they
need to trust you, but that you are willing to make yourself legally culpable
if you surreptitiously run different code on your servers (ie., you would be
violating the copyright terms of your contributors whose code is licensed
under AGPL and of which your application is a derivative).

It would be nice to have a technical solution as well, I'm not discounting any
of the other suggestions here. Just saying that adding a legal dimension could
help counterbalance the possibility that a technical protection could be
defeated.

------
LukeShu
As rainmaking said, I don't believe that there is a way you can fingerprint
the running code in a non-spoofable way. Whatever protections you put in the
design could be bypassed by having a proxy layer dispatch two requests for
each request received: one to the actual application, and one to
evil_application.

I believe that the best route to take that is most in-line with your goals
would be to design it such that the server-side is untrusted from a security
standpoint. Have the client process the data, and only give encrypted or
sanitary data to the server side. Don't trust the server with anything other
than availability.

~~~
arethuza
"Don't trust the server with anything other than availability."

You probably shouldn't even be trusting any one server even with that!

------
random28345
> I would host the product on my own servers

Don't do that. If you're targeting your product for the tinfoil hat crowd,
that's simply not going to work. Instead, you create a build script that will
generate your application from source, and (for example) generate a docker
image. This image could be run on your servers, on a third-party server (AWS,
DO, etc.) or on the user's own hardware, depending on the level of
inconvenience/security tradeoff they are willing to endure.

I know you're probably looking for the consistent revenue streams of a SaaS,
but unless user data can be completely encrypted during storage (e.g. email,
backup, etc.), the truly paranoid don't want to trust their information to a
3rd party.

------
stefanks
Ethereum is a distributed virtual machine, based on blockchain technology. It
appears to be what you are looking for.
[https://ethereum.org/](https://ethereum.org/)

~~~
teraflop
I've seen Ethereum mentioned before, but it's really really hard to figure out
whether it's a proposal or a thing that actually exists.

Are there any applications actually using Ethereum yet? Almost everything
linked to from the home page consists of "roadmaps" or "coming soon" pages. Is
there a public network that's up and running? How many nodes does it have?

~~~
drdeca
There is a testnet that works right now, but should not be used in production
yet (I think it gets reset regularly? Not sure.). Within a few months, it is
expected that the first version of "the real thing" will be up, but I don't
think putting things on it quite then would necessarily be a good idea
(probably depends on what things one is considering putting on it), because it
is set up so that after a certain time, there will be a change to a new
version, and while all balances will be carried over across that change to the
new version, other parts of the state of the system won't be, so anything
built on it would have to be set up again. (which could just be a matter of
sending the same message again, so it might not be that much of a problem to
set it up before the change I guess.)

Also, its probably important to note that a contract is run by every miner
(or, every miner on a certain part of the network, depending on how the
scalability thing ends up being done?), and that as such, you will want to
keep any computation done by a contract to be computationally cheap, so that
it will be cheap to use. Instead of having the computation do the computation
that needs to be done, it may in some cases work better to just have the
contract verify the accuracy of computations that have been done.

However, now that I think of it, I'm not sure that Ethereum would solve all of
the problems OP gives, because one of the problems OP wanted to solve was that
they wanted to demonstrate that they were not storing information that they
shouldn't be. But with Ethereum, because the contracts are executed by
"everyone", everyone has to have access to the data the contracts are using to
run, and there is no way to insure that they don't hold onto that data.

One way to solve this could maybe be doing computations on shared secrets (as
talked about in one of the Ethereum blog posts, but which is not something
Ethereum is to have), but this might require more messages to be sent over the
network than one is willing to use. Still much more practical than
homeomorphic encryption though I think. (If one was using the shared secret
thing, I'm not sure one would do it with Ethereum.)

Ethereum could/would solve the problem of ensuring that the program being run
is exactly as claimed, but it might not keep certain information private.
Depending on the specific problem at hand, there might be ways around that
though?

EDIT: ok, so, something suggested that homeomorphic encryption might have
gotten an improvement to the point of practicality recently? I wasn't aware of
that. May make this post slightly out of date.

------
decasia
Neat question. I guess there are really two separate things a user could want
to verify:

* the application isn't broken/still fulfills its API contract

* the application isn't compromised in a malicious way

As far as the first point, I wonder if it could be possible for users to run
some sort of a test suite against the public API? Like a crowd-sourced test
suite that verifies that production server behavior is still as advertised.

The second point I think can only be partially addressed by partial methods,
since it's impossible to guarantee that some sneaky compromise hasn't
happened. But you could allow outside auditing, let people have some form of
read-only access to the directory tree that stores the code (if it's separate
from the config), etc.

------
dfragnito
Decouple the application from the DB. Use a DB as a service that has an HTTP
API. The application encrypts/decrypts all data with public/private key pair.
The user, your client generates their own key pair.

This is the scenario we are testing out with
[http://schemafreedb.com/](http://schemafreedb.com/)

You do not have the private key so you cannot see the data but you still can
offer the data portion as a service. Your client does need to host the
application depending on your target client node may be a good choice or if
you want to go mainstream go with something like php.

When ever you have update to the code you can provide a diff of the changes.

------
jdiez17
One way of doing this (for your particular use case, but not for the general
case) would be for your "business logic" to be implemented in the client side,
which as you said, can be verified easily. Then make your backend be a dumb
data storage.

For example, if you want to store people's names and date of birth, you would
encrypt those on the client-side and only ever send ciphertext to the server.

The encryption key could be derived via a passphrase composed of a user name
and a password. Of course, this means that if someone loses their credentials,
they lose their data forever.

------
mappu
Trust comes back to a trust anchor.

If you could create e.g. a publicly available AMI of your application and
prevent further runtime modifications to it (e.g. disallow SSH access), then
maybe Amazon could offer an interface to verify that your application was
running based on the trusted AMI.

Essentially Amazon would issue a statement connecting a certain IP address to
a certain application.

Substitute AMI and Amazon for your choice of other technology as appropriate
(docker container and docker hosting provider - this sounds like a competitive
advantage for hosting providers hint hint)

------
MichaelGG
So you saw the news, how even major dark markets are unreliable, and figure,
hey, apparently anyone can get in on that action? Basically want to run a dark
market, but are aware of the trust implications? I'd imagine just being open
source would be a good start.

The closest to actually trusted computing is, well, trusted computing. Remote
attestation allows you to verify you're executing specific bits of code. But I
don't know if the tech is exposed in a useful enough way. With DRM apps, the
idea is that the hardware will be able to verify it's running known code. Then
you provide encrypted input to that processor, knowing the only way to decrypt
is to be in the trusted code. Still many remaining issues, such as extending
the trusted code to cover all access to data storage and wallets.

You might look at other aspects, such as somehow creating community
signatures. Like, I dunno, if main wallets needed sign off from 10% of the
user population to do major changes. But that is rife with problems.

And really, users don't seem sophisticated enough to bother with this stuff
anyways. You could just paste some JavaScript "verification" code and I'd
guess you'd get loyal defenders that don't know the difference. The market
that just went down did multisig, right? But since it's too difficult, people
just ignored the option, no?

Forgive me if my main assumption about the motive being incorrect.

------
mangeletti
I thought of doing this with the last web app I was running, but I decided not
to, and here's why:

1) Anyone, not just nice people, can view source code on GitHub

2) Source code can be used to find vulnerabilities (which is of course one of
the great values of __using __open source code - vulnerabilities are usually
spotted more quickly by a larger group)

3) A single vulnerability that allows access to private data OR can lead to
corruption of loss of data could put your company out of business

~~~
akerl_
You're simultaneously claiming that open source code is great because large
groups of people can look at it to spot vulnerabilities and that it's not
great because large groups of people can look at it to spot vulnerabilities.

There are people on both sides of that fence, but you do need to be on one
side or the other.

~~~
eevilspock
> you do need to be on one side or the other

binary, black and white thinking.

------
kephra
You need a 2nd trusted but independent person, who writes an application that
runs on your webserver checking its own signature, and the hash sums of your
code, after this trusted person started it with a password only he knows to
access the encrypted database of hashkeys and other metadata.

The drawback is, that you can show that the site is not compromised directly
after a reboot, but you need to call your friend to login, to give the
password for his validity checker. Once his app runs, other can connect to it,
and use the public key of the app to check if your own app is ok.

The problem is to find someone who is independent of you, so your community
trusts him, and you also need to trust him, as his code is running on your
server.

------
ianbicking
I think there's value in being able to say: here's the code I am claiming to
use on this service, and the only way it isn't is if I have deliberately and
actively lied.

That means that if someone hot-edits the files on the server, the resulting
edits should be visible, and/or the site is clearly unverified. If you deploy
from a branch someone doesn't know about, it should be clear. If you just
don't document that you made a deployment, someone should be able to figure
that out.

Of course that can be spoofed, but spoofing a solid claim on what is running
is very different than not making any claim about the code that is running, so
you've made yourself accountable.

~~~
mitchtbaum
Ah.. Now I get it.

So, if each branch's code was signed and contained an embedded key and chosen
encryption algorithm, then if the app used those during processing and users
received verifiable transmissions, that app's output could be verified by
users as having come from that advertised branch.

~~~
mitchtbaum
[http://en.wikipedia.org/wiki/Code_signing](http://en.wikipedia.org/wiki/Code_signing)
seems relevant. This case is more about having the software itself sign its
own output. Relevant search terms:

* software "sign its own output"

* software "encrypt its own output"

* software "encrypt its output"

* software "sign its output"

Some interesting results:

* Computer scientists develop 'mathematical jigsaw puzzles' to encrypt software (UCLA) #comment by zblaxell [http://lwn.net/Articles/562113/](http://lwn.net/Articles/562113/)

* Cryptographic Verification of Test Coverage Claims [http://www.cs.ucdavis.edu/~devanbu/doc.ps](http://www.cs.ucdavis.edu/~devanbu/doc.ps)

* Study of Security in Multi-Agent Architectures §3.4 [http://www.ecs.soton.ac.uk/~lavm/papers/sec.pdf](http://www.ecs.soton.ac.uk/~lavm/papers/sec.pdf)

------
BoppreH
Simple, cute solution that anybody can understand: record yourself setting it
up.

A video feed from a camera and another from the screen itself. Starting from a
fresh system, install the dependencies, download the source, verify the
hashes, and run the server. Then show the server response from another
machine.

Sure it doesn't guarantee 100%. The OS image could have been tampered, or you
mucked with the network and intercepted requests. But it's easy to record,
easy to understand and much more secure than a black box server.

~~~
ChristianBundy
Couldn't you go back in and change it at any time though?

------
kyled
You can't. When someone else hosts the program the remote system effectively
becomes a black box. The only way would be to give the code to the user so
they can inspect, compile, and run for themselves. Even allowing the user to
inspect the code on remote machine isn't good enough. The methods in which the
user inspects the data can be hijacked and make it look like the user is
looking at legitimate data.

------
niche
I love this idea: I think you can extend it further.

I would build this idea on the Blockchain; every transaction is tracked and
publicly available while details are secure

PTE: Publicly transparent entities (like a non profit, without the bullshit)

Also love the idea of having a computerized "gatekeeper"; a dangerous
proposition, query-less tables that only computers can read; make it
unreadable, and it is truly unhackable

------
fweespeech
As interesting as this would be, and even if you found a solution...

> I would like to introduce some level of trust that I'm not doing anything
> sneaky or unexpected behind the scenes (like storing information I shouldn't
> be).

All you have to do is take something like OpenResty, use it to proxy the
traffic and terminate SSL b/t the App and the rest of the internet, and you
can do all the nefarious things you wanted.

The ability to use proxies in such a transparent manner guarantees that this
isn't possible, regardless of whether or not you actually succeed in the
stated aim of verifiable open source code.

Tbh, the closest viable solution is for a reliable 3rd party auditor with
professional credentials to perform regular audits to match your production
environment to what you tell the general public. Otherwise, you can simply
circumvent whatever safeguards you create by simply using a separate
application to proxy traffic.

At this point, you are in the realm of 3rd party software audits and that is
an established field.

------
nickysielicki
This is funny, I was just daydreaming about this yesterday during class.

What I was thinking is allowing anonymous SSH access with a VERY locked down
shell (/bin/rbash [1]) and let them view that everything is in order with
their own eyes.

You still REALLY can't make a system that the user can conclusively see isn't
just for show. You could be shoving them into a jail/chroot that provides the
illusion of transparency, but be serving them elsewhere.

I think that this might be the closest you'll ever come to user software
freedom on hardware that they don't own. There are a lot of security concerns,
though, so I think it's out of the question for any type of production
environment. I'd love to see a proof of concept from someone though.

[1]: [https://www.gnu.org/software/bash/manual/html_node/The-
Restr...](https://www.gnu.org/software/bash/manual/html_node/The-Restricted-
Shell.html)

~~~
eli
Not only is this possible to fake, it's probably _easier_ to fake than to
implement for real.

------
mpnordland
Essentially anti cheat software in reverse. Instead of the client proving it
is unmodified to the server, the server must prove itself unmodified to the
client. You may be able to get somewhere by studying how open source network
games prove they aren't modified to the server if they do that sort of thing.

~~~
icebraining
Which they don't; they just make it harder to pretend they aren't modified.
Cheating is very much still possible, and more still if you wrote it all
yourself.

------
NoMoreNicksLeft
> So basically I started wondering if it is possible to implement a way people
> could verify that the same code they see on the Github repo is the code
> that's also running on the live hosted site?

If Github wanted to get into the hosting business, they could offer this...
you'd be trusting what they say when they tell users that the code is
identical in both.

I can't think of any clever way to prove it otherwise. Though, if you could,
it would have broader implications... imagine Microsoft handing the code to an
app over (for viewing), and then being able to prove that the shipped version
of the app was the same. They could verifiably claim that their software has
no backdoors (save those that are also in the source code, but obfuscated...
those are rare, but exist apparently).

This is an idea worth exploring. Good luck.

~~~
towelguy
I doesn't necessarily need to be GitHub itself, any hosting with a reputation
could host your software and verify which git commit it is pointing to.

~~~
elithrar
> and verify which git commit it is pointing to

You'd need to go a step even further. The "application" code is only one thing
- what about other applications, processes, DB logic, HTTP front-ends?

All of those can modify requests, data, copy data, etc - even if you could
"100% prove" that the server is running that particular git revision, there's
so many side-channels as to make it useless.

------
chj
I know little about trust computing, but here's a thought.

To know whether an open source program on server is modified, we send a
customized different executable copy every time with one time use secrets. So
when the program starts, it has to answer questions correctly and shortly (to
protect against reverse engineering) to prove it's a genuine copy, then we can
send it our encrypted access key. The access key will never be written to disk
by a genuine copy, so a restarted program won't be able to access our data
without asking for a key again, then we will know something is wrong.

The copies we upload to server functions exactly like open source one, but the
user is responsible for adding secret parts to it so that it's closed source.

------
thingification
Have you considered sandstorm.io?

[https://sandstorm.io/](https://sandstorm.io/)

I realize this doesn't meet your requirement re hosting it yourself -- but
since I don't know the full background, perhaps that doesn't matter to you.

~~~
popham
You can host Sandstorm yourself. The most recent relevant Dev Group post that
I see on the subject of self-hosting is at
[https://groups.google.com/forum/#!topic/sandstorm-
dev/ff3oZG...](https://groups.google.com/forum/#!topic/sandstorm-
dev/ff3oZGQ7xr4).

------
hmsimha
I was trying to figure something like this out myself a while ago. The best
solution I could come up with was a transparent Platform as a Service provider
that had an option for the person(s) with access to allow anyone who wants to
inspect the slug and application settings (minus any secret tokens) to do so.
Essentially, it would be verifiable for any users of the application that the
DNS resolves to the servers of the PaaS provider, and then from there they
could look up the code deployed to the application, the Procfile, etc.

This would still require placing trust in a centralized entity though, and
allow for administrators to manipulate or dump db info without the users
knowing.

------
jonjacky
Possibly pertinent-- Quark : A Web Browser with a Formally Verified Kernel.
The page says, "the specification implies several security properties..."

[http://goto.ucsd.edu/quark/](http://goto.ucsd.edu/quark/)

------
astrocat
I've always wondered this myself. The claim of "we're open source so you can
trust us" has always seemed a bit off to me for the exact reasons you give -
sure, you publish this codebase, but who says that's what you're actually on
your servers?

And from the answers so far, it's disheartening to hear that there doesn't
seem to be any great way to guarantee fair play. But I suppose it generalizes
- we don't have any way to guarantee that the people we interact with on a
daily basis have our best interests at heart, and we do a lot of trusting just
because it makes life easier.

------
towelguy
The only way I can think of is giving trusted auditors periodical access to
the server and have them make fingerprints of the code/data for others to
compare. I think this is what some bitcoin exchanges do, for example.

------
VikingCoder
It would be possible for Amazon, Azure, or Google Compute Engine to provide a
service like this.

"We certify that the code running on X machine came directly from Source Code
Y at Version Z of Branch W."

I think it's a great idea.

------
rcthompson
There's no point in trying to verify that your app is running the server-side
code it claims to be running. That still doesn't prevent you from logging into
your own server as root and taking everyone's data. If you have people's data
unencrypted on your server, they are ultimately trusting _you_ with it, not
your app. Part of that is trusting your app not to have a security hole that
leaks data to third parties, and verification could help with that. But it
won't help with securing user data against a malicious sysadmin.

------
zokier
Heck I think even enforcing local trusted computing effectively is a challenge
enough, doing it remotely with attestation sounds like quite a feat.

Does anyone know what the state of executable signing for Linux is these days?
I found some unmaintained DigSig project, and some noise about SecureBoot
related patches from couple years back. And that would be just a start, I
haven't heard anything that would allow enforcing code signing for "dynamic"
code (like JS or Python)

------
oneeyedpigeon
You could convince me that your server has a checked-out copy of a given body
of code by a) giving me push access to a single 'throwaway' file in your
repository b) generating a fingerprint of the codebase and serve it. You can't
then just return a hardcoded fingerprint, but this doesn't guarantee that you
aren't running other things in addition to that code.

~~~
ethbro
How do you know that the pull the server made & hashed is actually the running
code?

Again, this requires a root of trust on the server, otherwise, anything
returned from the server _including any information you would need to verify
code the server is running_ could be spoofed.

------
popham
Have you discovered [https://unhosted.org](https://unhosted.org)? For the
server-side, you could let users provide their own backend
([https://unhosted.org/practice/31/Allowing-the-user-to-
choose...](https://unhosted.org/practice/31/Allowing-the-user-to-choose-the-
backend-server.html)).

------
ghgr
Maybe the best way to ensure you don't do anything sneaky with the data is not
having the data in the first place (at least unencrypted). Do you know the
Mylar project [1]? They are doing pretty interesting stuff with zero-trust
(even compromised) servers.

[1] [https://css.csail.mit.edu/mylar/](https://css.csail.mit.edu/mylar/)

------
pjc50
You can't make it 100% watertight. You can't even prove to _yourself on your
own system_ that there isn't anything going on behind the scenes, unless on
very restricted hardware with an open-source BIOS and no system management
processor, verified all the PCI cards, USB devices, etc.

------
sinaa
Hi. I've been meant to build something very similar to this (with some
differences that'd also make it enterprise friendly), but I've been looking
around to find a team.

If you -- or anyone in this thread -- want to build it together or to discuss
ideas, please get in touch.

------
vbezhenar
Client code could be hard to verify too. Browser do not support this
functionality and there are no popular addons to verify JavaScript I'm aware
of. You can't just download script.js with curl and assume that server will
serve the same file to your browser.

------
frik
Do it like AAA games that disallow modifications. They ship with a text file
that contains every file and its file-hash (crc32, md5, sha1, etc.). And the
game executable checks the hash of text file and then checks the hash of files
listed in the text file.

~~~
towelguy
The software runs on your server so you could spoof the checker.

------
dguido
Ironclad Apps from MSR:

[https://www.usenix.org/conference/osdi14/technical-
sessions/...](https://www.usenix.org/conference/osdi14/technical-
sessions/presentation/hawblitzel)

------
parfe
Why not a conventional desktop application? Those still exist and work quite
nicely. I work on one everyday. What benefit does a user receive from trusting
code running on your machine?

------
sacheendra
This would allow users to verify if a SaaS provider is running unmodified
software. Provided of course that they open source all their stack.

------
josh2600
Just make a snapshot on terminal.com and then the state of your web
application can be distributed at an instant in time. You can verify its
exactly what you say it is since the snapshot is bound to your user name.

You can make a snapshot for free right now if you want to. This should solve
your problem.

Let me know if you have any questions, but using terminal's snapshot feature
you can distribute your web app at a known state.

Edit: it's like git versioning for machine state.

~~~
MichaelGG
While it seems cool, this doesn't solve anything. At _best_ it just shifts
trust onto Terminal. And really, there's nothing stopping a malicious VM owner
to "fix" up things to a good looking state, then show off that snapshot, then
revert and continue.

A solid example to keep in mind is MtGox. How can we run something and know no
invalid trades are added, no fake password resets processed, etc etc.

------
Kinnard
This is what Ethereum is for, but it's much lower level than 'web
applications'.

------
oldmanjay
I don't know the answer but if you figure it out you'll be a trillionaire.

------
scardine
Lets start a startup for certifying the code running at some service is the
same on the public repo.

Small bitcoin fee would be charged to the privacy nuts willing to confirm it.

