Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How would you implement a verifiable open-source web application?
160 points by gvido on Mar 19, 2015 | hide | past | favorite | 83 comments
Wasn't really sure how to make the title descriptive enough without putting the full question there.

Anyway, I have been toying with the idea of creating an open-source web application. Open-source in the way that I would host the product on my own servers but the full source code, except, of course, authentication info, tokens and things which shouldn't be under source control anyway, would be available on Github. Anyone would also be able to set up a self-hosted version of the product, and if anyone wanted to contribute, I would accept pull requests, etc.

The idea behind that was that, since it would be targeted mostly at a tech-savy crowd and deals with personal information, I would like to introduce some level of trust that I'm not doing anything sneaky or unexpected behind the scenes (like storing information I shouldn't be).

So basically I started wondering if it is possible to implement a way people could verify that the same code they see on the Github repo is the code that's also running on the live hosted site? I will be working with node.js, but I don't think the tech stack is too relevant here.

The client-side part, as I imagined it, would be relatively trivial to verify - run the build part locally, compare fingerprint of the live code and the local version.

But my idea got stuck on the server side. Since there, even if I would make a server side endpoint that returns the fingerprint of the live code, there is no way for someone to check what is actually going on, I could just as well return a static file with a hardcoded fingerprint.

I'm sure someone has dealt with or thought about similar problems before, and I would be happy to hear some insights. Feels like this might be a pipe dream, but at least some level of verification would be nice to achieve.

The only way you can do this is if the server is not fully under your control but partially controlled by the remote client.

We've been here before: this is Trusted Computing. You need a Trusted Platform Module on your servers (thankfully you're picking the hardware, so you can make that a hard requirement). Your users can inspect and sign your code with their keys, that they generate and keep on the client side (you never see them). Or more likely, they sign that they trust a particular third-party auditor. Either way, their data is uploaded encrypted with their keys and only code they have signed will ever be allowed to decrypt it.

It won't be easy. You'll have to keep old versions of your code around in case users haven't signed the new versions. The TPM-handling libraries are immature, though they get better every day. But it's possible, particularly since you only need to make it work with one particular model of TPM.

Good luck!

I think I picked this up off Hacker News originally, but there's apparently new Intel stuff (aka SGX) coming out to help with this. http://theinvisiblethings.blogspot.com/2013/08/thoughts-on-i...

Unfortunately, I think the reason most open source people have a knee-jerk aversion to trusted platforms are that they've historically been designed to only serve the interests with the most money (read: the government and/or content industry).

There's nothing inherently anti-open source about the schemes, and they would provide innumerable benefits to increasing security confidence in a networked world.

However, when you can rattle off enough failed or botched encryption initiatives involving a hardware component to fill one hand just from the top of one's head (CSS, AACSS, HDCP, UEFI/SecureBoot, FairPlay), confidence is not inspired...

I'm fine with "Trusted Computing". I'd just like the private keys, please.

Oh wait, I can't do that? Hmm, so who are you trusting against? Me, you say?

Nope nope nope.

Yes. If Intel can let you control the keys and guarantees in some way that it's not compromising your trusted module, then SGX would be perfect. Unfortunately, right now, Intel wants to keep the SGX keys for itself.

Remote attestation is most certainly anti- Free software.

Bank: "For your security, you may only access our website with an officially supported browser"

If you remotely attest your own software, how is that anti-freedom?

I use remote attestation to verify that my firmware, kernel, initrd, and configuration were booted as expected. It's a tool you can use for your own benefit.

What you are describing is someone else attesting that their software booted on your computer. That was the scary scenario people were afraid of when trusted computing rolled out, but it never materialized. Nobody is using TXT for DRM.

Like secure boot, it ultimately comes down to who has the keys.

If you generated the signing key and loaded it onto the hardware (or generated on-chip but it's signed by nothing else), then I don't see the problem.

If the hardware has a factory-generated private key that you cannot get at and the corresponding public key can be verified through some well known trust root, then a third party can ask you to attest to what software is running on your hardware and you cannot lie. This custom hasn't materialized yet, but it's not too hard to imagine it catching on after support winds its way up the software stack.

What are the specifics of your setup?

On the other hand, it allows trusting otherwise untrusted third parties. For instance, you could use it to run a verifiably safe bitcoin tumbler. (Assuming you can trust Intel directly and against attackers.)

So does designing a proper protocol that doesn't rely on tamper-proof hardware as a cornerstone.

The essential idea behind software freedom is that your computer runs code wholly of your choosing and functions as your agent. The parties to a transaction voluntarily meet together by adhering to a mutually beneficial protocol.

Allowing other parties to know exactly what code you're running lets the more powerful party dictate that your code works for their benefit, effectively leaving you without a computer.

Wait, so you "sign" the version of code you're okay with using, and if you haven't "signed" the new version off, you're actually served by the old version of the code (the "newest" version you've "signed")?

Wicked. :)

Perhaps it's a stupid question, but how can a web client confirm that code is really running from inside the TPM? Since the source code is freely available from github, isn't there a chance that whoever controls the server (hacker or malicious owner) can simply override the TPM at some point in future, and run the unsigned, possibly altered, code directly, circumventing all the restrictions? As far as I understand the mechanics of TPM, for this setup to work you need the server owner to be trusted and actively monitoring and protecting the server setup like banks do, to detect any breach. Then it makes sense: you trust the owner, owner creates the proper setup and TPM protects it from any future unauthorized changes. If the owner of the server cannot be trusted this will not work, since he/she can just change the back-end setup and do as he's pleased, and you don't have any way to detect this on the client side?

TPMs cannot, AFAIK. But Intel SGX can. It allows you to remotely verify code running on the processor and provide inputs that can only be decrypted by the secure enclave. How I'm guessing this would play out is:

1. Site publishes its hardware public key, allows users to verify it can sign on behalf on an Intel processor.

2. Site publishes source and reproducible build, so everyone can agree on a hash of acceptable bits.

3. Users submit requests encrypted to that public key (there's also something missing, where the key is actually a combination of the public key plus the hash of the executable code. Maybe the processor signs another cert for a specific proc+code combo).

4. Server can only decrypt when it has access to the matching private key, which is only available after entering the secure enclave.

5. If the server could decrypt the request and sign a response, the user knows it was handled by the right bits.

This still has many problems, the main one being that users are not going to really verify anything anyways. Also the data storage and all important handling needs to be done with encryption, so an admin can't just change the data.

But in theory, assuming no one can break the secure enclave/trust chain, it's a pretty nifty solution.

The web client can confirm that the code is running in what something with a TPM key claimed as being a secure environment (remote attestation). The whole point of the TPM is that its private key is stored in tamper-resistant hardware and never exposed to the outside. Of course no hardware is perfectly tamperproof, and I imagine a sufficiently smart or patient attacker could compromise one, but we're talking liquid nitrogen and electron microscopes territory here.

Is it known how exploit prevention is done in a remote trusted computing scenario? It is well known that TCM implementations on behalf of Microsoft are being circumvented left and right.

This is a noble goal. I would like to do this, too.

It appears to me, however, to be theoretically impossible to achieve. The code is running on your hardware; there is simply at the moment no known way to give any kind of assurance about remotely running code.

What would work, of course, would be to implement the entire app as a JavaScript application that only asks the backend for information when necessary. The blockchain.info bitcoin wallet does something to this effect. So does the web version of the LastPass vault. There is, of course, still no assurance that you will not occasionally inject compromising code when it is difficult to spot, but this is the closest that I have seen.

Richard Stallman is also advocating something to this effect: https://www.gnu.org/philosophy/javascript-trap.html

Stallman is just recommending a way to have scripts self identify as free software. There's nothing that'll help there.

Implementing as an API doesn't help much, and in fact might just make it easier to fake. This is because you have a reduced surface area that you need to check and modify.

It's theoretically impossible if you assume full control of hardware. But most people are not capable of controlling hardware, so secure enclaves and remote attestation are likely to be a legitimate win if feasible.

The first statement is untrue: Stallman is also advocating measures to replace obfuscated JavaScript web apps with free versions:

"Browser users also need a convenient facility to specify JavaScript code to use instead of the JavaScript in a certain page. (The specified code might be total replacement, or a modified version of the free JavaScript program in that page.) Greasemonkey comes close to being able to do this, but not quite, since it doesn't guarantee to modify the JavaScript code in a page before that program starts to execute."

As for the second claim, actually implementing as an API does indeed help, because most of the code is then running in the browser and can be audited.

As for the third, the are no known ways to implement secure enclaves and remote attestation, that is what the questioner is asking. If you know of any, do share them.

OK, but unminimized JS still has no bearing on trusted computing.

What about Intel TXT (maybe?) and the upcoming SGX? Although I've not seen details on how the key system works with SGX. But assuming each processor has a unique ID/public key signed by Intel, and assuming we trust Intel and assume it's not profitable/plausible for a darknet to undo Intel's hardware protection, SGX seems to be exactly what the OP is asking for.

Yes it does; the current state of the art of trusted computing is indeed "run open source on your own hardware", and unminifed JS does that. It's only, as of now, impractical, because the browser does not help with verification.

I was unaware of Intel SGX; sounds okay in principle, but I would consider the jury out until it's released and a the security community has weighed in.

This is the goal of CloudProxy: http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-13...

which is open source: https://github.com/jlmucb/cloudproxy

It relies on TPMs (trusted platform modules, a hardware root of trust).

What confused me about the naming is that CloudProxy is an OS, not a proxy server. It's a distributed OS that provides attestation of the identity of remote code. To do this you need secure boot and key management.

If anyone dives further into it, let me know :) I'm curious how deployable it is from the Github repo. I guess you can run it on Linux, but I'm not sure how the kernel is involved in the chain of trust. I would have thought you needed your own OS.

The CloudProxy Tao (henceforth, “the Tao”) is a recipe for creating secure, distributed, cloud-based services by combining ingredients that are already available in many cloud data centers. The Tao is realized as an interface that can be implemented at any layer of a system. CloudProxy implements multiple layers of the Tao and provides means for

- protecting the confidentiality and integrity of information stored or transmitted by some hosted program,

- establishing that the code executed as a hosted program in a cloud is the expected code and is being run in the expected environment, and

- authenticating requests to the hosted program to check that they come from a client executing some expected program in an expected environment, either remotely or locally in the cloud.

CloudProxy is the first implemented, fully fleshed-out system providing these properties along with key management and an appropriate trust model for all principals.

There is no way to know that some remote blackbox is genuine, except to feed all the inputs that you are feeding to that blackbox also to a trusted blackbox, and verify that the outputs are identical. This breaks down as soon as random numbers are involved (session keys, etc).

Also, even this does not assure you that the blackbox isn't doing something bad via a side channel. For instance, if we trust the genuine blackbox not to transmit personal data somewhere, how do we know that the BUT (blacbox-under-test) isn't doing that? We have to isolate the blackbox, and then destroy it at the end of the test (so it can't store something and transmit later). You can't isolate a remote box. A remote box has inputs and outputs unknown to you that you cannot monitor.

Trust is a big problem with the SaaS model. When you're trusting a SaaS application, you're really trusting the provider who is hosting that application. There is no way around that.

I think, however, that there is an intermediate wortwhile goal: to have assurance that you're trusting only the host of the application!

That is to say, that the SaaS application doesn't contain malware which was not intentionally injected by the provider, but somehow got there via an upstream code source or through some exploit or whatever.

If you can provide a way that the SaaS provider itself can check that it's running the code that it thinks it is running, that is valuable. If the SaaS is rogue and customizes the code to do things that the users don't agree with, that's a separate concern.

The answer is cryptography. You need to have the client encrypt everything before sending it to the server, removing even the option for wrongdoing. This only works if the client is a program, app, or browser extension that the users can compile themselves. There's not (yet) a way to verify the client code that is running in the browser.

If the server needs to do stuff with the data, then what you want probably not possible. (It depends on the exact thing that needs to be done, as there are things like homomorphic encryption.) Instead, you should focus on non-technical assurances that you are acting in good faith and that promote trust.

This would include things like having a privacy policy on the site with strong guarantees about what can be changed in the future and having a physical address. You could put funds in escrow that would pay out to the users if you violated the policy. You could have outside auditors come and verify your procedures.

Honestly, you can't guarantee that you won't have a security breach or that the government will give you a national security letter. I'd focus on building your service and making it useful enough that users deem the risk a worthwhile trade-off.

Homomorphic encryption is the solution when you want the server to process the data, if it every becomes efficient.

This is not a technical solution, but you could release your code under AGPL and take code contributions.

By doing this, you are essentially telling the world that not only do they need to trust you, but that you are willing to make yourself legally culpable if you surreptitiously run different code on your servers (ie., you would be violating the copyright terms of your contributors whose code is licensed under AGPL and of which your application is a derivative).

It would be nice to have a technical solution as well, I'm not discounting any of the other suggestions here. Just saying that adding a legal dimension could help counterbalance the possibility that a technical protection could be defeated.

As rainmaking said, I don't believe that there is a way you can fingerprint the running code in a non-spoofable way. Whatever protections you put in the design could be bypassed by having a proxy layer dispatch two requests for each request received: one to the actual application, and one to evil_application.

I believe that the best route to take that is most in-line with your goals would be to design it such that the server-side is untrusted from a security standpoint. Have the client process the data, and only give encrypted or sanitary data to the server side. Don't trust the server with anything other than availability.

"Don't trust the server with anything other than availability."

You probably shouldn't even be trusting any one server even with that!

> I would host the product on my own servers

Don't do that. If you're targeting your product for the tinfoil hat crowd, that's simply not going to work. Instead, you create a build script that will generate your application from source, and (for example) generate a docker image. This image could be run on your servers, on a third-party server (AWS, DO, etc.) or on the user's own hardware, depending on the level of inconvenience/security tradeoff they are willing to endure.

I know you're probably looking for the consistent revenue streams of a SaaS, but unless user data can be completely encrypted during storage (e.g. email, backup, etc.), the truly paranoid don't want to trust their information to a 3rd party.

Ethereum is a distributed virtual machine, based on blockchain technology. It appears to be what you are looking for. https://ethereum.org/

I've seen Ethereum mentioned before, but it's really really hard to figure out whether it's a proposal or a thing that actually exists.

Are there any applications actually using Ethereum yet? Almost everything linked to from the home page consists of "roadmaps" or "coming soon" pages. Is there a public network that's up and running? How many nodes does it have?

There is a testnet that works right now, but should not be used in production yet (I think it gets reset regularly? Not sure.). Within a few months, it is expected that the first version of "the real thing" will be up, but I don't think putting things on it quite then would necessarily be a good idea (probably depends on what things one is considering putting on it), because it is set up so that after a certain time, there will be a change to a new version, and while all balances will be carried over across that change to the new version, other parts of the state of the system won't be, so anything built on it would have to be set up again. (which could just be a matter of sending the same message again, so it might not be that much of a problem to set it up before the change I guess.)

Also, its probably important to note that a contract is run by every miner (or, every miner on a certain part of the network, depending on how the scalability thing ends up being done?), and that as such, you will want to keep any computation done by a contract to be computationally cheap, so that it will be cheap to use. Instead of having the computation do the computation that needs to be done, it may in some cases work better to just have the contract verify the accuracy of computations that have been done.

However, now that I think of it, I'm not sure that Ethereum would solve all of the problems OP gives, because one of the problems OP wanted to solve was that they wanted to demonstrate that they were not storing information that they shouldn't be. But with Ethereum, because the contracts are executed by "everyone", everyone has to have access to the data the contracts are using to run, and there is no way to insure that they don't hold onto that data.

One way to solve this could maybe be doing computations on shared secrets (as talked about in one of the Ethereum blog posts, but which is not something Ethereum is to have), but this might require more messages to be sent over the network than one is willing to use. Still much more practical than homeomorphic encryption though I think. (If one was using the shared secret thing, I'm not sure one would do it with Ethereum.)

Ethereum could/would solve the problem of ensuring that the program being run is exactly as claimed, but it might not keep certain information private. Depending on the specific problem at hand, there might be ways around that though?

EDIT: ok, so, something suggested that homeomorphic encryption might have gotten an improvement to the point of practicality recently? I wasn't aware of that. May make this post slightly out of date.

I'm a big Ethereum/Eris fan, but another option specifically for attested hosting is Codius from Ripple Labs (codius.org). I'm not sure how far along their development is atm.


Neat question. I guess there are really two separate things a user could want to verify:

* the application isn't broken/still fulfills its API contract

* the application isn't compromised in a malicious way

As far as the first point, I wonder if it could be possible for users to run some sort of a test suite against the public API? Like a crowd-sourced test suite that verifies that production server behavior is still as advertised.

The second point I think can only be partially addressed by partial methods, since it's impossible to guarantee that some sneaky compromise hasn't happened. But you could allow outside auditing, let people have some form of read-only access to the directory tree that stores the code (if it's separate from the config), etc.

Decouple the application from the DB. Use a DB as a service that has an HTTP API. The application encrypts/decrypts all data with public/private key pair. The user, your client generates their own key pair.

This is the scenario we are testing out with http://schemafreedb.com/

You do not have the private key so you cannot see the data but you still can offer the data portion as a service. Your client does need to host the application depending on your target client node may be a good choice or if you want to go mainstream go with something like php.

When ever you have update to the code you can provide a diff of the changes.

One way of doing this (for your particular use case, but not for the general case) would be for your "business logic" to be implemented in the client side, which as you said, can be verified easily. Then make your backend be a dumb data storage.

For example, if you want to store people's names and date of birth, you would encrypt those on the client-side and only ever send ciphertext to the server.

The encryption key could be derived via a passphrase composed of a user name and a password. Of course, this means that if someone loses their credentials, they lose their data forever.

Trust comes back to a trust anchor.

If you could create e.g. a publicly available AMI of your application and prevent further runtime modifications to it (e.g. disallow SSH access), then maybe Amazon could offer an interface to verify that your application was running based on the trusted AMI.

Essentially Amazon would issue a statement connecting a certain IP address to a certain application.

Substitute AMI and Amazon for your choice of other technology as appropriate (docker container and docker hosting provider - this sounds like a competitive advantage for hosting providers hint hint)

So you saw the news, how even major dark markets are unreliable, and figure, hey, apparently anyone can get in on that action? Basically want to run a dark market, but are aware of the trust implications? I'd imagine just being open source would be a good start.

The closest to actually trusted computing is, well, trusted computing. Remote attestation allows you to verify you're executing specific bits of code. But I don't know if the tech is exposed in a useful enough way. With DRM apps, the idea is that the hardware will be able to verify it's running known code. Then you provide encrypted input to that processor, knowing the only way to decrypt is to be in the trusted code. Still many remaining issues, such as extending the trusted code to cover all access to data storage and wallets.

You might look at other aspects, such as somehow creating community signatures. Like, I dunno, if main wallets needed sign off from 10% of the user population to do major changes. But that is rife with problems.

And really, users don't seem sophisticated enough to bother with this stuff anyways. You could just paste some JavaScript "verification" code and I'd guess you'd get loyal defenders that don't know the difference. The market that just went down did multisig, right? But since it's too difficult, people just ignored the option, no?

Forgive me if my main assumption about the motive being incorrect.

I thought of doing this with the last web app I was running, but I decided not to, and here's why:

1) Anyone, not just nice people, can view source code on GitHub

2) Source code can be used to find vulnerabilities (which is of course one of the great values of using open source code - vulnerabilities are usually spotted more quickly by a larger group)

3) A single vulnerability that allows access to private data OR can lead to corruption of loss of data could put your company out of business

You're simultaneously claiming that open source code is great because large groups of people can look at it to spot vulnerabilities and that it's not great because large groups of people can look at it to spot vulnerabilities.

There are people on both sides of that fence, but you do need to be on one side or the other.

> you do need to be on one side or the other

binary, black and white thinking.

You need a 2nd trusted but independent person, who writes an application that runs on your webserver checking its own signature, and the hash sums of your code, after this trusted person started it with a password only he knows to access the encrypted database of hashkeys and other metadata.

The drawback is, that you can show that the site is not compromised directly after a reboot, but you need to call your friend to login, to give the password for his validity checker. Once his app runs, other can connect to it, and use the public key of the app to check if your own app is ok.

The problem is to find someone who is independent of you, so your community trusts him, and you also need to trust him, as his code is running on your server.

I think there's value in being able to say: here's the code I am claiming to use on this service, and the only way it isn't is if I have deliberately and actively lied.

That means that if someone hot-edits the files on the server, the resulting edits should be visible, and/or the site is clearly unverified. If you deploy from a branch someone doesn't know about, it should be clear. If you just don't document that you made a deployment, someone should be able to figure that out.

Of course that can be spoofed, but spoofing a solid claim on what is running is very different than not making any claim about the code that is running, so you've made yourself accountable.

Ah.. Now I get it.

So, if each branch's code was signed and contained an embedded key and chosen encryption algorithm, then if the app used those during processing and users received verifiable transmissions, that app's output could be verified by users as having come from that advertised branch.

http://en.wikipedia.org/wiki/Code_signing seems relevant. This case is more about having the software itself sign its own output. Relevant search terms:

* software "sign its own output"

* software "encrypt its own output"

* software "encrypt its output"

* software "sign its output"

Some interesting results:

* Computer scientists develop 'mathematical jigsaw puzzles' to encrypt software (UCLA) #comment by zblaxell http://lwn.net/Articles/562113/

* Cryptographic Verification of Test Coverage Claims http://www.cs.ucdavis.edu/~devanbu/doc.ps

* Study of Security in Multi-Agent Architectures §3.4 http://www.ecs.soton.ac.uk/~lavm/papers/sec.pdf

Simple, cute solution that anybody can understand: record yourself setting it up.

A video feed from a camera and another from the screen itself. Starting from a fresh system, install the dependencies, download the source, verify the hashes, and run the server. Then show the server response from another machine.

Sure it doesn't guarantee 100%. The OS image could have been tampered, or you mucked with the network and intercepted requests. But it's easy to record, easy to understand and much more secure than a black box server.

Couldn't you go back in and change it at any time though?

You can't. When someone else hosts the program the remote system effectively becomes a black box. The only way would be to give the code to the user so they can inspect, compile, and run for themselves. Even allowing the user to inspect the code on remote machine isn't good enough. The methods in which the user inspects the data can be hijacked and make it look like the user is looking at legitimate data.

I love this idea: I think you can extend it further.

I would build this idea on the Blockchain; every transaction is tracked and publicly available while details are secure

PTE: Publicly transparent entities (like a non profit, without the bullshit)

Also love the idea of having a computerized "gatekeeper"; a dangerous proposition, query-less tables that only computers can read; make it unreadable, and it is truly unhackable

As interesting as this would be, and even if you found a solution...

> I would like to introduce some level of trust that I'm not doing anything sneaky or unexpected behind the scenes (like storing information I shouldn't be).

All you have to do is take something like OpenResty, use it to proxy the traffic and terminate SSL b/t the App and the rest of the internet, and you can do all the nefarious things you wanted.

The ability to use proxies in such a transparent manner guarantees that this isn't possible, regardless of whether or not you actually succeed in the stated aim of verifiable open source code.

Tbh, the closest viable solution is for a reliable 3rd party auditor with professional credentials to perform regular audits to match your production environment to what you tell the general public. Otherwise, you can simply circumvent whatever safeguards you create by simply using a separate application to proxy traffic.

At this point, you are in the realm of 3rd party software audits and that is an established field.

This is funny, I was just daydreaming about this yesterday during class.

What I was thinking is allowing anonymous SSH access with a VERY locked down shell (/bin/rbash [1]) and let them view that everything is in order with their own eyes.

You still REALLY can't make a system that the user can conclusively see isn't just for show. You could be shoving them into a jail/chroot that provides the illusion of transparency, but be serving them elsewhere.

I think that this might be the closest you'll ever come to user software freedom on hardware that they don't own. There are a lot of security concerns, though, so I think it's out of the question for any type of production environment. I'd love to see a proof of concept from someone though.

[1]: https://www.gnu.org/software/bash/manual/html_node/The-Restr...

Not only is this possible to fake, it's probably easier to fake than to implement for real.

Essentially anti cheat software in reverse. Instead of the client proving it is unmodified to the server, the server must prove itself unmodified to the client. You may be able to get somewhere by studying how open source network games prove they aren't modified to the server if they do that sort of thing.

Which they don't; they just make it harder to pretend they aren't modified. Cheating is very much still possible, and more still if you wrote it all yourself.

> So basically I started wondering if it is possible to implement a way people could verify that the same code they see on the Github repo is the code that's also running on the live hosted site?

If Github wanted to get into the hosting business, they could offer this... you'd be trusting what they say when they tell users that the code is identical in both.

I can't think of any clever way to prove it otherwise. Though, if you could, it would have broader implications... imagine Microsoft handing the code to an app over (for viewing), and then being able to prove that the shipped version of the app was the same. They could verifiably claim that their software has no backdoors (save those that are also in the source code, but obfuscated... those are rare, but exist apparently).

This is an idea worth exploring. Good luck.

I doesn't necessarily need to be GitHub itself, any hosting with a reputation could host your software and verify which git commit it is pointing to.

> and verify which git commit it is pointing to

You'd need to go a step even further. The "application" code is only one thing - what about other applications, processes, DB logic, HTTP front-ends?

All of those can modify requests, data, copy data, etc - even if you could "100% prove" that the server is running that particular git revision, there's so many side-channels as to make it useless.

I know little about trust computing, but here's a thought.

To know whether an open source program on server is modified, we send a customized different executable copy every time with one time use secrets. So when the program starts, it has to answer questions correctly and shortly (to protect against reverse engineering) to prove it's a genuine copy, then we can send it our encrypted access key. The access key will never be written to disk by a genuine copy, so a restarted program won't be able to access our data without asking for a key again, then we will know something is wrong.

The copies we upload to server functions exactly like open source one, but the user is responsible for adding secret parts to it so that it's closed source.

Have you considered sandstorm.io?


I realize this doesn't meet your requirement re hosting it yourself -- but since I don't know the full background, perhaps that doesn't matter to you.

You can host Sandstorm yourself. The most recent relevant Dev Group post that I see on the subject of self-hosting is at https://groups.google.com/forum/#!topic/sandstorm-dev/ff3oZG....

I was trying to figure something like this out myself a while ago. The best solution I could come up with was a transparent Platform as a Service provider that had an option for the person(s) with access to allow anyone who wants to inspect the slug and application settings (minus any secret tokens) to do so. Essentially, it would be verifiable for any users of the application that the DNS resolves to the servers of the PaaS provider, and then from there they could look up the code deployed to the application, the Procfile, etc.

This would still require placing trust in a centralized entity though, and allow for administrators to manipulate or dump db info without the users knowing.

Possibly pertinent-- Quark : A Web Browser with a Formally Verified Kernel. The page says, "the specification implies several security properties..."


I've always wondered this myself. The claim of "we're open source so you can trust us" has always seemed a bit off to me for the exact reasons you give - sure, you publish this codebase, but who says that's what you're actually on your servers?

And from the answers so far, it's disheartening to hear that there doesn't seem to be any great way to guarantee fair play. But I suppose it generalizes - we don't have any way to guarantee that the people we interact with on a daily basis have our best interests at heart, and we do a lot of trusting just because it makes life easier.

The only way I can think of is giving trusted auditors periodical access to the server and have them make fingerprints of the code/data for others to compare. I think this is what some bitcoin exchanges do, for example.

It would be possible for Amazon, Azure, or Google Compute Engine to provide a service like this.

"We certify that the code running on X machine came directly from Source Code Y at Version Z of Branch W."

I think it's a great idea.

There's no point in trying to verify that your app is running the server-side code it claims to be running. That still doesn't prevent you from logging into your own server as root and taking everyone's data. If you have people's data unencrypted on your server, they are ultimately trusting you with it, not your app. Part of that is trusting your app not to have a security hole that leaks data to third parties, and verification could help with that. But it won't help with securing user data against a malicious sysadmin.

Heck I think even enforcing local trusted computing effectively is a challenge enough, doing it remotely with attestation sounds like quite a feat.

Does anyone know what the state of executable signing for Linux is these days? I found some unmaintained DigSig project, and some noise about SecureBoot related patches from couple years back. And that would be just a start, I haven't heard anything that would allow enforcing code signing for "dynamic" code (like JS or Python)

You could convince me that your server has a checked-out copy of a given body of code by a) giving me push access to a single 'throwaway' file in your repository b) generating a fingerprint of the codebase and serve it. You can't then just return a hardcoded fingerprint, but this doesn't guarantee that you aren't running other things in addition to that code.

How do you know that the pull the server made & hashed is actually the running code?

Again, this requires a root of trust on the server, otherwise, anything returned from the server including any information you would need to verify code the server is running could be spoofed.

Have you discovered https://unhosted.org? For the server-side, you could let users provide their own backend (https://unhosted.org/practice/31/Allowing-the-user-to-choose...).

Maybe the best way to ensure you don't do anything sneaky with the data is not having the data in the first place (at least unencrypted). Do you know the Mylar project [1]? They are doing pretty interesting stuff with zero-trust (even compromised) servers.

[1] https://css.csail.mit.edu/mylar/

You can't make it 100% watertight. You can't even prove to yourself on your own system that there isn't anything going on behind the scenes, unless on very restricted hardware with an open-source BIOS and no system management processor, verified all the PCI cards, USB devices, etc.

Hi. I've been meant to build something very similar to this (with some differences that'd also make it enterprise friendly), but I've been looking around to find a team.

If you -- or anyone in this thread -- want to build it together or to discuss ideas, please get in touch.

Client code could be hard to verify too. Browser do not support this functionality and there are no popular addons to verify JavaScript I'm aware of. You can't just download script.js with curl and assume that server will serve the same file to your browser.

Do it like AAA games that disallow modifications. They ship with a text file that contains every file and its file-hash (crc32, md5, sha1, etc.). And the game executable checks the hash of text file and then checks the hash of files listed in the text file.

The software runs on your server so you could spoof the checker.

Why not a conventional desktop application? Those still exist and work quite nicely. I work on one everyday. What benefit does a user receive from trusting code running on your machine?

This would allow users to verify if a SaaS provider is running unmodified software. Provided of course that they open source all their stack.

Just make a snapshot on terminal.com and then the state of your web application can be distributed at an instant in time. You can verify its exactly what you say it is since the snapshot is bound to your user name.

You can make a snapshot for free right now if you want to. This should solve your problem.

Let me know if you have any questions, but using terminal's snapshot feature you can distribute your web app at a known state.

Edit: it's like git versioning for machine state.

While it seems cool, this doesn't solve anything. At best it just shifts trust onto Terminal. And really, there's nothing stopping a malicious VM owner to "fix" up things to a good looking state, then show off that snapshot, then revert and continue.

A solid example to keep in mind is MtGox. How can we run something and know no invalid trades are added, no fake password resets processed, etc etc.

This is what Ethereum is for, but it's much lower level than 'web applications'.

I don't know the answer but if you figure it out you'll be a trillionaire.

Lets start a startup for certifying the code running at some service is the same on the public repo.

Small bitcoin fee would be charged to the privacy nuts willing to confirm it.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact