More

nwf · on Aug 29, 2024

That's only mostly true; Big CHERI (that is, the 64-bit CHERI systems, not CHERIoT) specifically has support for running legacy binaries within capability confinement. It's true that we think recompiling is generally the better approach, but we can sandbox pre-CHERI libraries, for example, at library-scale granularity.

nwf · on Aug 29, 2024

FWIW...

Morello boards are hard to come by, but there have been efforts to offer cloud-computing style use of them, especially now that bhyve support exists; if you're interested I can try to find out more (I'd offer you time on my cloud-computing Morello cluster from MSR, but it's offline for silly reasons). The "Big CHERI" RISC-V FPGA boards are indeed quite expensive, but CHERIoT-Ibex runs on the Arty A7 or the purpose-built Sonata board, and those are much more reasonable. (I'd still love to see it brought up on cheaper boards, too...)

LoganDark · on Aug 29, 2024

Morello is definitely what I'd been eyeing for a while. AFAICT those are real systems (somewhat like HiFive Unleashed) and not just embedded chips (although they are embedded chips too). I'm kind of bored of microcomputers (Raspberry Pi et al.).

nwf · on Aug 29, 2024

> it doesn't matter too much whether the emulator is CHERI or not since Rust itself lets me express memory safety in the type system

You might be interested in a very timely blog post: https://cheriot.org/cheri/myths/2024/08/28/cheri-myths-safe-...

aidenn0 · on Aug 29, 2024

> CHERI doesn’t guarantee that your code is free from memory-safety errors, it guarantees that any memory-safety bugs will trap and not affect confidentiality or integrity of your program.

That sounds an awful lot like ensuring your code is free from memory-safety errors. A language which always traps on erroneous memory accesses is a memory safe language, so if CHERI really guarantees what that sentence says, then C on CHERI hardware is memory safe.

LoganDark · on Aug 29, 2024

C on CHERI hardware is not magically memory-safe. CHERI just traps on memory-unsafety.

aidenn0 · on Aug 29, 2024

If it traps on all things that would otherwise be memory-unsafety then it is memory safe. If trapping doesn't count as memory safe, then e.g. Rust isn't memory safe, since it traps on OOB accesses to arrays.

LoganDark · on Aug 29, 2024

CHERI capabilities are memory-safe, because they trap on any attempted memory unsafety. Safe Rust is memory-safe, assuming all the underlying Unsafe Rust traps on any attempted memory unsafety.

C is not memory-safe, even on CHERI, because it has to be trapped by CHERI; it cannot catch itself.

Safe Rust is memory-safe on its own, because memory unsafety can only be introduced by Unsafe Rust; Safe Rust has no unsafe operations. Assuming the Unsafe Rust is sound, Safe Rust cannot cause memory safety to be violated on its own. (You can do `/proc/mem` tricks on Linux, but that's a platform thing...)

aidenn0 · on Aug 29, 2024

I'm not sure if we are talking past each other or what.

1. Non-unsafe rust is memory-safe because otherwise unsafe operations (e.g. out-of-bounds accesses on arrays) are guaranteed to trap.

2. A typical C implementation on typical non-CHERI hardware is not safe because various invalid memory operations (e.g. out-of-bounds, use after free) may fail to trap.

3. A typical C implementation on CHERI hardware guarantees that all otherwise memory-unsafe operations trap.

I think we both agree on #1 and #2. Am I wrong about #3? If I'm not wrong about #3, then what makes you say that #3 is not memory-safe?

LoganDark · on Aug 30, 2024

In #3, the C implementation is not what's memory-safe, that's what I've been trying to say. CHERI is memory-safe, but the C isn't what actually guarantees the memory safety. You can dereference a null pointer on CHERI. The architecture can trap it, but that doesn't change the fact that C actually attempted to do it and therefore would not be memory-safe. Only the system as a whole prevents the memory unsafety.

aidenn0 · on Aug 30, 2024

Aha. I think we agree. I originally said "C on CHERI hardware is memory safe" by which I meant "the system as a whole of (C code + CHERI hardware)" is memory safe, but you seemed to think I meant (C code) is memory safe.

LoganDark · on Aug 31, 2024

I actually had thought you meant something like "the C language is memory safe when run on CHERI". If you mean that running a C program on CHERI prevents memory safety from actually being broken, then yeah, I guess we do agree. On CHERI, even if the C program alone isn't completely sound, and so attempts to violate memory safety, the attempt won't actually succeed, so memory safety won't actually be violated.

LoganDark · on Aug 29, 2024

I'm not talking about writing robust software, I'm talking about having fun - Rust's type system already provides the type of fun that I would have gotten from a CHERI emulator. That's why only getting to own physical CHERI hardware would truly pique my interest.

That article indeed is quite timely. I do agree with it. Slightly different angle though.

nwf · on Oct 12, 2023

> I wonder to what extent moving bounds checks into hardware provides the potential for efficient memory safety.

It's great! The CHERI team at U. Cambridge has recently released their initial performance characterization of Morello, Arm's experimental ARMv8 w/ CHERI: https://ctsrd-cheri.github.io/morello-early-performance-resu... . The major take-away there is a little buried, but is:

> The above 1.8% to 3.0% is our current best estimate of the geometric mean overhead that would be incurred for a future optimized design

That seems to be well within people's tolerance for security features, especially as we think having CHERI would also allow us to turn off, and so stop paying for, some existing mitigations.

While there's a wealth of stuff to read about CHERI (https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri...), if you're new to it and want something more presentation flavored than text, you might enjoy my talk from HOPE 2022: https://www.youtube.com/watch?v=dH7QUdXeVrI

nwf · on June 14, 2016

You don't sign the whole image as a stream, and you don't sign every block. Recursion is your friend! You sign the Merkle tree root, check it once, and then check O(log n) hashes per block access. You can, of course, amortize the checking of the first several of those hashes as a further optimization that ties in easily with your caching layer.

There's no such thing as "the block doesn't decrypt" absent MACs/MICs or AEAD schemes -- encryption and decryption are just maps from N bytes to N bytes.

nwf · on Oct 15, 2015

You may be interested in reading, if you haven't, our 2011 position paper on Dyna. (The 2012 paper on what I think you would call propagator networks was fun, but is nothing you don't know already, I'm sure.)

http://dyna.org/wiki/index.php/Publications

edwardkmett · on Oct 15, 2015

Indeed. Our old discussions about "omega-continuous semiring homomorphisms" as the way to try to make something half-way between Dyna and the datalog bits I was working on have been very much present in my mind lately. =)

nwf · on Oct 17, 2014

Agitate browser implementers to add support for magnet: URIs and you can do exactly that without needing to add new attributes to HTML. :)

There's a bug for firefox here: https://bugzilla.mozilla.org/show_bug.cgi?id=528148

runeks · on Oct 18, 2014

Would certainly be nice to avoid tons of different href_sha256, href_sha512, href_sha3-512 etc. attributes.

But the browser supporting identifying files via hashes is vastly different from it fetching files via BitTorrent.

I think it makes more sense to pursue each separately. Even though magnet URI support will deprecate any existing implementation of hash support.

nwf · on Oct 17, 2014

First off, let me say that I'm always happy to see people thinking about the robustness of scientific data. It's a thing we do not do well at all, at present, and should be much more urgent, given its importance to the enterprise. However, this work has a number of small problems and mostly seems like rehashing (no pun intended) well-trodden ground.

Like so many similar works, this fails to cite the magnet: URI scheme (see, for starters, http://en.wikipedia.org/wiki/Magnet_URI_scheme) of which trusty URLs and the cited niURI scheme both seem to be small subsets. Introduced in 2002, these already defined a way of stably identifying an immutable object and providing one (or more!) suggestions for retrieval, which the present paper calls "authorities" but are likely better viewed as caches; one cache may be authoritative, but that's optional. The "modules" defined are probably better encoded as MIME types (and could be integrated into a magnet URI as "x.mime=.../..." attributes; the draft standard does not have a field for MIME type, sadly), rather than introducing yet another namespace for describing document types.

Speaking of caches, the paper's assertion that "any artifact that is available on the web for a sufficiently long time will remain available forever" is extremely worrying; the search engines of the Internet (other than Internet Archive, perhaps) are not altruistic entities out to serve your data forever. Their caches cannot and must not be depended upon by the scientific community; we must host our own data or pay for its archival, as much as that may be painful. There Ain't No Such Thing As A Free Lunch.

The trick for deriving self-reference is analogous to how IP packets carry their own checksum; it's an old trick, dating back to at least RFC 791 (section 3.1, heading Header Checksum; earlier RFCs do not seem to ) but almost surely earlier, and probably merits a citation of something. The use of the same technique for Skolemization is cute, providing a nice workaround for RDF's poor handling of existentials.

The performance numbers are worrying; streaming a search-and-replace pass (to transform out self-references) followed by a SHA256 verification through 177GB of data should not take 29 hours, especially given that the data is already sorted. CheckSortedRdf and CheckLargeRdf both exhibit linear time in figure 3, suggesting that the data being verified is already sorted (which would be consistent with earlier assertions that the existing implementation only generates sorted files); a better comparison would be to show CheckLargeRdf on randomized inputs, as all we see now is the overhead of a pre-processing pass that is, essentially, just verifying the sortedness of input.

tkuhn · on Oct 17, 2014

(disclaimer: I am an author of the paper)

Thanks for your comments. First off: yes, most (perhaps all) of the applied methods are not novel, some of them have been around for a long time. We only claim novelty on how these existing methods are combined to solve the problem of data availability and integrity on the web.

Yes, the magnet URI scheme is highly related, and we probably should have referred to it in one way or another. However, there are crucial features that magnet links do not provide (as far as I know): you cannot generate a hash that represents content on a more abstract level than byte sequences (MIME types by themselves don't solve that problem), and you can also not have self-references. All of the features from our list of requirements are supported by some approaches, but (to our knowledge) no approach supports all of them at the same time.

In terms of search engines caching research data, I agree! We shouldn't trust existing providers too much but build a dedicated decentralized infrastructure for scientific purposes (this is what I am working on now).

I am sure the performance measures can be improved (incremental cryptography might allow us to get rid of sorting altogether). The shape of the curve is however not much affected by the fact whether the statements are already sorted or not (they are not sorted for TransformRdf and TransformLargeRdf!).

I hope this clarifies some things.

nwf · on Oct 17, 2014

Thanks for your response; it does clarify things.

But, I don't think I understand your concern about abstract hashing and how it would need to be something fundamentally new. Both the order normalization and self-reference are simply preprocessing stages on your data, albeit slightly different forms. The sortedness requirement, I think, is captured by MIME type parameters (the "charset=" in "text/html;charset=UTF-8"), as it does not change the fact that the document is an RDF graph. For the placeholder trick, I think you're right and that you'd want something like a "text/rdf+selfref" MIME type to indicate that it is not in fact valid RDF until preprocessing has been performed. All told, your RDF module would be described in MIME as something like "text/rdf+selfref;sorted=".

tkuhn · on Oct 18, 2014

Right, I guess you could define everything into a new MIME type, but I think that would be quite a weird thing to do and wouldn't really be faithful to the idea of MIME types. This MIME type would stand for a type that nobody would be directly using for files, but it would only stand for some internal intermediate representation (I will not be able to convince people using RDF to switch to my new strange format instead of TriG or N-Quads!). And that means that there would be two MIME types involved for a single file: the actual type (such as application/rdf+xml or application/trig) and then the type for normalization and hash calculation (something like "text/rdf+selfref;sorted="). I think this shows that MIME types are not a straightforward solution to the given problem and I think this justifies to introduce this new level and a new scheme for the trusty URI modules (e.g. "RA").

nwf · on Sept 8, 2014

Unfortunately, the technology you need to do that is not yet finished: https://trac.torproject.org/projects/tor/ticket/9498

ETA: I should have said "one possible technology"; there may be others, but I am pretty sure that an IP-less Tor node requires that you play the Tor Bridge game and stream the Tor protocol over a non-IP link. I have had a prototype of this design running, but have yet to have it to a point I consider robust.

icebraining · on Sept 8, 2014

praptak didn't say IP-less, but no public IP. If the server has an IP in the 192.168.1.X range, that doesn't tell an attacker much, supposedly.

nickodell · on Sept 8, 2014

This looks like it's unrelated.

nwf · on April 8, 2014

As far as I can tell, openvpn with TLS authentication is vulnerable as it just uses the usual TLS suite. If you use PSKs or the (mis-named?) --tls-auth PSK additional MAC, then you are only owned if one of your own legitimate nodes revealed the PSK (or was coopted into performing this attack) in which case you're already owned.