Hacker News new | past | comments | ask | show | jobs | submit login
NeoPG – an opiniated fork of GnuPG 2 (neopg.io)
112 points by sphinxc0re on Jan 17, 2018 | hide | past | web | favorite | 43 comments

The switch to C++ is incredibly poorly justified. I don't care how much safer you think it is (and it really isn't), the language is way more complicated and error-prone and rewriting the code is ALWAYS going to introduce many, many bugs than would be found by simply maintaining the code. This guy also switched on -fpermissive as part of the process, which disables warnings which are there for a reason! This guy has also done a lot of lovely things like committing half-assed changes to master (the build breaks more often than it passes!) which have clearly undergone no review.

Anyone this person convinces to use and contribute his fork are going to take time and money (yes, money - the guy put up a Patreon campaign and BountySource page) away from GnuPG, which sorely needs it, as we all hopefully remember. Please do not use this, and I hope the authors give it up.

Hi, I'm "this guy". Thanks for pointing out -fpermissive, which was needed during the conversion for the legacy code. Since then I fixed all the warnings, so I can remove the flag (https://github.com/das-labor/neopg/commit/5359aab7885628554c...).

I would love to do code review, but I need a second developer for that!

Thank you for your interest and taking the time to write down your criticism.

> C++ is the better C

Oh no it isn't. It does have more features, and for some fields, sure it is better (user interfaces, simulations).

However, this is security software we are talking about. C++ introduces more complexity as the spec is an order of magnitude bigger than, say, C99. More complex language spec, more complex tooling, lots of places for bugs to hide. It can help with classic buffer overflows and the like, sure. But it introduces its own set of issues.

>Many of these issues are well-known or can be easily researched. They have been documented many times, and can be avoided.

Right. The issues can be avoided. You just need to not make mistakes, right?

Also, this is worrying:

> Converting the legacy code base (490,000 lines of code) to C++ was a straightforward, mechanical task that took only a couple of hours

So we are good? The fact that it compiles (and maybe even works) doesn't automatically validates it as a secure rewrite (and it is a rewrite, no matter how similar the common language set is). The stakes are just much higher for something like GPG.

I do wish this project good luck, though.

> Converting the legacy code base (490,000 lines of code) to C++ was a straightforward, mechanical task that took only a couple of hours

That sounds like they haven't actually converted it to C++ so much as compiled it as C++.

That means they're probably still using raw owning pointers and no RAII, which means you get none of the safety benefits of C++ (which do exist, so long as you use modern C++ practices like smart pointers, and containers instead of C-style arrays).

I'm glad that folks are starting to take an interest into improving GnuPG (and PGP in general). I've been working on an SKS replacement[0] in Rust and I've had to dive into the OpenPGP message format, pgpdump, GnuPG, etc. and it's definitely a bit of a mess.

0: https://github.com/srct/sks-rs/

Hmmm, I don't think converting a 500k codebase in C to C++ has a meaningful cost/benefit ratio, especially for security critical code like this. It might be justified to do in something like Rust, but at that point I think it's better to move on from GPG entirely. The OpenPGP standard itself is showing its age (eg. no perfect forward secrecy) and is extremely complicated. I think GPG should be left as-is, and the community gradually move to something simpler, like OpenBSD's signify[0].

[0] https://www.openbsd.org/papers/bsdcan-signify.html

Writing a competent, simple replacement for GnuPG, yet PGP fully compatible, is one of the things I plan to do on the long road, thus I think neopg is a good idea, except the fact that they decided to use C++. I know that type safety is one of the considerations, but OOP always results in ambiguous unnecessarily complex code, and for me that defeats the purpose of simplicity. Nevertheless, the GnuPG codebase is a mess.

They explicitly address the "Why C++?" question here: https://neopg.io/blog/cplusplus/

Choice quote:

There are many programming languages, and I believe in picking the right tool for the job. In the case of NeoPG, the priorities were:

- Support for strong cryptography.

- Compatibility with C application developers.

- Convert legacy code quickly.

- Tool support for QA.

Everything else is, at this point, a secondary concern. The Sequoia Project uses Rust, and I envy that. But the first thing they had to do was to wrap an existing C crypto library (they choose libnettle), because there is no high quality crypto library for Rust yet. That is their challenge.

My challenge will be to stay focussed on the parts of C++ that are actually helpful, and not get bogged down by the rest.

Is using a C library or crypto library that hard in Rust? That's first Ive heard of it being a problem. In high-assurance, it was standard to limit unsafety to the stuff behind interfaces of unsafe modules (incl FFI) in otherwise memory-safe, systems language. We didn't see this as a detriment esp if was C with its wide support.

> wrap an existing C crypto library

I know it's mostly semantics but calling C from C++ still requires some wrapping(extern "C", integrating build system, etc).

I've found with bindgen unless there's some crazy macro shenanigans going on it's actually quicker for me to integrate C libraries into Rust then wrangling CMake/Make/etc. The C FFI is very much a first-class citizen in Rust.

Even doubly so if we're talking about a cross-platform library.

NeoPG uses Botan, which is a crypto library written in C++. I recommend reading its source code and comparing it with, for example, libgcrypt.

Is most of the removed code functionality that is now in the crypto library used, Botan?

A blog post on the decision to use an external library might be good. Is there a reason to think that the library will be better maintained than code with similar functionality in GPG?

Great work by the way! Thank you!

C++ has a long and successful history of application development, regardless of project size (from 0 to >100 MLOC; from 1 guy sitting in his attic to thousands of engineers). Much like Java it isn't "hip". Instead people like to use "hip" languages with an underdeveloped ecosystem. Typical example: analysis tools, e.g. code analysers, performance and profiling tools.

Comments like "couldn't they have used $niche-lang.org instead of [C++/Java/...]" bore me, to be honest.

The fact that C++ is so widely used reflects the fact that it’s both old and acceptably designed. It does not even remotely indicate that it’s universally a good tool choice.

I think C++ is not a great choice for complex crypto software because it doesn’t have very good safety-by-construction properties and it has to deal with (extremely) untrusted and probably malicious input. Parsers written in C or C++ are historically one of the biggest attack vectors out there.

Okay, I'll bite. What language would be "a great choice" for complex cryptography applications, in your opinion?

Ada with SPARK Ada variant and careful refcounting wouldve been my baseline given it's all safe by default before Rust came along. D also had potential. Now Rust with SPARK. Im keepimg SPARK in for semi-automated verification. Just rewriting Skein's C code in SPARK automatically found an error thanks to prover. If affording specialists, then something like F star used in miTLS verified stack, Isabelle/HOL, Jasmin language, and/or imperative ML. All core, trusted functionality done that way.

I choose languages with tight control on memory for crypto-related stuff since you want to prevent leaks. You'll follow with a covert-channel analysis to be sure. If not crypto and stopping code injection, then a memory-safe language with interface checks and input validation will cover most problems. Those have been around since one was deployed in first, business mainframe: Burroughs B5000.

What do you mean by imperative ML? Using a language from the ML family, but mostly its imperative features? Can you expand on that?

Yeah, I should have. There's been prior work porting linear types and other stuff (think Rust safety) like that to ML since that's an easy-to-analyze language academics like to work in. Tolmach et al actually made a converter for it to Ada one time. ML is also the language that most formal verification extracts to by default. CakeML is a verified compiler for ML. There's flow analysis and concurrent variants for it. So, ML by itself is a good choice for correct code that will get a lot of analysis. Imperative ML, imperative for the needed efficiency gains, combined with type systems like Rust's to ensure its safety with numbers mapped to say 64-bit can go a long way.

When I wrote it, though, I was mainly thinking of recent work that converts functional, verified specs into imperative specs that extracts to ML. I think that has a lot of potential for producing a pile of verified data structures with low cost similarly to the COGENT language that was used for ext2 filesystem. Got a link for Imperative/HOL-to-ML below.


I didn't expect anything interesting to come up as replies, but here we are - thanks for those insights.

Depends on the application. I would personally use Rust for embedded applications (since it’s low-overhead and ADTs are basically a requirement for safe parsers) and Haskell for non-embedded. But if neither of those float your boat, there are other safe-by-construction parser-friendly languages available, mostly from the ML family.

At this point Rust is the clear winner, as it's providing higher safety for similar run-time speed.

Comments like these bore me, to be honest.

We all know C++ is a valid design decision. We all know more hip languages have underdeveloped ecosystems. Yet opinions on which language to choose are not devoid of value because it indicates interest, sparks conversations, and may initiate development of the tools needed to make them less 'underdeveloped'.

The only thing we don't need is people saying another person's opinion is boring.

EDIT: Also, they said they were disappointed they used C++. Maybe they wished they stuck with C?

What $LANG would you use if you had the time to dedicate to the project and were rewriting it starting today? Why would you choose $LANG?

(Not an attack -- a genuine question because I'm interested in your choice and reasoning.)

I’m not faulting you at all, but... different people know different tools, place premiums on different features, and of reach different conclusions as to what they wish to use. It's just an opinion, and comparing opinions is such a sterile exercise.

Whatever value you assign to $LANG, you will always encounter a denizen on HackerNews that disparages $LANG and would've used $LANG-prime, but by not actually moving first to start a project to achieve that aim with the tool of their choice, they kind-of ceded their right to criticise.

That’s my point of view, at least: don't criticise the artist's choice of tools. It's rude.

Using C++ by no means implies using OOP.

C++ has an excellent C FFI :-)

Should the NeoPG project mention why GPGME wasn't a suitable compromise?

I understand the author wants to clean up the core and the API, but if there's already a project that cleans up the API isn't that good enough? Especially considering the standard it implements-- PGP-- is still quite complicated.

Not exactly related to NeoPG, but GPGME in general: Doesn't GPGME assume that the user already has GnuPG 2.x installed and running on their system?

This might just be a misconception I have, but I always thought you couldn't make a batteries-included, self-contained, works-out-of-the-box app if you used GPGME. You'd first have to tell the user to go get a GnuPG implementation from somewhere.

I'd be super happy to be wrong on this.

Yes, I should document that. GPGME only exposes a high-level API, and application developers often want more control. For example, you can't inspect key material before importing it, but importing a key is not a reversible operation - so applications sometimes use a temporary HOMEDIR for GPGME/GnuPG to import the key there and inspect it with a keylisting. It can be very cumbersome.

> Especially considering the standard it implements-- PGP-- is still quite complicated.

A standard without multiple independent implementations is a bit unhealthy..

Glad someone is working on this!

One of the reasons we built PGP signing capabilities into Krypton [0] is to make it easy for anyone to sign their Git commits/tags. Almost nobody uses this awesome feature of Git. We even ended up implementing parts of the OpenPGP spec in Swift [1].

0: https://krypt.co/docs/start/code-signing.html

1: https://github.com/kryptco/swift-pgp


It may be opinionated but not well thought of a fork.

How is possible to take GPL 3.0 code, say how it is "more restricted", miss to mention the point of what would happen if it is not, and to claim that any additional code to the project shall be licensed under the BSD type of the license.

The author has some serious misunderstands about licensing issues.

I did see this during day 4 of 32C3's Lightning talks. Found the link with time-index here: https://media.ccc.de/v/34c3-9258-lightning_talks_day_4#t=284...

Starts about 47:25 in if the time-index doesn't work for you.

Sounds great. I always wondered why GPG was so complex.

GPG is not that complex, but PGP concept, keys structures and workflows - they are really hard. And this project has to implement all those hard elements too.

GPG has a terrible user interface, and it's really hard to script too. It takes a complicated workflow and makes it unbearable.

+1, a cleanup of the GPG cli interface would do wonders..

Just reorganizing existing commands into subcommands like 'gpg encrypt' instead of 'gpg --encrypt' :)

As for key management, keyrings and trust, I think a proper UI with decent explanations would help a lot too. This isn't rocket science, but it's too hard to guess what various abbreviations mean.

OpenPGP is kind of like git - if you understand the underlying concepts (in case of OpenPGP this is RFC 4880) then it is simple. GPG is very old and created in different times so it has many quirks at the implementation and UI level but I find the encoding and structures quite simple (there are some weird choices of course).

I wonder what do you think is complex? Trust model maybe? Different kinds of signatures? Negotiating algorithms to use?

> I wonder what do you think is complex?

For me it culminates in the way gpg feels really opinionated about key management (with keyrings). Way too often (in relative terms) I end up creating a temp dir, setting GPGHOME, then setting some permissions to quiet up gpg, then importing the keys, then actually doing the thing I wanted, and finally cleaning up[1]. I have no doubt the keyring design works wonderfully for gpgs author(s), but for a tool that should really be more generic than that it feels less ideal.

gpg being as monolithic as it is probably is the fundamental problem here which, in addition to making it unnecessarily cumbersome to use in some cases, also makes it more difficult to learn piecewise (imo).

[1] For one example, see my comment here: https://github.com/keybase/keybase-issues/issues/2230#issuec... That operation should be basically "curl ..|gpg-key --to-ssh", but instead it exploded into 10 line bash script, complete with parsing gpg output with grep/awk.

Note that, per the blurb at hand, NeoPG is intentionally even more monolithic.

I think while I probably understand the underlying concepts better (cryptography-wise), there isn't much _practical_ infromation about GPG in the wild.

Just as much as knowing how git works under the hood doesn't necessarily make you great at managing branches in git. More often than not it's way easier to just go with a sort of ready-made recipe, so that your workflow would be easily accepted by the industry; rather than read the doc.

I think except the cryptography there is just trust calculations and encoding.

These two articles describe trust in detail:



GPG esoteric options is also a good read:


Besides that... the RFC itself I suppose:


Don't confuse arcane and complex. Although PGP is a 90s crypto standard, so it's more than likely a nightmare to implement.

Nice! I like the idea from the descriptions on the first page

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact