Hacker News new | past | comments | ask | show | jobs | submit login
Starting a tech startup with C++ (medium.com)
157 points by n3mes1s on Jan 2, 2016 | hide | past | web | favorite | 101 comments

I would like to see one other aspect discussed though: quickness to market.

At my job, we are rewriting some pretty terrible vb5 code. It's easy to just WTF a lot, and complain that there are truly insane things inside of it. The flipside of that, is that this code was quickly built and quickly allowed for a revenue stream.

The author mentioned that thrown away python code provides no value, but I'm not convinced of that statement. The development time could have been significantly shortened, allowing for a revenue stream while the site was rebuilt in C++. While this may seem like wasted development, it could have created incoming money at three months rather than eight months, helping keep the lights on. Further, there are conceivably problems that would come about during development that would impact their design, which they would have to figure out. Solving these problems are irrespective of the language implementation, and using a dynamic but slower language could help them nail down theoretical issues in algorithms, api, and infrastructure, before coding in the c++ system.

A counter point to this though, is that often, there is no budget for a rewrite. I'm wondering if they would be constrained by the prototype becomes long lasting production like in our case.

Additionally, Python plays well with C, so perhaps some of the performance killing portions could have been replaced with C.

Not only that. A lot of the modules in the Python standard library are written in C; some may be thin wrappers over the corresponding C library. So when you call that code in your app, it runs at (nearly) the speed of C. Same could apply to other languages like Ruby too, that follow that model.

> Manual memory management is the most popular misconception of C++. Since C++11, it is now recommended to use std::shared_ptr or std::unique_ptr for automatic memory management. There is a small computational cost to maintaining referenced pointers but it’s minuscule and the safety outweighs this cost.

I think another misconception is that you need pointers at all. Smart pointers are still pointers. Its better to use value semantic all the way when you can. And you usually can. Why place something on the heap if it can live on the stack?

Well said! Dynamic memory allocation is a really big performance hit, and in a lot of cases you can pre-allocate data structures on the stack for serious speed improvements. Avoiding DMA was one of the secrets for writing blazing fast real time systems stuff when I was in telecom. I don't think those benefits have gone away.

You and I can't possibly have the same definition of DMA. It is essential to low latency data communication both in the embedded and server domain.

I think he was using a new acronym for Dynamic Memory Allocation rather than Direct Memory Access. I also was confused the first time I read it.

"Dynamic memory allocation", not "direct memory access".

Ah never seen dynamic memory allocation written as DMA before. Maybe obvious from the context but DMA really only has one meaning for me :)

I thought he meant the hardware-related meaning of DMA too, for a second. But the first line of his comment is:

>Well said! Dynamic memory allocation is a really big performance hit

You mean C++ programmers allocate n object on the heap even when its lifetime is the same as the scope of the containing block ?

Unless its such a large huge object that stackoverflow is a possibility I cant imagine why would one do that.

Many programmers have this idea. Oh, this is a big object. I better wrap it in a smart pointer to avoid doing expensive copies when I pass it around. They do it as an optimisation. However because of named return value optimisation and move constructors no expensive copies are done when passing stack values to and from functions. Using smart pointers is actually slower (because of the indirection and reference counting).

I don't think C++ supports variable length arrays?

So, if you want to allocate N objects, where N is not known at compile time, (I think) you have to use heap allocation anyway. Normally I just use vector<>, and call reserve() if I feel like extra performance-y. Sure, it's much slower than C99-style variable length arrays, but 99% of the time it's still fast enough.

More importantly, it plays nice with other C++ functionalities. For example, zero-copy construction using emplace_back() inside a for loop. (And if you throw an exception in the middle you're guaranteed that destructors are called exactly for those that are already constructed.)

std::array exists. But it isnt variable length. We might get a dynarray in the next standard. I saw a standards committee panel talk where someone asked about it. Ville Voutilainen promised to write standards proposal. I dont miss it much. Usually std::array does what I want.

My question was regarding whether its the norm that C++ programmers use the heap when stack would have sufficed. Parent seemed to be hinting that they do.

And, yeah its disappointing C99 arrays did not make it to C++, one can use alloca to live a little dangerously, better to wrap it so that it allocates on the heap only if more than a certain size has been asked for. Too bad there is no portable way to find out how much stack space does the current process still have. There should have been a system call for that.

> I don't think C++ supports variable length arrays?

VLAs are not part of the standard but GCC supports them. Clang however does not, at least not without a compiler flag I think.

You can still use alloca in C++

Yes and I have at times, but its living a little dangerously. Even if it returns a non-null pointer it does not mean you can write there safely.

No one who cares about performance enough to be using C++ would do something like this. Allocating on the heap when not necessary is a sign of a dangerous amateur.

Pointers (smart or raw or whatever) get used when you need reference semantics. It's true that value semantics are usually better but not always (eg inheritance).

To be sure, though, being able to make proper value types is a strength of C++. Also we should point out most non-trivial value types make extensive (and necessary!) use of the heap under the hood (eg string, vector)

And then, you have Small String Optimization that put the string data back on the stack ;)

It's true that stack allocation is often better, and should be preferred, but that doesn't mean the heap isn't useful.

Because copying the entire in-memory database you're using on every function call is kind of expensive?

Most often you can pass stack values to and from functions without copying. This is because of move constructors, and named return value optimization. Often its slower to use pointers (the indirection).

For instance this does not do any expensive copies:

std::string getStr() { auto huge_str = getDatabaseDump(); return huge_str; }

void readStr(const std::string& str ){ //read str.. }

int main() { auto str = getStr(); readStr(str); }

But std::string is a wrapper for a heap allocation, and a reference also incurs the same indirection cost as a pointer?

Yes that's true. Internally a lot of library types use heap allocation. Most well designed libraries avoid it if they can, but sometimes the object is simply too large for the stack. For many library types its a undocumented implementation detail we dont know about.

We as users of that library type should not additionally put that object on the heap though, if we dont have too. Its simply unnecessary.

Its makes for one extra unnecessary pointer indirection, it adds unnecessary reference counting if using smart pointers, or unnecessary manual memory management if we use raw pointers. It complicates the interface for our functions if we have to wrap types in smart pointers.

As for c++ references. Yes that's true. As I understand it they are implemented as pointers internally by compilers. But in important ways they behave more like values, rather than pointers. As in if you copy or assign to them, they copy or assign the value (not the pointer). If you take their address they give the address of the value (not the pointer). Because of this there are not as many pitfalls to using them as there is with pointers. I see them mostly just as aliases for values.

In the example above I could have passed the string by value (instead of by reference) to the function. It would still not have done a expensive copy, because of copy elision optimisation. I probably should have done that. It looks a bit funny, and takes some getting used too.

Relevant: https://web.archive.org/web/20140205194657/http://cpp-next.c...

The effects you describe are mainly caused by copy elision (https://en.m.wikipedia.org/wiki/Copy_elision) While move semantics certainly improve performance, they are still expensive in many cases, and it's very hard to make assumptions on their performance without detailed measurements.

The problem with copy elision is that it's an optional compiler optimization and you have to check if the compiler applies it to you particular piece of code. That doesn't scale for large code bases (nor for small ones imho).

That isn't about value semantics, what you are describing is a combination of copy elision and ownership.

C++ has perfect move semantics since C++11, and you can always use a reference to pass things around if you need.

I wonder why the choice of language is framed as being between C++ and Ruby/Python? Yes, C++ is known for being fast while suffering from a shortage of easy libraries and being more verbose to write and intellectually challenging to write in, while the mainstream dynamic languages are known for being not so fast but having a vast array of handy, easy to integrate libraries and being fast to develop in. But I think there's a number of choices between the two that seem worth considering.

C#/.NET doesn't get a lot of love from the startup community, but it's mature, stable, fast, full of advanced features, and has good library support.

Java and other JVM languages have similar advantages and better compatibility with Unix-based OSes. Java itself is a little long in the tooth, but there is the option of alternate JVM languages.

I'd probably be most tempted to check out Go if I was working on something that absolutely had to wring the best possible performance out of my hardware. I don't honestly know that much about it, but it has a reputation for getting you most of the performance of C++ without the complexity.

But I will say that if the author has lots of experience in C++ and comfort with the ecosystem, and not much in any of those other languages, then by all means go with C++. Getting your product out there is more important than getting the perfect language.

> I'd probably be most tempted to check out Go if I was working on something that absolutely had to wring the best possible performance out of my hardware. I don't honestly know that much about it, but it has a reputation for getting you most of the performance of C++ without the complexity.

Go isn't billed as a language that is designed to do that. It's faster than Ruby and Python, for sure, but it made many runtime performance sacrifices in the name of ease of use and compilation speed.

The GC has been drastically improved recently and going to get another improvement in the next release - Go 1.6. Either way, if Python was considered, Go isn't less fit to be considered.

> The GC has been drastically improved recently and going to get another improvement in the next release - Go 1.6.

I wasn't just referring to the GC.

It's sad that you need to justify using one of the most established and longest serving languages/platforms ever.

I don't think it is sad. You explained why it is necessary, C++ has been around for a while, is/was great but there are now languages that can compete in terms of performance with fewer drawbacks in terms of security and undefined behavior.

If you're starting from scratch, it is important to strongly consider which language and platform best suits your needs. Anecdotally, a lot of companies have found success using newer languages because they tend to attract a better proportion of talented people.

> there are now languages that can compete in terms of performance

Such as?

If Rust code is slower than equivalent C++, it's a bug. We track performance bugs, please file them :)

I like the quip but that's barely an answer given OP avoided Rust specifically. There's a difference between a high-performing language in development (and debugging) vs a high-performing language with over a decade of work into its tooling and support. The latter's strengths and weaknesses are known through and through with many implementation and library problems likely worked out long ago.

So, if it was apples to apples, I'd probably cite Ada, Eiffel, Modula-3, or Component Pascal as the closest to C++ in terms of performance and key features while being safer and more readable. Each has had years of work, support from commercial sector (except Modula-3 now), and programs tend to work more than break after a compile.

That said, your Rust work is exciting and I hope it gets in that same category. It's just so new and evolving that its not in C++'s class in terms of risk or predictability. Not yet.

Absolutely. I was responding more to this conversation than the author; they specifically said "new" languages :)

Your points are all legit. Unfortunately, the only way to get an old language is to start with a new one, and then let time pass.

Last sentence is quotable :)

I agree haha.


Indeed, the given strawman alternative languages (Ruby and Python) are far, far worse for productivity. Python changes are totally unreviewable since one cannot in any way reason about the correctness of a function call without reading the definition of the function itself, and all the functions to which it passes the arguments, all the way down, which of course takes forever. At least with C++ you can reason that if a change compiles it has not made any dramatically stupid type errors. You still need to think about whether any unnecessary temporaries or copies were made, or questionable assignments, but you need only think about that one level deep. Is seems to me far more dangerous to suggest writing new code in Python.

> At least with C++ you can reason that if a change compiles it has not made any dramatically stupid type errors.

And with Python you can reason that if code parses it isn't going to have undefined behavior (for example, segfaults randomly thousands of lines away from the actual problem due to heap corruption, or exploitable use-after-free vulnerabilities).

Type safety only has the benefits you describe if the language is actually type safe.

Ummm, unit tests? The compiler offers a false sense of security. Type safety is not correctness.

Not to dredge up this debate for the umpteenth time, but the same can be said of unit tests: they are not a proof of correctness, and they often give you a false sense of security.

The Internet has, indeed, conducted this argument thousands of times. My point here is only about the difficulty of reviewing Python code. I find myself sometimes having to hold up a piece of paper to the screen to see if the indentation has been done right, in addition to the aforementioned function-body-reading chores. Of course my opinion is of no concern to people except those who work with me, from whom I refuse to review Python code. It's just not worth my time.

My point is that there are tools and techniques that will solve these problems in ways that are more valuable than the compiler's type system.

If you seriously "refuse to review Python code" because you feel you need to visually check indentation, I think you might want to take a step back and evaluate your methods.

Type declarations in python have another name: unit tests.

I don't know why this is controversial, in Julia type annotations to expressions are basically asserts

Indeed it is. It wouldn't be this way if most hiring companies hired for talent and not buzzwords/trends. As it stands, half of developers don't want to touch a language that won't help them get another job. The other half suck so they have to pad their resumes with the new hotness to even have a chance at getting a job.

I'm curious as to has this plays out at scale. If you can reduce the number of machines needed by a factor of 10, then apart from the hardware/power savings, there are staffing savings too. Managing 1000 machines is different from managing 10,000

> There is a small computational cost to maintaining referenced pointers but it’s minuscule and the safety outweighs this cost.

Atomic reference counting does not have minuscule costs. See, for example, http://www.hboehm.info/gc/nonmoving/html/slide_11.html

In particular, note that if you use shared_ptr everywhere you will be end up with much slower code than you would have if you had a good GC. In other words, you end up slower than Java, with less safety and more verbosity.

Browser engines have used custom non-thread-safe reference counted pointers where possible extensively for this reason.

Furthermore, I always have to say it: shared_ptr is not memory safe.

I like these "You CAN do X with Y!" types of posts - especially when it's not the same drum being banged over and over.

I think there's a really big part of this case study that isn't mentioned (perhaps because there is currently no data): bringing on more engineers. It really seems as if the number of C++ developers ready to be a part of the startup ecosystem is small. That is the impression that sites like HN leaves me with at least.

For databases, C/C++ is still the only reasonable choice. For OLAP cubes in particular, you want to keep as much data in memory as possible as efficiently as possible, and drop down to SSE intrinsics where needed. C++ lets you do just that. Frontend can be in any language you like.

There are several other languages that have "ability keep as much data in memory as possible as efficiently as possible" and "drop down to SSE intrinsics" as features. C and C++ by no means have a monopoly here.

How battle tested are those languages? How many outstanding bugs do they have in their compilers and libraries? How good is tooling support? How easy is it to find competent programmers? How well do they interoperate with existing high performance libraries? That's just a small sample of issues you need to consider when starting a serious project. I maintain that it's hard to compete with C/C++ on all of those criteria.

Moving the goalposts to "every single advantage C++ might possibly possess" isn't very interesting. The idea that using a new language can only be justified if the language surpasses C++ in every conceivable way is very 1995, pre-PL Renaissance thinking.

In reality, you can only choose programming languages by balancing all the factors in question, and C++ has plenty of disadvantages for databases: it's unsafe, it's difficult to learn, compile times are poor, there's no package manager, etc.

Says someone who has quite obviously never worked on database engines. Package manager? LOL. What language are you programming in, Ruby?

pcwalton works on a web browser engine, which I'd say is at least as "complicated" as a database engine. He's already stated the benefits Cargo brings to Servo's development (as compared to working on, say, Gecko) elsewhere. What makes you think a package manager wouldn't be useful for a database engine which may want to, just as an example, provide a web server for queries or administration? [1][2]

(Honestly, I'd rather call Cargo a "project manager" , because it does far more than manage packages -- it builds your project and its dependencies, too. Having a completely standard way to build applications and libraries is an enormous win over C++. The discovery and management of a project's packaged dependencies is of secondary importance to me.)

[1]: http://docs.basho.com/riak/latest/dev/references/http/ [2]: https://www.rethinkdb.com/docs/administration-tools/

Compared to DBs browser code bases are lax and buggy beyond belief. Want to take a second and a gig of RAM opening a page? Go right ahead. Browser crashes? We're "sorry", wait for next release 2 weeks from now. No multitenancy. No having to deal with data of arbitrary size. No transactions. No writing anything complicated to disk. Etc, etc. You're comparing things that really require completely different levels of rigor, sophistication and engineering skill. Last I checked, BTW, all currently shipping browsers are written in C++.

Browsers certainly do have to deal with multitenancy (although we don't call it by that name), data of arbitrary size, transactions, and complicated on-disk data structures.

In any case, you now seem to be primarily interested in arguing that database programmers are smarter than all other programmers, which, needless to say, is about the most uninteresting conversation we could possibly be having.

Not to mention the time saved in testing due to the compiler catching many errors only unit tests could catch in python.

I was glad when Python added type annotations, but the Python community is aggressive about preferring dynamic typing. Don't think the chances are that great to use them in a real project.

Perl 6's gradual type system seems interesting and one plus of Go is that it brought some well-deserved attention to concise static typing.

Sounds very inspiring.

Any other good places to start learning about C++ 11/14 for complete beginners with JS and other programming languages?

I'm very much interested to create proof-of-concept for high performance SaaS services and play with C++ as it might be useful to build Node.js extension later.

Programming - principles and practice using C++[1] should be appropriate for beginners. It's a long book at 1000+ pages, but it is designed for novices and explains subjects such as why one needs functions, how to handle errors, GUIs, testing, etc.

The C++ Programming language (4th ed)[2] is for experienced programmers. A tour of C++ is the short version of the former.[3]

[1]: http://www.stroustrup.com/programming.html

[2]: http://www.stroustrup.com/4th.html

[3]: http://www.stroustrup.com/Tour.html

Ohh... thnx! Nothing changed since I graduated in 2001 ... I recollect Stroustrup since then. It's still C++ bible.

In the language many things changed and you should look a the books which focus on C++11/14 no the old stuff.

It isn't the same edition, the language have changed a lot ;)

The [1] book is exceptionally good for beginners. He completely rewrote it with C++11 and refocused beginners on some key changes to the language to make it much easier.

Stack Overflow's Definitive C++ Book Collection: http://stackoverflow.com/questions/388242/the-definitive-c-b...

I think the best book for complete beginners is "Jumping into C++" by Alex Allain. From there, move on to "C++ Primer Plus" by Stephen Prata.

I'm more curious as to the decision not to stick with Rust. Sounds like there was a prototype. I'm guessing there were some libraries missing.

Would be cool to know more about that.

Rust's ecosystem is super young, and up until May '15 the language had breaking changes occurring pretty frequently (daily? I wasn't using it back then so I'm not sure). I've been really enjoying using it, but I imagine that this startup began work before Rust hit 1.0, or very nearly afterward.

Sometimes more than once per day :)

Crates.io has 3700 crates in stock. That said, there are still gaps, like with any young ecosystem. It's really progressing nicely though. And given Rust's domain, there are packages for really interesting stuff, like OS dev...

Ditto. I'm also curious if D ever came up. My impression is that it was (until recently) much more stable than rust.

Rust has been remarkable stable since the 1.0 realease.

Sure, I was referring to pre-1.0 rust ;)

Every word has its own place, every place has its own word

Starting startup with language X depends on how that language were used in industry, starting hardware startup with Python/Ruby is strange, but for web startup its totally fine. Now think about creating web site in C++ with average C++ developer.

There is also cost of talent, if you own expert C++ developer than probably writing web site in C++ can make sense, because you have a talent who can manage and fix every bug/feature. Finding professional Python/Ruby web site developers are easier than finding C++ web site developer.

Ah one of our core systems is in C++, though it is for video processing and needs to work with multiple platforms. I suppose nobody has questioned the C++ choice because its a more "typical" use-case

Wondering how Swift will play in this arena now that is open sourced - and performance ? According to this (http://www.primatelabs.com/blog/2014/12/swift-performance) for some workloads does approach C++.

What's the state of Swift for web development? Are web frameworks starting to appear?

There's perfect.org that came out in Nov 2015.

I think you hit the nail on the head early in the post. C++ is what you are most comfortable with. I believe that is the number one reasoning, at first at least (prototype, mvp stage), in deciding that language/stack to use. Use what you know the best as the tech lead, everything else will fall into place.

I don't think that

   auto start = std::chrono::system_clock::now();
   // benchmark something here
   auto end = std::chrono::system_clock::now();
is concise. I think that

   start, end = benchmark do
   #  benchmark something here
is concise.

https://www.techempower.com/benchmarks/ also show the fastest HTTP servers are written in C++.

No one doubs the performance of C++. What I would really like to see is arguments about speed of development, ease of testing, etc. This articles argues alot about language features or libraries, but what about the bigger picture? It is not enough for the HTTP server be fast. It also needs to do REST API, auth, etc. How does that compare to for example node/js.

One C++ advantage is, at 40X the horsepower, they didn't have to deal with scaling configuration right away. They can put that off, and deal with proving their product instead.

The argument is you do not need a load balancer because performance of one host is good enough. But that would become a single point of failure. And thats why AWS is so awesome. Start with an autoscaling group when you create your ec2 instance, you dont really need alot configuration.

Aren't a lot of startups using dynamic languages because those are more popular? We use Javascript everywhere since that makes it a lot more easy to hire new people.


And then they show how the type can now be inferred with "auto":

    auto start = std::chrono::system_clock::now();

I guess you totally missed the comment "//here's the verbose version" as well as the entire context for that code snippet?

The main problem I imagine is as the platform/codebase matures, there may be compatibility issues as the compiler changes, as the underlying hardware changes, etc. This is something that is well hidden by a language like Python. I remember having to support a cross-database and cross-platform C/C++ server back in the day, and the maintenance was a nightmare. Having the luxury of virtualizing your hardware makes this step a lot easier, that's for sure.

I would find it hard to believe that c++ can be that much faster for many websites since the bottleneck would be the I/O subsystem. I don't understand how they are getting this 40x speed up. I wonder what the benchmark looks like.

It's not that a single request is 40x faster (it may be no faster), it's that 40x requests can be handled in the same amount of time. The limitation is then the processing of the connections both in userspace and in the kernel, and the cost of all the system calls.

This is easy enough to test with ab (apache bench), siege, or jmeter, by just cranking up the number of concurrent requests. I've seen 100x differences in implementations of web servers before, 40x for python vs c++ is pretty believable.

In an optimized system the bottleneck will be I/O. However elements like serialization, many function calls, and extraneous memory copies chips away at performance to the point it matters and its difficult to address these problems directly in languages outside of C/C++.

Today you can get 15MB/s to S3. A not uncommon mysql node might do 10MB/s. Good luck trying to push that throughput with python or Ruby on a single process through a http response. Much faster I/O exists with sequential HD, SSD, and RAM based I/O. Most networks are built with 1Gbit Ethernet allowing 100MB/s that won't get saturated in the slower languages without going parallel which can increase total system complexity and lines of code.

Its more like, with 2gb ram and 2.0ghz cpu, how many requests can you handle? Its not surprising language like C++ would win.

Funny you say that given I'm dusting off an old laptop that has a 1.xGhz Core Duo CPU and 2GB of RAM. It's the next one to be used for performance and stress testing. The reason? Anything that can't run snappy on that box is just bloated or wasting resources. Lean, native code tends to do the job nicely. :)

Note: Also good to do it on cheap, throwaways because this sort of thing burns out the CPU's. Better that box than my main one.

I get down voted for posing a logical question?

I don't see any question.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact