Hacker News new | past | comments | ask | show | jobs | submit login
C++: The most important complexities (sandordargo.com)
55 points by jandeboevrie on Nov 16, 2023 | hide | past | favorite | 60 comments



Yes... But you don't need to memorize all this, and it leaves out a simple rule: vectors (or arrays) outperform everything else if the dataset is small. Small is usually in the order of 100-300, but can vary wildly.

Also note that all <algorithm>s have built-in fully automatic parallelism via <execution>, a massively underused feature. In typical CPP fashion though, their newer views:: counterparts lack those overloads for the moment.


Never heard of `execution`, great tip. Is there some big caveat or just generally unknown?


You need to be aware that these invocations are going to blast your cores full throttle as you obviously don't have fine grained control. But as long as your data is easily parallelized on a vector with computations that don't depend on each other, it's a game changer. I use it all the time to multithread things with literally a single line of code.


I looked into it (my video, probably too long https://youtu.be/9oh66SF91LA?si=azDCSOAJKA9Gpzim), and the general result was that they make sense for non-small datasets and are a solid way to to parallelize something without having to pull in OpenMP or something.


They're only supported in MSVC and GCC (for the latter you need to link against Intel's TBB to make it work). Support in libc++ (Clang) is work in progress.


Clang does support parallel stl already (requires either TBB or OpenMP). Our project https://github.com/elalish/manifold made use of this to speed up mesh processing algorithms a lot.


Does it? You mean if you link against libstdc++ instead of libc++?


I remember it works for libc++ (partially, see https://libcxx.llvm.org/Status/PSTL.html), but forgot when I linked against libc++ last time...


Incomplete support from Clang’s STL (especially in Apple Clang).


Yes, but actually small can often be much larger then 100-300 depending on the specifics. Programmers often vastly underestimate how fast cache, and prefetch can go compared to complicated data structures.


... Or much smaller, I remember a benchmark for a case I had a few years ago, the cutoff I had for a map being faster than std::vector/array & linear probing was closer to N=10 that 100.

(Not std::map, at the time it must have been something like tsl:: hopscotch_map).

Note also that nowadays for instance boost comes with state-of-the-art flat_map and flat_unordered_map which gives both the cache coherency for small sizes and the algorithmic characteristics of various kinds of maps


Exactly. I measured vector vs an efficient linear probing hash map very recently and the cutoff was single digit. Even against unordered_map or plain std::map the cutoff was surprisingly low (although in this case I would trust a synthetic benchmark significantly less).


Sure, the point here is with all the specific context specific details, your most likely comparing apples versus oranges. So simple complexity analysis or a general rule without benchmarks and a good understanding of the system and how it interacts with your details is not going to solve your problem.


So you're saying that if I had had to store 100 elements in the memory, I would be better off using hashmap instead of vector/array? What type of elements did you use in your experiment or how large they were and what was your access pattern?


A successful search in a vector will do, on average, 50 comparisons, while the hash map version would hash the key, look up the bucket, typically find a single-item in that bucket (with only a 100 items in the hashtag, hash collisions will be highly unlikely), and do a single comparison.

For an unsuccessful search, the vector version would do 100 key comparisons, and the hashtag would do a single hash, lookup the bucket, and almost certainly find it empty.

So, if you make the comparison function relatively expensive, I can see the hash map being faster at search.

Even relatively short string keys might be sufficient here, if the string data isn’t stored inline. Then, the key comparisons are likely to cause more cache misses than accessing a single the bucket.

Of course, the moment you start iterating over all items often, the picture will change.


Searching the vector is literally incrementing a pointer over the data. The number of instructions needed to do the search is very small - e.g. ~15. This means that it can very easily fit into the CPU uOp cache but also makes it a candidate for the LSD cache. Both of those will be a major factor in hiding the latencies or getting rid of them in the CPU frontend fetch-decode pipeline, effectively making all the for-loop iterations only left to be bound by the CPU backend, or more specifically, branch-mispredictions (aka ROB flushing) and memory latencies.

Given the predictable access nature of vectors and their contiguous layout in the memory, the CPU backend will be able to take advantage of those facts and will be able to hide the memory latency (even within the L1+L2+L3 cache) by pre-fetching the data on consecutive cache-lines just as you go through the loop. Accessing the data that resides in L1 cache is ~4 cycles.

The "non-branchiness" of such code will make it predictable and as such will make it a good use of BTB buffers. Predictability will prevent the CPU from having to flush the ROB and hence flushing the whole pipeline and starting all over again. The cost of this is one of the largest there are within the CPU and it is ~15 cycles.

OTOH searching the open-addressing hashmap is the super-set of that - e.g. almost as if you're searching over an array of vectors. So, only the search code is: (1) By several factors larger, (2) Much more branchy, (3) Less predictable and (4) Less cache-friendly.

Algorithmically speaking, yes, what you're saying makes sense, but I think the whole picture can only be made once the hardware details are also taken into account. Vector approach will literally be only bound by the number of cycles it takes to fetch the data from L1 cache and I don't see that happening for a hash-map.


No, benchmark it for your particular type, and decide based on that.


I think you're missing my point. I'm highly suspicious, or let's say intrigued, under what conditions one can come up with such conclusion. Therefore I asked for a clarification.


vectors (or arrays) outperform everything else if the dataset is small. Small is usually in the order of 100-300, but can vary wildly.

This is a very poor way to choose a data structure. How many items you want to store is not what someone should be thinking about.

How you are going to access it is what is important. Looping through it - vector. Random access - hash map. These two data structures are what people need 90% of the time.

If you are putting data on the heap it is already because you don't know how many items you want to store.


Isn't that because view performs the procedure and access when the data is accessed? It's mainly meant so that you don't have to load all this memory in or stall when accessing the view, and if you don't need that capability, the regular algorithm stl is better.


iirc, on msvc, execution parallel delegates to the OS (windows) to decide how many threads to create, at it is usually more than the total number of vcpu contrary to the usual recommendation.


It's very much implementation defined yes. I'm currently using this for something that runs for about 10 seconds, and even music playback and mouse cursor movement gets affected.

(But I'm about to move it to GPU)


Not quite. Many other datastructures can be shoehorned into contiguous, cache efficient representations.


I rarely meet a CS major who will accept that small things will always stay small. They will frequently talk you into more complex data structures with the assurance that you are just too stupid to realize your problem will suddenly need to scale several orders of magnitude. Do they teach you in CS-101 to interpret '+' as the exponential operator? It often feels that way.


Are they fresh graduates? It is very important to understand the workload distribution for any optimization. Even if small things can sometimes get large, optimizing for the small case can often yield large gain as they may occur frequently. And complex data structures are usually worse in the small case...


There’s computer science and there’s software engineering. The best developers are good at both.

…but in order for this to really matter, communication is required, since even the best developers don’t scale.


It's even more true for larger collections. Stroustrup gave a talk on this back at GoingNative2012. Tl;Dr: "Use a Vectah" -Bjarne

https://youtu.be/YQs6IC-vgmo


A vector should be people's default data structure, but this presentation is bizarre, because it is based on looping through every element of a vector or a list to find the element you want.

This is never a scenario that should happen, because if you are going to retrieve an arbitrary element it should be in a hash map or sorted map.


I think don't try to memorize the big-Os. Just read through a basic DS&A text, and see the standard structures: vectors, trees, linked lists, hashmaps.

If you can picture them in your mind using blocks and arrows, the big-Os fall out of them naturally. To find an element, do you have to follow one pointer, then another? Do you have a choice at each junction? Is it contiguous? Hashmaps you have to convince yourself that it is indeed constant time, but once you get it you won't be in doubt.

Do the same for "what if I have to rearrange it, adding or removing an element?" and you get a bunch of other logical big-O answers.

There's a bunch of things made up of these things (LRU cache) and you can similarly logic your way to the answer, but I don't think I've come across an algo problem that isn't just a mash of these basic structures.


Or maybe, if lucky to be on a market where there are more jobs than developers, avoid the companies that think they are the next Google on their interview process.

Naturally one should have a base knowledge of data structures and algorithms complexity, to the point relevant to the job, and naturally knowing which book to open when needed.


A tricky question in the context of this: why doesn't vector have a constant time pop_front? By just incrementing the begin pointer


One of the top guides to learn and accept as a C++ developer is "Boost has it":

https://www.boost.org/doc/libs/master/doc/html/container/non...


TIL. It also has a (small) list of downsides.


Because there are an infinite number of half-breed data structures that you could think of to balance tradeoffs.

The STL has to restrict itself to somewhat stereotypical data structures that are intuitive to understand, yet can be composed to create such tailored data structures.


Yes, but that's not the point in this case. Naively, a vector should be able to. The trouble lies within the details, in particular allocation.


I think it is precisely the point.

What you suggest is basically a tradeoff for speed of front removal against wasting some memory.

There are an infinite number of such subtle tradeoffs that can be done. Some line need to be drawn somewhere.

The STL is not intended to contain all possible variants of data structures in existence. It should provide you with a minimal set of containers that are _good enough_ for most use cases. In the case where front insertion is important to you, you can use std::deque. If you want a mix of the pros and cons of deque and vector, then it's fair to say that's on you to implement it.

All the rest is for dedicated libraries / custom containers to implement.


The question the grandparent comment wrote was "Explain why popfront would not be implemented in the current vector in O(1) by simply pointing the front pointer forward one element?"

Answering that with "there are tradeoffs and STL had to pick one" is misunderstanding the point of this question; the _premise_ of the question is that there are tradeoffs and STL picked this one and we can safely assume they have some reason; the leading question is highlighting an interesting case where the tradeoff isn't trivially obvious regarding difficult-to-use capacity laying at the front of the vector after the popfront operation happens if you implement that operation in O(1).


I guess I still don't clearly see what the leading question is pointing at.

Should that be: _why was it not important to optimize for vector pop front?_

If that is the case, then I feel like the overall philosophy of vector vs deque answers that question.

Vectors are tailored to be good at random access and back insertion/removal, at the expense of wasting capacity. Overall, that means vectors are focused for append-mostly workloads, thus why they often have and aggressive capacity reallocation factor.

Having a vector double down on unused capacity consumption by allowing constant time front removal was, I guess, deemed useless, since one would most likely be using a deque for such access patterns.


I'm thinking it's maybe it's "too obvious" for you that makes you misunderstand the point.

It's just:

- Look at these big-O, see that pop front isn't O(1)

- Here's a proposed pop front that is O(1) (move the pointer forward)

- Reason about why they didn't choose that.

It's already an aha moment for people to realize that the capacity left at front is generally harder to use than keeping unused capacity at the end. In the spirit of the leading question that aspect is something the reader can reasonably realize after thinking about it, not something that is intended to already be obvious before you start thinking.


because vectors are not designed for that. When they mean "dynamic" they literally just mean ontiguous memory, removing the first element would leave an empty space at the beginning. To maintain contiguity, all other elements would need to be shifted one position forward. This operation is linear in time complexity O(n), as it depends on the number of elements in the vector.

Also Vectors manage their own memory allocation and deallocation. If the begin pointer is incremented without moving the elements, the vector would still be holding onto memory that it's not actually using for storage of active elements. This can lead to inefficient memory usage. Basically speaking: they are designed to modify the end, keep adding, take a bit off, split them at a point, but not really take away from the beginning ( and I mean literally the first element)


If an STL container lacks a method, it's a hint that 'here be dragons'. For example, random access of a std::list: it's possible, but you don't want the uninformed to be doing it without considering the consequences simply because it's easy to write.

What do you do about the memory allocated for the first element? That's a decision for the programmer to think about, not for the standard to enforce. A std::deque may deallocate.

Additionally, you can just do:

    std::stack<T, std::vector<T>>
to get pop.


You're arguing that there aren't STL methods for doing inefficient things.

But the parent took that as a given. Their comment translates as, "why doesn't the STL allow efficient removal at the front of a vector" (which would be possible in principle, e.g. python bytearray is similar but supports it). Part of the answer is that it would need an extra internal data member.

BTW the pop on std::stack refers to the back so doesn't help with pop_front (and vector already supports pop_back).


An often missed c++ Container is std::deque, which has constant O(1) complexities in adding and removing from front and back, and accessing random elements. This is missing in the article.


Which only has a reasonable implementation in one of the three major stdlibs (libc++).


Boosts implementation has a configurable block size.


The libstdc++ one is also pretty good.


> removing an item from a random location: with erase() which has a complexity of O(1) for one element

This is not considering the time you need to get to that element before erasing, which is O(n). The most frequent fallacy about lists.

Someone had to do it, sorry.


I haven taken advantage of O(1) removal on linked lists (although typically intrusive ones, not std::list) many times, but I'm pretty sure not once I did a linear scan specifically to find the element to be removed. Typically you save the iterator on some index on insertion.


But std::list doesn't invalidate iterators, so in situations where you can keep a reference back to whatever item you need, it's quite nice.


At the std::vector section I think the std::array is used instead of std::vector

>std::array is a dynamically sized sequence container


Yup, spotted the same. Looks like a copy-paste / typo


Typo: The section about std::vector starts with "std::array is a dynamically sized sequence container" (lol).


I think it's pretty rare you would ever use std::list and rarer still std::forward_list – the on-paper complexities might be okay but the memory locality is so bad you should almost always be using std::vector or std::deque.


Unless you want to splice lists here and there. I'm quite disappointed the article doesn't mention these features, as for most of my career, they were the only reason to prefer std::list to other containers.


Feel that it is also important to know when iterators/references to elements will be invalidated, in addition to knowing the complexities of the operations.


Ummm.. this is pretty basic stuff. If you don’t know this, you probably shouldn’t be claiming to be a c++ developer.


I’ll never understand the aversion of C++ developers to useful learning resources and documentation.


It's nice to know, but the fact is that C++ is the wild west and my domain avoids using STD anyway.

A bit of a shame because C++ 14 and onward really closed all those gaps that devs used to rely on boost to get extended use out of, but I guess that's how a culture develops where there is no central package management repo.


And? Were you born knowing C++? Is that a genetic trait?

As you yourself have implied, you have to _start_ somewhere.


There is a very large spectrum of quality for C++ developers.

That's why the yearly salary of a senior C++ dev can vary a lot, usually from 1x to 10x.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: