One of its unusual design decisions is that the array's length and capacity is stored next to the array elements themselves. This means that nsTArray stores just one pointer, which makes for more compact DOM objects and so on.
To make this work requires some cooperation with Firefox's allocator (jemalloc, the same one that FB uses, although afaik FB uses a newer version). In particular, it would be a bummer if nsTArray decided to allocate space for e.g. 4kb worth of elements and then tacked on a header of size 8 bytes, because then we'd end up allocating 8kb from the OS (two pages) and wasting most of that second page. So nsTArray works with the allocator to figure out the right number of elements to allocate without wasting too much space.
We don't want to allocate a new header for zero-length arrays. The natural thing to do would be to set nsTArray's pointer to NULL when it's empty, but then you'd have to incur a branch on every access to the array's size/capacity.
So instead, empty nsTArrays are pointers to a globally-shared "empty header" that describes an array with capacity and length 0.
Mozilla also has a class with some inline storage, like folly's fixed_array. What's interesting about Mozilla's version, called nsAutoTArray, is that it shares a structure with nsTArray, so you can cast it to a const nsTArray*. This lets you write a function which will take an const nsTArray& or const nsAutoTArray& without templates.
Anyway, I won't pretend that the code is pretty, but there's a bunch of good stuff in there if you're willing to dig.
GNU stdlibc++ does this for std::string so you get prettier output in the debugger. The object itself only contains a char*.
This is assuming a "next-fit" allocator, which is not always the case. I think this is why the expansion factor of 2 was chosen - because it's an integer, and doesn't assume any behaviour of the underlying allocator.
I'm mostly a C/Asm programmer, and dynamic allocation is one of the things that I very much avoid if I don't have to - I prefer constant-space algorithms. If it means a scan of the data first to find out the right size before allocating, then I'll do that - modern CPUs are very fast "going in a straight line", and realloc costs add up quickly.
Another thing that I've done, which I'm not entirely sure would be possible in "pure C++", is to adjust the pointers pointing to the object if reallocation moves it (basically, add the difference between the old and new pointers to each reference to the object); in theory I believe this involves UB - so it might not be "100% standard C" either, but in practice, this works quite well.
I'm guessing that most customers of std::vector don't really need contiguous memory, they just need something that has fast linear access time. In this sense, std::vector is a poor design.
The implementation of std::deque I just read uses a circular queue, resized to a power of two upon growth. It would exhibit the same bad performance as std::vector.
Libstdc++ describes how it's implemented here: https://github.com/gcc-mirror/gcc/blob/master/libstdc%2B%2B-...
Libc++ doesn't have as descriptive a comment but it's impemented basically the same way here https://github.com/llvm-mirror/libcxx/blob/master/include/de...
In the GNU stdlibc++ implementation the object itself is pretty large (and they use a page size of 512 bytes):
Essentially, instead of a direct addressing like V[i], you can do VV[log2(i)][i]. Where log2() is implemented with a single BSR instruction.
The addressing table for the first level is very small in size, and it can be assumed always in cache.
Here the .h/.c source:
Did you used something different ?
In short std:vector has served me well, I dont want it to lose performance.
This has been a well known problem in the C++ community for years, in particular Howard Hinnant put a lot of work into this problem. I believe the fundamental problem has always been that C++ implementations always use the underlying C implementations for malloc and friends, and the C standards committee could not be pursaded to add the necessary primitives.
A few years ago I tried to get a reallic which did not move (instead returned fail) into glibc and jealloc and failed. Glad to see someone else has succeeded.
From what I understand, using a "rvalue-reference ready" vector implementation with a good memory allocator must work at least as good as FBVector.
I.e. if we want fast push_back() behavior, we can use a compiler that knows to construct the element directly inside the vector's backing store rather that creating a temporary object and copying it into the vector.
The reallocation issue isn't fixable this way however...
... also most types are memmovables
The real issue here is that std::allocator doesn't support realloc. (Note that you couldn't really use realloc as spec'd, though, because in the case where it fails to reallocate in place it blows away the old memory area and copies the data itself, which would have incorrect semantics: this is actually discussed in the article from Facebook as the "notorious design of realloc()" which made the usage of jemalloc required for this task. However, std::allocator could still have features for handling this situation, and implements based on malloc would simply have to not support the reallocate functionality. I imagine if more allocators actually supported the idea of reallocation this could be proposed to the committee.)
That said, the statement that std::vector can't grow the chunk in place is still way too heavy-handed: std::vector will grow its size in place within the memory area allocated to it by deconstructing and constructing elements within the over-sized memory area it has allocated. In general, this memory area grows by doubling in size whenever the capacity is breached, which leads to vector having amortized constant time complexity to insert a new element.
Actually the standard says std::allocator<T> just calls the global operator new function. When you call construct(), placement-new is then used to construct the objects in place. This makes it easy to replace the memory allocator globally as well as on a per-container basis.
Edit: shamelessly citing my blog entry on this subject: http://blog.httrack.com/blog/2014/04/05/a-story-of-realloc-a...
You can't grow the size you allocated with new in-place, and because you need to retain the existing data it's not safe to delete the old buffer, call new and hope that it points to the previous memory address and assume that the existing data remains intact.
A realloc implementation can try to grow the buffer if there's contiguous space, and if it succeeds it doesn't need to deallocate or copy anything. I haven't had a look at realloc implementations so I don't know if that, or other optimizations are done in practice, but I assume that realloc's worst performance case is somewhere around new, copy and delete's best case.
The copy mechanism in std::vector may also have a significant overhead over realloc if it has to call the copy constructor (or ideally move constructors in C++11) of every object in the vector, although I can imagine a C++ equivalent of realloc doing so too.
This is occasionally annoying in C, but not a great issue. In C++, just memcpying classes leads to awful things happening.
- if you need dynamic storage you should use a std::vector instead of new .
- with std::vector-s, if you want to save the cost of the copy you can redefine the vector allocator to use realloc or even mremap.
Edit @xroche: For example, you can create an allocator that manages an initial chunk of memory (allocated with malloc) and grows it when it's required using realloc inside the 'allocate' call.
Edit2: So I've implemented an allocator that behaves (almost) like a described above but I admit that it's a bit more hacky than I though. Drop me an email if you're interested.
Edit² @bnegreve: but how do you handle moved blocks then ? And this would involve copying all objects anyway, because this is the std::vector's logic to allocate+memcpy+deallocate when the capacity is reached. The only solution is to ... rewrite std::vector, which is precisely what FB did :)
Besides realloc can force a memory move anyway. Its behavior depends from implementation and memory state.
Solving the equations suggest a fibonacci like sequence, seeded by something like "2, 3, 4, 5". Continuing 9, 14, 23 etc.
We could use just 2, 3, 4 as a seed, but we can't use the fibonacci formula before the fourth element is added. Try some different seeds for yourself, it's trickier than you'd think.
As an aside, there was a great Star Trek novel where there was a long range transporter invented that accidentally cloned people.
(I think it was "Spock Must Die")
if std::vector is your bottleneck you have bigger problems i suspect.
reminds me a bit of eastl as well... which is much more comprehensive: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n227...
Most of the time if performance and allocation is so critical, you're better off not using a vector anyway. A fixed sized array is much more cache friendly, makes pooling quite easy, and eliminates other performance costs that suffer from std::vector's implementation.
More to the point, who would use a c++ library from Facebook? Hopefully don't need to explain the reasons here.
I've no doubt there has been many talented programmers for decades, but what about open-sourcing these "in house" libraries if they are so great, to avoid countless other programmers the need to reinvent the wheel? I think it would be more constructive than bashing Facebook for being late to the party.
As to why people don't open source their libraries - it's a big hassle and doesn't especially benefit them personally. Indeed if other people are going to have to spend time reinventing the wheel I suppose some might even see it as a competitive advantage. And as a general rule, video games companies don't seem to be much into open source anyway.
(Additionally, if the ones I've used are anything to go by, the libraries aren't so great. They attack specific problems with std::vector, and that's fine, but usually not in any general way. No need to do a full extension of the allocator mechanism if you know you can just call your company's allocator's special try_realloc function, for example - but now of course nobody else can use it. And there usually isn't any impetus to "fix" this, because it isn't actually a problem.)
Could you explain them for those of us not in the loop? Does Facebook have a bad reputation for C++?
As far as their work on HHVM, it was necessary due to failure by bad technology choices from the start. There's very little interesting about this work unless you somehow love PHP, want to make debugging your production applications more difficult, and refuse to address your real problems. I am 100% sure no one outside of the PHP community cares about anything Facebook has done in C++.
Simply having a large company with lots of developers who might have even had good reputations elsewhere or even be smart doesn't mean much. Having worked in many places with lots of smart developers, I can tell you stories about too many geniuses in the room. Calling Facebook developers engineers is also about as apt as calling janitors sanitation engineers. We're programmers, or developers, or perhaps software architects at best depending on the position. I happen to have an EE and CS degree but given I do programming for a living, I'd hardly call myself an engineer. But we're way off topic :)
HHVM arose out of Facebook's desire to save on server purchasing and operating costs. Facebook could run perfectly well without it on Plain Old PHP, but they'd have to buy and power more servers.
I'd hardly call PHP a "bad technology choice" given the outstanding financial success of many companies that use it.
I could also probably speculate their thinking along the lines of backdoors or something. But Facebook does have a good reputation with open source projects, so I wouldn't be too worried about that. (As in no more worried than some other random project)
The fact the something has been done in industry 'x' does not negate the value of this article/library.
OTOH, game programmers don't share this kind of knowledge on public fora as freely as Internet companies do. It may be "old news" to you, but it was unpublished. Knowledge is great, but dissemination is even better.
-GDC Slide Decks
-Game Programming Data Structure/Performance Books
-Game Industry Publications - ex: Game Programming Gems
-Game Dev Sites
-Any available source to various engines/games in the last many years should have more than a few inside
Moreover, this is also a common thing in a lot of embedded and high-performance development in general.
Just because someone didn't make a github project and post it on HN doesn't mean it doesn't exist. This seems to be a trend with web developers - reinventing what exists without properly checking what is there, and doing so usually poorly. I say this as someone who has done game development and web development at various points of my career (currently web by day, game by night) so I am not trying to be condescending, just pointing out an unfortunate trend.
Downvote if you want, but I see the first game dev who replied already confirmed what I said.
If this truly was a well-known issue long ago, it should also have been fixed long ago by game developers contributing back to the open-source developers who have made their work possible so that the rest of us didn't have to rediscover these performance issues.
If you want to avoid this kind of criticism, be a part of the solution outside your own industry walls; don't just brag post hoc about how you solved it in your castle for yourselves.
I haven't looked at it myself but it might be an interesting start for those interested.
The guy who tried this at EA didn't last long there.
Paul Pedriana, the man responsible for the lions share of the EASTL work, started at Maxis before it was acqired by EA in 1997. EASTL has been in continuous development for more than 10 years.
That's a new one. Usually it's rocket scientist or brain surgeon. What exactly does a rocket surgeon do? :)