Hacker News new | comments | show | ask | jobs | submit login
Folly - The Faceboook open source library (facebook.com)
200 points by sidcool on June 2, 2012 | hide | past | web | favorite | 43 comments

This is all pretty great stuff. Make sure you read through the docs; I'm unlikely to use Fb's C++ code, but I'm sure as hell going to look at making my C vector code do some of what FBVector does:


I poured through the FBVector code because of your comment. It has some really interesting tricks. In a normal array allocation, it always goes for the next-largest-size in the jemalloc memory hierarchy. That is, it goes for multiples of 64 bytes, 256 bytes, 4 KB, or 4 MB. This makes the array cache-efficient.

The push_back() semantics features the standard array doubling technique, but only up to 4 KB; jemalloc can't grow in-place anything smaller than this, so a copy is required. Beyond that cut-off, push_back() will instead grow by 1.5 times the capacity to prevent too much "slack" memory from accumulating.

Out of curiosity, are you unlikely to use the code because it's C++ or because it comes from Facebook?


I realise this is a dangerous subject, but I am genuinly curious: given the business you run and the work you do for the Freebsd team, why would you object to using c++, but not c?

I could understand why you would object to both, but why only the lower level language.

Personally, I can't keep all of how C++ works in my brain at once. Most people end up limiting themselves to a "sane subset" of C++ to compensate for spec sprawl. But, everybody picks a different subset, so you can quickly stumble on things you think you know but don't work quite right when working on non-self-originated code.

given the business you run and the work you do for the Freebsd team

Are you getting confused between tptacek and me?

Hey, why don't you use C++? :)

I think Tavis put it best: C is easier to audit because it's transparent. With C++ you can have innocuous-looking source code and have the compiler doing all sorts of crazy things behind your back.

I don't work for the FreeBSD team.


I recently created a private memory allocator so the discussion on the page is somewhat interesting...

Only a tiny minority of objects are genuinely non-relocatable:

Hmm, I'm not exactly what is meant here. Moving a block of memory from here to there in the most general case will leave your pointers dangling and crash you in no short order. Things that pointed to the data you moved just don't any more. If you are disciplined and don't have raw pointers in the block moved, you're good. But as far as I know, that situation requires a very complete understanding of the data in the block you are moving. You can do that. It's just not easy or something that makes sense to stuff "anything" into.

If your object is in a vector, it's already possible for the vector to move the object without updating any pointers to it. The only question is whether it is safe for the vector to move the object with memmove instead of invoking a copy/move constructor, essentially skipping the step of telling the object that it is being moved. This will only fail if the object contains pointers to its own sub-objects (or the move constructor updates external pointers). This is fairly rare, especially for objects that you would store directly in a vector.

std::string is a key exception to this, but fbstring is not.

Could you elaborate at all on what it means to put an object in a vector/why you would want to do that?

Edit: I think the assumption is that an object itself is relocatable inside a container like vector if it marks itself as such. The document itself doesn't actually explain the inner workings.

Yes, I think the point is that they are indeed glossing over important stuff.

That's all good and well, but I'm not sure I understand what it brings compared to std::vector with a good allocator.

In addition, it does not support concurrent growth.

- It adjusts its growth strategy to work well with the properties of the allocator.

- It requires that its members be relocatable and uses memmove to speed up move operations.

1 - Can be done from the allocator; 2 - I don't think it's useful with rvalues references. It also sounds like you want to use std::list...

I'm going to be taking a serious look at this for my own projects.

I like Format: https://github.com/facebook/folly/blob/master/folly/docs/For...

Histogram is going to come in handy. :)

fbstring seems to be exactly what the doctor ordered.

I like the emphasis on cooperating with the memory allocator in fbstring and fbvector. If the entire library does that, that's going to a big win for long-running programs: memory fragmentation can slowly increase program footprint, requiring the use of fancy arena allocators etc.

I had fun reading their vector doc: https://github.com/facebook/folly/blob/master/folly/docs/FBV...

I like dynamic: https://github.com/facebook/folly/blob/master/folly/docs/Dyn...

Modern heaps typically have some low-fragmentation technique built-in, for example, Windows ships with Low Fragmentation Heap, which is turned on by default since Vista.

What is "low-fragmentation heap"? Why would anyone want "high-fragmentation heap"? (since you imply that it's an option)

Low-fragmentation heap puts object of similar size together, so once object is freed, this memory can be reused for other object of similar size without fragmentation. Because of this is has more "slack" - unused memory at the end of the objects that are smaller than their buckets. On other hand, application in steady state is not going slowly increase it's memory use over time.

Also it puts consequently allocated objects (of different size) far away (and thus reduces cache locality), which, in turn may reduce performance for some "allocate a lot of stuff at the beginning and then serve it", etc scenarios, but this is pretty esoteric problem.

Benefits outweigh the concerns, so most apps benefit from the low-fragmentation heaps.

Why would anyone want "high-fragmentation heap"?

For short-lived processes, it's faster and uses less memory. The code is also simpler (important if you're writing a malloc in the mid-1980s).

Aren't most of these already within boost or recent C++ standard ? I just took a look at the format and dynamic cases.

Some of them.

For example Boost.Spirit.Karma is an order of magnitude faster than sprintf (I don't know compared to fbformat), with the drawback of a very long compile time (see here: http://tinodidriksen.com/2010/02/07/cpp-convert-int-to-strin...).

Arena, SmallLocks and AtomicHashMap will have to challenge Intel TBB. I submit TBB will be better, but I haven't done the benchmarks.

As for the rest, some of the stuff is simply outdated.

There's better stuff than Synchronized in the C++11 standard. Foreach doesn't make sense with lambdas.

Some of the headers are just trivial and useless functions (MapUtil.h, IntrusiveList.h, Random.h...), etc.

I'm a bit disappointed.

From the OP: "Our primary aim with this 'foolishness' is to create a solution that allows us to continue open sourcing parts of our stack without resorting to reinventing some of our internal wheels."

And: "we think C++ developers might find parts of this library interesting in their own right."

I think the key thing is that these are some basic pieces that a lot of FB's internal C++ code depends on. These pieces are being made available as a stepping stone to open sourcing more significant work.

I'm Tudor, one of the main folly authors. I wrote format, Arena / ThreadCachedArena, DiscriminatedPtr, GroupVarint, TimeoutQueue, and various other pieces (parts of String.h, ThreadLocal, etc), and I'm pretty well-versed with the entire library so I can reasonably answer questions (or poke the appropriate people to make a HN account and answer). Ask away.

Not necessarily specific to Folly, but I've wondered why vector classes don't have a "short vector optimization" like string classes do. That is, why don't they store 24 bytes or so in place? Is it because the required iterators would take-up too much space anyway?

folly::small_vector does just that (and it lets you use one bit for a mutex, too! -- we have a lot of memory-constrained apps so we had to design data structures for them).

We might unify that with fbvector eventually.

i just attended alexandrescu's talk on some of the optimisations that went into this [rami sayar liveblogged it here: http://ramisayar.com/fb-cpp-conf/]. very interesting and inspiring remarks on instruction-level parallelism [http://en.wikipedia.org/wiki/Instruction_level_parallelism] and ways to help your code achieve it.

I'm surprised they used mixed-case file names when their class names are lowercase. You have to remember both capitalizations. Mixed-case file names can also be problematic when porting to case-insensitive file systems like Windows' or Mac OS X'.

++ for being nicely documented

I suppose I'll have to be the first to express shock that they decided to name it Folly. I feel like Poe's Law is expressing itself.

"Folly" started as an internal codename loosely based on "F"acebook "O"pen source "LL"ibrar"Y". When the time to choose an official name arrived, we found "folly" too funny to not use.

Very nice addition to my C++ toolset.

Is the 3 "o" intentional?

Very nice.

This is a conspiracy theory, but what if there are bugs in it that they can't find and so they're open sourcing it in the hope that someone else will fix it.

That isn't a conspiracy, it's one of the major reasons that anyone open sources anything ever.

Not the main purpose, obviously, but if you find any bugs, I'll be more than happy to take a look.

i think you're just paranoid

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact