Instead of wasting more energy on improving the (overused) linked list, write a little resizable array library that doubles size on each fill-up. It's easy and much faster.
Don't do this.
Doubling the size of your allocation whenever you expand can easily lead to large amounts of wasted storage in common scenarios. "Yes," you may say. "But memory is cheap." Which it is at the small scale, but when you're allocating 1GB to store a 512MB array you have a problem.
A vastly better approach is to use some form of adaptive rescaling which avoids excessive overcommits.
Here, for example, is the code (and explanatory comment) Python uses to determine the new allocation size for an expanding list:
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) + newsize;
if (newsize == 0)
new_allocated = 0;
I'll steal the Python trick, because it's neat, but I don't think worrying about your resize function is a good reason to procrastinate moving away from linked lists, which are mostly evil.
Instead the trick you mention allocating multiple elements per node is actually a linked list of fixed size arrays: there you are trading a constant factor per storage efficiency. This is a good approach for many use cases as you are still O(1) and the cache locality will likely help to compensate for the additional operations you need to perform.
Still it's worth to note that there is to handle such a data structure to make sure, in the case of deletions in the middle, a few adiacent nodes that will end with just a few elements are "compacted" from time to time while traversing the data structure.
Of course, you have to be a little careful - if you're doing a `remove_if` loop, you have to remember to process the element that used to be on the end, and you need to make sure you don't iterate past the end of the (now shorter) array.
As for insertions in the middle, a vector may still be better than a linked list. If the vector is short you could win on that alone, but even if it's long it can be worth it.
Consider that the program you're writing probably doesn't consist entirely of middle-insertions into these data structures. If you iterate over those data structures with the same frequency as you insert into the middle, you're never not going to be O(n) there. The O(n) + O(1) = O(n) of the linked list iterate/insert may well be far slower than the O(n) + O(n) = O(n) of the vector iterate/insert for any n.
(And with binary search, your O(log n) + O(n) = O(n) may be faster again...)
There are lots of reasons to keep track of things even if finding them is actually rare.
And an array doesn't solve any problem. Searching it is still linear, you're still tearing up the cache doing so because in a case like this you're storing pointers in the array, not data, and now you another memory allocation that can fail. structs with builtin links can always be added to a list without fail, growing an array not so much.
Every time a process exists, you have to remove it from the parent's children. If it was the first of 1000 children, that's a lot of memmove.
Also, another disadvantage: Once you start storing data in resizable arrays, you can never, ever, let anyone keep a pointer to an element (such as in a different tree or list). One resize later and boom, that pointer is broken.
The pointer problem is straightforwardly addressed by storing offsets instead of pointers; high(er)-performance encodings of the list (and, esp. tree) ADTs already do that.
And you still can't delete anything when using offsets. Forgetting about random deletions (which are important) an array works for a stack, but not a fifo, and fifos are at least as common.
If you have a collection where random deletion is central to the design, then by all means, use a list. The process table is perhaps one example.
But just because you might at some point need random deletion doesn't mean it's a win to use a linked list. The process table will constantly have random deletions, but pid 47481 is far more likely to die at any moment than pid 10. Do you know how not-expensive it is to memcpy 1-3 "struct procs"? How often is the process table indexed or traversed? Roughly every 10th scheduling quantum? Are you sure you're optimizing for the right thing?
Incidentally, if you need FIFO behavior, a ring buffer is a nice thing to have in your toolchest.
This is a very specific article about optimizing the linked lists as used in the Linux kernel, with wonderful data about real world usage. I find it highly doubtful that the Linux kernel would benefit from dropping linked lists in favor of resizable arrays. Possible of course, but I'd want to see the data before making any recommendations.
In general, it would depend on what's in your list and what you do with it. If each item is the size of a cache line, you're probably not trashing your cache. If the list is large and you are keeping it ordered, you might benefit from easy insertions and deletions. If you've got millions of little lists of non-power-of-2 sizes, maybe doubling the size on a fill up would be problematic.
Obviously you know this and I'm not giving you any new information, but I'm confused why you'd make this recommendation. Is it aimed at Rusty? Or do you truly feel that the linked list has no place in the world?
 Well, maybe, but this depends on how often you need to do a full scan through the list. If you're storing unordered, you probably are scanning the array; if ordered, you'll be shuffling memory a lot on inserts. Again, use case matters.
(In this article, modification is the most frequent operation, but "replace" is the least frequent; for some of those lists, tail insertion might be the most frequent use case).
For me, the question is, what's the right go-to collection for general purposes? For most C programmers, that's the linked list, and I think that's a bad default.
I'm happy to buy the argument that they aren't a great default for cases where you know you will be dealing with a small unordered collection. Storing HTTP headers? Don't use a linked list. Large enough to benefit from binary search? Don't use a linked list.
In general I'd agree with you that a simple table is a better first approach. Brute force and realloc can be surprisingly efficient. Others have made the argument that a hash beats a table even for very small numbers, and thus is a better default: http://news.ycombinator.com/item?id=1860917
The Naggum article (parent of the linked comment) is a fun rant in favor of linked lists over hash tables, although I think Naggum would probably also agree with the argument that you shouldn't use even a linked list when a simple array will do.
At very large sizes, deletions from the interior can be painful, but at very large sizes you need to be customizing for performance (if that matters) anyways. Instead of deleting, for instance, you can invalidate, and then amortize the memcpy across many deletions; a straightforward time/space tradeoff.
The big problem I have with "performant" linked lists is that malloc is death to performance. The single biggest profiler payoffs I've ever gotten --- order of magnitude improvements --- have been from getting rid of calls to malloc. Yes, you can custom-alloc a linked list, but you're then getting to a place where lists and vectors are starting to converge on each other.
I think C programmers use linked lists instead of "vectors" because realloc resizing is scary, and because every CS student learns linked-lists in their first month of class.
Intrusive data structures are actually very malloc friendly because you can include multiple lists in structures that are allocated sequentially, and insert/remove at will without any allocation or deallocation whatsoever. This is of course why C programmers use linked lists heavily.
("realloc is scary"? Really?)
To reduce memory allocation time penalty you can use a memory pool.
However if memory size is a problem (maybe another 2 pointers per element) or if you do mostly adds and random access of elements an array could be a good choice.
Meanwhile: you aren't really saying most people use pool-allocated lists, right? I see people hand-hacking malloc'd lists all the time. I rarely see custom pool allocators. It's what people were taught to do.
As for his test purportedly showing that A is better than B on Linux... It's only right if you assume that Linux can only use one type of list in all cases.
To answer the original question, you decide a list is empty when prev == next.
And if the head lives outside of the list proper, then you would need special-case code to handle deletion of the first element in the list (since the head contains a pointer to it). It wouldn't be branchless.
H->next = A, A->next = H
H->prev = A, A->prev = H