
C# String Builder library, 70% faster and 72% less memory allocation - maverickeye
https://github.com/justinamiller/LiteStringBuilder
======
jbverschoor
I dunno. I still haven’t seen an implementation of a string builder which
leverages the immutability of strings in modern languages.

String building is mainly about concatenating objects. Most of the times they
are actual strings, other times they are object which get transformed into
strings. As long as the object is immutable, you can do the transformation at
a later stage.

You can keep appending the strings to an array or linked list, and only make
the built strong concrete when it’s asked for - in Java that’s toString(), or
to compare strings.

The benefit is that you don’t have to allocate and reallocate (fragment)
memory when appending, because you’ll know exactly what the required capacity
will be.

You can expand the append() with any immutable object, but strings and numbers
make the most sense.

~~~
mikeeeek
interesting idea, although not obvious this would have the intended affect.
E.g., consider a string search across 10 different string objects (thus 10
different backing arrays), vs a search across a single array. Much more likely
for increase in cache misses when you hit 10 different arrays at potentially
10 different locations in memory.

Also, this implies some copy-on-write semantics. At first thought, this makes
code significantly more complex.

~~~
jbverschoor
You can always make the string concrete for certain operations. Most of the
string building is usually just concatenation

------
mikeeeek
Briefly looking at the code:

    
    
      1. data structures don't support efficient insertion/removal at arbitrary positions in the string
    
      2. "replace" method is O(n) where n == length of the string
    
      3. string searching inside replace is sub-optimal. e.g., given buffer of "aaaa" and replace("aaab", "xxxx"), you need to do 4 comparisons to determine there is no match.
    

1 is easily rectified by using a gap buffer, or possibly a rope or a piece
table, but to be honest Gap Buffer is so simple.

For 2, given a more efficient backing store (e.g., gap buffer), you should be
able to eliminate the O(n) buffer copy.

For 3, you can implement Boyer-Moore string search.

Gap Buffer:
[https://en.wikipedia.org/wiki/Gap_buffer](https://en.wikipedia.org/wiki/Gap_buffer)

Rope:
[https://en.wikipedia.org/wiki/Rope_%28data_structure%29](https://en.wikipedia.org/wiki/Rope_%28data_structure%29)

Piece Table:
[https://en.wikipedia.org/wiki/Piece_table](https://en.wikipedia.org/wiki/Piece_table)

Boyer-Moore: [https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-
sea...](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-
search_algorithm)

------
0xcoffee
I would love to here some MSFT people who work on the Net Core teams thoughts
about these libraries, because it seems the community has really amazing
things to contribute, such as this string builder library.

Net Core has seen amazing improvements compared to Framework, yet often still
falls behind existing community offerings.

e.g. UTF8JSon library: [https://michaelscodingspot.com/the-battle-of-c-to-
json-seria...](https://michaelscodingspot.com/the-battle-of-c-to-json-
serializers-in-net-core-3/)

Outperforms the new Net Core json library.

HyperLinq: [https://medium.com/@antao.almada/netfabric-hyperlinq-zero-
al...](https://medium.com/@antao.almada/netfabric-hyperlinq-zero-allocation-
fe5d0dd6b1a6)

Is basically Linq with 0 allocations.

What is blocking MS from integrating these approaches into Net Core?

~~~
apk-d
There's usually trade-offs when it comes up to those highly optimized
libraries. Utf8json doesn't have nearly as many features as more universal
serializers. HyperLinq requires you to write boilerplate code for each custom
collection you wish to support, and won't work on AOT platforms (such as Unity
targeting the IL2CPP runtime, which happens to be the use case that would
actually heavily benefit from such optimization).

------
kristianp
I'm interested to know what the benchmark code is and how it works. Sounds
cool that it can beat stringbuilder at such a seemingly simple thing as
buffering string segments.

~~~
pixelbath
Here is the benchmark code:
[https://github.com/justinamiller/LiteStringBuilder/tree/mast...](https://github.com/justinamiller/LiteStringBuilder/tree/master/perf/Benchmark)

