Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions, then using the map to double check possible matches. I have a rough idea that I think could possibly work:
Maintain the sum of booked time within each coarser time bucket, e.g. 1 hour or something (ideally a power of two though so you can use shifts to find the index for a timestamp), and keep that in a flat array. If you're looking for a slot that's less than 2x the coarser interval here, e.g. a 90-minute one, look for look for two consecutive buckets with a combined sum <= 30 minutes. 2 hours or more is guaranteed to fully overlap a 1 hour bucket, so look for completely empty buckets, etc. These scans can be done with SIMD.
When candidate buckets are found, use the start timestamp of the bucket (i*1hr) to jump into map and start scanning the map there, up to the end of the possible range of buckets. The index can confidently skip too full regions but doesn't guarantee there's a contiguous slot of the right size. I don't have a great sense of how much this would filter out in practice, but maybe there's some tuning of the parameters here that works for your case.
Updates should be relatively cheap since the ranges don't overlap, just +/- the duration of each booking added/removed. But it would have to deal with bookings that overlap multiple buckets. But, the logic is just arithmetic on newly inserted/deleted values which are already in registers at least, rather than scanning parts of the map, e.g. if you wanted to maintain the min/max value in each bucket and support deletes.
> Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions
That sounds a bit like circling back around to a tree-based system (albeit with depth=2) where each parent is somehow summarizing the state of child nodes below it.
Yeah it's definitely similar to having parent nodes summarize their children. The motivation for the flat array structure was to have an entirely fixed-size array that's contiguous in memory to make it friendly for SIMD filtering. Having the data in the tree on the other hand could probably filter a little better since it can summarize at multiple levels, but my bet is that the flat array allows for a faster implementation for the case here with a few thousand intervals. On the one hand, the whole working set should probably fit in L1 cache for both the tree or a flat array, but on the other hand, a tree with pointers needs to do some pointer chasing. The tree could alternatively be represented in a flat array, but then elements need to be shuffled around. I don't know which is faster in practice for this problem, but my gut says the speed of a flat array + SIMD probably outweighs potential asymptotic improvements for the few thousand case
I don't think I've put the same amount of thought into it, but my gut feeling is:
1. If you're summarizing "a contiguous chunk exists from A to B", that would require a mutable rebalancing tree and the borders constantly shift and merge and subdivide.
2. If you're instead just summarizing "the range between constant indices X-Y is [1/0/both] right now", then that can be done with a fixed tree arrangement that can have an exact layout in memory.
A simple example of the latter would be 8 leaves of 01110000, 4 parents of [both, 1, 0, 0], 2 grandparents of [both, 0], etc. The 3-value aspects could also be encoded into two bitmasks, if "has a 1" and "has a 0".
Maintain the sum of booked time within each coarser time bucket, e.g. 1 hour or something (ideally a power of two though so you can use shifts to find the index for a timestamp), and keep that in a flat array. If you're looking for a slot that's less than 2x the coarser interval here, e.g. a 90-minute one, look for look for two consecutive buckets with a combined sum <= 30 minutes. 2 hours or more is guaranteed to fully overlap a 1 hour bucket, so look for completely empty buckets, etc. These scans can be done with SIMD.
When candidate buckets are found, use the start timestamp of the bucket (i*1hr) to jump into map and start scanning the map there, up to the end of the possible range of buckets. The index can confidently skip too full regions but doesn't guarantee there's a contiguous slot of the right size. I don't have a great sense of how much this would filter out in practice, but maybe there's some tuning of the parameters here that works for your case.
Updates should be relatively cheap since the ranges don't overlap, just +/- the duration of each booking added/removed. But it would have to deal with bookings that overlap multiple buckets. But, the logic is just arithmetic on newly inserted/deleted values which are already in registers at least, rather than scanning parts of the map, e.g. if you wanted to maintain the min/max value in each bucket and support deletes.