Hacker News new | past | comments | ask | show | jobs | submit | ashf023's comments login

This immediately popped into my head! I remember it blowing my mind years ago


A few rough ideas for bitsets:

- To find runs of 15+ 1s you can find bytes with the value 255 using SIMD eq

- For windows of width N you could AND the bitmap with itself shifted by N to find spots where there's an opening at index i AND index i+N, then check that those openings are actually contiguous

- You can use count-trailing-zeros (generally very fast on modern CPUs) to quickly find the positions of set bits. For more than 1 searched bit in a word, you can do x = x & (x-1) to unset the last bit and find the next one. This would require you search for 1's and keep the bitmap reversed. You can of course skip any words that are fully 0.

- In general, bitmaps let you use a bunch of bit hacks like in https://graphics.stanford.edu/~seander/bithacks.html


Thanks for the ideas! I did try lzcnt/tzcnt with bitshifts to find contiguous intervals across elements, but I found that it tended to be more expensive than searching the map depending on the density. I think it's a really promising approach though, and I've like to figure out a way to make it work better for my test case.


Gotcha. I wasn't quite sure from your post, are you storing free slots in the map, or booked slots? And do you know, for the cases that aren't fast enough, how much they tend to search through? Like are there just a ton of bookings before the first free slot, or are the slow cases maybe trying to find very wide slots that are just hard to find?


I'm storing free slots right now, but either would be ok if a data structure would benefit from the other representation.

The slow cases usually tend to be trying to find wider slots and skipping through many smaller slots. There are often a lot of bookings up to the first wide enough slot too though, so it's a little bit of both.


Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions, then using the map to double check possible matches. I have a rough idea that I think could possibly work:

Maintain the sum of booked time within each coarser time bucket, e.g. 1 hour or something (ideally a power of two though so you can use shifts to find the index for a timestamp), and keep that in a flat array. If you're looking for a slot that's less than 2x the coarser interval here, e.g. a 90-minute one, look for look for two consecutive buckets with a combined sum <= 30 minutes. 2 hours or more is guaranteed to fully overlap a 1 hour bucket, so look for completely empty buckets, etc. These scans can be done with SIMD.

When candidate buckets are found, use the start timestamp of the bucket (i*1hr) to jump into map and start scanning the map there, up to the end of the possible range of buckets. The index can confidently skip too full regions but doesn't guarantee there's a contiguous slot of the right size. I don't have a great sense of how much this would filter out in practice, but maybe there's some tuning of the parameters here that works for your case.

Updates should be relatively cheap since the ranges don't overlap, just +/- the duration of each booking added/removed. But it would have to deal with bookings that overlap multiple buckets. But, the logic is just arithmetic on newly inserted/deleted values which are already in registers at least, rather than scanning parts of the map, e.g. if you wanted to maintain the min/max value in each bucket and support deletes.


> Maybe an additional "lower resolution" index on top of the map could help as a heuristic to skip whole regions

That sounds a bit like circling back around to a tree-based system (albeit with depth=2) where each parent is somehow summarizing the state of child nodes below it.


Yeah it's definitely similar to having parent nodes summarize their children. The motivation for the flat array structure was to have an entirely fixed-size array that's contiguous in memory to make it friendly for SIMD filtering. Having the data in the tree on the other hand could probably filter a little better since it can summarize at multiple levels, but my bet is that the flat array allows for a faster implementation for the case here with a few thousand intervals. On the one hand, the whole working set should probably fit in L1 cache for both the tree or a flat array, but on the other hand, a tree with pointers needs to do some pointer chasing. The tree could alternatively be represented in a flat array, but then elements need to be shuffled around. I don't know which is faster in practice for this problem, but my gut says the speed of a flat array + SIMD probably outweighs potential asymptotic improvements for the few thousand case


I don't think I've put the same amount of thought into it, but my gut feeling is:

1. If you're summarizing "a contiguous chunk exists from A to B", that would require a mutable rebalancing tree and the borders constantly shift and merge and subdivide.

2. If you're instead just summarizing "the range between constant indices X-Y is [1/0/both] right now", then that can be done with a fixed tree arrangement that can have an exact layout in memory.

A simple example of the latter would be 8 leaves of 01110000, 4 parents of [both, 1, 0, 0], 2 grandparents of [both, 0], etc. The 3-value aspects could also be encoded into two bitmasks, if "has a 1" and "has a 0".


But the article talks about how the men are basically doing whatever they want and women are left as single mothers with multiple kids...


Independence achieved at last, for men and women.


What kind of bum doesn't take care of your kids? That's no independence. That's a dereliction of duty.

Pure incel talk. Pure loser talk.

Who wants to go through life being a loser?


Incels with kids? Get your narrative straight before you blow a gasket.


That implies equal choice / responsibility for the kids. If men aren't required to contribute half the work -- or at least some financial equivalent -- then it's not really independence for women at all.

And if the man discourages the woman from developing her own career, then leaves, she's been denied twice.


And so atomized and isolated, we become easier pickings for big capital


There's a biological reality. Which btw, has been greatly tamed with contraceptives and what not, so the only thing left is individual failure.


Financial independence for the feminist movement was always about divesting from a husband, not relying on a husband.


Is this different than existing instruction level parallelism and hyperthreading?

> [CPUs] primary limitation is that as serial rather than parallel processors, they can only do one thing at a time. Of course, they switch that thing a billion times a second across multiple cores and pathways

This is from the article, not the company, but unless I'm misreading it, it's just wrong


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: