Hacker News new | past | comments | ask | show | jobs | submit login

I'm a bit confused, you are now storing the IPv4 addresses in a hash table using a 64-bit hash?

Why not just use the 32-bit address as a key, and grow the 'blocks' so if two addresses are just a couple of digits apart, promote it to a /24 block etc.




Apologies, maybe I oversimplified the original problem. I'm dealing with IP's (both v4 and v6), subnets, ranges (which may or may not align to subnets). These map to one or more datacenter numbers.

I could indeed define data model, parse the data thoroughly, optimize in-memory data structure, and so on. That requires rigid data structure, knowing access pattern and understanding the problem space. I'm not there yet. Instead, I created this generic tool which works with any text files, and fell into a rabbit hole of over-optimizing it. That's it.


FWIW, you should be able to represent individual IPs, ranges, and subsets all in CIDR notation, tho for ranges you may need multiple CIRD entries to reflect the whole range.

CIDR for ipv4 consists of the 32 bit address and a 32 bit mask, so with some bit packing you can uniquely represent them in 64 bits without hashing.

The problem you’ll run into there is doing a “contains” check on an origin IP for a list of CIDRs, but you’ll need to do that currently since you’re dealing with subnets, I assume.


32 bit mask is way too generous, you only need 5 bit masklen. It all doesn't matter though since they have v6 addresses and ranges.


Would save a lot of space to just have separate lists for each mask length etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: