I'm a bit confused, you are now storing the IPv4 addresses in a hash table using...

majke · on March 2, 2020

Apologies, maybe I oversimplified the original problem. I'm dealing with IP's (both v4 and v6), subnets, ranges (which may or may not align to subnets). These map to one or more datacenter numbers.

I could indeed define data model, parse the data thoroughly, optimize in-memory data structure, and so on. That requires rigid data structure, knowing access pattern and understanding the problem space. I'm not there yet. Instead, I created this generic tool which works with any text files, and fell into a rabbit hole of over-optimizing it. That's it.

taywrobel · on March 2, 2020

FWIW, you should be able to represent individual IPs, ranges, and subsets all in CIDR notation, tho for ranges you may need multiple CIRD entries to reflect the whole range.

CIDR for ipv4 consists of the 32 bit address and a 32 bit mask, so with some bit packing you can uniquely represent them in 64 bits without hashing.

The problem you’ll run into there is doing a “contains” check on an origin IP for a list of CIDRs, but you’ll need to do that currently since you’re dealing with subnets, I assume.

jsn · on March 2, 2020

32 bit mask is way too generous, you only need 5 bit masklen. It all doesn't matter though since they have v6 addresses and ranges.

willvarfar · on March 2, 2020

Would save a lot of space to just have separate lists for each mask length etc.