I've only had a chance to skim so far, but this is an absolutely superb survey of an enormous space. It doesn't just list out a bunch of data structures, it actully helps you build a mental model of all the concerns to think about when building or using data structures in an application.
More recommendations on this topic? Paper's great. I also know about Martin Klepmann's Designing data intensive applications but it's less about data structures and more about databases.
It does not mention Array of Structures and Structure of Arrays dichotomy, which is extremely important if you design a structure to hold data for analytics of any kind.
6.1 talks about tradeoffs and reasons for row vs column oriented storage. So they do discuss it, just not in the array of struct and struct of array terms.
I Introduction 2
II Performance Metrics and Operational Tradeoffs 9
III Dimensions of the Data Structure Design Space 21
IV From Workloads to Data Structures 67
V Adaptivity: Evolving Data Structures to a Workload 93
VI Data Structures for Specific Application Domains 109
VII Challenging Design Considerations 124
VIII Summary 132
(along the way I saw some interesting figures, so, for me, this exercise was worth the time)
Yes I was going to do the same but wanted to learn a general way to do this task along the way too and get more details (second level sections as well).
======================
Step 1:
Use pdftk to extract metadata
> pdftk input.pdf dump_data output > raw.txt
Step 2:
copy relevant sections of metadata "Bookmarks" into a new tetx file.
Step 3:
Upload this text file and ask ChatGPT to format it as you want -- markdown / HTML / plain text.
Took maybe 10 minutes and this was also worth it for a different reason (learning one more bit about pdftk capabilities).
=======================
CONTENTS
1. Introduction (2)
1.1. Data Structures Are Foundational (2)
1.2. Tradeoffs in Data Structure Design (4)
1.3. Audience & Prerequisites (7)
1.4. Learning Outcomes (8)
1.5. Overview of the Book (8)
2. Performance Metrics and Operational Tradeoffs (9)
2.1. Memory Hierarchy (9)
2.2. From Read/Update to RUM: Memory & Space Costs (10)
2.3. RUM Performance Costs (11)
2.4. From RUM to PyRUMID (14)
2.5. Chapter Summary (17)
2.6. Questions (18)
2.7. Further Readings (19)
3. Dimensions of the Data Structure Design Space (21)
3.1. Global Data Organization (24)
3.2. Global Search Algorithms When Not Using an Index (26)
3.3. Search When Using an Index (32)
3.4. Local Data Organization (46)
3.5. Local Search (48)
3.6. Modification Policy: In-place vs. Out-of-place (48)
3.7. Buffering (53)
3.8. Key-Value Representation (55)
3.9. Summary of the Design Space Dimensions (57)
3.10. Data Structure Design Expert Rules (58)
3.11. Chapter Summary (61)
3.12. Questions (61)
3.13. Further Readings (64)
4. From Workloads to Data Structures (67)
4.1. Point and Range Queries, Modifications, but Rare Scans (Traditional B+-trees and Learned Tree Indexes) (68)
4.2. Similar Workload With a Working Set That Fits in Memory (Fractal B+-trees) (70)
4.3. Point and Range Queries, Rare Scans, More Modifications (Insert-Optimized Search Trees) (70)
4.4. Mixed Workload With No Short Range Queries (Hybrid Range Trees) (75)
4.5. Mixed Workload, With Ever Increasing Data Size (Radix Trees) (78)
4.6. Point Queries, Inserts, and Some Modifications (Static Hashing with Overflow Pages) (79)
4.7. Read-mostly With Long Range Queries (Scans with Zonemaps) (81)
4.8. Modification-intensive With Point and Range Queries (LSM-tree) (82)
4.9. Modification-intensive With Point Queries Only (LSM-hash) (84)
4.10. When to Design Heterogeneous Data Structures (85)
4.11. Data Structures in Practice (86)
4.12. Chapter Summary (88)
4.13. Questions (89)
4.14. Further Readings (92)
5. Adaptivity: Evolving Data Structures to a Workload (93)
5.1. Design Dimension: Reorganization Aggressiveness (96)
5.2. Adaptivity for Frequently Accessed Data (96)
5.3. Adaptivity for Value-Organized Data (97)
5.4. Aggressiveness of Adaptivity during Initialization (100)
5.5. Partial Adaptive Indexing (101)
5.6. Adaptive Modifications (102)
5.7. Adaptivity and Concurrency (103)
5.8. Adaptivity Metrics (104)
5.9. Open Topics (105)
5.10. Chapter Summary (106)
5.11. Questions (107)
6. Data Structures for Specific Application Domains (109)
6.1. Data Structures in Relational Database Systems (109)
6.2. File Systems & Memory Management (117)
6.3. Data Structures in Machine Learning Pipelines (118)
6.4. Cross-System Design Considerations and Tradeoffs (120)
6.5. Chapter Summary (121)
6.6. Questions (122)
6.7. Further Readings (123)
7. Challenging Design Considerations (124)
7.1. Concurrency (125)
7.2. Distributed Systems (127)
7.3. Emerging Workload Types (127)
7.4. Hardware Considerations in Data Structure Implementation (128)
7.5.Chapter Summary (129)
7.6. Questions (129)
7.7. Further Readings (130)
8. Summary (132)
Acknowledgments (134)
References (135)