Hacker News new | past | comments | ask | show | jobs | submit login

Reading through a codebase without a specific purpose is a very slow way to get up to speed.

The best advice I've heard is "start from the data structures, they are at the heart of everything".

I like inventing purposes while reading through code if I don't have a real one. I call it "scavenger hunting" and often use it as a programming exercise when teaching. For example here is a scavenger hunt for CPython and Postgre - which lines of code would you need to touch to achieve these purposes?

https://github.com/python/cpython In json module: add parameter to load() and loads() that when True only allows spaces as whitespace characters in the json (and not tabs or newlines)

In re module, python regex has syntax for a digit (\d) and for an alphanumeric or underscore character (\w). Add \l that means "any letter".

In re module, python regex has syntax for hex escapes (e.g. \x0a), octal escapes (e.g. 0o012), unicode escapes (e.g. \u000a). Add binary escapes (e.g. \q00001010)

In python C implementation: Replace the hash function used when inserting a string key to a dictionary

In python C implementation: Add an average() builtin function (like e.g. max(), sum())

https://github.com/postgres/postgres Better string hashing algorithm for query plans that involve hashing a column

"Better column combination hashing algorithm for query plans that involve hashing a column

Some queries will want a hash on (a, b) which involves calculating hash(a), hash(b), and then combining them into combination_hash(hash(a), hash(b))"

"Add a new kind of step to execution plans

e.g. a step that logs to a cloud logging service. No need to compile it into queries, just to add the data structure for it and code that runs it when relevant so it could be compiled"

"Add gzip compression as a way to avoid page splits

Today when a page is about to be split, bottom up deletion and deduplication are attempted to keep the page from splitting. We want to add GZIP compression as a third way to try and avoid expensive page splits."

"Add a complex number column type

The column should hold a float for the real part and a float for the imaginary part"

Add Configuration Flag to Never Use Covered Index Optimization

"Add new type of index

e.g. index that runs Principal Component Analysis to index high dimensional data, but don't care about the details, just the integration points"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: