I liked one that involved implementing a graphing library for a memory-mapped bitmap display. It started with drawing a circle of radius R with origin X, Y but involved several iterations of optimization. You can get into a lot of interesting performance areas (caching, reuse, loop unrolling, etc.).

