I built a minimalist AI news aggregator while teaching myself NLP. It's a single static HTML page that updates every 4-6 hours with clustered AI news stories - no accounts, no cookies, no JS frameworks.
Technical approach:
- Python scraper for AI/tech sources
- Custom NLP pipeline with BERT embeddings to filter for AI-specific content
- Hierarchical clustering to group related stories
- ChatGPT API for generating cluster titles and short summaries
- Served as a static HTML via Cloudflare Pages
- Lightweight analytics with GoatCounter and Umami (understanding these two frameworks to choose one over the other)
- Experimental JSON-based search (considering proper search if this scales)
The project started when I realized I was wasting hours daily checking multiple sources as a PM trying to track AI developments. Built this over 3 months between work commitments.
Interesting challenges:
- Finding the right threshold for story similarity (still tuning this)
- Balancing comprehensive coverage with noise filtering
- Keeping the page lightweight while maintaining content density
Would appreciate feedback on clustering accuracy, false positive/negative rates, and overall UX.
Link: https://currentai.news