I ran into Cloudflare’s Markdown for Agents and thought it was exactly what I needed for LLM web research. Then I realized it only helps when a site is on Cloudflare and has it enabled, so it doesn’t solve “open web” extraction.
I built a simple HTML→Markdown pipeline in Rust that works on any public URL (strip scripts/styles/boilerplate, preserve structure + links). On a 100-URL set it reduced input size by ~70–80% (often close to 80%).
Benchmark on the same 100 URLs:
Rust server mode: p50 ~0.4s, p95 ~1.3s, memory ~100MB stable
Node baseline (JSDOM + Turndown): p50 ~1.2s, p95 ~50s, memory grew into hundreds of MB to GBs
Scripts + methodology are in the repo: <link>
Curious what others use for boilerplate removal and how you keep p95 tails under control when parsing nasty pages.
Quick concrete example of the “Artifacts” bit (the part I’m most excited about):
I can point Tandem at a folder (docs / logs / research notes) and ask for a single HTML dashboard/report—it writes the file into the workspace, and you can preview it in-app before sharing.
Under the hood it’s “Plan Mode” first (propose steps + files to touch), then execute only after approval.
If you’re curious, which demo would you rather see next?
A) web research refresh → updated report
B) legal research pack → brief + citations
C) script studio → outline + beat sheet
I built a simple HTML→Markdown pipeline in Rust that works on any public URL (strip scripts/styles/boilerplate, preserve structure + links). On a 100-URL set it reduced input size by ~70–80% (often close to 80%).
Benchmark on the same 100 URLs:
Rust server mode: p50 ~0.4s, p95 ~1.3s, memory ~100MB stable
Node baseline (JSDOM + Turndown): p50 ~1.2s, p95 ~50s, memory grew into hundreds of MB to GBs
Scripts + methodology are in the repo: <link>
Curious what others use for boilerplate removal and how you keep p95 tails under control when parsing nasty pages.