Show HN: HTML to Markdown with CSS selector & XPath annotations for LLM Scraper

  • Posted 9 hours ago by andrew_zhong
  • 3 points
https://github.com/lightfeed/scrapedown
HTML-to-Markdown converters produce clean, readable content for both humans and LLMs — but the DOM structure is lost along the way. You can always feed Markdown to an LLM to extract structured information, but that costs tokens on every page, every time.

What if the LLM could also see where each piece of content lives in the DOM? Then it can generate robust scraping code — stable selectors and XPaths that run without any LLM in the loop, saving tokens and improving accuracy on long or repetitive pages.

Scrapedown does exactly this: it converts HTML to Markdown and annotates each element with its CSS selector and/or XPath, so an LLM can produce precise, reusable scraper code in one shot.

0 comments