HN – Show HN: API to turn entire websites into Markdown

While building mendable - we found that feeding LLMs well-structured markdown improved accuracy. We also found it surprisingly hard.

We found some great tools online, but none reliably handled the entire process. We wanted an API that took a URL, crawled the pages in the URL, and gave us an easy-to-use, up-to-date markdown we could feed into our index.

So, we released an open-source repo and an API that crawls and turns entire websites into a markdown with just a few lines of code

The API handles:

- Crawling without consistent sitemaps - Infra to handle running many crawling jobs - Proxying, hosting headless browsers at scale - Conversion to clean markdown - Caching - Handling images, videos (soon), and tables(soon) - LLM extraction (soon)

It is open source, and we also offer an easy-to-use API that starts free. It has built-in loaders for both @llama_index and @langchain.

Excited to see people try it

Show HN: API to turn entire websites into Markdown

4 comments