Show HN: Side-by-side PDF parser comparison for RAG pipelines

  • Posted 5 hours ago by 2dogsanerd
  • 1 points
https://github.com/2dogsandanerd/rag_pdf_audit
A simple tool to compare how different PDF parsers handle your documents.

Shows naive parsing (pypdf) vs layout-aware parsing (Docling) side-by-side.

Helps spot issues with scans, tables, and multi-column layouts before theycause problems in your RAG system.

Parsers are easy to swap if you want to try alternatives.

1 comments

    Loading..