Show HN: Loclean – Local semantic data cleaning with LLMs and Pydantic

  • Posted 3 hours ago by nxank4
  • 1 points
https://github.com/nxank4/loclean
Hi HN, I’m the author of Loclean.

I built this because I work with sensitive data that I can't send to OpenAI, but traditional tools like Regex were too brittle for the messiness of real world inputs (like address typos or inconsistent date formats).

Loclean is a Python library that: - Runs entirely locally (CPU friendly) using quantized models via llama-cpp-python. - Uses Pydantic to enforce strict schemas (no more hallucinations or invalid JSON). - Compatible with Pandas/Polars/PyArrow workflows.

It's designed to be a "middle ground" between rigid Regex and expensive/risky Cloud LLMs.

Repo link: https://github.com/nxank4/loclean

I’d love to hear your feedback on the API design or use cases you might have for local data scrubbing. Thank you!

0 comments