HN – Hybrid local and cloud LLM stack for regulated financial document processing?

I'm scoping a hybrid AI pipeline for a consulting client in a regulated industry (GLBA-covered, NPI involved). Trying to validate the architecture before bringing on an engineer to build it.

The workflow: ingest financial PDFs (bank, brokerage, retirement statements, tax returns), classify by asset type, extract data, apply domain-specific business logic, populate Excel templates and fillable PDF forms. Compliance constraint: no NPI can hit a cloud API without ZDR-style controls.

Current architecture sketch: - Local LLM (Ollama or LM Studio) on dedicated hardware for OCR and first-pass extraction - Local PII scrubber/tokenizer (Presidio or Skyflow) replaces identifiers with tokens before any cloud call - Cloud LLM under enterprise terms (Claude API with ZDR, or Bedrock equivalent) for the reasoning layer - Local de-tokenization and template population

Questions for anyone who's actually shipped this pattern: 1. What stack did you land on, and what would you do differently? 2. Local model for financial document OCR + structured extraction - is Qwen2.5-VL still the move, or has something better landed? 3. Tokenization layer: roll your own with Presidio, or pay for Skyflow / Private AI? 4. Orchestration: LangGraph, n8n, or custom Python? 5. Is an M4 Max Mac realistic for a single-user workflow at 50-200 PDFs per case, or do I need to plan for proper inference hardware?

Already evaluated turnkey hybrid platforms (LLM.co, PremAI, Petronella) - leaning toward an assembled stack for cost and control reasons, but open to being talked out of it if someone's had a great experience with one of these.

Not looking for "just go fully local" (reasoning quality is important for this build) or "just use the API" (data constraints are real). Production-tested stacks only.

Hybrid local and cloud LLM stack for regulated financial document processing?

1 comments