HN – Show HN: Why is ML inference still so ad-hoc in practice?

Every place I’ve seen run more than a couple of ML models in production ends up with a mess of bespoke inference services: different APIs, different auth, different logging, half-working dashboards, and tribal knowledge holding it all together.

I’ve been building a small side project that tries to standardize just the serving part — a single gateway in front of heterogeneous models (local, managed cloud, different teams) that handles inference APIs, versioning/rollback, auth, basic metrics, and health checks. No training, no AutoML, no “end-to-end MLOps platform”.

Before I sink more time into it, I’m trying to figure out whether this is:

a real gap people quietly paper over with internal glue, or

something that sounds useful but collapses under real-world constraints.

For people actually running ML in prod:

Do you already have an internal inference layer like this?

Where does inference usually go wrong (deployments, versioning, debugging, compliance)?

At what scale does it stop being worth abstracting at all?

Not announcing anything — genuinely curious whether this resonates or if I’m just rediscovering why everyone rolls their own.

Show HN: Why is ML inference still so ad-hoc in practice?

0 comments