Recognizing the closest font from an image of text accurately has always been a challenge. Even SOTA vision LLMs can detect basic characteristics like if a font is a serif or a sans-serif, but when it comes to the actual font file that has the most similar styles, vision LLMs will often be quite off for the design and style of a chosen font file.
Popular font identification solutions on the internet today also have a two major shortcomings:
1. Most of them work based off identifying particular characters and don’t look at an overall word or image, so they require a manual character segmenting step. 2. Most online tools identify fonts from on a proprietary dataset of commercial fonts in order to monetize. However, this makes it difficult when building applications because often times the recognized fonts can’t be used or downloaded easily.
To solve this, I trained the Lens model on a large collection of open-source fonts (mostly Google Fonts, but also a few other open source / freely available fonts). The Lens model is also trained on a large collection of images, each which contain a variety of font styles and weights. As far as I can tell, Lens is the most accurate model right now that maps images to fonts in a single step.
I came up with the idea to create this model when I was playing around more with vibe coding tools like Claude Code and Codex. I noticed that when I would ask AI to implement a screenshot exactly, one of the main issues was always the fonts that the AI would choose. They would never match the screenshot exactly, and then swapping out the fonts for a usable alternative was always a pain for me personally.
Under the hood this is a classification model built on top of the Resnet 18 architecture. It is trained on over 1000+ fonts and 1.5m+ images.
The entire inference stack and model weights are available on Github: https://github.com/mixfont/lens. I also created a playground for the model on my website where it can be demo’d for free: https://www.mixfont.com/models/lens
Blog post on the training process coming soon! Would love to hear what you all think and some potential applications of a model like this. Happy to answer other questions in the meantime - thank you for checking it out!