Ask HN: What do you use for local embeddings?

  • Posted 3 hours ago by asim
  • 4 points
I built this project Reminder (https://github.com/asim/reminder) when I was first learning about LLMs and RAG. It used OpenAi to generate embeddings and then Fanar for the LLM. I'd basically index the corpus of text, ask the question for the embeddings and pass the results to the LLM for more accurate results and summarisation. It worked quite well. But there's two problems. One it depends on OpenAi embeddings and using their API to continue that. And then the second is that it uses the Fanar model which is quite slow. So I was thinking about switching to full text search. But then I lose semantic search in a way. So then I started looking into different ways to do it. I'm running a fairly Low-Cost VM. No. Gpus. I tried using olama with nomic-embed-text but it's quite slow. I was thinking about using ONNX but wasn't quite sure. I'm just curious what other people are doing. I mean, do you just bite the bullet and leave it and then use a more powerful model for the llm so that's faster?

It just curious to know what people are doing. I know a lot of people run stuff on their Mac Mini and things like that, but I don't have a Mac mini.

I've also already had this discussion with Claude, but sometimes you know it's kind of nice to have a collection of human opinions. Because even though Claude is using a you know effectively an experience of the internet from the past. It can't really capture the sentiment of people now in this moment.

0 comments