As a grad student (and an ADHDer), I had trouble doing literature review systematically. To combat this, I made a website that finds similar papers using the meaning of the thing I am looking for.<p>I used MixedBread&#x27;s [^1] embedding model to generate vectors from the abstracts. I store and search similar vectors using Milvus [^2] and finally use Gradio [^3] to serve the frontend. I update the vector database weekly by pulling the metadata dataset from Kaggle [^4].<p>To speed up the search process on my free oracle instance, I binarise the embeddings and use Hamming distance as a metric.<p>I would love your feedback on the site :)
Happy Holidays!<p>[1]: <a href="https:&#x2F;&#x2F;www.mixedbread.ai&#x2F;docs&#x2F;embeddings&#x2F;mxbai-embed-large-v1" rel="nofollow">https:&#x2F;&#x2F;www.mixedbread.ai&#x2F;docs&#x2F;embeddings&#x2F;mxbai-embed-large-...</a>
[2]: <a href="https:&#x2F;&#x2F;milvus.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;milvus.io&#x2F;</a>
[3]: <a href="https:&#x2F;&#x2F;www.gradio.app&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.gradio.app&#x2F;</a>
[4]: <a href="https:&#x2F;&#x2F;www.kaggle.com&#x2F;datasets&#x2F;Cornell-University&#x2F;arxiv" rel="nofollow">https:&#x2F;&#x2F;www.kaggle.com&#x2F;datasets&#x2F;Cornell-University&#x2F;arxiv</a>

Show HN: I made a website to semantically search ArXiv papers

33 comments