Lessons from building retrieval systems for AI assistants

Hybrid search

While semantic search using vector embeddings performs well for capturing rephrased or paraphrased meanings, it might not do well on searches that involve rare terms or jargon. In these cases, combining semantic search with the more traditional sparse retrieval techniques (BM25 or TF-IDF), which incorporate aspects like keyword frequency, often helps improve the retrieval process. In order to incorporate both of these types of retrieval mechanisms, you could have chunks be assigned both scores, with the final score being a weighted combination of the two, or you could use sparse retrieval as a first-pass filter followed by semantic search.

Reranking – the final step

Once you have run the initial search to retrieve relevant chunks, performing a final step of ranking these results helps to ensure that the most useful information is presented to the user. The reason for this is that although the chunks might technically be similar, they might not be the most helpful answer to the user’s query.

There are a few different ways in which reranking is done in practice. One approach is to use heuristics on certain metadata of the chunks, such as the author, date, source reliability, etc. A benefit of this approach is that it is usually computationally inexpensive and fast.

READ SOURCE