codebased

Embedded AI search engine for code

MIT License

Downloads
4.5K
Stars
17
Committers
1

Bot releases are visible (Hide)

codebased - 0.6.0 Latest Release

Published by mhconradt about 1 month ago

  • Improved the default radius argument to include more semantic search results and account for difference in vector spaces.
  • Respect top k AND radius in semantic search to avoid huge result sets that overload the reranker.
  • Fix parsing of TypeScript (and JavaScript) constants. Thanks to @sridatta for reporting this issue.
codebased - 0.5.2

Published by mhconradt about 1 month ago

Fix bug where we would stop a thread before it was started and edit reranking system prompt to less aggressively filter results. Before it would unhelpfully stop around 10 results.

codebased - 0.5.1

Published by mhconradt about 1 month ago

This is a small bug-fix release where I fixed an issue with the preview not being removed when the result set became empty.

codebased - 0.5.0

Published by mhconradt about 1 month ago

Codebased 0.5.0 introduces two new parameters:

  1. --rerank/--no-rerank (enable/disable reranking): Re-ranking, which uses gpt-4o-mini to re-order / filter results after the initial retrieval stage, is now on by default.
  2. --radius: The maximum L2 distance for the semantic search, defaults to 1, which was chosen pretty informally but seems to work well. Intuitively, the average L2 distance between two random vectors in a high-dimensional space is approximately the square root of 2, because usually these vectors will be orthogonal. In practice, if you randomly sample embedding vectors of objects from the same codebase, they're somewhat normally distributed around a value slightly less than this, for a number of reasons that I'll leave as an exercise for the reader. Previously, semantic search used only "top k", but this would include irrelevant results if there was not a good match.
    Also, I increased the default top k, which is now used only for full-text search, to 32. This ended up being the number that worked well for me during testing, especially after re-ranking, so I wanted to make it the default.
    In the future, top k might be increased or removed entirely.
codebased - 0.4.22

Published by mhconradt about 1 month ago