Skip to main content

What's behind ResearchRabbit's search algorithm?

Learn more about how ResearchRabbit generates the results you see during search.

Digl avatar
Written by Digl
Updated over a month ago

πŸ‘‹ This article is on how our search algorithm functions. If you're looking to learn how to use search, try these guides instead:
​
​​How to search in ResearchRabbit

ResearchRabbit's Search Algorithm

When you start a search in ResearchRabbit, you always supply seed articles. These are used as a starting point for our algorithm, informing the two components of any ResearchRabbit search:

  1. Which articles are candidates to be results

  2. How those results are ranked

How Candidate Articles are Chosen

Rather than searching the entire universe of academic articles, ResearchRabbit identifies a set of candidates to choose from. These candidates are articles which are connected to your seed articles.

In ResearchRabbit, "connected" can mean a few different things:

  • Most commonly, "connected" articles are where one cites the other

  • Connected articles can also share a reference or citation

  • Connected articles could also have shared authors, or co-authorships

  • Sometimes we'll infer a connection between articles based on semantic similarity of their titles and/or abstracts

When you run a search, ResearchRabbit will look at your seed articles, and, from them, compile a list of valid candidates.

Depending on how many seeds you've chosen this list can get very large: sometimes up to hundreds of thousands! Not all of these will be useful to your research, and the sheer number of them makes them unreasonable to browse manually. That's why the next step is very important: ranking candidates.

How Candidates are Ranked

Once the list of candidates has been compiled, we need to rank them according to how relevant they are to your seed articles.

To do this we run a scoring algorithm which weighs how relevant each candidate is to the set of seed articles. Relevance is measured in connectedness – exactly as described above, using is a mix of shared citations/references, authors/co-authorships, and semantic similarity.

There's some complex mathsy stuff that happen at this stage too, to make sure articles aren't unfairly promoted or demoted, but we won't go into details on those.

πŸ’‘ Note that above we've been talking about articles, but the same logic holds true when searching for authors too. We use connection factors like authorships to create lists of candidates, and rank them appropriately.

The ResearchRabbit Algorithm in Practice

We can use the interface in ResearchRabbit's Search Settings panel to see how the algorithm is controlled in practice.

Basic Search Settings

Basic search settings allow you to control how the candidate set is selected.

  • Articles

    • Similar: Candidates are chosen from all types of connection to seeds

    • References: Candidates are only chosen from the references of seeds

    • Citations: Candidates are only chosen from the citations of seeds

  • Authors

    • These Authors: Candidates are authors from within the seed articles' authors

    • Other Related Authors: Candidates are chosen from author lists of related articles (found using the Articles: Similar approach - above)

How Advanced Search Settings Work

Advanced Search Settings allow you to apply additional filters to constrain which candidate articles are considered valid.

  • Keywords

    • Candidate articles much mention the supplied phrase in their title or abstract

    • Choose from the suggested keywords, or write your own

    • Note that boolean search operators are not available in keyword filters

    • This is a loose keyword match (so "fish diversity" will find both "fish" and "diversity")

  • Publication Date

    • Candidate articles much be published within the given date range

  • SJR Quartiles

    • Candidate articles must have been published in a journal with one of the specified SJR Quartiles

  • Journal H-Index

    • Candidate articles must have been published in a journal within the specified H-Index range.

    • A reminder that this is a Journal H-Index, not an Author H-Index!

  • Open Access

    • Candidate articles must have Open Access PDFs

  • Retractions

    • When selected, candidate articles must either be retracted, or not be retracted.

πŸ’‘ Candidate filtering is a simple but powerful concept, check out the Advanced Search using RR+ guide for examples of workflows it enables.

Controlling the Result Ranking Algorithm

Result ranking algorithms are complex, involving lots of magic numbers and behavioural adjustments. Adding controls for these individual parameters can dramatically change results in a way that's hard to reason with, and rarely useful.

Because of this, ResearchRabbit's result ranking process is currently fixed and cannot be fine-tuned. This is something we're open to in future with further development work. If you have ideas about how you'd like to fine-tune the search algorithm, please get in touch!

Did this answer your question?