Skip to main content

How Search Works in Keenious

Keenious turns your plain-language question into a structured academic search and ranks results transparently using both meaning and scholarly cues.

Overview

Keenious helps students and researchers discover and understand academic literature. This page describes how its search works: how a query is matched against a curated index of ~188 million publications spanning all disciplines and more than 100 languages, and what determines the order of results. The same search engine runs whether you type a query in Search or the AI searches on your behalf in Chat.

How It Differs from a Traditional Search

A Keenious search is made for overview as much as for individual results. Instead of one long ranked list to sift through line by line, it returns a set of the best-matching papers and analyses that set as a whole. The results come grouped into research areas: the strongest matches and a picture of the directions the literature takes, at the same time. Digging deeper happens by opening the research area that interests you, not by paging further down a list.

The matching is semantic: a query is understood by its meaning, whatever its phrasing. Exact requirements can be added on top with Boolean-style operators β€” quoted phrases, OR groups, exclusions β€” and filters. The same search works as a loose description of a topic or as a tightly constrained query, depending on the task.

Search matches against titles and abstracts, not full text. A paper can discuss a topic in its body without it being mentioned in the title or abstract; such a paper will not match a query on that topic.

Matching

A query is matched in two ways at once. By meaning: every title and abstract in the index is stored as an embedding β€” a representation of its meaning β€” and the papers closest in meaning to the query are found, whatever words they use. teenage mental health and social media matches papers about "adolescent well-being and problematic smartphone use". By keyword (BM25, short for "Best Matching 25" β€” a standard keyword-ranking method): the same query's exact words count too, weighted toward distinctive terms β€” gene names, acronyms, place names β€” that meaning alone can treat loosely.

Papers as points in a space of meaning: the query sits among its closest papers, which are retrieved regardless of their wording, while unrelated papers sit far away

A paper that scores well on both meaning and keywords ranks highest, with meaning carrying more weight (the merge is a rank fusion). Quoted phrases and filters sit on top of all of this as hard requirements β€” see Search Syntax.

Ranking

Results are ordered by relevance first. Scholarly signals then adjust the order β€” each a modest adjustment that reorders similarly relevant papers rather than overriding relevance. The signals apply independently and are listed in no particular order:

  • Citations β€” more-cited papers rank higher. Citation counts are computed within the curated index, so they can be lower than in databases that count against a broader corpus.

  • Recency β€” recently published papers receive a small boost that fades with age.

  • Field-weighted citation impact (FWCI) β€” citation performance normalized by field, year, and document type, so papers from low-citation and high-citation fields are compared on the same scale.

  • Peer-review status β€” venues listed in the Norwegian Scientific Index receive a boost, Level 2 channels more than Level 1. Venues not listed receive no boost, but no penalty either.

  • Venue and publication type β€” journal and conference publications rank slightly above works without an identifiable venue; review and research articles rank slightly above other document types.

  • Language match β€” papers written in the query's language are boosted. English-language work dominates global citation counts, so without this signal, queries in other languages would return mostly English results.

The Result Set

A traditional keyword database can report a total β€” "1,247 results" β€” because a record either matches or it doesn't. Semantic matching has no such boundary: relevance never objectively ends, and any automatic cutoff would be arbitrary. A search therefore returns a fixed-size result set β€” the 300 best-ranked papers by default, adjustable from 100 to 1,000. The number of results is a setting, not a measurement of how much literature exists. (One exception: when quotes or filters leave fewer matches than the set size, the smaller number returned is a real count.)

Everything that follows operates on this set. Research areas are computed from it, and sorting reorders it β€” sort by citations and you get the most cited of those 300, not of the index. The fixed set is also what makes sorting meaningful: without a relevance boundary, the most-cited papers only loosely related to the query would dominate. The size trades focus for coverage β€” a smaller set stays on the core of a narrow topic, a larger one gives a broader overview and more research areas.

For tasks that depend on exhaustive, documented retrieval β€” a systematic review, for example β€” this matters: a search returns at most 1,000 papers and makes no claim of completeness. Keenious is a starting point for that kind of work, not the search of record.

Every search is saved with a permanent link, and opening that link always shows the same results. Typing the same query again later is a new search β€” and a new search can return different results, as the index is updated.

Research Areas

The results of a search come grouped into research areas β€” clusters of related papers within the result set. Papers whose meanings sit close together (by the same embeddings used for matching) form an area, and each paper belongs to exactly one. The number of areas scales with the size of the result set, up to 15.

Result papers shown as points, colored into three labeled clusters β€” each cluster of nearby papers is a research area

Each area is named β€” by a language model β€” for what sets it apart from the rest of the results, not for the query: a search about social media and teenage mental health gets "Cyberbullying and Online Harassment" and "Screen-Time Interventions", not "Social Media Studies". The names are not a fixed taxonomy, and the same paper can sit under a differently named area in another search.

Selecting research areas filters the result list; it does not change the ranking within them. The areas are computed from the result set itself, so the same search always produces the same areas, and changing the search β€” the query, the filters, or the result size β€” recomputes them.

Search Syntax

Example

Effect

gene editing therapy

Matched by meaning and keywords (default)

"CRISPR-Cas9"

Must appear as an exact phrase in the title or abstract

"CRISPR" OR "TALEN"

At least one must appear

-mice or -"in vitro"

Must not appear in the title or abstract

2021 (a bare four-digit year)

Papers published that year are boosted; others still appear

Quoted phrases. A quoted phrase is a hard requirement: every result must contain it in its title or abstract. A multi-word phrase must appear as consecutive words in the given order β€” "gene therapy" matches "a gene therapy trial" but not a paper where gene and therapy only appear in separate sentences. Matching ignores capitalization, accents ("Zurich" matches "ZΓΌrich"), and punctuation ("CRISPR-Cas9" also matches "CRISPR Cas9"). It does not ignore word forms: quoted matching has no stemming, so "vaccine" does not match a paper that only writes "vaccines", and "mouse" does not match "mice". Unquoted text is unaffected by this β€” keyword matching on unquoted words handles word forms, and semantic matching is independent of wording altogether.

A quoted phrase also remains part of the query for semantic and keyword matching; the quotes add the requirement on top rather than replacing the term's role in matching.

OR groups. OR operates between adjacent quoted phrases: in "CRISPR" OR "TALEN" "off-target", at least one of CRISPR or TALEN must appear, and off-target must appear. Longer chains work the same way ("mouse" OR "mice" OR "murine"). Between unquoted words, OR has no operator meaning and is read as ordinary text.

Exclusions. -term and -"phrase" remove every paper whose title or abstract contains the term, following the same matching rules as quoting β€” including exact word forms, so -mouse does not remove papers that only write "mice". Excluded terms are stripped from the query before matching: they only remove results, they do not influence what the rest of the query matches.

Years. A bare four-digit year (1900 up to next year) boosts papers published in that year; papers from other years still appear, without the boost. Several years can be given (2023 2024), and the year also remains part of the ordinary query text. A hard cutoff is a job for the year filter, not the query.

Operators combine freely: gene editing "CRISPR-Cas9" -mice matches the concept gene editing semantically, requires CRISPR-Cas9 in the title or abstract, and removes papers mentioning mice. All quoted and excluded terms are matched against titles and abstracts only β€” a quoted term excludes any paper that does not use it in those two fields, even if the term appears in the paper's full text.

Filters (publication year, document type, peer-review status, open access, and others) are hard constraints rather than ranking signals: a filtered-out paper is removed before ranking and will not appear regardless of how well it matches.

Why a Paper Does or Doesn't Appear

A result looks off-topic. Semantic matching retrieves by closeness of meaning, and the matched concept is normally visible in the result's title or abstract. Quoting a term that must appear, or excluding one that shouldn't, narrows the results.

An expected paper is missing. Common causes:

  • The wording only appears in the full text. Titles and abstracts are the searchable fields; a topic that is not visible there does not match.

  • A quoted term is not in the title or abstract. Quoted terms are hard requirements. Without the quotes, the paper can still match semantically.

  • A filter excludes it. A year range or peer-review filter removes everything outside it.

  • It is not in the index. Book chapters, master's theses, meeting abstracts, retracted works, and several source types are excluded from the OpenAlex dataset.

  • It is outside the result set. A search returns a fixed number of best-ranked papers (see The Result Set). A paper can match without making the cut β€” a more specific query or a larger result size brings it in.

  • It is very new. The index synchronizes with OpenAlex regularly; a paper published in the last few days may not be indexed yet.

Frequently Asked Questions

Does Keenious search the full text of papers? No β€” titles and abstracts only. A paper whose topic is visible only in its body will not match a query on that topic.

Can the AI make up a paper? No. Every result is a publication from the index. AI is used to match queries by meaning and to name research areas β€” never to generate results.

Why did I get fewer results than the size I chose? Quoted terms or filters left fewer papers than the set size β€” in that case the number shown is a real count of matches. See The Result Set.

If I run the same search next month, will I get the same results?

Redoing the same search within a short period will give the same results, but the Research Area labels might differ. Executing the same search some weeks or months later can differ in results as the index is updated with new and corrected records.

Did this answer your question?