Replacing Elasticsearch with PostgreSQL Alone: Hybrid Search with VectorChord-BM25 and RRF
If you've ever stood up a separate Elasticsearch cluster just to add a search feature, you know how heavy that operational burden can be. Schema synchronization, indexing lag, transaction consistency issues between two systems... honestly, more than once I've wondered why we even need Elasticsearch sitting alongside PostgreSQL when the data is already there. One thing teams that have removed Elasticsearch consistently mention is that "sync bugs disappeared" — all those hours spent checking the indexing pipeline status after every deployment, and debugging which side was correct whenever a transaction mismatch occurred.
In March 2025, the tensorchord team released the vchord_bm25 extension written in Rust, offering a fairly realistic alternative to this pain. It's not just full-text search (BM25) that came along — combined with VectorChord's vector search, it's now possible to complete hybrid search entirely within PostgreSQL.
This article covers how VectorChord-BM25 works, and how to implement hybrid search on a single PostgreSQL stack by combining BM25 keyword search and vector similarity search with RRF (Reciprocal Rank Fusion). We'll walk through working SQL and Python code, a pros-and-cons analysis, and common pitfalls to avoid when getting started. Code examples are based on an English corpus. If you're curious about pg_tokenizer configuration for Korean text, check the upcoming topics preview at the end of this article.
Core Concepts
Why BM25 Is Better Than TF-IDF
BM25 (Best Match 25) is the standard in information retrieval — good enough for Elasticsearch to adopt it as its default ranking algorithm. Like TF-IDF, it combines term frequency (TF) and inverse document frequency (IDF), but there's a critical difference.
Just because a word appears 100 times in a long document doesn't necessarily mean it's more relevant than a word appearing 10 times in a short document. BM25 addresses this with parameters k1 (term frequency saturation) and b (document length normalization strength). A higher k1 increases the impact of term repetition, and a b closer to 1 strengthens document length normalization.
BM25(d, q) = Σ IDF(t) × [TF(t,d) × (k1 + 1)] / [TF(t,d) + k1 × (1 - b + b × |d|/avgdl)]At first glance this formula looks complicated and I assumed it was basically the same as TF-IDF, but in corpora with many long documents, the difference is quite noticeable in practice.
How VectorChord-BM25 Brings BM25 into PostgreSQL
vchord_bm25 is a PostgreSQL extension that provides several key components.
| Component | Role |
|---|---|
bm25vector |
A dedicated type for storing sparse vectors generated by the tokenizer |
| BM25 index | A custom index in the form USING bm25 (embedding bm25_ops) |
<&> operator |
A distance operator that returns BM25 scores (negative values — lower means more relevant) |
| Block WeakAnd | An early termination algorithm that efficiently retrieves only the top K results |
A quick note on the bm25vector type: it is different from pgvector's vector (dense vector). It is a sparse vector where each dimension corresponds to a vocabulary ID and most values are zero. Only the word indices that appear in the document have values, with everything else being 0 — so even with a vocabulary size in the tens of thousands, the actual storage space is much smaller.
There's one design point worth knowing. The reason to_bm25query takes an index name as its first argument is that the BM25 index itself encodes which tokenizer to use. This enforces that the same tokenizer is always used both at document indexing time and at query conversion time. Not knowing this makes it easy to fall into mistake #2 described later.
What is Block WeakAnd? Rather than scanning the entire document corpus, it maintains a top-K candidate set and skips blocks of documents that are guaranteed to score lower than the current threshold. Elasticsearch uses this algorithm internally as well. It allows accurate top results to be retrieved quickly without scanning the full index.
Why Vector Search Alone Is Not Enough
Vector similarity search (Semantic Search) excels at intent-based queries like "warm jacket for winter." But for queries where the exact spelling matters — such as "iPhone 16 Pro 256GB," a specific function name, or an API endpoint — it can surprisingly underperform. "iPhone 16 Pro" and "Galaxy S25 Ultra" can be semantically close in embedding space.
Conversely, BM25 alone for a query like "recommend a good laptop" will find documents containing the words "good" or "recommend," but may miss documents that actually contain laptop reviews. The two approaches are structured to complement each other's weaknesses.
RRF — A Way to Combine Two Scores on Different Scales
BM25 scores and cosine similarities have different units, so directly summing them causes one to overwhelm the other. RRF (Reciprocal Rank Fusion) sidesteps this problem by using only ranks instead of absolute scores.
RRF_score(d) = Σ 1 / (k + rank(d))k=60 is the default value; this constant buffers the score difference between high and low ranks. For example, the difference between rank 1 (1/61 ≈ 0.016) and rank 10 (1/70 ≈ 0.014) in BM25 is small, and a document ranked 1st in vector search but 5th in BM25 will still have a fairly high combined score.
Why k=60? It's the experimentally validated value from the original paper by Cormack et al. (SIGIR 2009). It works well across most domains, but running A/B tests to find the optimal value for your use case is recommended.
Practical Application
Basic Setup: Creating a BM25 Index
There's an all-in-one Docker image you can spin up immediately, so getting started isn't difficult. tensorchord/vchord-suite:pg18-latest includes VectorChord, BM25, and pg_tokenizer all in one.
-- Install required extensions
CREATE EXTENSION IF NOT EXISTS pg_tokenizer CASCADE;
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;
-- Create a BERT-based tokenizer (for English)
SELECT create_tokenizer('bert_tok', $$ model = "bert_base_uncased" $$);
-- Create the documents table
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
body TEXT,
embedding bm25vector -- dedicated type for storing sparse vectors
);
-- Tokenize raw text and store as bm25vector
INSERT INTO documents (body, embedding)
SELECT body, tokenize(body, 'bert_tok')::bm25vector
FROM raw_texts;
-- Create BM25 index
CREATE INDEX doc_bm25_idx ON documents USING bm25 (embedding bm25_ops);After creating the index, searching looks like this:
-- BM25 search: scores are negative, so ORDER BY ASC = highest relevance first
SELECT
id,
body,
embedding <&> to_bm25query('doc_bm25_idx', tokenize('database search', 'bert_tok')) AS score
FROM documents
ORDER BY score ASC
LIMIT 10;| Code Element | Description |
|---|---|
<&> |
BM25 distance operator. Returns negative values — lower means more relevant |
to_bm25query |
Converts a search query into a BM25 query vector. The index name as first argument determines the tokenizer |
tokenize(...) |
Converts the query string using the same tokenizer as the index |
RRF Combination Query: Merging BM25 and Vector Search
This is the pattern you'll use most often in production. It assumes embeddings generated by pgvector or VectorChord are already present in a vec_embedding column.
There's one important implementation point. When extracting the top 20 by score inside a CTE, placing ROW_NUMBER() OVER (ORDER BY score) and LIMIT 20 at the same level won't behave as expected. PostgreSQL returns arbitrary rows when it encounters a LIMIT without an ORDER BY, and the window function will then rank only those arbitrary rows. This goes unnoticed in small-scale tests but becomes the cause of non-reproducible RRF results as data grows. To get the actual top 20, you must sort and limit in a subquery first, then apply ranking in the outer query.
WITH bm25_results AS (
-- Extract actual top 20 in subquery first, then assign ranks
SELECT id, ROW_NUMBER() OVER (ORDER BY score) AS rank
FROM (
SELECT id,
embedding <&> to_bm25query('doc_bm25_idx', tokenize(:query, 'bert_tok')) AS score
FROM documents
ORDER BY score
LIMIT 20
) top_bm25
),
vector_results AS (
-- Extract top 20 by vector similarity, assign ranks
SELECT id, ROW_NUMBER() OVER (ORDER BY score) AS rank
FROM (
SELECT id,
vec_embedding <=> :query_vector AS score
FROM documents
ORDER BY score
LIMIT 20
) top_vec
),
rrf AS (
-- FULL OUTER JOIN to include documents that appear in only one set
SELECT
COALESCE(b.id, v.id) AS id,
COALESCE(1.0 / (60 + b.rank), 0) +
COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
FROM bm25_results b
FULL OUTER JOIN vector_results v ON b.id = v.id
)
SELECT d.id, d.body, r.rrf_score
FROM rrf r
JOIN documents d ON d.id = r.id
ORDER BY r.rrf_score DESC
LIMIT 10;The key is using FULL OUTER JOIN. If a document didn't make the BM25 top 20 but ranked 1st in vector search, it gets included rather than discarded. COALESCE(..., 0) handles cases where a document appears in only one result set by treating its score from the missing method as 0.
Applying to a RAG Pipeline
In LLM-based RAG systems, retrieval quality directly impacts generation quality. What surprised me most when I first added hybrid search to this pipeline was how often vector-only search fetched irrelevant documents for queries where exact spelling matters — things like product version numbers or function names. Queries like "v2.3.1 migration issue" would return results mixed with "v2.2 upgrade guide" or "v3.0 changelog." After adding RRF, search results for these cases noticeably improved.
import asyncpg # Python async PostgreSQL driver
import numpy as np
# The output dimension of the embed function must match the dimension of the vec_embedding column
from your_embedding_model import embed
async def hybrid_search(query: str, top_k: int = 10) -> list[dict]:
query_vector = embed(query)
# In production, using asyncpg.create_pool() is recommended
conn = await asyncpg.connect(DATABASE_URL)
results = await conn.fetch("""
WITH bm25_results AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY score) AS rank
FROM (
SELECT id,
embedding <&> to_bm25query('doc_bm25_idx', tokenize($1, 'bert_tok')) AS score
FROM documents
ORDER BY score
LIMIT 20
) top_bm25
),
vector_results AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY score) AS rank
FROM (
SELECT id,
vec_embedding <=> $2 AS score
FROM documents
ORDER BY score
LIMIT 20
) top_vec
),
rrf AS (
SELECT COALESCE(b.id, v.id) AS id,
COALESCE(1.0 / (60 + b.rank), 0) +
COALESCE(1.0 / (60 + v.rank), 0) AS rrf_score
FROM bm25_results b
FULL OUTER JOIN vector_results v ON b.id = v.id
)
SELECT d.id, d.body, r.rrf_score
FROM rrf r JOIN documents d ON d.id = r.id
ORDER BY r.rrf_score DESC
LIMIT $3
""", query, query_vector.tolist(), top_k)
await conn.close()
return [dict(r) for r in results]When you feed the context retrieved this way into an LLM prompt, you can feel the difference in search quality compared to vector-only search — especially for queries involving proper nouns or version numbers.
Pros and Cons Analysis
Advantages
| Item | Details |
|---|---|
| Operational simplicity | Unified management of search + vector + relational data in a single PostgreSQL instance. No Elasticsearch sync pipeline needed |
| ACID guarantees | Transactional consistency by default. Elasticsearch's near-realtime indexing can introduce brief reflection delays |
| BM25 performance | ~3x higher QPS than Elasticsearch based on Block WeakAnd (official benchmark figures measured under specific hardware and data conditions) |
| Vector indexing speed | VectorChord completes a 100-million-vector index in under 20 minutes (official benchmark; reduces indexing time by tens of hours compared to pgvector's HNSW index) |
| SQL-friendly | Naturally combines with existing JOINs, filters, and aggregations. No separate query DSL required |
| Cost reduction | No need to operate a separate search engine cluster |
What is near-realtime? Elasticsearch has a default refresh interval of 1 second before a document becomes searchable after indexing. During this brief window, newly inserted data may not appear in search results. With PostgreSQL + vchord_bm25, data is searchable immediately upon transaction commit.
Disadvantages and Caveats
| Item | Details | Mitigation |
|---|---|---|
| Language support | Primarily validated for English. Non-English languages like Korean require separate tokenizer configuration | Explore multilingual BPE models in pg_tokenizer. Morpheme-analyzer-based options also exist |
| Maturity | Released March 2025; production case studies still accumulating | For high-criticality services, run in parallel with Elasticsearch and gradually shift traffic |
| BM25 semantic limitations | Cannot handle synonyms or contextual nuance | Always pair with vector search. For precision-critical cases, consider adding a cross-encoder reranker after RRF |
| Resource overhead | Maintaining both BM25 and vector indexes simultaneously increases storage and memory | Analyze actual query patterns and selectively configure indexes with low usage frequency |
| Large-scale limits | Scenarios requiring distributed indexing at hundreds of TB scale favor dedicated search engines | For workloads exceeding single-node limits, consider partitioning or dedicated distributed solutions in parallel |
Operations tip —
bm25_limitparameter: For queries that exceed the default result count limit, check thebm25_catalog.bm25_limitGUC parameter and adjust as needed.
The Most Common Mistakes in Practice
-
Forgetting that BM25 scores are negative and querying with
ORDER BY score DESC— The<&>operator returns negative values, soORDER BY score ASCis highest-relevance order. This is especially easy to mix up when using BM25 standalone without RRF. -
Using a different tokenizer for the query versus the index — This is exactly why
to_bm25querytakes an index name as its first argument. The index itself encodes the tokenizer, so using the same index name guarantees the same tokenizer is used at both indexing time and search time. If you pass a different tokenizer name totokenize()and connect it toto_bm25query, the scores become meaningless. -
Setting the LIMIT for RRF too small — If the LIMIT for BM25 and vector search individually is set too small, good results may already be cut off by the time RRF merges them. It is generally safe to retrieve 3–5 times the final desired result count at each stage.
Closing Thoughts
By combining VectorChord-BM25 with RRF, you can complete hybrid search within a single PostgreSQL stack without Elasticsearch. Schema sync pipelines, transaction inconsistency debugging, operational overhead bouncing between two systems — the core value of this combination is reducing all of that down to a single PostgreSQL instance.
Of course, if you need distributed search at hundreds of terabytes scale, or genuinely require Elasticsearch's mature ecosystem, a simple replacement is not feasible. But many production services operate complex dual systems without ever reaching that scale. For those teams, this combination is well worth evaluating.
Here are 3 steps to get started right now:
-
Set up the environment with Docker — Run
docker run --rm -e POSTGRES_PASSWORD=password -p 5432:5432 tensorchord/vchord-suite:pg18-latestto immediately spin up a PostgreSQL instance that includes VectorChord + BM25 + pg_tokenizer. -
Experiment by adding a BM25 index to an existing text column — If you have a PostgreSQL table already in production, you can add an
embedding bm25vectorcolumn, populate it withtokenize(body, 'bert_tok')::bm25vector, create a BM25 index, and directly compare results against your existingLIKEortsvectorsearches. However, running a bulk update afterALTER TABLE ... ADD COLUMNon a production DB can cause table locks, so it's recommended to validate on a test DB first. -
Replace the RRF query in your RAG pipeline — Swap the vector-only RAG retriever with the RRF query from the example above, and you can observe the change in search quality for queries involving proper nouns, version numbers, and code-related terms.
References
- VectorChord-BM25 Official GitHub | tensorchord
- VectorChord-BM25: Revolutionize PostgreSQL Search — 3x Faster than Elasticsearch | VectorChord Blog
- Hybrid Search with Postgres Native BM25 and VectorChord | VectorChord Docs
- Hybrid Search with Postgres Native BM25 and VectorChord | VectorChord Blog
- Bringing Search-Engine Ranking to PostgreSQL with VectorChord-BM25 | VectorChord Blog
- VectorChord-BM25: Introducing pg_tokenizer | VectorChord Blog
- Benchmark with Elasticsearch | VectorChord Docs
- Hybrid Search in PostgreSQL: The Missing Manual | ParadeDB
- BM25 in PostgreSQL: Full-Text Search Without Elastic | Tiger Data
- Elasticsearch's Hybrid Search, Now in Postgres (BM25 + Vector + RRF) | Tiger Data
- Hybrid Search in 100 Lines: BM25 + pgvector with RRF Merge | DEV Community
- BM25 Search in PostgreSQL: The Missing Piece for Hybrid Search | Pedro Alonso
- VectorChord-BM25 Official Documentation | EnterpriseDB
- What is Reciprocal Rank Fusion? | ParadeDB
- pg_tokenizer.rs GitHub | tensorchord