When Does pgvector Hit Its Performance Limits — pgvector vs VectorChord Benchmark Comparison
Reaching for pgvector first when building a RAG pipeline is almost a reflexive choice. It's already running on PostgreSQL. A single CREATE EXTENSION vector line does the job, and it's right there in the LangChain docs. But as data grew from hundreds of thousands toward tens of millions of records, query latency started creeping up. I still remember the surprise when I first checked how much RAM the HNSW index was consuming.
That's when I discovered VectorChord. Seeing the official benchmarks for the first time, my honest reaction was "is this for real?" — 16x faster index builds, 14x faster inserts, up to 5x higher QPS. Testing it myself, those numbers turned out to be legitimate. In this post, I'll walk through how each extension works under the hood, which index to choose at the 1-million-record mark, and the exact threshold for switching to VectorChord — with real code throughout.
This isn't an argument to ditch pgvector immediately. At under 1 million records, pgvector is the far more convenient and safe choice. This post is aimed at developers already using pgvector or building RAG pipelines. Once you understand where to draw the line and how VectorChord moves that line, your next architecture decision becomes much clearer.
Core Concepts
pgvector's Two Index Strategies
Installing pgvector itself is straightforward — PostgreSQL 14 or higher and you're ready to go.
-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Add vector column (1536-dim OpenAI embeddings)
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
);
-- IVFFlat index (recommended to create after some data is loaded)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
-- HNSW index (can be created on an empty table immediately)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);The two indexes have quite different trade-offs.
| Index | Build Speed | Memory Usage | Recall | Characteristics |
|---|---|---|---|---|
| IVFFlat | Fast | Low | Moderate | Divides data into clusters and searches only nearby ones |
| HNSW | Slow | High (RAM-resident) | High | Fast traversal via multi-layer graph, supports incremental inserts |
The HNSW trap: The HNSW index keeps everything in memory. Memory usage can be estimated as
vector count × dimensions × 4 bytes × graph overhead (1.2–1.5x)— for 5 million vectors at 1536 dimensions alone, that's a minimum of 36 GB. When I saw build times exceeding 3 hours even after raisingmaintenance_work_memas high as possible, it forced me to rethink this indexing strategy.
If you're already using pgvector: If IVFFlat and HNSW are already familiar, you can skip ahead to the VectorChord section.
How VectorChord Solves the Problem Differently: RaBitQ
VectorChord's core index type is vchordrq, combining IVF with RaBitQ. RaBitQ (Randomized Binary Quantization) is the game-changer here.
RaBitQ is a quantization technique that compresses high-dimensional vectors into compact bit representations. Instead of 32-bit floats, it stores bit-packed codes and approximates inner products via POPCOUNT bitwise operations during similarity computation — dramatically improving CPU efficiency while maintaining high recall. It's an algorithm published by NTU with theoretical error bound guarantees.
Comparing the query flow makes the difference even clearer.
| Stage | pgvector HNSW | VectorChord (IVF+RaBitQ) |
|---|---|---|
| Storage format | Full 32-bit floats | Compressed bit codes |
| Search method | Graph traversal | IVF cluster routing |
| Similarity computation | Float arithmetic | POPCOUNT bitwise ops |
| Memory requirement | Must reside in RAM | Disk-friendly |
| Index build | Slow | 16x faster than pgvector |
-- Create index after installing VectorChord
CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
-- vchordrq index (IVF + RaBitQ)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
residual_quantization = true
[build.internal]
lists = 1000
$$);As a note, starting with VectorChord 0.5, a DiskANN index type is also available experimentally alongside vchordrq. DiskANN performs graph traversal directly from disk, and on some datasets it shows higher QPS than IVF+RaBitQ — but production use is not yet recommended. When you see "DiskANN experimental stage" in the pros/cons table, that's what it refers to.
Practical Application
Example 1: RAG System Under 1 Million Records — pgvector HNSW
At this scale, pgvector HNSW is the most comfortable choice. It works out of the box on Supabase, Neon, AWS RDS, and GCP Cloud SQL, and integration docs for LangChain and LlamaIndex are plentiful.
import asyncpg
from pgvector.asyncpg import register_vector
from openai import AsyncOpenAI
client = AsyncOpenAI()
async def semantic_search(query: str, conn, top_k: int = 5):
# This registration step is required to use the pgvector type with asyncpg
# Skipping it causes type conversion errors
await register_vector(conn)
response = await client.embeddings.create(
input=query,
model="text-embedding-3-small"
)
query_embedding = response.data[0].embedding
rows = await conn.fetch(
"""
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT $2
""",
query_embedding,
top_k
)
return rows| Code Point | Description |
|---|---|
register_vector(conn) |
Registers the pgvector type with asyncpg. Omitting this causes an error |
<=> operator |
Cosine distance (1 - similarity) |
<-> operator |
L2 Euclidean distance |
<#> operator |
Inner product (negative inner product) |
$1::vector |
Explicit casting of asyncpg parameter to vector type |
HNSW search accuracy can be tuned with the ef_search parameter. Higher values improve recall but reduce speed.
-- Adjust ef_search per session (default 40)
SET hnsw.ef_search = 100;
-- Example with metadata filter
SELECT id, content, embedding <=> $1::vector AS distance
FROM documents
WHERE category = 'tech' AND created_at > '2024-01-01'
ORDER BY embedding <=> $1::vector
LIMIT 10;Watch out: Filters like
WHERE category = 'tech'can cause the index to not be used as expected. I was caught off guard the first time I added one and checkedEXPLAIN ANALYZE— the plan was nothing like what I anticipated. Always verify the execution plan when combining metadata filters with vector search.
Example 2: High-Load Environments with 50 Million+ Records — VectorChord vchordrq
At this scale, HNSW's memory problem becomes serious. Managing 100 million 768-dimensional vectors under HNSW requires hundreds of gigabytes for the index alone — and it all needs to be in RAM, so instance costs skyrocket. Translated to actual instance costs, the difference is jarring. VectorChord breaks through this with a disk-friendly design.
-- Create VectorChord index (large-scale configuration example)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
residual_quantization = true
[build.internal]
lists = 4096
spherical_centroids = false
$$);
-- Adjust search scope at query time
SET vchordrq.probes = 100; -- Number of clusters to probe (recall vs QPS trade-off)
-- Same SQL interface as pgvector
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;| Parameter | Role | Recommended Value |
|---|---|---|
lists |
Number of IVF clusters | Around sqrt(total vector count) |
probes |
Clusters to probe per query | lists * 0.05 for 0.95 recall target |
residual_quantization |
Enable residual quantization | true recommended in most cases |
Another advantage of VectorChord is hybrid search. Installing VectorChord-BM25 alongside it allows keyword+vector combined search entirely within PostgreSQL — no Elasticsearch required.
-- Install VectorChord-BM25
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;
-- Create BM25 index
CREATE INDEX ON documents USING bm25 (content bm25_ops);
-- Hybrid search: combining BM25 + vector similarity (verify function signatures in official docs)
SELECT
id,
content,
bm25_score(content, 'RAG pipeline') * 0.3 +
(1 - (embedding <=> $1::vector)) * 0.7 AS hybrid_score
FROM documents
ORDER BY hybrid_score DESC
LIMIT 10;Pros and Cons Analysis
Advantages
| Item | pgvector | VectorChord |
|---|---|---|
| Managed services | Full support on AWS RDS, GCP, Azure, Supabase, Neon | Requires self-installation; managed options limited |
| WAL/replication stability | Full PITR and physical replication support | WAL-based index not supported |
| Index build speed | Moderate | 16x faster than pgvector |
| Insert performance | Moderate | 14x faster than pgvector |
| Large-scale QPS | Drops sharply beyond 10M+ vectors | Maintains 131 QPS at 100M 768-dim vectors, 0.95 precision |
| Memory efficiency | HNSW must reside in RAM | Disk-friendly design, significant memory savings |
| Cost efficiency | Memory costs spike as scale grows | Stores 6x more vectors than Pinecone at the same cost |
| Community maturity | High, extensive documentation | Relatively smaller (growing rapidly) |
Disadvantages and Caveats
| Item | Detail | Mitigation |
|---|---|---|
| pgvector: large-scale performance degradation | QPS drops sharply beyond 10M records | Migrate to VectorChord or evaluate pgvectorscale |
| pgvector: HNSW memory | RAM costs spike as data grows | Estimate scale upfront; consider IVFFlat or VectorChord from the start |
| VectorChord: recovery constraints | WAL-based index not supported; index must be rebuilt after PITR | Account for index rebuild time in recovery planning |
| VectorChord: minimum memory requirement | Cannot build index in environments with less than 4 GB | Use a dedicated build instance or import after building externally |
| VectorChord: DiskANN status | Still experimental | Use vchordrq (IVF+RaBitQ) in production |
pgvectorscale: A StreamingDiskANN extension from Timescale built on top of pgvector, offering a middle-ground solution to HNSW's memory problem. It reportedly reduces p95 latency by 28x compared to pgvector on 50M Cohere embeddings. Worth evaluating first if you want improved performance while maintaining managed service compatibility.
The Most Common Mistakes in Practice
-
Defaulting to HNSW without forecasting data scale — "Go with HNSW for now and deal with it later" turns into a memory bomb at tens of millions of records. Strongly recommended: estimate your data scale one year out before settling on an index strategy.
-
Not checking the execution plan when combining metadata filters with vector search — Filters like
WHERE category = 'tech'can cause the index to not be used as expected. Always verify withEXPLAIN ANALYZE. -
Overlooking WAL replication constraints when migrating to VectorChord — In mission-critical environments with PITR requirements, VectorChord indexes must be rebuilt after recovery. This must be reflected in operational planning.
Closing Thoughts
Data scale and operational requirements are the core of index selection, and pgvector and VectorChord are tools that each solve different problems well. I still use pgvector HNSW as the default for small-scale services, and bring up VectorChord in a test environment once data growth projections become clear. The "change it when performance becomes a problem" approach is far more painful than setting your threshold in advance.
Three steps you can take right now:
-
Start by writing down your current vector count and projected scale in one year — Under 1 million records, stick with pgvector HNSW. If you expect to exceed 10 million, spin up VectorChord in a test environment. It's quick with Docker:
docker run --name vectorchord -e POSTGRES_PASSWORD=yourpwd -p 5432:5432 tensorchord/vchord-pg17-v0.4.3(check GitHub Releases for the latest version) -
The most reliable benchmark is one you run yourself — Use the benchmark scripts from the official VectorChord repository, or compare build times and query latency between the two indexes on a sample of 100K–1M records from your actual service data using
EXPLAIN ANALYZE. Seeing how the numbers shift in your own context makes the picture concrete. -
If you need hybrid search, the VectorChord-BM25 combination is the most practical choice — If you want to reduce the cost and complexity of maintaining a separate Elasticsearch cluster, consider installing
vchord_bm25alongside VectorChord to consolidate keyword+vector search into a single PostgreSQL stack.
References
- pgvector vs VectorChord Benchmark | VectorChord Official Docs
- pgvector Comparison FAQ | VectorChord Official Docs
- Benchmark FAQ | VectorChord Official Docs
- Vector Search over PostgreSQL: A Comparative Analysis of Memory and Disk Solutions | VectorChord Blog
- VectorChord 1.0 — 100x Faster Indexing | VectorChord Blog
- Achieving 10,000 QPS in PostgreSQL | VectorChord Blog
- Store 400K Vectors for $1 | VectorChord Blog
- RaBitQ-Powered DiskANN Index (VectorChord 0.5) | VectorChord Blog
- Introducing VectorChord-BM25 | VectorChord Blog
- GitHub — tensorchord/VectorChord
- GitHub — pgvector/pgvector
- pgvector IVFFlat & HNSW Deep Dive | AWS Blog
- The Trade-offs of pgvector | Qdrant Blog
- Independent Postgres as Vector DB Benchmark | Sean's Blog
- HNSW vs IVFFlat vs IVF_RaBitQ Comparison | kodesage