When Does pgvector Hit Its Performance Limits — pgvector vs VectorChord Benchmark Comparison

Reaching for pgvector first when building a RAG pipeline is almost a reflexive choice. It's already running on PostgreSQL. A single CREATE EXTENSION vector line does the job, and it's right there in the LangChain docs. But as data grew from hundreds of thousands toward tens of millions of records, query latency started creeping up. I still remember the surprise when I first checked how much RAM the HNSW index was consuming.

That's when I discovered VectorChord. Seeing the official benchmarks for the first time, my honest reaction was "is this for real?" — 16x faster index builds, 14x faster inserts, up to 5x higher QPS. Testing it myself, those numbers turned out to be legitimate. In this post, I'll walk through how each extension works under the hood, which index to choose at the 1-million-record mark, and the exact threshold for switching to VectorChord — with real code throughout.

This isn't an argument to ditch pgvector immediately. At under 1 million records, pgvector is the far more convenient and safe choice. This post is aimed at developers already using pgvector or building RAG pipelines. Once you understand where to draw the line and how VectorChord moves that line, your next architecture decision becomes much clearer.

Core Concepts

pgvector's Two Index Strategies

Installing pgvector itself is straightforward — PostgreSQL 14 or higher and you're ready to go.

sql

-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Add vector column (1536-dim OpenAI embeddings)
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)
);
 
-- IVFFlat index (recommended to create after some data is loaded)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
 
-- HNSW index (can be created on an empty table immediately)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

The two indexes have quite different trade-offs.

Index	Build Speed	Memory Usage	Recall	Characteristics
IVFFlat	Fast	Low	Moderate	Divides data into clusters and searches only nearby ones
HNSW	Slow	High (RAM-resident)	High	Fast traversal via multi-layer graph, supports incremental inserts

The HNSW trap: The HNSW index keeps everything in memory. Memory usage can be estimated as vector count × dimensions × 4 bytes × graph overhead (1.2–1.5x) — for 5 million vectors at 1536 dimensions alone, that's a minimum of 36 GB. When I saw build times exceeding 3 hours even after raising maintenance_work_mem as high as possible, it forced me to rethink this indexing strategy.

If you're already using pgvector: If IVFFlat and HNSW are already familiar, you can skip ahead to the VectorChord section.

How VectorChord Solves the Problem Differently: RaBitQ

VectorChord's core index type is vchordrq, combining IVF with RaBitQ. RaBitQ (Randomized Binary Quantization) is the game-changer here.

RaBitQ is a quantization technique that compresses high-dimensional vectors into compact bit representations. Instead of 32-bit floats, it stores bit-packed codes and approximates inner products via POPCOUNT bitwise operations during similarity computation — dramatically improving CPU efficiency while maintaining high recall. It's an algorithm published by NTU with theoretical error bound guarantees.

Comparing the query flow makes the difference even clearer.

Stage	pgvector HNSW	VectorChord (IVF+RaBitQ)
Storage format	Full 32-bit floats	Compressed bit codes
Search method	Graph traversal	IVF cluster routing
Similarity computation	Float arithmetic	POPCOUNT bitwise ops
Memory requirement	Must reside in RAM	Disk-friendly
Index build	Slow	16x faster than pgvector

sql

-- Create index after installing VectorChord
CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
 
-- vchordrq index (IVF + RaBitQ)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
  residual_quantization = true
  [build.internal]
  lists = 1000
$$);

As a note, starting with VectorChord 0.5, a DiskANN index type is also available experimentally alongside vchordrq. DiskANN performs graph traversal directly from disk, and on some datasets it shows higher QPS than IVF+RaBitQ — but production use is not yet recommended. When you see "DiskANN experimental stage" in the pros/cons table, that's what it refers to.

Practical Application

Example 1: RAG System Under 1 Million Records — pgvector HNSW

At this scale, pgvector HNSW is the most comfortable choice. It works out of the box on Supabase, Neon, AWS RDS, and GCP Cloud SQL, and integration docs for LangChain and LlamaIndex are plentiful.

python

import asyncpg
from pgvector.asyncpg import register_vector
from openai import AsyncOpenAI
 
client = AsyncOpenAI()
 
async def semantic_search(query: str, conn, top_k: int = 5):
    # This registration step is required to use the pgvector type with asyncpg
    # Skipping it causes type conversion errors
    await register_vector(conn)
 
    response = await client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embedding = response.data[0].embedding
 
    rows = await conn.fetch(
        """
        SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> $1::vector
        LIMIT $2
        """,
        query_embedding,
        top_k
    )
    return rows

Code Point	Description
`register_vector(conn)`	Registers the pgvector type with asyncpg. Omitting this causes an error
`<=>` operator	Cosine distance (1 - similarity)
`<->` operator	L2 Euclidean distance
`<#>` operator	Inner product (negative inner product)
`$1::vector`	Explicit casting of asyncpg parameter to vector type

HNSW search accuracy can be tuned with the ef_search parameter. Higher values improve recall but reduce speed.

sql

-- Adjust ef_search per session (default 40)
SET hnsw.ef_search = 100;
 
-- Example with metadata filter
SELECT id, content, embedding <=> $1::vector AS distance
FROM documents
WHERE category = 'tech' AND created_at > '2024-01-01'
ORDER BY embedding <=> $1::vector
LIMIT 10;

Watch out: Filters like WHERE category = 'tech' can cause the index to not be used as expected. I was caught off guard the first time I added one and checked EXPLAIN ANALYZE — the plan was nothing like what I anticipated. Always verify the execution plan when combining metadata filters with vector search.

Example 2: High-Load Environments with 50 Million+ Records — VectorChord vchordrq

At this scale, HNSW's memory problem becomes serious. Managing 100 million 768-dimensional vectors under HNSW requires hundreds of gigabytes for the index alone — and it all needs to be in RAM, so instance costs skyrocket. Translated to actual instance costs, the difference is jarring. VectorChord breaks through this with a disk-friendly design.

sql

-- Create VectorChord index (large-scale configuration example)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
  residual_quantization = true
  [build.internal]
  lists = 4096
  spherical_centroids = false
$$);
 
-- Adjust search scope at query time
SET vchordrq.probes = 100;  -- Number of clusters to probe (recall vs QPS trade-off)
 
-- Same SQL interface as pgvector
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

Parameter	Role	Recommended Value
`lists`	Number of IVF clusters	Around `sqrt(total vector count)`
`probes`	Clusters to probe per query	`lists * 0.05` for 0.95 recall target
`residual_quantization`	Enable residual quantization	`true` recommended in most cases

Another advantage of VectorChord is hybrid search. Installing VectorChord-BM25 alongside it allows keyword+vector combined search entirely within PostgreSQL — no Elasticsearch required.

sql

-- Install VectorChord-BM25
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;
 
-- Create BM25 index
CREATE INDEX ON documents USING bm25 (content bm25_ops);
 
-- Hybrid search: combining BM25 + vector similarity (verify function signatures in official docs)
SELECT
  id,
  content,
  bm25_score(content, 'RAG pipeline') * 0.3 +
  (1 - (embedding <=> $1::vector)) * 0.7 AS hybrid_score
FROM documents
ORDER BY hybrid_score DESC
LIMIT 10;

Pros and Cons Analysis

Advantages

Item	pgvector	VectorChord
Managed services	Full support on AWS RDS, GCP, Azure, Supabase, Neon	Requires self-installation; managed options limited
WAL/replication stability	Full PITR and physical replication support	WAL-based index not supported
Index build speed	Moderate	16x faster than pgvector
Insert performance	Moderate	14x faster than pgvector
Large-scale QPS	Drops sharply beyond 10M+ vectors	Maintains 131 QPS at 100M 768-dim vectors, 0.95 precision
Memory efficiency	HNSW must reside in RAM	Disk-friendly design, significant memory savings
Cost efficiency	Memory costs spike as scale grows	Stores 6x more vectors than Pinecone at the same cost
Community maturity	High, extensive documentation	Relatively smaller (growing rapidly)

Disadvantages and Caveats

Item	Detail	Mitigation
pgvector: large-scale performance degradation	QPS drops sharply beyond 10M records	Migrate to VectorChord or evaluate pgvectorscale
pgvector: HNSW memory	RAM costs spike as data grows	Estimate scale upfront; consider IVFFlat or VectorChord from the start
VectorChord: recovery constraints	WAL-based index not supported; index must be rebuilt after PITR	Account for index rebuild time in recovery planning
VectorChord: minimum memory requirement	Cannot build index in environments with less than 4 GB	Use a dedicated build instance or import after building externally
VectorChord: DiskANN status	Still experimental	Use vchordrq (IVF+RaBitQ) in production

pgvectorscale: A StreamingDiskANN extension from Timescale built on top of pgvector, offering a middle-ground solution to HNSW's memory problem. It reportedly reduces p95 latency by 28x compared to pgvector on 50M Cohere embeddings. Worth evaluating first if you want improved performance while maintaining managed service compatibility.

The Most Common Mistakes in Practice

Defaulting to HNSW without forecasting data scale — "Go with HNSW for now and deal with it later" turns into a memory bomb at tens of millions of records. Strongly recommended: estimate your data scale one year out before settling on an index strategy.
Not checking the execution plan when combining metadata filters with vector search — Filters like WHERE category = 'tech' can cause the index to not be used as expected. Always verify with EXPLAIN ANALYZE.
Overlooking WAL replication constraints when migrating to VectorChord — In mission-critical environments with PITR requirements, VectorChord indexes must be rebuilt after recovery. This must be reflected in operational planning.

Closing Thoughts

Data scale and operational requirements are the core of index selection, and pgvector and VectorChord are tools that each solve different problems well. I still use pgvector HNSW as the default for small-scale services, and bring up VectorChord in a test environment once data growth projections become clear. The "change it when performance becomes a problem" approach is far more painful than setting your threshold in advance.

Three steps you can take right now:

Start by writing down your current vector count and projected scale in one year — Under 1 million records, stick with pgvector HNSW. If you expect to exceed 10 million, spin up VectorChord in a test environment. It's quick with Docker: docker run --name vectorchord -e POSTGRES_PASSWORD=yourpwd -p 5432:5432 tensorchord/vchord-pg17-v0.4.3 (check GitHub Releases for the latest version)
The most reliable benchmark is one you run yourself — Use the benchmark scripts from the official VectorChord repository, or compare build times and query latency between the two indexes on a sample of 100K–1M records from your actual service data using EXPLAIN ANALYZE. Seeing how the numbers shift in your own context makes the picture concrete.
If you need hybrid search, the VectorChord-BM25 combination is the most practical choice — If you want to reduce the cost and complexity of maintaining a separate Elasticsearch cluster, consider installing vchord_bm25 alongside VectorChord to consolidate keyword+vector search into a single PostgreSQL stack.

References

#pgvector#VectorChord#RAG#벡터검색#HNSW#IVFFlat#RaBitQ#PostgreSQL#임베딩#하이브리드검색

When Does pgvector Hit Its Performance Limits — pgvector vs VectorChord Benchmark Comparison | DEV BAK - 기술블로그

Database

When Does pgvector Hit Its Performance Limits — pgvector vs VectorChord Benchmark Comparison

Core Concepts

pgvector's Two Index Strategies

Installing pgvector itself is straightforward — PostgreSQL 14 or higher and you're ready to go.

sql

-- Install extension
CREATE EXTENSION IF NOT EXISTS vector;
 
-- Add vector column (1536-dim OpenAI embeddings)
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)
);
 
-- IVFFlat index (recommended to create after some data is loaded)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
 
-- HNSW index (can be created on an empty table immediately)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64);

The two indexes have quite different trade-offs.

Index	Build Speed	Memory Usage	Recall	Characteristics
IVFFlat	Fast	Low	Moderate	Divides data into clusters and searches only nearby ones
HNSW	Slow	High (RAM-resident)	High	Fast traversal via multi-layer graph, supports incremental inserts

The HNSW trap: The HNSW index keeps everything in memory. Memory usage can be estimated as vector count × dimensions × 4 bytes × graph overhead (1.2–1.5x) — for 5 million vectors at 1536 dimensions alone, that's a minimum of 36 GB. When I saw build times exceeding 3 hours even after raising maintenance_work_mem as high as possible, it forced me to rethink this indexing strategy.

If you're already using pgvector: If IVFFlat and HNSW are already familiar, you can skip ahead to the VectorChord section.

How VectorChord Solves the Problem Differently: RaBitQ

VectorChord's core index type is vchordrq, combining IVF with RaBitQ. RaBitQ (Randomized Binary Quantization) is the game-changer here.

RaBitQ is a quantization technique that compresses high-dimensional vectors into compact bit representations. Instead of 32-bit floats, it stores bit-packed codes and approximates inner products via POPCOUNT bitwise operations during similarity computation — dramatically improving CPU efficiency while maintaining high recall. It's an algorithm published by NTU with theoretical error bound guarantees.

Comparing the query flow makes the difference even clearer.

Stage	pgvector HNSW	VectorChord (IVF+RaBitQ)
Storage format	Full 32-bit floats	Compressed bit codes
Search method	Graph traversal	IVF cluster routing
Similarity computation	Float arithmetic	POPCOUNT bitwise ops
Memory requirement	Must reside in RAM	Disk-friendly
Index build	Slow	16x faster than pgvector

sql

-- Create index after installing VectorChord
CREATE EXTENSION IF NOT EXISTS vchord CASCADE;
 
-- vchordrq index (IVF + RaBitQ)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
  residual_quantization = true
  [build.internal]
  lists = 1000
$$);

Practical Application

Example 1: RAG System Under 1 Million Records — pgvector HNSW

At this scale, pgvector HNSW is the most comfortable choice. It works out of the box on Supabase, Neon, AWS RDS, and GCP Cloud SQL, and integration docs for LangChain and LlamaIndex are plentiful.

python

import asyncpg
from pgvector.asyncpg import register_vector
from openai import AsyncOpenAI
 
client = AsyncOpenAI()
 
async def semantic_search(query: str, conn, top_k: int = 5):
    # This registration step is required to use the pgvector type with asyncpg
    # Skipping it causes type conversion errors
    await register_vector(conn)
 
    response = await client.embeddings.create(
        input=query,
        model="text-embedding-3-small"
    )
    query_embedding = response.data[0].embedding
 
    rows = await conn.fetch(
        """
        SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
        FROM documents
        ORDER BY embedding <=> $1::vector
        LIMIT $2
        """,
        query_embedding,
        top_k
    )
    return rows

Code Point	Description
`register_vector(conn)`	Registers the pgvector type with asyncpg. Omitting this causes an error
`<=>` operator	Cosine distance (1 - similarity)
`<->` operator	L2 Euclidean distance
`<#>` operator	Inner product (negative inner product)
`$1::vector`	Explicit casting of asyncpg parameter to vector type

HNSW search accuracy can be tuned with the ef_search parameter. Higher values improve recall but reduce speed.

sql

-- Adjust ef_search per session (default 40)
SET hnsw.ef_search = 100;
 
-- Example with metadata filter
SELECT id, content, embedding <=> $1::vector AS distance
FROM documents
WHERE category = 'tech' AND created_at > '2024-01-01'
ORDER BY embedding <=> $1::vector
LIMIT 10;

Watch out: Filters like WHERE category = 'tech' can cause the index to not be used as expected. I was caught off guard the first time I added one and checked EXPLAIN ANALYZE — the plan was nothing like what I anticipated. Always verify the execution plan when combining metadata filters with vector search.

Example 2: High-Load Environments with 50 Million+ Records — VectorChord vchordrq

sql

-- Create VectorChord index (large-scale configuration example)
CREATE INDEX ON documents USING vchordrq (embedding vector_cosine_ops)
WITH (options = $$
  residual_quantization = true
  [build.internal]
  lists = 4096
  spherical_centroids = false
$$);
 
-- Adjust search scope at query time
SET vchordrq.probes = 100;  -- Number of clusters to probe (recall vs QPS trade-off)
 
-- Same SQL interface as pgvector
SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector
LIMIT 10;

Parameter	Role	Recommended Value
`lists`	Number of IVF clusters	Around `sqrt(total vector count)`
`probes`	Clusters to probe per query	`lists * 0.05` for 0.95 recall target
`residual_quantization`	Enable residual quantization	`true` recommended in most cases

Another advantage of VectorChord is hybrid search. Installing VectorChord-BM25 alongside it allows keyword+vector combined search entirely within PostgreSQL — no Elasticsearch required.

sql

-- Install VectorChord-BM25
CREATE EXTENSION IF NOT EXISTS vchord_bm25 CASCADE;
 
-- Create BM25 index
CREATE INDEX ON documents USING bm25 (content bm25_ops);
 
-- Hybrid search: combining BM25 + vector similarity (verify function signatures in official docs)
SELECT
  id,
  content,
  bm25_score(content, 'RAG pipeline') * 0.3 +
  (1 - (embedding <=> $1::vector)) * 0.7 AS hybrid_score
FROM documents
ORDER BY hybrid_score DESC
LIMIT 10;

Pros and Cons Analysis

Advantages

Item	pgvector	VectorChord
Managed services	Full support on AWS RDS, GCP, Azure, Supabase, Neon	Requires self-installation; managed options limited
WAL/replication stability	Full PITR and physical replication support	WAL-based index not supported
Index build speed	Moderate	16x faster than pgvector
Insert performance	Moderate	14x faster than pgvector
Large-scale QPS	Drops sharply beyond 10M+ vectors	Maintains 131 QPS at 100M 768-dim vectors, 0.95 precision
Memory efficiency	HNSW must reside in RAM	Disk-friendly design, significant memory savings
Cost efficiency	Memory costs spike as scale grows	Stores 6x more vectors than Pinecone at the same cost
Community maturity	High, extensive documentation	Relatively smaller (growing rapidly)

Disadvantages and Caveats

Item	Detail	Mitigation
pgvector: large-scale performance degradation	QPS drops sharply beyond 10M records	Migrate to VectorChord or evaluate pgvectorscale
pgvector: HNSW memory	RAM costs spike as data grows	Estimate scale upfront; consider IVFFlat or VectorChord from the start
VectorChord: recovery constraints	WAL-based index not supported; index must be rebuilt after PITR	Account for index rebuild time in recovery planning
VectorChord: minimum memory requirement	Cannot build index in environments with less than 4 GB	Use a dedicated build instance or import after building externally
VectorChord: DiskANN status	Still experimental	Use vchordrq (IVF+RaBitQ) in production

pgvectorscale: A StreamingDiskANN extension from Timescale built on top of pgvector, offering a middle-ground solution to HNSW's memory problem. It reportedly reduces p95 latency by 28x compared to pgvector on 50M Cohere embeddings. Worth evaluating first if you want improved performance while maintaining managed service compatibility.

The Most Common Mistakes in Practice

Defaulting to HNSW without forecasting data scale — "Go with HNSW for now and deal with it later" turns into a memory bomb at tens of millions of records. Strongly recommended: estimate your data scale one year out before settling on an index strategy.
Not checking the execution plan when combining metadata filters with vector search — Filters like WHERE category = 'tech' can cause the index to not be used as expected. Always verify with EXPLAIN ANALYZE.
Overlooking WAL replication constraints when migrating to VectorChord — In mission-critical environments with PITR requirements, VectorChord indexes must be rebuilt after recovery. This must be reflected in operational planning.

Closing Thoughts

Three steps you can take right now:

Start by writing down your current vector count and projected scale in one year — Under 1 million records, stick with pgvector HNSW. If you expect to exceed 10 million, spin up VectorChord in a test environment. It's quick with Docker: docker run --name vectorchord -e POSTGRES_PASSWORD=yourpwd -p 5432:5432 tensorchord/vchord-pg17-v0.4.3 (check GitHub Releases for the latest version)
The most reliable benchmark is one you run yourself — Use the benchmark scripts from the official VectorChord repository, or compare build times and query latency between the two indexes on a sample of 100K–1M records from your actual service data using EXPLAIN ANALYZE. Seeing how the numbers shift in your own context makes the picture concrete.
If you need hybrid search, the VectorChord-BM25 combination is the most practical choice — If you want to reduce the cost and complexity of maintaining a separate Elasticsearch cluster, consider installing vchord_bm25 alongside VectorChord to consolidate keyword+vector search into a single PostgreSQL stack.

References

#pgvector#VectorChord#RAG#벡터검색#HNSW#IVFFlat#RaBitQ#PostgreSQL#임베딩#하이브리드검색

Core Concepts

pgvector's Two Index Strategies

How VectorChord Solves the Problem Differently: RaBitQ

Practical Application

Example 1: RAG System Under 1 Million Records — pgvector HNSW

Example 2: High-Load Environments with 50 Million+ Records — VectorChord vchordrq

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

pgvector's Two Index Strategies

How VectorChord Solves the Problem Differently: RaBitQ

Practical Application

Example 1: RAG System Under 1 Million Records — pgvector HNSW

Example 2: High-Load Environments with 50 Million+ Records — VectorChord vchordrq

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Replacing Elasticsearch with PostgreSQL Alone: Hybrid Search with VectorChord-BM25 and RRF

Up to 10 Million Vectors with PostgreSQL Alone — Eliminating the Vector DB with pgvector·pgrag·pgai

Tens of Millions of Vectors in RAG on a Single PostgreSQL with pgvectorscale's StreamingDiskANN and Vector Compression — 28x Faster Than Pinecone

How PostgreSQL 18's `uuidv7()` and `uuid_extract_timestamp()` Speed Up Time-Series Queries by 44% Without `created_at`

ULID vs UUIDv7: The Standard, Encoding, and DB Compatibility Differences That Determine Your Primary Key Choice

Switching to UUIDv7 Cut PostgreSQL Index I/O by 312x — How It Reduces B-tree Page Splits by 500x