AI Code Review That Reasons Over the Entire Repository Beyond PR Diffs — How Codebase Semantic Graphs Catch Cross-File Bugs

If you've ever used AI for code review, you've probably run into this situation at least once: the AI finds nothing wrong inside the modified function, but after deployment, a type error blows up in a completely different file. This is a structural limitation of diff-based tools. Because they only look at the changed lines, they can't see when the callers of that function break.

Greptile tackles this problem head-on. The approach itself is different. Instead of handing the diff to an LLM when a PR is opened, it builds the entire repository into a knowledge graph of functions, classes, modules, and dependencies — all connected — and then reasons about the ripple effects of changes on top of that graph. This article examines how Greptile models a codebase as a graph, and how that approach concretely differs from traditional diff analysis, with specific code examples. This is the first installment in a series on AI code review tools.

Let's dig into what Greptile actually does differently.

Core Concepts

Viewing the Codebase as a Knowledge Graph, Not a Collection of Files

Traditional diff analysis tools pass the changed lines in a PR as LLM context. It's fast and simple, but it tells you nothing about what role the modified function plays in the system as a whole. It's like cutting out a single paragraph from a book and asking, "What does this sentence mean in the context of the whole story?"

The Codebase Semantic Graph that Greptile builds transforms the entire repository into a node-edge structure connected by function call relationships, import dependencies, and pattern similarities. When a change occurs, the AI can traverse the graph to reason about "how far does this change reach."

The Indexing Pipeline: How Code Becomes a Graph

When a repository is first connected, Greptile goes through four stages to build the graph. Each stage exists to address the limitations of the previous one.

Stage 1 — AST Parsing

The entire codebase is transformed into Abstract Syntax Trees (ASTs) and decomposed into functions, variables, and classes. This is the first step toward treating code as structure rather than a blob of text.

python

# Conceptual example (pseudocode) — may differ from the actual tree-sitter API
def parse_to_ast(source_code: str, language: str):
    # Load language-specific parser
    parser = get_parser(language)
    tree = parser.parse(source_code)
    # Result: a structured tree of function names, parameters, return types, and call relationships
    return extract_nodes(tree.root_node)

Stage 2 — Natural Language Conversion

ASTs alone aren't enough. Because each language has different syntax — Python's def versus TypeScript's function — the same logic can end up far apart in embedding space. Greptile absorbs this noise by recursively generating natural language descriptions (docstrings) for each AST node. According to internal measurements published by the Greptile team on their blog, natural language descriptions improve vector embedding similarity by approximately 12 percentage points compared to raw code.

Stage 3 — Dense Vector Embedding

The generated natural language summaries are chunked at the function level and converted into embedding vectors. These vectors are stored in a vector-specialized database (Chroma, Pinecone, etc. — databases optimized for fast similarity search over high-dimensional vectors).

Stage 4 — Graph Construction

Function call relationships and import dependencies are extracted directly from the code structure, while pattern similarity is determined by connecting nodes whose cosine similarity between the embedding vectors from Stage 3 exceeds a certain threshold. These three types of edges are combined to form the final graph structure.

Graph RAG (Graph Retrieval-Augmented Generation): While standard RAG retrieves similar text chunks via simple vector search, Graph RAG enables multi-hop traversal by following connections between nodes. The key difference is that it can find not just "code similar to this function," but "all code that depends on this function."

Practical Applications

Example 1: A Payment Logic Change Hiding a Cross-File Contract Violation

This is a situation frequently encountered in practice. A PR modifies the tax calculation function in a payment service, and the logic inside the function looks clean. Judging by the diff alone, it's "LGTM." When I first saw this example, I completely missed the invoice.service.ts side.

typescript

// tax.service.ts — modified function
// Before: calculateTax(amount: number): number
// After: calculateTax(amount: number, region: string): TaxResult
 
interface TaxResult {
  amount: number;
  rate: number;
  breakdown: Record<string, number>;
}
 
export function calculateTax(amount: number, region: string): TaxResult {
  const rate = getTaxRate(region);
  return {
    amount: amount * rate,
    rate,
    breakdown: { base: amount * rate },
  };
}

A diff analysis tool only checks whether the logic inside tax.service.ts is correct. But Greptile traverses the graph to trace every node that calls this function.

typescript

// invoice.service.ts — the caller (not included in the diff)
// ❌ Assumes the return type is number — causes a runtime error
const tax = calculateTax(invoice.amount); // missing region, return type mismatch
const total = invoice.amount + tax; // tries to add TaxResult to number, resulting in NaN

Analysis Tool	Error inside `tax.service.ts`	Contract violation in `invoice.service.ts`	Missing parameter in `order.controller.ts`
Diff-based	Detectable	Not detectable	Not detectable
Greptile	Detectable	Detectable	Detectable

Example 2: Implicit Interface Change in a Shared Library During a Refactoring PR

A PR described as a "simple refactor" that quietly changes the public interface of a shared library.

typescript

// shared/validators.ts — before refactoring
export function validateEmail(email: string): boolean {
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
}
 
// After refactoring — error handling changed to throw
export function validateEmail(email: string): void {
  if (!/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
    throw new ValidationError("Invalid email format");
  }
}

The return type changed from boolean to void, and the failure behavior shifted from return false to throw. A diff analysis tool reads the changes within this file just fine. But it can't see the other services consuming this function.

typescript

// auth.service.ts — the caller (not included in the diff)
// ❌ Code that assumes a boolean return — always behaves as truthy after refactoring
if (validateEmail(input)) {
  await createUser(input); // users get created even with invalid emails
}

Greptile finds all nodes in the graph that consume this function and leaves a comment warning that "the contract has been broken."

Example 3: Multi-Hop Investigation with the v3 Agent

Starting from Greptile v3, the agent autonomously goes through multiple steps of investigation beyond simple graph searches. The following is a conceptual diagram of how the agent performs this process.

sql

Detected an abnormal discount rate calculation in the PR
  ↓
Trace back through git history → found relevant commit
  ↓
Read original PR description for that commit → "hotfix: per specific client request"
  ↓
Search codebase for similar discount calculation patterns
  ↓
"3 other discount logic instances use a different approach. Possible consistency issue."

Multi-hop Investigation: A traversal approach that starts from a single question and continues the next search based on intermediate results. To answer "why does this function look like this," it autonomously explores git history → PR description → similar patterns in sequence.

The Decisive Difference Between Diff Analysis and Semantic Graphs

Through the examples above, you've seen directly what each tool does differently. To summarize:

Item	Traditional Diff Analysis	Semantic Graph (Greptile)
Scope of analysis	Changed lines (line diff)	Entire repository
Cross-file context	Within changed files	Traces entire call chain
Git history utilization	Limited	Multi-hop traceability
Review speed	Fast (seconds)	Relatively slower
False positives	Low	Relatively higher
Initial setup cost	None	Indexing time required

Pros and Cons Analysis

Advantages

The figures below are based on benchmarks self-published by Greptile, measured in internal testing environments.

Item	Details
Bug detection rate	82% vs. competitors' 44–54%. High-risk bugs detected at 100% (competitors: 36–57%)
Cross-file context	Traces entire call chains, import dependencies, and pattern similarities
Architectural regression detection	Catches interface contract violations hidden in "clean" diffs
Git history utilization	Enables judgments that reference the historical context of changes
Ecosystem integration	Provided as an MCP server, directly callable by AI agents such as Claude and Cursor

Disadvantages and Caveats

The most painful issue on this list in practice was the false positive problem. In the first two weeks, there were so many warnings that team members started muting the review notifications. Similar experiences were shared in the Greptile community, and independent benchmarks also confirmed the gap in false positive rates numerically — Greptile logged 11 cases versus CodeRabbit's 2.

Item	Details	Mitigation
High false positive rate	11 cases in independent benchmarks (CodeRabbit: 2)	Configure team rule-based filters, gradually adjust thresholds
Initial indexing cost	Minutes to hours depending on repository size	Run initial indexing overnight, outside the CI pipeline
Semantic vs. structural dependencies	May miss call relationships where function signatures differ	Use alongside static analysis tools like TypeScript strict mode
Codebase exposure	Entire repository is sent to a cloud service	Use on-premises option, review security policy in advance
Operational cost	Agent-based analysis incurs many LLM calls, raising costs	Set analysis depth by PR size, apply only to critical branches

False Positive: When AI incorrectly identifies code that has no actual problem as a bug or risk. Too many false positives cause developers to start ignoring review notifications, and ultimately even genuinely critical warnings get buried. This is exactly why you need to check the false positive rate alongside the detection rate.

The Most Common Mistakes in Practice

Abandoning the tool immediately when there are too many false positives. In the early stages, warnings unrelated to team conventions can flood in because the index hasn't yet learned enough of the codebase's implicit rules. After feeding back team rules for about 2–4 weeks, signal quality noticeably improves.
Assuming the semantic graph replaces static analysis tools. Structural errors caught by type checkers and linters, and architectural context caught by semantic graphs, are complementary. It's best to use TypeScript strict mode + ESLint + Greptile as a combination where each covers a different layer.
Applying it to the entire repository all at once. In monorepos (a pattern where multiple services or packages are managed together in a single repository) or large-scale repositories, the initial indexing cost and false positive volume both grow simultaneously. A practical approach is to first apply it to core domain modules with high change impact — like payments and authentication — and then gradually expand the scope.

Closing Thoughts

AI code review is shifting from a problem of "reading changed lines well" to one of "understanding the entire system and reasoning about the ripple effects of changes," and the semantic graph approach is currently one of the most concrete implementations in this direction. Competing tools like CodeRabbit and GitHub Copilot Code Review are also rapidly advancing in the same direction.

There are real-world tradeoffs in false positive rates and indexing costs, but the value of catching cross-file contract violations or architectural regressions before deployment becomes increasingly clear as team size grows.

Where in your repository would you connect first? Here are 3 steps you can start right now.

You can check the current plans on the official website and try connecting one side project or staging repository. After installing the GitHub app and completing the initial indexing, it's worth seeing firsthand what kind of cross-file comments appear on your next PR.
You can collect past cross-file bug cases and use them as retrospective tests. Having your team jointly verify "would this bug have been caught in a Greptile comment?" gives you the practical evidence you need for an adoption decision.
Integrating the MCP server to run codebase queries directly from Claude or Cursor is also a great option. Getting a feel for the practical value of semantic graphs through everyday development tasks — like "find everywhere this function is called" — before expanding into review automation is a natural progression.

References

#AI코드리뷰#시맨틱그래프#GraphRAG#AST#크로스파일분석#벡터임베딩#멀티홉탐색#TypeScript#정적분석#LLM

AI Code Review That Reasons Over the Entire Repository Beyond PR Diffs — How Codebase Semantic Graphs Catch Cross-File Bugs | DEV BAK - 기술블로그

AI Code Review That Reasons Over the Entire Repository Beyond PR Diffs — How Codebase Semantic Graphs Catch Cross-File Bugs

Let's dig into what Greptile actually does differently.

Core Concepts

Viewing the Codebase as a Knowledge Graph, Not a Collection of Files

The Indexing Pipeline: How Code Becomes a Graph

When a repository is first connected, Greptile goes through four stages to build the graph. Each stage exists to address the limitations of the previous one.

Stage 1 — AST Parsing

python

# Conceptual example (pseudocode) — may differ from the actual tree-sitter API
def parse_to_ast(source_code: str, language: str):
    # Load language-specific parser
    parser = get_parser(language)
    tree = parser.parse(source_code)
    # Result: a structured tree of function names, parameters, return types, and call relationships
    return extract_nodes(tree.root_node)

Stage 2 — Natural Language Conversion

Stage 3 — Dense Vector Embedding

Stage 4 — Graph Construction

Graph RAG (Graph Retrieval-Augmented Generation): While standard RAG retrieves similar text chunks via simple vector search, Graph RAG enables multi-hop traversal by following connections between nodes. The key difference is that it can find not just "code similar to this function," but "all code that depends on this function."

Practical Applications

Example 1: A Payment Logic Change Hiding a Cross-File Contract Violation

typescript

// tax.service.ts — modified function
// Before: calculateTax(amount: number): number
// After: calculateTax(amount: number, region: string): TaxResult
 
interface TaxResult {
  amount: number;
  rate: number;
  breakdown: Record<string, number>;
}
 
export function calculateTax(amount: number, region: string): TaxResult {
  const rate = getTaxRate(region);
  return {
    amount: amount * rate,
    rate,
    breakdown: { base: amount * rate },
  };
}

A diff analysis tool only checks whether the logic inside tax.service.ts is correct. But Greptile traverses the graph to trace every node that calls this function.

typescript

// invoice.service.ts — the caller (not included in the diff)
// ❌ Assumes the return type is number — causes a runtime error
const tax = calculateTax(invoice.amount); // missing region, return type mismatch
const total = invoice.amount + tax; // tries to add TaxResult to number, resulting in NaN

Analysis Tool	Error inside `tax.service.ts`	Contract violation in `invoice.service.ts`	Missing parameter in `order.controller.ts`
Diff-based	Detectable	Not detectable	Not detectable
Greptile	Detectable	Detectable	Detectable

Example 2: Implicit Interface Change in a Shared Library During a Refactoring PR

A PR described as a "simple refactor" that quietly changes the public interface of a shared library.

typescript

// shared/validators.ts — before refactoring
export function validateEmail(email: string): boolean {
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
}
 
// After refactoring — error handling changed to throw
export function validateEmail(email: string): void {
  if (!/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email)) {
    throw new ValidationError("Invalid email format");
  }
}

typescript

// auth.service.ts — the caller (not included in the diff)
// ❌ Code that assumes a boolean return — always behaves as truthy after refactoring
if (validateEmail(input)) {
  await createUser(input); // users get created even with invalid emails
}

Greptile finds all nodes in the graph that consume this function and leaves a comment warning that "the contract has been broken."

Example 3: Multi-Hop Investigation with the v3 Agent

sql

Detected an abnormal discount rate calculation in the PR
  ↓
Trace back through git history → found relevant commit
  ↓
Read original PR description for that commit → "hotfix: per specific client request"
  ↓
Search codebase for similar discount calculation patterns
  ↓
"3 other discount logic instances use a different approach. Possible consistency issue."

Multi-hop Investigation: A traversal approach that starts from a single question and continues the next search based on intermediate results. To answer "why does this function look like this," it autonomously explores git history → PR description → similar patterns in sequence.

The Decisive Difference Between Diff Analysis and Semantic Graphs

Through the examples above, you've seen directly what each tool does differently. To summarize:

Item	Traditional Diff Analysis	Semantic Graph (Greptile)
Scope of analysis	Changed lines (line diff)	Entire repository
Cross-file context	Within changed files	Traces entire call chain
Git history utilization	Limited	Multi-hop traceability
Review speed	Fast (seconds)	Relatively slower
False positives	Low	Relatively higher
Initial setup cost	None	Indexing time required

Pros and Cons Analysis

Advantages

The figures below are based on benchmarks self-published by Greptile, measured in internal testing environments.

Item	Details
Bug detection rate	82% vs. competitors' 44–54%. High-risk bugs detected at 100% (competitors: 36–57%)
Cross-file context	Traces entire call chains, import dependencies, and pattern similarities
Architectural regression detection	Catches interface contract violations hidden in "clean" diffs
Git history utilization	Enables judgments that reference the historical context of changes
Ecosystem integration	Provided as an MCP server, directly callable by AI agents such as Claude and Cursor

Disadvantages and Caveats

Item	Details	Mitigation
High false positive rate	11 cases in independent benchmarks (CodeRabbit: 2)	Configure team rule-based filters, gradually adjust thresholds
Initial indexing cost	Minutes to hours depending on repository size	Run initial indexing overnight, outside the CI pipeline
Semantic vs. structural dependencies	May miss call relationships where function signatures differ	Use alongside static analysis tools like TypeScript strict mode
Codebase exposure	Entire repository is sent to a cloud service	Use on-premises option, review security policy in advance
Operational cost	Agent-based analysis incurs many LLM calls, raising costs	Set analysis depth by PR size, apply only to critical branches

False Positive: When AI incorrectly identifies code that has no actual problem as a bug or risk. Too many false positives cause developers to start ignoring review notifications, and ultimately even genuinely critical warnings get buried. This is exactly why you need to check the false positive rate alongside the detection rate.

The Most Common Mistakes in Practice

Abandoning the tool immediately when there are too many false positives. In the early stages, warnings unrelated to team conventions can flood in because the index hasn't yet learned enough of the codebase's implicit rules. After feeding back team rules for about 2–4 weeks, signal quality noticeably improves.
Assuming the semantic graph replaces static analysis tools. Structural errors caught by type checkers and linters, and architectural context caught by semantic graphs, are complementary. It's best to use TypeScript strict mode + ESLint + Greptile as a combination where each covers a different layer.
Applying it to the entire repository all at once. In monorepos (a pattern where multiple services or packages are managed together in a single repository) or large-scale repositories, the initial indexing cost and false positive volume both grow simultaneously. A practical approach is to first apply it to core domain modules with high change impact — like payments and authentication — and then gradually expand the scope.

Closing Thoughts

Where in your repository would you connect first? Here are 3 steps you can start right now.

You can check the current plans on the official website and try connecting one side project or staging repository. After installing the GitHub app and completing the initial indexing, it's worth seeing firsthand what kind of cross-file comments appear on your next PR.
You can collect past cross-file bug cases and use them as retrospective tests. Having your team jointly verify "would this bug have been caught in a Greptile comment?" gives you the practical evidence you need for an adoption decision.
Integrating the MCP server to run codebase queries directly from Claude or Cursor is also a great option. Getting a feel for the practical value of semantic graphs through everyday development tasks — like "find everywhere this function is called" — before expanding into review automation is a natural progression.

References

#AI코드리뷰#시맨틱그래프#GraphRAG#AST#크로스파일분석#벡터임베딩#멀티홉탐색#TypeScript#정적분석#LLM

Core Concepts

Viewing the Codebase as a Knowledge Graph, Not a Collection of Files

The Indexing Pipeline: How Code Becomes a Graph

Practical Applications

Example 1: A Payment Logic Change Hiding a Cross-File Contract Violation

Example 2: Implicit Interface Change in a Shared Library During a Refactoring PR

Example 3: Multi-Hop Investigation with the v3 Agent

The Decisive Difference Between Diff Analysis and Semantic Graphs

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Viewing the Codebase as a Knowledge Graph, Not a Collection of Files

The Indexing Pipeline: How Code Becomes a Graph

Practical Applications

Example 1: A Payment Logic Change Hiding a Cross-File Contract Violation

Example 2: Implicit Interface Change in a Shared Library During a Refactoring PR

Example 3: Multi-Hop Investigation with the v3 Agent

The Decisive Difference Between Diff Analysis and Semantic Graphs

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Mastra: TypeScript AI Agent Framework — Type-Safe Agent Design and Production Deployment

How to Connect Parallel Execution, Human-in-the-Loop, and Multi-Agent with Mastra Workflow in a Single TypeScript File

How a TypeScript AI Agent Maintains Conversational Context Across Sessions — Designing Mastra's Memory Layer

Why AI Is Blocking Your PR Reviews — Clearing the Bottleneck with Tools, Process, and Architecture

Oh My OpenCode (oh-my-openagent) Configuration That Cuts Multi-Agent AI Coding API Costs to ~$11/Month with Category Routing

OpenCode Multi-Provider Model Routing Strategy That Cuts Your Monthly AI Coding Agent Bill by 40%+