Google Docs Chose OT, Figma Chose CRDT — Why the Conflict Resolution Approach Determines Your Entire Real-Time Collaboration Architecture

When you set out to build real-time collaboration yourself, the question "what happens when two people edit the same spot at the same time?" ends up driving every architectural decision. I used to think "last writer wins, right?" — but once I actually implemented it, I spent days chasing a bug where content got mangled by a single index offset.

Google Docs and Figma are both collaboration tools used by millions simultaneously, yet their approaches to resolving conflicts are completely different. One adopted OT (Operational Transformation), the other adopted the ideas behind CRDT (Conflict-free Replicated Data Type). These two approaches aren't answers to the question "which is better?" — they're answers to different questions that naturally diverge based on a product's data structure and infrastructure characteristics. Does it need offline support? Does it handle non-text structures? Does a central server already see every operation? These judgments determine the algorithm choice, and that choice cascades through every subsequent design decision.

If you're planning to build real-time collaboration yourself, or you've already built it and want to understand why you're seeing unexpected state inconsistencies, here's a summary that should help.

Core Concepts

Why Conflicts Happen — The Problem of Shifting Indices

Consider a scenario where two people edit a text document simultaneously.

sql

Initial document: "hello"
A: Insert "!" at index 5 → "hello!"
B: Insert "?" at index 5 → "hello?"

What happens if you apply both operations sequentially as-is?

Apply A first: "hello!" (length 6)
Then apply B (index 5 unchanged): "hello?!" ← not what was intended
Expected result: "hello!?" or "hello?!"

What B wanted was to append "?" after "hello," but since A's insertion already filled index 5 with "!", the position is off. How you handle this misalignment is where OT and CRDT diverge.

What Convergence Means

Before diving deeper, let me clarify a term that comes up frequently. Convergence is the property whereby all clients eventually reach the same state. Regardless of the order operations arrive, regardless of which client applied them first, the final document must be identical for everyone. OT and CRDT differ in how they guarantee this convergence.

OT — The Server Transforms Operations to Achieve Convergence

The idea behind OT is: "let's transform B's operation to account for the fact that A's operation has already been applied." You keep the position as-is but adjust it to reflect the effects of the other operation.

typescript

// Simplified OT transform function example
function transformInsert(op1: InsertOp, op2: InsertOp): InsertOp {
  if (op2.position > op1.position) {
    return { ...op2, position: op2.position + op1.text.length };
  } else if (op2.position === op1.position) {
    // Same-position conflict: real OT implementations determine priority
    // using server receive order or logical clocks;
    // clientId is used here as a simplification for illustration
    return op1.clientId < op2.clientId
      ? { ...op2, position: op2.position + op1.text.length }
      : op2;
  }
  return op2;
}

Here's the flow on the server side.

typescript

// Node.js + ShareDB style (simplified)
server.on('submit', (agent, op) => {
  const pendingOps = db.getOpsSince(op.v); // ops the client doesn't know about yet
  const transformed = pendingOps.reduce(
    (acc, serverOp) => transform(acc, serverOp),
    op
  );
  db.commit(transformed);  // store the transformed operation
  broadcast(transformed);  // distribute to all clients
});

Key point: In OT, the server maintains a history of operations. When a new operation arrives, it transforms it against the operations applied in the interim and then broadcasts it. The server always has final authority over ordering.

To understand why OT transform functions are tricky to implement, you need to know the TP1 and TP2 conditions. TP1 requires that transforming two operations in either order yields the same result. TP2 requires that consistency is maintained when multiple operations are composed. Even with just text insertions and deletions, implementing a transform function that fully satisfies these conditions is difficult — ACM CSCW 2020 research found violations of these conditions in existing OT algorithm implementations.

CRDT — The Data Structure Embeds the Merge Rules

CRDT takes a fundamentally different approach. It discards the notion of "position" and assigns each character a unique ID.

typescript

// CRDT (RGA, Replicated Growable Array style) character insertion
interface CRDTChar {
  id: string;             // unique identifier (e.g., "clientA:3")
  value: string;
  afterId: string | null; // relationship: "comes after this character"
}
 
// "hello" represented as CRDT
const doc: CRDTChar[] = [
  { id: "s:1", value: "h", afterId: null },
  { id: "s:2", value: "e", afterId: "s:1" },
  { id: "s:3", value: "l", afterId: "s:2" },
  { id: "s:4", value: "l", afterId: "s:3" },
  { id: "s:5", value: "o", afterId: "s:4" },
];
 
// A inserts "!": "insert after s:5" (id: "A:6")
// B inserts "?": "insert after s:5" (id: "B:6")
// When two characters share the same afterId, sort lexicographically by id
// "A:6" < "B:6" → order: o → ! → ? → "hello!?"
 
function merge(chars: CRDTChar[]): string {
  // sort by afterId, breaking ties at the same position by lexicographic id comparison
  return topoSort(chars).map(c => c.value).join('');
}

Key point: In CRDT, "insert at position 3" doesn't exist. Position is expressed as a relationship — "insert after ID s:5" — so no matter which client merges in what order, the result is mathematically guaranteed to converge.

Core Differences Between the Two Paradigms

	OT	CRDT
Conflict resolution authority	Server (central coordination)	The data structure itself
Position representation	Index (0, 1, 2...)	Relationship (ID-based)
Server dependency	Required	Not required (converges without one)
Offline support	Not possible	Auto-merges on reconnect
Memory overhead	Low	High (per-character metadata)
Core implementation complexity	Satisfying TP1·TP2 conditions in transform functions	Merge algorithm built-in (libraries available)

Real-World Application

Example 1: Why Google Docs Chose OT — The Server Already Sees Everything

Honestly, Google didn't choose OT because "OT is superior." It was because OT was a structurally natural fit given Google's infrastructure characteristics.

Google's servers have to receive every operation anyway. Access control (ACL), version history, rendering, storage — all of it flows through the server. The added cost of transforming operations there is under 5ms, and in return you get a compact document with no metadata bloat.

sql

User A (insert "x" at index 3)
         ↓
    Google Server (process A's op: rev 15)
         ↓
User B's op arrives (insert "y" at index 4, based on rev 14)
    → transform("insert y at 4", rev 14→15 diff)
    → transformed to "insert y at 5"
         ↓
    Broadcast to all clients

What if they'd used CRDT? Every character would need an ID and causal metadata attached. In an uncompressed RGA implementation, a 100,000-character document can balloon from 100KB of raw content to several MB of metadata alone. Libraries like Yjs keep this within practical bounds through internal compression, but without that, the overhead is substantial. On top of that, deleted characters persist as tombstones, and memory pressure accumulates as the document grows.

Linear text structure, a central server already in place, no need for offline support — when all three conditions overlap, OT has almost no downsides.

Example 2: Why Figma Chose the CRDT Approach — "This Isn't a Text Editor"

Figma initially evaluated OT and abandoned it.

yaml

Figma document structure (simplified)
└── Frame A
    ├── Rectangle (x: 100, y: 200, width: 300)
    ├── Text "Hello" (font-size: 16, color: #333)
    └── Group
        ├── Circle (r: 50)
        └── Image (src: "...")

OT transform functions were designed for index-based text insertions and deletions. But a Figma document is a nested tree structure with a far wider variety of operations. When "set Rectangle's x to 100" and "set Rectangle's width to 200" arrive simultaneously, the number of cases to handle to satisfy TP1·TP2 in the transform functions explodes. That complexity is unmanageable at startup speed.

What Figma adopted was a hybrid that borrows ideas from CRDT.

typescript

// Figma approach simplified — per-property LWW (Last Write Wins)
interface FigmaOperation {
  nodeId: string;
  property: string;
  value: unknown;
  timestamp: number;   // logical timestamp
  clientId: string;
}
 
function mergeProperties(
  ops: FigmaOperation[]
): Record<string, FigmaOperation> {
  return ops.reduce((acc, op) => {
    const key = `${op.nodeId}.${op.property}`;
    if (!acc[key] || acc[key].timestamp < op.timestamp) {
      acc[key] = op;
    }
    return acc;
  }, {} as Record<string, FigmaOperation>);
}

LWW (Last Write Wins): When a conflict occurs on the same property, the value with the most recent timestamp wins. If two people simultaneously change the same layer's color, one person's choice will inevitably overwrite the other's — and in a design tool, "the person who changed it last" is a fairly natural outcome.

It's not a full P2P CRDT. A server still exists and still makes authoritative ordering decisions. However, by simplifying the conflict resolution logic to per-property CRDT-style merge rules, the startup was able to ship multiplayer functionality quickly without writing complex OT transform functions.

Building It Yourself: Yjs

The most practical CRDT choice today is Yjs. 9 million weekly downloads, official bindings for ProseMirror, Quill, and Monaco — it plugs directly into most editors.

typescript

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { QuillBinding } from 'y-quill'
 
// Create a CRDT document
const ydoc = new Y.Doc()
 
// Connect peers over WebSocket (server acts as a simple relay)
const provider = new WebsocketProvider(
  'wss://your-server.com',
  'room-name',
  ydoc
)
 
// Bind to the Quill editor
const ytext = ydoc.getText('quill')
const binding = new QuillBinding(ytext, quill, provider.awareness)
 
// Concurrent edits are handled automatically,
// and offline edits merge automatically on reconnect

If you'd implemented OT from scratch, you'd have had to write the transform functions, server-side history management, and version vector handling yourself. For rapid prototyping or small teams, Yjs is the overwhelmingly faster starting point.

Pros and Cons

Here's a side-by-side comparison of OT and CRDT.

	OT	CRDT
Memory efficiency	Keeps the document at its original size	Can balloon to several times the size due to metadata (mitigated by library compression)
Offline support	Not possible	Auto-merges on reconnect
P2P suitability	Impossible without a server	Peers can sync directly without a server
Server dependency	Required, single point of failure	Optional (can be simplified to a relay server)
Implementation complexity	Satisfying TP1·TP2 conditions in transform functions is hard	Merge algorithm built-in, libraries easy to use
Conflict predictability	Server decides → deterministic	Based on merge rules → indirectly predictable
Non-text structures	Transform function explosion for tree/graph structures	Extends naturally with LWW etc.

The memory overhead row in particular is something people tend to underestimate when first evaluating CRDT. Yjs keeps it within practical bounds thanks to internal compression, but the tombstone accumulation from deleted elements is a real concern in long-running systems.

Tombstone: In CRDT, deleted elements aren't actually removed — they're marked as "deleted" and left in place. This is necessary because other peers may still hold references to deleted elements, but it causes document size to grow indefinitely over time. Periodic snapshotting and garbage collection strategies are needed alongside this mechanism.

The Most Common Mistakes in Practice

Assuming CRDT means you don't need a server — Both Figma and Notion maintain servers. "It can work without a server" is a possibility, not a signal that you can eliminate the server in a real service. Access control, backup, and authentication still require a server.
Choosing OT when your structure isn't text — Trying to implement OT for nested trees, graphs, or object property synchronization causes transform function combinations to explode. For non-text data, CRDT or LWW is far more natural.
Using Yjs but ignoring awareness — Yjs's awareness API shares cursor positions and user presence ("who is currently editing where"). Leaving it out makes a collaboration tool feel like a single-user tool. Setting it up takes five lines of code.

Closing Thoughts

The more important question isn't which algorithm is superior — it's which one naturally fits your product's data structure and infrastructure characteristics.

If I were building a new collaboration feature, here's how I'd approach it:

If offline support is needed, start with Yjs — Starting with pnpm add yjs y-websocket is the fastest path. For editor integration, pick from y-prosemirror, y-quill, or y-codemirror based on your current stack.
If you have a server-centric architecture, evaluate ShareDB — It lets you implement OT-based real-time editing relatively quickly in a Node.js environment, with official MongoDB adapter support (pnpm add sharedb).
For complex object trees or non-text data, try LWW first — Start with per-property timestamp comparisons à la Figma, observe what conflict cases actually arise in the wild, then refine incrementally.

References

Google Docs Chose OT, Figma Chose CRDT — Why the Conflict Resolution Approach Determines Your Entire Real-Time Collaboration Architecture | DEV BAK - 기술블로그

Architecture

Google Docs Chose OT, Figma Chose CRDT — Why the Conflict Resolution Approach Determines Your Entire Real-Time Collaboration Architecture

If you're planning to build real-time collaboration yourself, or you've already built it and want to understand why you're seeing unexpected state inconsistencies, here's a summary that should help.

Core Concepts

Why Conflicts Happen — The Problem of Shifting Indices

Consider a scenario where two people edit a text document simultaneously.

sql

Initial document: "hello"
A: Insert "!" at index 5 → "hello!"
B: Insert "?" at index 5 → "hello?"

What happens if you apply both operations sequentially as-is?

Apply A first: "hello!" (length 6)
Then apply B (index 5 unchanged): "hello?!" ← not what was intended
Expected result: "hello!?" or "hello?!"

What B wanted was to append "?" after "hello," but since A's insertion already filled index 5 with "!", the position is off. How you handle this misalignment is where OT and CRDT diverge.

What Convergence Means

OT — The Server Transforms Operations to Achieve Convergence

typescript

// Simplified OT transform function example
function transformInsert(op1: InsertOp, op2: InsertOp): InsertOp {
  if (op2.position > op1.position) {
    return { ...op2, position: op2.position + op1.text.length };
  } else if (op2.position === op1.position) {
    // Same-position conflict: real OT implementations determine priority
    // using server receive order or logical clocks;
    // clientId is used here as a simplification for illustration
    return op1.clientId < op2.clientId
      ? { ...op2, position: op2.position + op1.text.length }
      : op2;
  }
  return op2;
}

Here's the flow on the server side.

typescript

// Node.js + ShareDB style (simplified)
server.on('submit', (agent, op) => {
  const pendingOps = db.getOpsSince(op.v); // ops the client doesn't know about yet
  const transformed = pendingOps.reduce(
    (acc, serverOp) => transform(acc, serverOp),
    op
  );
  db.commit(transformed);  // store the transformed operation
  broadcast(transformed);  // distribute to all clients
});

Key point: In OT, the server maintains a history of operations. When a new operation arrives, it transforms it against the operations applied in the interim and then broadcasts it. The server always has final authority over ordering.

CRDT — The Data Structure Embeds the Merge Rules

CRDT takes a fundamentally different approach. It discards the notion of "position" and assigns each character a unique ID.

typescript

// CRDT (RGA, Replicated Growable Array style) character insertion
interface CRDTChar {
  id: string;             // unique identifier (e.g., "clientA:3")
  value: string;
  afterId: string | null; // relationship: "comes after this character"
}
 
// "hello" represented as CRDT
const doc: CRDTChar[] = [
  { id: "s:1", value: "h", afterId: null },
  { id: "s:2", value: "e", afterId: "s:1" },
  { id: "s:3", value: "l", afterId: "s:2" },
  { id: "s:4", value: "l", afterId: "s:3" },
  { id: "s:5", value: "o", afterId: "s:4" },
];
 
// A inserts "!": "insert after s:5" (id: "A:6")
// B inserts "?": "insert after s:5" (id: "B:6")
// When two characters share the same afterId, sort lexicographically by id
// "A:6" < "B:6" → order: o → ! → ? → "hello!?"
 
function merge(chars: CRDTChar[]): string {
  // sort by afterId, breaking ties at the same position by lexicographic id comparison
  return topoSort(chars).map(c => c.value).join('');
}

Key point: In CRDT, "insert at position 3" doesn't exist. Position is expressed as a relationship — "insert after ID s:5" — so no matter which client merges in what order, the result is mathematically guaranteed to converge.

Core Differences Between the Two Paradigms

	OT	CRDT
Conflict resolution authority	Server (central coordination)	The data structure itself
Position representation	Index (0, 1, 2...)	Relationship (ID-based)
Server dependency	Required	Not required (converges without one)
Offline support	Not possible	Auto-merges on reconnect
Memory overhead	Low	High (per-character metadata)
Core implementation complexity	Satisfying TP1·TP2 conditions in transform functions	Merge algorithm built-in (libraries available)

Real-World Application

Example 1: Why Google Docs Chose OT — The Server Already Sees Everything

Honestly, Google didn't choose OT because "OT is superior." It was because OT was a structurally natural fit given Google's infrastructure characteristics.

sql

User A (insert "x" at index 3)
         ↓
    Google Server (process A's op: rev 15)
         ↓
User B's op arrives (insert "y" at index 4, based on rev 14)
    → transform("insert y at 4", rev 14→15 diff)
    → transformed to "insert y at 5"
         ↓
    Broadcast to all clients

Linear text structure, a central server already in place, no need for offline support — when all three conditions overlap, OT has almost no downsides.

Example 2: Why Figma Chose the CRDT Approach — "This Isn't a Text Editor"

Figma initially evaluated OT and abandoned it.

yaml

Figma document structure (simplified)
└── Frame A
    ├── Rectangle (x: 100, y: 200, width: 300)
    ├── Text "Hello" (font-size: 16, color: #333)
    └── Group
        ├── Circle (r: 50)
        └── Image (src: "...")

What Figma adopted was a hybrid that borrows ideas from CRDT.

typescript

// Figma approach simplified — per-property LWW (Last Write Wins)
interface FigmaOperation {
  nodeId: string;
  property: string;
  value: unknown;
  timestamp: number;   // logical timestamp
  clientId: string;
}
 
function mergeProperties(
  ops: FigmaOperation[]
): Record<string, FigmaOperation> {
  return ops.reduce((acc, op) => {
    const key = `${op.nodeId}.${op.property}`;
    if (!acc[key] || acc[key].timestamp < op.timestamp) {
      acc[key] = op;
    }
    return acc;
  }, {} as Record<string, FigmaOperation>);
}

LWW (Last Write Wins): When a conflict occurs on the same property, the value with the most recent timestamp wins. If two people simultaneously change the same layer's color, one person's choice will inevitably overwrite the other's — and in a design tool, "the person who changed it last" is a fairly natural outcome.

Building It Yourself: Yjs

The most practical CRDT choice today is Yjs. 9 million weekly downloads, official bindings for ProseMirror, Quill, and Monaco — it plugs directly into most editors.

typescript

import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { QuillBinding } from 'y-quill'
 
// Create a CRDT document
const ydoc = new Y.Doc()
 
// Connect peers over WebSocket (server acts as a simple relay)
const provider = new WebsocketProvider(
  'wss://your-server.com',
  'room-name',
  ydoc
)
 
// Bind to the Quill editor
const ytext = ydoc.getText('quill')
const binding = new QuillBinding(ytext, quill, provider.awareness)
 
// Concurrent edits are handled automatically,
// and offline edits merge automatically on reconnect

Pros and Cons

Here's a side-by-side comparison of OT and CRDT.

	OT	CRDT
Memory efficiency	Keeps the document at its original size	Can balloon to several times the size due to metadata (mitigated by library compression)
Offline support	Not possible	Auto-merges on reconnect
P2P suitability	Impossible without a server	Peers can sync directly without a server
Server dependency	Required, single point of failure	Optional (can be simplified to a relay server)
Implementation complexity	Satisfying TP1·TP2 conditions in transform functions is hard	Merge algorithm built-in, libraries easy to use
Conflict predictability	Server decides → deterministic	Based on merge rules → indirectly predictable
Non-text structures	Transform function explosion for tree/graph structures	Extends naturally with LWW etc.

Tombstone: In CRDT, deleted elements aren't actually removed — they're marked as "deleted" and left in place. This is necessary because other peers may still hold references to deleted elements, but it causes document size to grow indefinitely over time. Periodic snapshotting and garbage collection strategies are needed alongside this mechanism.

The Most Common Mistakes in Practice

Assuming CRDT means you don't need a server — Both Figma and Notion maintain servers. "It can work without a server" is a possibility, not a signal that you can eliminate the server in a real service. Access control, backup, and authentication still require a server.
Choosing OT when your structure isn't text — Trying to implement OT for nested trees, graphs, or object property synchronization causes transform function combinations to explode. For non-text data, CRDT or LWW is far more natural.
Using Yjs but ignoring awareness — Yjs's awareness API shares cursor positions and user presence ("who is currently editing where"). Leaving it out makes a collaboration tool feel like a single-user tool. Setting it up takes five lines of code.

Closing Thoughts

The more important question isn't which algorithm is superior — it's which one naturally fits your product's data structure and infrastructure characteristics.

If I were building a new collaboration feature, here's how I'd approach it:

If offline support is needed, start with Yjs — Starting with pnpm add yjs y-websocket is the fastest path. For editor integration, pick from y-prosemirror, y-quill, or y-codemirror based on your current stack.
If you have a server-centric architecture, evaluate ShareDB — It lets you implement OT-based real-time editing relatively quickly in a Node.js environment, with official MongoDB adapter support (pnpm add sharedb).
For complex object trees or non-text data, try LWW first — Start with per-property timestamp comparisons à la Figma, observe what conflict cases actually arise in the wild, then refine incrementally.

Core Concepts

Why Conflicts Happen — The Problem of Shifting Indices

What Convergence Means

OT — The Server Transforms Operations to Achieve Convergence

CRDT — The Data Structure Embeds the Merge Rules

Core Differences Between the Two Paradigms

Real-World Application

Example 1: Why Google Docs Chose OT — The Server Already Sees Everything

Example 2: Why Figma Chose the CRDT Approach — "This Isn't a Text Editor"

Building It Yourself: Yjs

Pros and Cons

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

Why Conflicts Happen — The Problem of Shifting Indices

What Convergence Means

OT — The Server Transforms Operations to Achieve Convergence

CRDT — The Data Structure Embeds the Merge Rules

Core Differences Between the Two Paradigms

Real-World Application

Example 1: Why Google Docs Chose OT — The Server Already Sees Everything

Example 2: Why Figma Chose the CRDT Approach — "This Isn't a Text Editor"

Building It Yourself: Yjs

Pros and Cons

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

Yjs Collaborative Editor Architecture: What I Learned Building Awareness, Offline Persistence, and Server Relay from Scratch

Horizontally Scaling a Yjs Collaboration Server with Hocuspocus + Redis: Sticky Session and Document Persistence Strategies

Flow Engineering: From LLM Workflows to Organizational Architecture, How to Design Flow

How to Auto-Merge Concurrent Edit Conflicts in Real-Time Collaborative Apps with CRDT and LWW

Platform Engineering and Internal Developer Platforms: How Backstage and Golden Paths Enable Developer Self-Service

The Reality of Sidecar-Less Service Mesh: How eBPF Replaces Istio Sidecars