Google Docs Chose OT, Figma Chose CRDT — Why the Conflict Resolution Approach Determines Your Entire Real-Time Collaboration Architecture
When you set out to build real-time collaboration yourself, the question "what happens when two people edit the same spot at the same time?" ends up driving every architectural decision. I used to think "last writer wins, right?" — but once I actually implemented it, I spent days chasing a bug where content got mangled by a single index offset.
Google Docs and Figma are both collaboration tools used by millions simultaneously, yet their approaches to resolving conflicts are completely different. One adopted OT (Operational Transformation), the other adopted the ideas behind CRDT (Conflict-free Replicated Data Type). These two approaches aren't answers to the question "which is better?" — they're answers to different questions that naturally diverge based on a product's data structure and infrastructure characteristics. Does it need offline support? Does it handle non-text structures? Does a central server already see every operation? These judgments determine the algorithm choice, and that choice cascades through every subsequent design decision.
If you're planning to build real-time collaboration yourself, or you've already built it and want to understand why you're seeing unexpected state inconsistencies, here's a summary that should help.
Core Concepts
Why Conflicts Happen — The Problem of Shifting Indices
Consider a scenario where two people edit a text document simultaneously.
Initial document: "hello"
A: Insert "!" at index 5 → "hello!"
B: Insert "?" at index 5 → "hello?"What happens if you apply both operations sequentially as-is?
Apply A first: "hello!" (length 6)
Then apply B (index 5 unchanged): "hello?!" ← not what was intended
Expected result: "hello!?" or "hello?!"What B wanted was to append "?" after "hello," but since A's insertion already filled index 5 with "!", the position is off. How you handle this misalignment is where OT and CRDT diverge.
What Convergence Means
Before diving deeper, let me clarify a term that comes up frequently. Convergence is the property whereby all clients eventually reach the same state. Regardless of the order operations arrive, regardless of which client applied them first, the final document must be identical for everyone. OT and CRDT differ in how they guarantee this convergence.
OT — The Server Transforms Operations to Achieve Convergence
The idea behind OT is: "let's transform B's operation to account for the fact that A's operation has already been applied." You keep the position as-is but adjust it to reflect the effects of the other operation.
// Simplified OT transform function example
function transformInsert(op1: InsertOp, op2: InsertOp): InsertOp {
if (op2.position > op1.position) {
return { ...op2, position: op2.position + op1.text.length };
} else if (op2.position === op1.position) {
// Same-position conflict: real OT implementations determine priority
// using server receive order or logical clocks;
// clientId is used here as a simplification for illustration
return op1.clientId < op2.clientId
? { ...op2, position: op2.position + op1.text.length }
: op2;
}
return op2;
}Here's the flow on the server side.
// Node.js + ShareDB style (simplified)
server.on('submit', (agent, op) => {
const pendingOps = db.getOpsSince(op.v); // ops the client doesn't know about yet
const transformed = pendingOps.reduce(
(acc, serverOp) => transform(acc, serverOp),
op
);
db.commit(transformed); // store the transformed operation
broadcast(transformed); // distribute to all clients
});Key point: In OT, the server maintains a history of operations. When a new operation arrives, it transforms it against the operations applied in the interim and then broadcasts it. The server always has final authority over ordering.
To understand why OT transform functions are tricky to implement, you need to know the TP1 and TP2 conditions. TP1 requires that transforming two operations in either order yields the same result. TP2 requires that consistency is maintained when multiple operations are composed. Even with just text insertions and deletions, implementing a transform function that fully satisfies these conditions is difficult — ACM CSCW 2020 research found violations of these conditions in existing OT algorithm implementations.
CRDT — The Data Structure Embeds the Merge Rules
CRDT takes a fundamentally different approach. It discards the notion of "position" and assigns each character a unique ID.
// CRDT (RGA, Replicated Growable Array style) character insertion
interface CRDTChar {
id: string; // unique identifier (e.g., "clientA:3")
value: string;
afterId: string | null; // relationship: "comes after this character"
}
// "hello" represented as CRDT
const doc: CRDTChar[] = [
{ id: "s:1", value: "h", afterId: null },
{ id: "s:2", value: "e", afterId: "s:1" },
{ id: "s:3", value: "l", afterId: "s:2" },
{ id: "s:4", value: "l", afterId: "s:3" },
{ id: "s:5", value: "o", afterId: "s:4" },
];
// A inserts "!": "insert after s:5" (id: "A:6")
// B inserts "?": "insert after s:5" (id: "B:6")
// When two characters share the same afterId, sort lexicographically by id
// "A:6" < "B:6" → order: o → ! → ? → "hello!?"
function merge(chars: CRDTChar[]): string {
// sort by afterId, breaking ties at the same position by lexicographic id comparison
return topoSort(chars).map(c => c.value).join('');
}Key point: In CRDT, "insert at position 3" doesn't exist. Position is expressed as a relationship — "insert after ID
s:5" — so no matter which client merges in what order, the result is mathematically guaranteed to converge.
Core Differences Between the Two Paradigms
| OT | CRDT | |
|---|---|---|
| Conflict resolution authority | Server (central coordination) | The data structure itself |
| Position representation | Index (0, 1, 2...) | Relationship (ID-based) |
| Server dependency | Required | Not required (converges without one) |
| Offline support | Not possible | Auto-merges on reconnect |
| Memory overhead | Low | High (per-character metadata) |
| Core implementation complexity | Satisfying TP1·TP2 conditions in transform functions | Merge algorithm built-in (libraries available) |
Real-World Application
Example 1: Why Google Docs Chose OT — The Server Already Sees Everything
Honestly, Google didn't choose OT because "OT is superior." It was because OT was a structurally natural fit given Google's infrastructure characteristics.
Google's servers have to receive every operation anyway. Access control (ACL), version history, rendering, storage — all of it flows through the server. The added cost of transforming operations there is under 5ms, and in return you get a compact document with no metadata bloat.
User A (insert "x" at index 3)
↓
Google Server (process A's op: rev 15)
↓
User B's op arrives (insert "y" at index 4, based on rev 14)
→ transform("insert y at 4", rev 14→15 diff)
→ transformed to "insert y at 5"
↓
Broadcast to all clientsWhat if they'd used CRDT? Every character would need an ID and causal metadata attached. In an uncompressed RGA implementation, a 100,000-character document can balloon from 100KB of raw content to several MB of metadata alone. Libraries like Yjs keep this within practical bounds through internal compression, but without that, the overhead is substantial. On top of that, deleted characters persist as tombstones, and memory pressure accumulates as the document grows.
Linear text structure, a central server already in place, no need for offline support — when all three conditions overlap, OT has almost no downsides.
Example 2: Why Figma Chose the CRDT Approach — "This Isn't a Text Editor"
Figma initially evaluated OT and abandoned it.
Figma document structure (simplified)
└── Frame A
├── Rectangle (x: 100, y: 200, width: 300)
├── Text "Hello" (font-size: 16, color: #333)
└── Group
├── Circle (r: 50)
└── Image (src: "...")OT transform functions were designed for index-based text insertions and deletions. But a Figma document is a nested tree structure with a far wider variety of operations. When "set Rectangle's x to 100" and "set Rectangle's width to 200" arrive simultaneously, the number of cases to handle to satisfy TP1·TP2 in the transform functions explodes. That complexity is unmanageable at startup speed.
What Figma adopted was a hybrid that borrows ideas from CRDT.
// Figma approach simplified — per-property LWW (Last Write Wins)
interface FigmaOperation {
nodeId: string;
property: string;
value: unknown;
timestamp: number; // logical timestamp
clientId: string;
}
function mergeProperties(
ops: FigmaOperation[]
): Record<string, FigmaOperation> {
return ops.reduce((acc, op) => {
const key = `${op.nodeId}.${op.property}`;
if (!acc[key] || acc[key].timestamp < op.timestamp) {
acc[key] = op;
}
return acc;
}, {} as Record<string, FigmaOperation>);
}LWW (Last Write Wins): When a conflict occurs on the same property, the value with the most recent timestamp wins. If two people simultaneously change the same layer's color, one person's choice will inevitably overwrite the other's — and in a design tool, "the person who changed it last" is a fairly natural outcome.
It's not a full P2P CRDT. A server still exists and still makes authoritative ordering decisions. However, by simplifying the conflict resolution logic to per-property CRDT-style merge rules, the startup was able to ship multiplayer functionality quickly without writing complex OT transform functions.
Building It Yourself: Yjs
The most practical CRDT choice today is Yjs. 9 million weekly downloads, official bindings for ProseMirror, Quill, and Monaco — it plugs directly into most editors.
import * as Y from 'yjs'
import { WebsocketProvider } from 'y-websocket'
import { QuillBinding } from 'y-quill'
// Create a CRDT document
const ydoc = new Y.Doc()
// Connect peers over WebSocket (server acts as a simple relay)
const provider = new WebsocketProvider(
'wss://your-server.com',
'room-name',
ydoc
)
// Bind to the Quill editor
const ytext = ydoc.getText('quill')
const binding = new QuillBinding(ytext, quill, provider.awareness)
// Concurrent edits are handled automatically,
// and offline edits merge automatically on reconnectIf you'd implemented OT from scratch, you'd have had to write the transform functions, server-side history management, and version vector handling yourself. For rapid prototyping or small teams, Yjs is the overwhelmingly faster starting point.
Pros and Cons
Here's a side-by-side comparison of OT and CRDT.
| OT | CRDT | |
|---|---|---|
| Memory efficiency | Keeps the document at its original size | Can balloon to several times the size due to metadata (mitigated by library compression) |
| Offline support | Not possible | Auto-merges on reconnect |
| P2P suitability | Impossible without a server | Peers can sync directly without a server |
| Server dependency | Required, single point of failure | Optional (can be simplified to a relay server) |
| Implementation complexity | Satisfying TP1·TP2 conditions in transform functions is hard | Merge algorithm built-in, libraries easy to use |
| Conflict predictability | Server decides → deterministic | Based on merge rules → indirectly predictable |
| Non-text structures | Transform function explosion for tree/graph structures | Extends naturally with LWW etc. |
The memory overhead row in particular is something people tend to underestimate when first evaluating CRDT. Yjs keeps it within practical bounds thanks to internal compression, but the tombstone accumulation from deleted elements is a real concern in long-running systems.
Tombstone: In CRDT, deleted elements aren't actually removed — they're marked as "deleted" and left in place. This is necessary because other peers may still hold references to deleted elements, but it causes document size to grow indefinitely over time. Periodic snapshotting and garbage collection strategies are needed alongside this mechanism.
The Most Common Mistakes in Practice
-
Assuming CRDT means you don't need a server — Both Figma and Notion maintain servers. "It can work without a server" is a possibility, not a signal that you can eliminate the server in a real service. Access control, backup, and authentication still require a server.
-
Choosing OT when your structure isn't text — Trying to implement OT for nested trees, graphs, or object property synchronization causes transform function combinations to explode. For non-text data, CRDT or LWW is far more natural.
-
Using Yjs but ignoring awareness — Yjs's
awarenessAPI shares cursor positions and user presence ("who is currently editing where"). Leaving it out makes a collaboration tool feel like a single-user tool. Setting it up takes five lines of code.
Closing Thoughts
The more important question isn't which algorithm is superior — it's which one naturally fits your product's data structure and infrastructure characteristics.
If I were building a new collaboration feature, here's how I'd approach it:
-
If offline support is needed, start with Yjs — Starting with
pnpm add yjs y-websocketis the fastest path. For editor integration, pick fromy-prosemirror,y-quill, ory-codemirrorbased on your current stack. -
If you have a server-centric architecture, evaluate ShareDB — It lets you implement OT-based real-time editing relatively quickly in a Node.js environment, with official MongoDB adapter support (
pnpm add sharedb). -
For complex object trees or non-text data, try LWW first — Start with per-property timestamp comparisons à la Figma, observe what conflict cases actually arise in the wild, then refine incrementally.
References
- How Figma's multiplayer technology works | Figma Blog
- Building real-time collaboration applications: OT vs CRDT | TinyMCE
- CRDTs vs Operational Transformation: A Practical Guide | HackerNoon
- Real Differences between OT and CRDT | ACM CSCW 2020
- Collaborative Text Editing without CRDTs or OT | Matthew Weidner (2025)
- Peritext: A CRDT for Rich-Text Collaboration | Ink & Switch
- Yjs Documentation
- About CRDTs | crdt.tech
- Operational Transformation | Wikipedia
- Conflict-free Replicated Data Type | Wikipedia