How to Reduce TTFB from 350ms to 60ms with Next.js RSC + Streaming
While reviewing the Core Web Vitals of a production service, I once discovered pages where TTFB was well over 400ms. Each DB query was individually fast, so I wondered why things were so slow — and it turned out that three independent queries were waiting in series. I figured parallelizing them with Promise.all would solve the problem, but the TTFB barely changed. The issue wasn't a single line of code; it was at the architecture level.
Only after properly understanding RSC + Streaming did the numbers change dramatically. After reading this article, you'll be able to directly apply a page design approach where TTFB equals "the time to render the static shell" rather than "the time the slowest query takes to finish." This is explained in the context of the Next.js App Router, and anyone who has used React and the App Router should be able to follow along right away.
Core Concepts
What's the Difference Between Traditional SSR and Streaming SSR?
Honestly, at first I thought, "How different can streaming really be?" But once I opened the Network tab and watched chunks arriving one by one, my thinking changed completely.
Traditional SSR is a structure where the server collects all data and returns a single completed HTML response. The problem lies in the word "all." If three DB queries take 50ms, 100ms, and 300ms respectively, the user doesn't receive the first byte until 300ms have elapsed. The slowest query holds the rest hostage.
| Approach | TTFB Determinant | User Experience |
|---|---|---|
| Traditional SSR | Completion time of the slowest data fetch | White screen → full render all at once |
| Streaming SSR | Static shell rendering time | Layout displayed immediately → content appears progressively |
| PPR | CDN edge latency (20–80ms) | Cached shell instantly → dynamic sections streaming |
TTFB (Time to First Byte): The time from when the browser sends a request to the server until it receives the first byte of the response. Google's Core Web Vitals recommended threshold is below 800ms, ideally below 200ms.
What RSC Actually Sends to the Client
What RSC (React Server Components) delivers is not completed HTML. It is a component tree encoded in a serialization format called the React Flight protocol. (The internal structure of this format is not covered in detail here, but if you want to dive deep, the renderToPipeableStream documentation is a good starting point.) The client runtime progressively assembles the UI as it receives stream chunks.
// app/dashboard/_components/MetricsPanel.tsx
// Without 'use client', this is a Server Component by default
async function MetricsPanel() {
const metrics = await getMonthlyMetrics(); // Direct DB call
// The db module is completely excluded from the client bundle
return <MetricsGrid data={metrics} />;
}This structure brings two benefits. Server-only code like db is excluded from the client bundle, and DB calls are made directly on the server, eliminating network round trips.
How Suspense Boundaries Make Streaming Work
Suspense was originally created for code splitting, but it has taken on a central role in server streaming.
At the HTTP level, it works like this. The server opens a Transfer-Encoding: chunked response and immediately flushes the static shell outside the Suspense boundary as the first chunk. While the component inside the boundary waits for data, a fallback UI (such as a skeleton) is sent along with it. When the data is ready, the server sends the actual content for that boundary as an additional chunk, and the client runtime replaces the skeleton.
// app/dashboard/page.tsx
export default function DashboardPage() {
return (
<DashboardLayout>
{/* DashboardLayout is static — included in the first chunk immediately */}
<Suspense fallback={<MetricsSkeleton />}>
<MetricsPanel /> {/* Sent as a separate chunk when the internal fetch completes */}
</Suspense>
</DashboardLayout>
);
}Without Suspense, the server sends nothing until MetricsPanel's data is available. With Suspense, the layout is delivered immediately and the data is filled in later.
Practical Application
Here are three examples arranged in order of complexity. It's best to start with whichever one most closely resembles your current situation.
Example 1: Data-Heavy Dashboard — Parallel Streaming
A dashboard where multiple widgets each require independent data is the type where RSC Streaming effects are most pronounced. Wrapping each widget in an independent Suspense boundary enables parallel streaming.
// app/dashboard/page.tsx
export default function DashboardPage() {
return (
<DashboardLayout>
{/* DashboardLayout is static — flushed immediately on request */}
<Suspense fallback={<MetricsSkeleton />}>
<MetricsPanel /> {/* DB query A: ~100ms */}
</Suspense>
<Suspense fallback={<ChartSkeleton />}>
<RevenueChart /> {/* DB query B: ~150ms */}
</Suspense>
<Suspense fallback={<TableSkeleton />}>
<RecentTransactions /> {/* DB query C: ~300ms */}
</Suspense>
</DashboardLayout>
);
}// app/dashboard/_components/MetricsPanel.tsx
// Aggregated data — cacheable, Server Component
async function MetricsPanel() {
const metrics = await getMonthlyMetrics();
return <MetricsGrid data={metrics} />;
}| Component | Role | When Sent |
|---|---|---|
DashboardLayout |
Navigation, sidebar | Immediately on request |
<MetricsSkeleton /> etc. |
Loading placeholders | Together with Layout |
<MetricsPanel /> |
Aggregated metrics | After query A completes |
<RecentTransactions /> |
Transaction list | After query C completes |
Queries A, B, and C run in parallel without waiting for each other. With traditional SSR the TTFB would be 300ms+, but with this structure the layout is delivered immediately. No matter how slow query C is, the other widgets appear first.
Example 2: E-commerce Product Detail Page — Separating Static/Dynamic with PPR
Product pages have an interesting structure. The product name, images, and description rarely change, but inventory and pricing require real-time data. PPR (Partial Prerendering) separates these two at build time.
The reason ProductHero and ProductDescription are classified as static is that they can be generated at build time using only params.id without any external data fetching. By contrast, LivePrice disables caching with noStore(), so it is fetched from the origin server on every request. This difference is the branching point for caching strategy.
// next.config.ts
export default {
experimental: { ppr: 'incremental' },
};// app/product/[id]/page.tsx
export default function ProductPage({ params }: { params: { id: string } }) {
return (
<>
{/* Static shell — generated at build time, served from CDN */}
<ProductHero id={params.id} />
<ProductDescription id={params.id} />
{/* Dynamic sections — streamed from origin */}
<Suspense fallback={<PriceSkeleton />}>
<LivePrice id={params.id} />
</Suspense>
<Suspense fallback={<StockSkeleton />}>
<StockStatus id={params.id} />
</Suspense>
</>
);
}// app/product/[id]/_components/LivePrice.tsx
import { unstable_noStore as noStore } from 'next/cache';
async function LivePrice({ id }: { id: string }) {
noStore(); // Disable caching — fetch the latest price on every request
const price = await fetchCurrentPrice(id);
return <PriceDisplay price={price} />;
}Once this structure is in place, ProductHero is delivered from a CDN edge within 20–80ms, and price and inventory information streams immediately after. Origin server processing time has zero impact on TTFB.
PPR (Partial Prerendering): A Next.js feature that generates both a static HTML shell and a
postponedStateblob at build time. On request, the CDN immediately sends the shell while the origin server simultaneously streams only the dynamic sections. It is evolving toward stabilization under theexperimental.pprflag.
Example 3: Preventing Waterfalls — The Promise Early-Start Pattern
I made this mistake for quite a while early on. It's the case where you carefully separate Suspense boundaries, but then create a waterfall inside the component itself. This pattern comes up frequently:
// Bad example: serial waterfall — takes 500ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
const user = await getUser(params.userId); // Wait 300ms
const posts = await getUserPosts(user.id); // Then wait another 200ms
// user.id === params.userId, yet we unnecessarily wait for user before starting posts
return <Profile user={user} posts={posts} />;
}Because getUserPosts receives user.id, it appears there's a dependency — but in reality, passing params.userId directly makes the two fetches completely independent.
// Good example: Promise early start — takes 300ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
const userPromise = getUser(params.userId); // Start immediately
const postsPromise = getUserPosts(params.userId); // Start immediately (no waiting for user)
const [user, posts] = await Promise.all([userPromise, postsPromise]);
return <Profile user={user} posts={posts} />;
}Simply restructuring the code like this saves 200ms. Even though the code change looks small, it makes a noticeably meaningful difference in real user experience.
Pros and Cons Analysis
Advantages
| Item | Description |
|---|---|
| Reduced TTFB | Static shell is sent immediately, breaking away from the "slow query = slow TTFB" equation |
| Reduced bundle size | Server Component code is completely excluded from the client bundle |
| Fetching close to the data source | DB and internal APIs are called directly on the server, eliminating network round trips |
| SEO compatibility | Search engines can index content even during streaming |
| Parallel data fetching | Independent Suspense boundaries fetch data in parallel |
Disadvantages and Caveats
The most common problems when first adopting this are reverse proxy buffering and error handling. I personally experienced a situation where streaming that worked fine locally didn't work at all in production because of an Nginx configuration issue. If chunks appear to arrive all at once in the Network tab, the proxy settings are the first thing worth investigating.
| Item | Description | Mitigation |
|---|---|---|
| Blocking fetches | A top-level page await neutralizes streaming |
Move data-dependent sections inside Suspense boundaries |
| Error handling complexity | Errors after shell flush cannot change the HTTP status code | ErrorBoundary is required; design upstream error recovery logic |
| Reverse proxy buffering | Nginx default settings buffer chunks and block streaming | proxy_buffering off; proxy_cache off; configuration is required |
| Flight payload bloat | Large Server→Client props can delay hydration | Minimize prop passing, review serialization of large data |
| Environment differences | Streaming behavior may differ between local and production | Real-world measurement via RUM is essential |
| Suspense design overhead | Too granular boundaries can cause dozens of loading UIs to appear simultaneously | Design with both information architecture and data dependencies in mind |
RUM (Real User Monitoring): Performance data collected from real users' browsers. It can be gathered with the
@vercel/speed-insightsorweb-vitalslibraries. Local measurements can differ meaningfully from production and are difficult to rely on.
The Most Common Mistakes in Practice
-
Blocking global data with
awaitin layout components — Fetching session or user information withawaitinapp/layout.tsxincreases the TTFB of all child pages by that duration. If data is truly required in the layout, consider wrapping it in Suspense or moving it to the client side. -
Concluding "streaming doesn't work" while leaving Nginx settings unchanged — If chunks don't appear separated in the Network tab, it's worth checking the
proxy_buffering off; proxy_cache off;settings first. It's likely not a code problem. -
Confusion from trying to attach client logic to Server Components — Where you place the
'use client'boundary in the RSC tree significantly affects bundle size. It's important to develop the habit of isolating only the parts that require client state or event handlers.
Closing Thoughts
The biggest shift in mindset from applying RSC + Streaming was realizing that performance improvements come from "correct architectural boundaries" rather than "faster code." Optimizing queries is important, but structuring things so that slow queries cannot block fast layouts is what actually moved the TTFB numbers.
Here's the order I applied these changes in, and I was able to see visible improvements at each step.
-
The first thing to do is find the slowest data fetch in an existing page and wrap it with
<Suspense fallback={<Skeleton />}>. This alone prevents that section from blocking the rest of the rendering. It's worth checking the DevTools Network tab to confirm that chunks are arriving separately. -
If there are serial fetches inside a component, you can revisit the dependencies. Asking "does this fetch truly need the result of the previous fetch?" often reveals more cases than expected where fetches can be combined with
Promise.all. -
To compare metrics before and after changes, it's recommended to add RUM with
pnpm add @vercel/speed-insights. Even if things feel fast locally, real user data can tell a different story — and the moment you see TTFB drop from 350ms to 60ms on a graph, you'll be convinced this is the right direction.
References
- React Server Components Streaming Performance Guide 2026 | SitePoint
- 8 Next.js Streaming Tactics That Slash TTFB | Medium (Neurobyte)
- The Ultimate Guide to Improving Next.js TTFB: From 800ms to <100ms | CatchMetrics
- Partial Prerendering (PPR) in Production: Architecture Patterns (2026 Edition) | samcheek.com
- How We Reduced TTFB by 60% Using Server Actions in Next.js 15 | Medium
- 6 React Server Component Performance Pitfalls in Next.js | LogRocket Blog
- How to Avoid Waterfalls in React Suspense | sergiodxa
- Guides: Streaming | Next.js Official Docs
- Guides: PPR Platform Guide | Next.js Official Docs
- How Streaming Helps Build Faster Web Applications | Vercel Blog
- React Server Components in Production: Benefits, Pitfalls and Best Practices 2026 | Growin
- Mastering Next.js 15 Streaming and Suspense: A Performance Guide | untergletscher.com