How to Reduce TTFB from 350ms to 60ms with Next.js RSC + Streaming

While reviewing the Core Web Vitals of a production service, I once discovered pages where TTFB was well over 400ms. Each DB query was individually fast, so I wondered why things were so slow — and it turned out that three independent queries were waiting in series. I figured parallelizing them with Promise.all would solve the problem, but the TTFB barely changed. The issue wasn't a single line of code; it was at the architecture level.

Only after properly understanding RSC + Streaming did the numbers change dramatically. After reading this article, you'll be able to directly apply a page design approach where TTFB equals "the time to render the static shell" rather than "the time the slowest query takes to finish." This is explained in the context of the Next.js App Router, and anyone who has used React and the App Router should be able to follow along right away.

Core Concepts

What's the Difference Between Traditional SSR and Streaming SSR?

Honestly, at first I thought, "How different can streaming really be?" But once I opened the Network tab and watched chunks arriving one by one, my thinking changed completely.

Traditional SSR is a structure where the server collects all data and returns a single completed HTML response. The problem lies in the word "all." If three DB queries take 50ms, 100ms, and 300ms respectively, the user doesn't receive the first byte until 300ms have elapsed. The slowest query holds the rest hostage.

Approach	TTFB Determinant	User Experience
Traditional SSR	Completion time of the slowest data fetch	White screen → full render all at once
Streaming SSR	Static shell rendering time	Layout displayed immediately → content appears progressively
PPR	CDN edge latency (20–80ms)	Cached shell instantly → dynamic sections streaming

TTFB (Time to First Byte): The time from when the browser sends a request to the server until it receives the first byte of the response. Google's Core Web Vitals recommended threshold is below 800ms, ideally below 200ms.

What RSC Actually Sends to the Client

What RSC (React Server Components) delivers is not completed HTML. It is a component tree encoded in a serialization format called the React Flight protocol. (The internal structure of this format is not covered in detail here, but if you want to dive deep, the renderToPipeableStream documentation is a good starting point.) The client runtime progressively assembles the UI as it receives stream chunks.

typescript

// app/dashboard/_components/MetricsPanel.tsx
// Without 'use client', this is a Server Component by default
async function MetricsPanel() {
  const metrics = await getMonthlyMetrics(); // Direct DB call
 
  // The db module is completely excluded from the client bundle
  return <MetricsGrid data={metrics} />;
}

This structure brings two benefits. Server-only code like db is excluded from the client bundle, and DB calls are made directly on the server, eliminating network round trips.

How Suspense Boundaries Make Streaming Work

Suspense was originally created for code splitting, but it has taken on a central role in server streaming.

At the HTTP level, it works like this. The server opens a Transfer-Encoding: chunked response and immediately flushes the static shell outside the Suspense boundary as the first chunk. While the component inside the boundary waits for data, a fallback UI (such as a skeleton) is sent along with it. When the data is ready, the server sends the actual content for that boundary as an additional chunk, and the client runtime replaces the skeleton.

tsx

// app/dashboard/page.tsx
export default function DashboardPage() {
  return (
    <DashboardLayout>
      {/* DashboardLayout is static — included in the first chunk immediately */}
 
      <Suspense fallback={<MetricsSkeleton />}>
        <MetricsPanel />  {/* Sent as a separate chunk when the internal fetch completes */}
      </Suspense>
    </DashboardLayout>
  );
}

Without Suspense, the server sends nothing until MetricsPanel's data is available. With Suspense, the layout is delivered immediately and the data is filled in later.

Practical Application

Here are three examples arranged in order of complexity. It's best to start with whichever one most closely resembles your current situation.

Example 1: Data-Heavy Dashboard — Parallel Streaming

A dashboard where multiple widgets each require independent data is the type where RSC Streaming effects are most pronounced. Wrapping each widget in an independent Suspense boundary enables parallel streaming.

tsx

// app/dashboard/page.tsx
export default function DashboardPage() {
  return (
    <DashboardLayout>
      {/* DashboardLayout is static — flushed immediately on request */}
 
      <Suspense fallback={<MetricsSkeleton />}>
        <MetricsPanel />          {/* DB query A: ~100ms */}
      </Suspense>
 
      <Suspense fallback={<ChartSkeleton />}>
        <RevenueChart />          {/* DB query B: ~150ms */}
      </Suspense>
 
      <Suspense fallback={<TableSkeleton />}>
        <RecentTransactions />    {/* DB query C: ~300ms */}
      </Suspense>
    </DashboardLayout>
  );
}

tsx

// app/dashboard/_components/MetricsPanel.tsx
// Aggregated data — cacheable, Server Component
async function MetricsPanel() {
  const metrics = await getMonthlyMetrics();
  return <MetricsGrid data={metrics} />;
}

Component	Role	When Sent
`DashboardLayout`	Navigation, sidebar	Immediately on request
`<MetricsSkeleton />` etc.	Loading placeholders	Together with Layout
`<MetricsPanel />`	Aggregated metrics	After query A completes
`<RecentTransactions />`	Transaction list	After query C completes

Queries A, B, and C run in parallel without waiting for each other. With traditional SSR the TTFB would be 300ms+, but with this structure the layout is delivered immediately. No matter how slow query C is, the other widgets appear first.

Example 2: E-commerce Product Detail Page — Separating Static/Dynamic with PPR

Product pages have an interesting structure. The product name, images, and description rarely change, but inventory and pricing require real-time data. PPR (Partial Prerendering) separates these two at build time.

The reason ProductHero and ProductDescription are classified as static is that they can be generated at build time using only params.id without any external data fetching. By contrast, LivePrice disables caching with noStore(), so it is fetched from the origin server on every request. This difference is the branching point for caching strategy.

typescript

// next.config.ts
export default {
  experimental: { ppr: 'incremental' },
};

tsx

// app/product/[id]/page.tsx
export default function ProductPage({ params }: { params: { id: string } }) {
  return (
    <>
      {/* Static shell — generated at build time, served from CDN */}
      <ProductHero id={params.id} />
      <ProductDescription id={params.id} />
 
      {/* Dynamic sections — streamed from origin */}
      <Suspense fallback={<PriceSkeleton />}>
        <LivePrice id={params.id} />
      </Suspense>
 
      <Suspense fallback={<StockSkeleton />}>
        <StockStatus id={params.id} />
      </Suspense>
    </>
  );
}

tsx

// app/product/[id]/_components/LivePrice.tsx
import { unstable_noStore as noStore } from 'next/cache';
 
async function LivePrice({ id }: { id: string }) {
  noStore(); // Disable caching — fetch the latest price on every request
  const price = await fetchCurrentPrice(id);
  return <PriceDisplay price={price} />;
}

Once this structure is in place, ProductHero is delivered from a CDN edge within 20–80ms, and price and inventory information streams immediately after. Origin server processing time has zero impact on TTFB.

PPR (Partial Prerendering): A Next.js feature that generates both a static HTML shell and a postponedState blob at build time. On request, the CDN immediately sends the shell while the origin server simultaneously streams only the dynamic sections. It is evolving toward stabilization under the experimental.ppr flag.

Example 3: Preventing Waterfalls — The Promise Early-Start Pattern

I made this mistake for quite a while early on. It's the case where you carefully separate Suspense boundaries, but then create a waterfall inside the component itself. This pattern comes up frequently:

typescript

// Bad example: serial waterfall — takes 500ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
  const user = await getUser(params.userId);        // Wait 300ms
  const posts = await getUserPosts(user.id);         // Then wait another 200ms
  // user.id === params.userId, yet we unnecessarily wait for user before starting posts
  return <Profile user={user} posts={posts} />;
}

Because getUserPosts receives user.id, it appears there's a dependency — but in reality, passing params.userId directly makes the two fetches completely independent.

typescript

// Good example: Promise early start — takes 300ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
  const userPromise = getUser(params.userId);        // Start immediately
  const postsPromise = getUserPosts(params.userId);  // Start immediately (no waiting for user)
 
  const [user, posts] = await Promise.all([userPromise, postsPromise]);
  return <Profile user={user} posts={posts} />;
}

Simply restructuring the code like this saves 200ms. Even though the code change looks small, it makes a noticeably meaningful difference in real user experience.

Pros and Cons Analysis

Advantages

Item	Description
Reduced TTFB	Static shell is sent immediately, breaking away from the "slow query = slow TTFB" equation
Reduced bundle size	Server Component code is completely excluded from the client bundle
Fetching close to the data source	DB and internal APIs are called directly on the server, eliminating network round trips
SEO compatibility	Search engines can index content even during streaming
Parallel data fetching	Independent Suspense boundaries fetch data in parallel

Disadvantages and Caveats

The most common problems when first adopting this are reverse proxy buffering and error handling. I personally experienced a situation where streaming that worked fine locally didn't work at all in production because of an Nginx configuration issue. If chunks appear to arrive all at once in the Network tab, the proxy settings are the first thing worth investigating.

Item	Description	Mitigation
Blocking fetches	A top-level page `await` neutralizes streaming	Move data-dependent sections inside Suspense boundaries
Error handling complexity	Errors after shell flush cannot change the HTTP status code	ErrorBoundary is required; design upstream error recovery logic
Reverse proxy buffering	Nginx default settings buffer chunks and block streaming	`proxy_buffering off; proxy_cache off;` configuration is required
Flight payload bloat	Large Server→Client props can delay hydration	Minimize prop passing, review serialization of large data
Environment differences	Streaming behavior may differ between local and production	Real-world measurement via RUM is essential
Suspense design overhead	Too granular boundaries can cause dozens of loading UIs to appear simultaneously	Design with both information architecture and data dependencies in mind

RUM (Real User Monitoring): Performance data collected from real users' browsers. It can be gathered with the @vercel/speed-insights or web-vitals libraries. Local measurements can differ meaningfully from production and are difficult to rely on.

The Most Common Mistakes in Practice

Blocking global data with await in layout components — Fetching session or user information with await in app/layout.tsx increases the TTFB of all child pages by that duration. If data is truly required in the layout, consider wrapping it in Suspense or moving it to the client side.
Concluding "streaming doesn't work" while leaving Nginx settings unchanged — If chunks don't appear separated in the Network tab, it's worth checking the proxy_buffering off; proxy_cache off; settings first. It's likely not a code problem.
Confusion from trying to attach client logic to Server Components — Where you place the 'use client' boundary in the RSC tree significantly affects bundle size. It's important to develop the habit of isolating only the parts that require client state or event handlers.

Closing Thoughts

The biggest shift in mindset from applying RSC + Streaming was realizing that performance improvements come from "correct architectural boundaries" rather than "faster code." Optimizing queries is important, but structuring things so that slow queries cannot block fast layouts is what actually moved the TTFB numbers.

Here's the order I applied these changes in, and I was able to see visible improvements at each step.

The first thing to do is find the slowest data fetch in an existing page and wrap it with <Suspense fallback={<Skeleton />}>. This alone prevents that section from blocking the rest of the rendering. It's worth checking the DevTools Network tab to confirm that chunks are arriving separately.
If there are serial fetches inside a component, you can revisit the dependencies. Asking "does this fetch truly need the result of the previous fetch?" often reveals more cases than expected where fetches can be combined with Promise.all.
To compare metrics before and after changes, it's recommended to add RUM with pnpm add @vercel/speed-insights. Even if things feel fast locally, real user data can tell a different story — and the moment you see TTFB drop from 350ms to 60ms on a graph, you'll be convinced this is the right direction.

References

#NextJS#RSC#Streaming#Suspense#PPR#ServerComponents#TTFB#CoreWebVitals#React#성능최적화

frontend

How to Reduce TTFB from 350ms to 60ms with Next.js RSC + Streaming

Core Concepts

What's the Difference Between Traditional SSR and Streaming SSR?

Honestly, at first I thought, "How different can streaming really be?" But once I opened the Network tab and watched chunks arriving one by one, my thinking changed completely.

Approach	TTFB Determinant	User Experience
Traditional SSR	Completion time of the slowest data fetch	White screen → full render all at once
Streaming SSR	Static shell rendering time	Layout displayed immediately → content appears progressively
PPR	CDN edge latency (20–80ms)	Cached shell instantly → dynamic sections streaming

TTFB (Time to First Byte): The time from when the browser sends a request to the server until it receives the first byte of the response. Google's Core Web Vitals recommended threshold is below 800ms, ideally below 200ms.

What RSC Actually Sends to the Client

typescript

// app/dashboard/_components/MetricsPanel.tsx
// Without 'use client', this is a Server Component by default
async function MetricsPanel() {
  const metrics = await getMonthlyMetrics(); // Direct DB call
 
  // The db module is completely excluded from the client bundle
  return <MetricsGrid data={metrics} />;
}

This structure brings two benefits. Server-only code like db is excluded from the client bundle, and DB calls are made directly on the server, eliminating network round trips.

How Suspense Boundaries Make Streaming Work

Suspense was originally created for code splitting, but it has taken on a central role in server streaming.

tsx

// app/dashboard/page.tsx
export default function DashboardPage() {
  return (
    <DashboardLayout>
      {/* DashboardLayout is static — included in the first chunk immediately */}
 
      <Suspense fallback={<MetricsSkeleton />}>
        <MetricsPanel />  {/* Sent as a separate chunk when the internal fetch completes */}
      </Suspense>
    </DashboardLayout>
  );
}

Without Suspense, the server sends nothing until MetricsPanel's data is available. With Suspense, the layout is delivered immediately and the data is filled in later.

Practical Application

Here are three examples arranged in order of complexity. It's best to start with whichever one most closely resembles your current situation.

Example 1: Data-Heavy Dashboard — Parallel Streaming

tsx

// app/dashboard/page.tsx
export default function DashboardPage() {
  return (
    <DashboardLayout>
      {/* DashboardLayout is static — flushed immediately on request */}
 
      <Suspense fallback={<MetricsSkeleton />}>
        <MetricsPanel />          {/* DB query A: ~100ms */}
      </Suspense>
 
      <Suspense fallback={<ChartSkeleton />}>
        <RevenueChart />          {/* DB query B: ~150ms */}
      </Suspense>
 
      <Suspense fallback={<TableSkeleton />}>
        <RecentTransactions />    {/* DB query C: ~300ms */}
      </Suspense>
    </DashboardLayout>
  );
}

tsx

// app/dashboard/_components/MetricsPanel.tsx
// Aggregated data — cacheable, Server Component
async function MetricsPanel() {
  const metrics = await getMonthlyMetrics();
  return <MetricsGrid data={metrics} />;
}

Component	Role	When Sent
`DashboardLayout`	Navigation, sidebar	Immediately on request
`<MetricsSkeleton />` etc.	Loading placeholders	Together with Layout
`<MetricsPanel />`	Aggregated metrics	After query A completes
`<RecentTransactions />`	Transaction list	After query C completes

Example 2: E-commerce Product Detail Page — Separating Static/Dynamic with PPR

typescript

// next.config.ts
export default {
  experimental: { ppr: 'incremental' },
};

tsx

// app/product/[id]/page.tsx
export default function ProductPage({ params }: { params: { id: string } }) {
  return (
    <>
      {/* Static shell — generated at build time, served from CDN */}
      <ProductHero id={params.id} />
      <ProductDescription id={params.id} />
 
      {/* Dynamic sections — streamed from origin */}
      <Suspense fallback={<PriceSkeleton />}>
        <LivePrice id={params.id} />
      </Suspense>
 
      <Suspense fallback={<StockSkeleton />}>
        <StockStatus id={params.id} />
      </Suspense>
    </>
  );
}

tsx

// app/product/[id]/_components/LivePrice.tsx
import { unstable_noStore as noStore } from 'next/cache';
 
async function LivePrice({ id }: { id: string }) {
  noStore(); // Disable caching — fetch the latest price on every request
  const price = await fetchCurrentPrice(id);
  return <PriceDisplay price={price} />;
}

PPR (Partial Prerendering): A Next.js feature that generates both a static HTML shell and a postponedState blob at build time. On request, the CDN immediately sends the shell while the origin server simultaneously streams only the dynamic sections. It is evolving toward stabilization under the experimental.ppr flag.

Example 3: Preventing Waterfalls — The Promise Early-Start Pattern

typescript

// Bad example: serial waterfall — takes 500ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
  const user = await getUser(params.userId);        // Wait 300ms
  const posts = await getUserPosts(user.id);         // Then wait another 200ms
  // user.id === params.userId, yet we unnecessarily wait for user before starting posts
  return <Profile user={user} posts={posts} />;
}

Because getUserPosts receives user.id, it appears there's a dependency — but in reality, passing params.userId directly makes the two fetches completely independent.

typescript

// Good example: Promise early start — takes 300ms total
// app/profile/[userId]/page.tsx
async function UserProfilePage({ params }: { params: { userId: string } }) {
  const userPromise = getUser(params.userId);        // Start immediately
  const postsPromise = getUserPosts(params.userId);  // Start immediately (no waiting for user)
 
  const [user, posts] = await Promise.all([userPromise, postsPromise]);
  return <Profile user={user} posts={posts} />;
}

Simply restructuring the code like this saves 200ms. Even though the code change looks small, it makes a noticeably meaningful difference in real user experience.

Pros and Cons Analysis

Advantages

Item	Description
Reduced TTFB	Static shell is sent immediately, breaking away from the "slow query = slow TTFB" equation
Reduced bundle size	Server Component code is completely excluded from the client bundle
Fetching close to the data source	DB and internal APIs are called directly on the server, eliminating network round trips
SEO compatibility	Search engines can index content even during streaming
Parallel data fetching	Independent Suspense boundaries fetch data in parallel

Disadvantages and Caveats

Item	Description	Mitigation
Blocking fetches	A top-level page `await` neutralizes streaming	Move data-dependent sections inside Suspense boundaries
Error handling complexity	Errors after shell flush cannot change the HTTP status code	ErrorBoundary is required; design upstream error recovery logic
Reverse proxy buffering	Nginx default settings buffer chunks and block streaming	`proxy_buffering off; proxy_cache off;` configuration is required
Flight payload bloat	Large Server→Client props can delay hydration	Minimize prop passing, review serialization of large data
Environment differences	Streaming behavior may differ between local and production	Real-world measurement via RUM is essential
Suspense design overhead	Too granular boundaries can cause dozens of loading UIs to appear simultaneously	Design with both information architecture and data dependencies in mind

RUM (Real User Monitoring): Performance data collected from real users' browsers. It can be gathered with the @vercel/speed-insights or web-vitals libraries. Local measurements can differ meaningfully from production and are difficult to rely on.

The Most Common Mistakes in Practice

Blocking global data with await in layout components — Fetching session or user information with await in app/layout.tsx increases the TTFB of all child pages by that duration. If data is truly required in the layout, consider wrapping it in Suspense or moving it to the client side.
Concluding "streaming doesn't work" while leaving Nginx settings unchanged — If chunks don't appear separated in the Network tab, it's worth checking the proxy_buffering off; proxy_cache off; settings first. It's likely not a code problem.
Confusion from trying to attach client logic to Server Components — Where you place the 'use client' boundary in the RSC tree significantly affects bundle size. It's important to develop the habit of isolating only the parts that require client state or event handlers.

Closing Thoughts

Here's the order I applied these changes in, and I was able to see visible improvements at each step.

The first thing to do is find the slowest data fetch in an existing page and wrap it with <Suspense fallback={<Skeleton />}>. This alone prevents that section from blocking the rest of the rendering. It's worth checking the DevTools Network tab to confirm that chunks are arriving separately.
If there are serial fetches inside a component, you can revisit the dependencies. Asking "does this fetch truly need the result of the previous fetch?" often reveals more cases than expected where fetches can be combined with Promise.all.
To compare metrics before and after changes, it's recommended to add RUM with pnpm add @vercel/speed-insights. Even if things feel fast locally, real user data can tell a different story — and the moment you see TTFB drop from 350ms to 60ms on a graph, you'll be convinced this is the right direction.

References

#NextJS#RSC#Streaming#Suspense#PPR#ServerComponents#TTFB#CoreWebVitals#React#성능최적화

Core Concepts

What's the Difference Between Traditional SSR and Streaming SSR?

What RSC Actually Sends to the Client

How Suspense Boundaries Make Streaming Work

Practical Application

Example 1: Data-Heavy Dashboard — Parallel Streaming

Example 2: E-commerce Product Detail Page — Separating Static/Dynamic with PPR

Example 3: Preventing Waterfalls — The Promise Early-Start Pattern

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Core Concepts

What's the Difference Between Traditional SSR and Streaming SSR?

What RSC Actually Sends to the Client

How Suspense Boundaries Make Streaming Work

Practical Application

Example 1: Data-Heavy Dashboard — Parallel Streaming

Example 2: E-commerce Product Detail Page — Separating Static/Dynamic with PPR

Example 3: Preventing Waterfalls — The Promise Early-Start Pattern

Pros and Cons Analysis

Advantages

Disadvantages and Caveats

The Most Common Mistakes in Practice

Closing Thoughts

References

Recommended Posts

HTMX 4.0 Server Rendering Patterns: Architecture Choices for Building Interactive Web Apps Without Client State

View Transitions API — Production Page Transition Animations Without Libraries, After Achieving Baseline 2025

We Migrated from Webpack to Rsbuild and Production Builds Got 74% Faster — The Migration Reality and Rspack Pitfalls

Speed Up `next build` 10× with Next.js Turbopack Build Cache — Pitfalls of Experimental Flags and CI Integration Strategies

Zustand persist migrate: 4 Ways to Safely Narrow `persistedState unknown` Type with TypeScript

TanStack Query + Zustand: Patterns and Anti-Patterns for Separating Server State and Client State