Cloudflare Observability Guide

Last reviewed: 2026-03-22

This guide explains what Cloudflare Observability includes today, how each part works, and how to apply it to your own programs and services.


1. What “Observability” means in Cloudflare

In Cloudflare, Observability is not one single feature. It is a collection of tools for answering questions like:

Depending on which Cloudflare products you use, the Observability or related menus usually include some combination of:

The most important idea is this:

Cloudflare gives you both native dashboards and export/programmatic paths.
Use native dashboards for fast diagnosis, and exports/APIs for long-term operations, alerting, compliance, and cross-system correlation.


2. The Cloudflare observability toolbox at a glance

Need Best Cloudflare feature
See traffic, status codes, bandwidth, cache behavior Analytics dashboards / GraphQL Analytics API
Debug a Worker right now Real-time logs / wrangler tail
Store and query Worker logs in Cloudflare Workers Logs
Export Worker logs/traces to external platforms OpenTelemetry export or Workers Logpush
Build custom, per-tenant or per-feature app metrics Workers Analytics Engine
Monitor origin uptime and latency Health Checks + Health Checks Analytics
Monitor Cloudflare Tunnel health cloudflared Prometheus metrics
Export edge/security/HTTP/Zero Trust logs Logpush
Alert on incidents Notifications
See who changed account settings Audit Logs

A quick visual map:

flowchart LR
    A[Cloudflare Observability] --> B[Workers signals]
    A --> C[Edge and platform signals]
    A --> D[Operations and governance]

    B --> B1[Workers Logs]
    B --> B2[Real-time logs / wrangler tail]
    B --> B3[Workers traces]
    B --> B4[OpenTelemetry export]
    B --> B5[Tail Workers]
    B --> B6[Workers Logpush]
    B --> B7[Workers Analytics Engine]
    B --> B8[Source maps]

    C --> C1[Analytics dashboards]
    C --> C2[Cloudflare Logs / Logpush]
    C --> C3[GraphQL Analytics API]
    C --> C4[Health Checks]
    C --> C5[cloudflared metrics]
    C --> C6[Zero Trust logs]
    C --> C7[Web Analytics]

    D --> D1[Notifications]
    D --> D2[Audit Logs]
    D --> D3[Logpush health monitoring]

3. What is included in Cloudflare Observability

3.1 Workers Observability

If your program runs on Cloudflare Workers, this is the most important area.

Cloudflare Workers observability includes:

Cloudflare says Workers observability is designed to help you understand application performance, diagnose issues, and inspect request flows, either inside Cloudflare or in your existing observability stack. Newly created Workers have observability enabled by default for Workers Logs. Cloudflare also documents native support for exporting Workers traces and logs to OpenTelemetry-compatible destinations.
Source docs:

What to use when

Decision flow:

flowchart LR
    A[What are you trying to answer?] --> B{Need live debugging right now?}
    B -->|Yes| C[Use wrangler tail]
    B -->|No| D{Need stored logs or a past time window?}

    D -->|Yes| E[Use Workers Logs]
    D -->|Long-term export / SIEM / object storage| F[Use Workers Logpush or Cloudflare Logpush]
    D -->|Trace request flow and latency| G[Use Workers traces]
    D -->|Send telemetry to Grafana or Datadog| H[Use OTel export]
    D -->|Custom app or tenant metrics| I[Use Workers Analytics Engine]
    D -->|Traffic trends and request analytics| J[Use Analytics dashboards or GraphQL]
    D -->|Origin uptime or latency| K[Use Health Checks]
    D -->|Tunnel health| L[Use cloudflared Prometheus metrics]
    D -->|Who changed config| M[Use Audit Logs]

Worker telemetry flow:

flowchart LR
    U[User request] --> W[Cloudflare Worker]
    W --> M1[Metrics and analytics]
    W --> L1[Workers Logs]
    W --> T1[Real-time tail]
    W --> X1[Workers traces]
    W --> S1[Source maps and stack traces]
    W --> A1[Analytics Engine custom metrics]

    W --> TW[Tail Worker]
    TW --> TW1[Filter]
    TW --> TW2[Redact]
    TW --> TW3[Route to webhook or Slack]

    W --> O1[OTel export]
    O1 --> O2[Grafana / Datadog / Honeycomb / Sentry]

    W --> LP1[Workers Logpush]
    LP1 --> LP2[R2 / S3 / SIEM / log platform]
    LP2 --> LP3[jq, SQL, dashboards, investigations]

3.2 Analytics dashboards

Cloudflare has several built-in dashboards for request/traffic analytics.

These dashboards are good for:

For Workers specifically, Cloudflare documents two graphical sources:

Workers metrics show performance and usage for your Worker. Zone analytics can help you inspect subrequests, bandwidth, status codes, and total requests. Cloudflare also notes that the Workers tab in Analytics is especially useful for spotting origin-side issues such as spikes in 500s and understanding traffic going to origin.
Source docs:

Good use cases


3.3 Workers Logs

Workers Logs is Cloudflare’s persisted log storage and query interface for Worker-emitted logs.

Cloudflare states that Workers Logs automatically collects, stores, filters, and analyzes logging data emitted from Workers, including:

Cloudflare also states:

Source docs:

When to use Workers Logs

Use Workers Logs when:

Basic Worker logging example

export default {
  async fetch(request, env) {
    const start = Date.now();

    console.log(JSON.stringify({
      event: "request_started",
      method: request.method,
      path: new URL(request.url).pathname,
      colo: request.cf?.colo ?? "unknown"
    }));

    try {
      const response = await fetch(request);
      const durationMs = Date.now() - start;

      console.log(JSON.stringify({
        event: "request_finished",
        status: response.status,
        durationMs
      }));

      return response;
    } catch (err) {
      console.error(JSON.stringify({
        event: "request_failed",
        message: String(err)
      }));
      throw err;
    }
  }
};

Best practice

Prefer structured JSON logs instead of free-form text. That makes filtering and later export much more useful.

Control sampling in Wrangler

{
  "observability": {
    "enabled": true,
    "head_sampling_rate": 0.1
  }
}

or:

[observability]
enabled = true
head_sampling_rate = 0.1

Use lower sampling in high-volume production services if log cost or noise becomes a problem.


3.4 Real-time logs and wrangler tail

Cloudflare’s real-time log path is for live debugging.

You can:

Cloudflare documents that real-time logs:

Source docs:

Example

npx wrangler tail

Pipe through jq:

npx wrangler tail | jq .event.request.url

Here, jq is a command-line JSON processor. It reads the JSON output from wrangler tail and extracts only the .event.request.url field, which is the request URL from each tailed event.

How to get only the logs you need

The right approach depends on whether you need live logs or older logs from a time window.

Examples for live filtering:

# Show only request URLs from the live stream
npx wrangler tail | jq -r '.event.request.url'
# Show only request URLs that contain a path fragment
npx wrangler tail | jq -r '.event.request.url' | grep '/api/orders'
# Show only lines that contain a known marker from your app logs
npx wrangler tail | grep 'request_failed'

If logs are already stored in a file or external platform, you can filter by time range there. For example, with newline-delimited JSON logs:

jq 'select(.timestamp >= "2026-03-22T00:00:00Z" and .timestamp < "2026-03-23T00:00:00Z")' worker-logs.ndjson

The exact timestamp field name depends on the dataset or destination, but the general idea is the same: use wrangler tail for live filtering, and use stored logs for time-range filtering.

When to use it

Use real-time logs when:

Do not use it as your main logging system

Because it is not a persisted log store, it is best for debugging sessions, not long-term operations.


3.5 Workers traces

Cloudflare Workers tracing is for request-flow visibility.

Cloudflare documents that Workers tracing follows OpenTelemetry (OTel) standards and that the default trace sampling rate is 1 when tracing is enabled.

Source docs:

When tracing is most useful

Use tracing when you want to answer:

Enable traces in Wrangler

{
  "observability": {
    "traces": {
      "enabled": true,
      "head_sampling_rate": 0.05
    }
  }
}

or:

[observability.traces]
enabled = true
head_sampling_rate = 0.05

Good production pattern


3.6 OpenTelemetry export

This is one of the most important integrations if you already have an observability stack.

Cloudflare documents that Workers can export OTel-compliant telemetry to any destination with an OTel endpoint, including examples like Honeycomb, Grafana Cloud, Axiom, and Sentry-compatible setups. Cloudflare also documents the persist option, which can be set to false if you want to export without storing in Cloudflare’s own dashboard-backed persistence.
Source docs:

Example configuration

{
  "observability": {
    "traces": {
      "enabled": true,
      "destinations": ["tracing-destination-name"],
      "head_sampling_rate": 0.05,
      "persist": false
    },
    "logs": {
      "enabled": true,
      "destinations": ["logs-destination-name"],
      "head_sampling_rate": 0.2,
      "persist": false
    }
  }
}

When you should use OTel export

Use OTel export if:


3.7 Source maps and stack traces

If your Worker is bundled, transpiled, or minified, source maps are critical.

Cloudflare documents that source maps let stack traces point back to your original code, and that Wrangler can upload them automatically during deployment.

Source docs:

Enable source maps

{
  "upload_source_maps": true
}

or:

upload_source_maps = true

Why this matters

Without source maps, production stack traces often point to minified bundles and useless line numbers.
With source maps, incidents become much faster to debug.

Recommendation

Turn this on for every serious Worker application.


3.8 Tail Workers

A Tail Worker receives telemetry about execution of another Worker and can process it.

Cloudflare documents that Tail Workers can process logs for alerts, debugging, or analytics, and that they are available on Workers Paid and Enterprise plans.

Source docs:

When Tail Workers are useful

Use Tail Workers when you want to:

Producer configuration example

{
  "tail_consumers": [
    {
      "service": "my-tail-worker"
    }
  ]
}

or:

[[tail_consumers]]
service = "my-tail-worker"

Conceptual pattern

Good use cases


3.9 Workers Logpush

Workers Logpush is for exporting Worker trace event logs to a supported destination.

Cloudflare documents dashboard setup and cURL/API setup for Workers trace events, including export to R2 and other supported destinations.

Source docs:

When to use Workers Logpush vs OTel export

Use Workers Logpush when:

Use OTel export when:

How to use Workers Logpush in practice

In practice, the flow is:

  1. enable logpush = true on the Worker
  2. create an account-scoped Logpush job
  3. choose workers_trace_events as the dataset
  4. send logs to R2, S3, or another supported destination
  5. limit fields or add filter / sample_rate so you only export what you need
  6. read the delivered files from storage or ingest them into your log platform

Example 1: enable it on a Worker

name = "orders-api"
main = "src/index.js"
compatibility_date = "2026-03-22"
logpush = true

Cloudflare documents that Workers with logpush = true are automatically picked up by the Logpush job.

Example 2: send Workers trace events to R2

curl "https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/logpush/jobs" \
  --header 'X-Auth-Key: <API_KEY>' \
  --header 'X-Auth-Email: <EMAIL>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "workers-logpush-r2",
    "output_options": {
      "field_names": [
        "Event",
        "EventTimestampMs",
        "Outcome",
        "Exceptions",
        "Logs",
        "ScriptName"
      ],
      "timestamp_format": "rfc3339"
    },
    "destination_conf": "r2://<BUCKET_PATH>/{DATE}?account-id=<ACCOUNT_ID>&access-key-id=<R2_ACCESS_KEY_ID>&secret-access-key=<R2_SECRET_ACCESS_KEY>",
    "dataset": "workers_trace_events",
    "enabled": true
  }' | jq .

Use {DATE} in the destination path if you want daily folders. That makes it much easier to retrieve logs for a specific day or deployment window.

Example 3: export only exception events

curl "https://api.cloudflare.com/client/v4/accounts/<ACCOUNT_ID>/logpush/jobs" \
  --header 'X-Auth-Key: <API_KEY>' \
  --header 'X-Auth-Email: <EMAIL>' \
  --header 'Content-Type: application/json' \
  --data '{
    "name": "workers-logpush-exceptions-only",
    "output_options": {
      "field_names": [
        "EventTimestampMs",
        "Outcome",
        "Exceptions",
        "ScriptName"
      ],
      "timestamp_format": "rfc3339"
    },
    "filter": "{\"where\":{\"key\":\"Outcome\",\"operator\":\"eq\",\"value\":\"exception\"}}",
    "destination_conf": "r2://<BUCKET_PATH>/exceptions/{DATE}?account-id=<ACCOUNT_ID>&access-key-id=<R2_ACCESS_KEY_ID>&secret-access-key=<R2_SECRET_ACCESS_KEY>",
    "dataset": "workers_trace_events",
    "enabled": true
  }' | jq .

This pattern is useful when you only care about failed requests, uncaught exceptions, or post-deploy regression checks.

Example 4: inspect one day of delivered logs

Once the files are in storage, you can query them by date folder and then filter with jq:

gzip -dc ./2026-03-22/*.gz | jq 'select(.Outcome == "exception") | {ts: .EventTimestampMs, script: .ScriptName, exceptions: .Exceptions}'

This is the Logpush pattern for questions like:

Important operational note

Cloudflare’s Logpush Health Dashboards docs explicitly warn that Logpush cannot backfill dropped data. If logs are dropped because a job is disabled or failing, that data is gone.
Source:

That means Logpush health alerting is not optional for important environments.


3.10 Cloudflare Logs / Logpush (beyond Workers)

Cloudflare Logs is the broader log export family across Cloudflare products.

Cloudflare documents that Logpush delivers logs in batches as quickly as possible and supports many datasets. Dataset availability depends on plan and product. Zone-scoped http_requests is available in both Logpush and legacy Logpull, while most other datasets are Logpush-only.

Source docs:

Examples of useful datasets

Depending on your plan/products, common datasets include:

Use Logpush when you need

Strong recommendation

If your services matter in production, send the important datasets to long-term storage or a SIEM.


3.11 Logpush health dashboards and notifications

Cloudflare provides health dashboards for Logpush jobs.

Cloudflare documents that the dashboard helps you monitor delivery status, diagnose issues, and understand data volume being sent to destinations. It also warns that once Logpush data is dropped, it is permanently lost because Logpush cannot backfill it.

Source docs:

Operational advice

For every important Logpush job:


3.12 GraphQL Analytics API

Cloudflare’s GraphQL Analytics API is the main programmatic analytics interface for many Cloudflare datasets.

Cloudflare states that the GraphQL Analytics API provides data about HTTP requests passing through Cloudflare and data from specific products like Firewall and Load Balancing, and lets you select datasets, filter, aggregate, and integrate results with other applications.

Source docs:

Use GraphQL Analytics API when

Example: Workers metrics via GraphQL

Cloudflare documents querying Workers metrics over a specified period, with metrics like requests, errors, subrequests, and quantiles such as CPU time.

Conceptually, use GraphQL to ask:

Endpoint

Cloudflare documents this GraphQL endpoint:

https://api.cloudflare.com/client/v4/graphql

Example request shape

curl https://api.cloudflare.com/client/v4/graphql   -H "Authorization: Bearer <API_TOKEN>"   -H "Content-Type: application/json"   --data '{
    "query": "query($accountTag: string, $scriptName: string, $datetimeStart: string, $datetimeEnd: string) { viewer { accounts(filter: {accountTag: $accountTag}) { workersInvocationsAdaptive(limit: 100, filter: {scriptName: $scriptName, datetime_geq: $datetimeStart, datetime_leq: $datetimeEnd}) { sum { requests errors subrequests } quantiles { cpuTimeP50 cpuTimeP99 } dimensions { datetime scriptName status } } } } }",
    "variables": {
      "accountTag": "<ACCOUNT_ID>",
      "scriptName": "<WORKER_NAME>",
      "datetimeStart": "2026-03-21T00:00:00Z",
      "datetimeEnd": "2026-03-22T00:00:00Z"
    }
  }'

When GraphQL is the right tool

Use GraphQL when dashboard screenshots are not enough and you need data to drive:


3.13 Notifications

Cloudflare Notifications is the native alerting system for many Cloudflare product events.

Cloudflare documents:

Source docs:

Why notifications matter

Dashboards are passive. Alerts are active.

Use Notifications for:

Example webhook receiver idea

Create a small service or Worker that accepts Cloudflare webhook JSON and:


3.14 Audit Logs

Audit Logs are for change tracking, not request telemetry.

Cloudflare documents that Audit Logs v2 is account-based and captures:

Source docs:

When Audit Logs help

Use Audit Logs to answer:

Example API endpoint from docs

https://api.cloudflare.com/client/v4/accounts/{account_id}/logs/audit

Why every production team should care

A lot of “outages” are actually config changes.
Audit Logs are essential for:


3.15 Health Checks and Health Checks Analytics

Health Checks monitor origin availability from Cloudflare’s edge.

Cloudflare documents that standalone Health Checks monitor an IP or hostname, notify you in near real-time when there is a problem, and support configuration options like response codes, protocol types, intervals, and request path targeting. Health Checks Analytics can show uptime, latency, failure reason, and detailed event logs.
Source docs:

Use Health Checks when

your app is behind Cloudflare but the real failure point may be:

Health Checks are especially useful for

Good configuration pattern

What analytics can tell you

Cloudflare documents:

Important distinction

Health Checks are not the same thing as general app logs or traces. They answer: “Is the origin reachable and responding correctly?”


3.16 Cloudflare Tunnel / cloudflared metrics

If your service is exposed through Cloudflare Tunnel, you also need connector-side metrics.

Cloudflare documents that when cloudflared runs, it starts an HTTP metrics endpoint that exposes metrics in Prometheus format. In non-container environments the default address is typically 127.0.0.1:<PORT>/metrics, with ports commonly chosen from 20241 to 20245 unless unavailable.

Source docs:

Example

cloudflared tunnel --metrics 127.0.0.1:60123 run my-tunnel

Then scrape:

http://127.0.0.1:60123/metrics

When to monitor tunnel metrics

Use these metrics when:


3.17 Zero Trust logs

If you use Cloudflare One / Zero Trust features, Cloudflare documents that Zero Trust logs can be exported with Logpush to third-party storage or SIEMs, and notes that this is Enterprise-only. Cloudflare also notes a dashboard limitation: R2 is not supported as a destination for Zero Trust logs in the dashboard, but can be configured via API.

Source docs:

Use cases


3.18 Web Analytics

Cloudflare Web Analytics is more website analytics than operational observability, but it is still useful for frontend visibility.

Cloudflare documents that Web Analytics is privacy-first and helps you understand the performance of web pages as experienced by visitors, without requiring Cloudflare proxying for all sites.

Source docs:

Use it for

Do not confuse it with


3.19 Workers Analytics Engine

Workers Analytics Engine (WAE) is one of the best tools for “observability with my programs” when you mean custom application telemetry.

Cloudflare documents that WAE provides unlimited-cardinality analytics at scale, with a built-in API to write data points from Workers and a SQL API to query them. Cloudflare specifically calls out use cases like exposing analytics to your own customers, building usage-based billing, and understanding service health on a per-customer basis.
Source docs:

This is different from logs

Use logs for detailed event-by-event debugging.
Use Analytics Engine for fast aggregate queries like:

Configure a dataset binding

{
  "analytics_engine_datasets": [
    {
      "binding": "ANALYTICS",
      "dataset": "service_metrics"
    }
  ]
}

or:

[[analytics_engine_datasets]]
binding = "ANALYTICS"
dataset = "service_metrics"

Example Worker instrumentation

export default {
  async fetch(request, env) {
    const url = new URL(request.url);
    const started = Date.now();

    let status = 500;

    try {
      const response = await fetch(request);
      status = response.status;

      return response;
    } finally {
      const duration = Date.now() - started;

      env.ANALYTICS.writeDataPoint({
        blobs: [
          url.pathname,
          String(status),
          request.headers.get("cf-connecting-country") ?? "unknown"
        ],
        doubles: [duration, 1],
        indexes: [url.hostname]
      });
    }
  }
};

Query it with SQL API

curl "https://api.cloudflare.com/client/v4/accounts/{account_id}/analytics_engine/sql"   --header "Authorization: Bearer <API_TOKEN>"   --data "SELECT blob1 AS path, blob2 AS status, AVG(double1) AS avg_duration_ms, SUM(double2) AS requests FROM service_metrics GROUP BY path, status ORDER BY requests DESC LIMIT 50"

Why this matters for your programs

WAE is ideal when you want:


4. How to use Cloudflare Observability for your services

This section is the practical answer to:
“How should I use this for my services?”

4.1 If your program runs on Cloudflare Workers

Recommended baseline:

  1. Enable Workers Logs
  2. Turn on source maps
  3. Use real-time logs / wrangler tail during development and incident response
  4. Enable traces
  5. Export to OTel if you already use an external platform
  6. Add Workers Analytics Engine for custom business/application metrics
{
  "name": "my-worker",
  "main": "src/index.ts",
  "compatibility_date": "2026-03-22",
  "upload_source_maps": true,
  "observability": {
    "enabled": true,
    "head_sampling_rate": 0.2,
    "traces": {
      "enabled": true,
      "head_sampling_rate": 0.05
    }
  }
}

What to log

In your Worker, log:

Do not log secrets, auth tokens, raw cookies, or sensitive personal data.

What to measure with Analytics Engine

Create custom metrics for:


4.2 If your program is an API or web service behind Cloudflare, but runs on your own servers

Recommended stack:

Why this combination works

Cloudflare sees the edge side:

Your origin sees:

You need both.

Example questions this setup answers


4.3 If you use Cloudflare Tunnel (cloudflared)

Recommended stack:

Why this matters

Tunnel issues can look like application issues.
Scraping cloudflared metrics helps you distinguish:

Best practice

Put tunnel metrics into the same Grafana/Prometheus environment as:

That gives you one place to diagnose edge-to-origin path health.


4.4 If you already use Grafana / Datadog / Honeycomb / Sentry / Splunk

Use Cloudflare as a telemetry source, not a silo.

Preferred approach

Suggested mapping

Why this is the strongest setup

You get:


4.5 If you need customer-facing analytics or usage-based billing

Use Workers Analytics Engine.

Why not plain logs?

Because billing and per-customer usage questions are aggregate queries, not raw log inspection.

Good WAE dimensions

Use blobs for:

Use doubles for:

Example data model

Then query:


4.6 If you just need a good baseline and do not want to overbuild

Use this starter package:

For Workers

For origin services

For governance

For later growth

This gives you good practical coverage without building a huge telemetry platform immediately.


Pattern summary:

flowchart LR
    A[Service type] --> B[Small Worker API]
    A --> C[Critical Worker service]
    A --> D[Origin-backed API]
    A --> E[Tunnel-exposed internal app]

    B --> B1[Workers Logs]
    B --> B2[wrangler tail]
    B --> B3[Source maps]
    B --> B4[Low-sample traces]

    C --> C1[Workers Logs]
    C --> C2[Traces]
    C --> C3[OTel export]
    C --> C4[Analytics Engine]
    C --> C5[Notifications]

    D --> D1[Health Checks]
    D --> D2[Cloudflare Logpush]
    D --> D3[Origin APM and logs]
    D --> D4[Audit Logs]

    E --> E1[cloudflared metrics]
    E --> E2[Health Checks]
    E --> E3[App logs]
    E --> E4[Grafana dashboard]
    E --> E5[Notifications]

Pattern A — Small Worker API

Use:

Best for:

Pattern B — Production SaaS on Workers

Use:

Best for:

Pattern C — Traditional app behind Cloudflare

Use:

Best for:

Pattern D — Tunnel-based private service exposure

Use:

Best for:


6. A practical rollout plan

Rollout sequence:

flowchart LR
    A[Phase 1<br/>Visibility basics] --> B[Phase 2<br/>Better incident response]
    B --> C[Phase 3<br/>Centralization]
    C --> D[Phase 4<br/>Custom service analytics]

    A --> A1[Enable Workers Logs]
    A --> A2[Enable source maps]
    A --> A3[Use native dashboards]

    B --> B1[Use wrangler tail during incidents]
    B --> B2[Enable traces]
    B --> B3[Add Notifications]

    C --> C1[Export via OTel]
    C --> C2[Set up Workers Logpush or Cloudflare Logpush]
    C --> C3[Monitor Logpush health]

    D --> D1[Add Analytics Engine metrics]
    D --> D2[Build tenant or billing dashboards]
    D --> D3[Refine filters and sampling]

Phase 1 — Visibility basics

Phase 2 — Better incident response

Phase 3 — Centralization

Phase 4 — Custom service analytics


7. Suggested KPIs and dashboards

For Worker-based APIs

Track:

For origin-backed apps

Track:

For Tunnel deployments

Track:

For governance/security

Track:


8. Best practices

1. Use native dashboards first, exports second

The dashboard is usually the fastest way to answer “what just broke?”
Then use exports/APIs for automation and longer retention.

2. Separate logs, traces, and metrics mentally

3. Turn on source maps for all serious Workers

This is one of the highest-value, lowest-effort improvements.

4. Use structured logs

JSON logs are much easier to search, filter, export, and route.

5. Do not rely on one signal

A healthy dashboard does not guarantee a healthy origin.
Use Cloudflare analytics + app logs + health checks together.

6. Alert on pipeline failure, not just app failure

If Logpush fails, you may silently lose the very data you need during incidents.

7. Keep sensitive data out of logs

Avoid tokens, cookies, raw auth headers, full PII, or customer secrets.

8. Sample intentionally

100% sampling is useful during early development but may be noisy or expensive later.

9. Add custom metrics for your actual business questions

Request counts and 5xx rates are not enough for:

10. Correlate config changes with incidents

Audit Logs often explain mysterious behavior changes.


9. Common mistakes


10. Troubleshooting guide

Symptom: users see errors, but Worker metrics look normal

Check:

Symptom: Worker errors are hard to debug

Check:

Symptom: incident happened but logs are missing

Check:

Symptom: service behind Tunnel is unstable

Check:

Symptom: config changed unexpectedly

Check:


A. Small Worker service

B. Critical Worker service

C. Origin-backed API

D. Tunnel-exposed internal app


If I were setting up Cloudflare observability for a modern production service today, I would do this:

For Workers apps

For origin services behind Cloudflare

For tunnel services

For governance


13. References

Official Cloudflare docs referenced in this guide: