Monitoring

Centralized observability with Prometheus metrics, Grafana dashboards, and configurable alerting — scoped to your environment.

Table of contents

Metrics Architecture
Grafana Dashboards
1. Canton Network Dashboards
2. Infrastructure Dashboards
Metrics Collection
1. Canton Node Metrics
2. Kubernetes Metrics
Alerting
1. Notification Channels
2. Alert Examples
Client Access
1. What You See
2. Access Control
Retention and Storage

Metrics Architecture

Metrics flow from your Canton nodes through a centralized collection and visualization pipeline:

graph LR
    subgraph env["Environment Cluster"]
        Validator[Validator<br/>:10013] -->|scrape| Agent[Prometheus<br/>Agent]
        Participant[Participant<br/>:10013] -->|scrape| Agent
        Kubelet[Kubelet /<br/>cAdvisor] -->|scrape| Agent
    end

    Agent -->|remote write| Central[Central<br/>Prometheus]
    Central --> Grafana[Grafana<br/>Dashboards]
    Central --> Alertmanager[Alertmanager]
    Alertmanager -->|notify| Channels[Email / Slack /<br/>PagerDuty]

Prometheus Agent runs on each environment cluster, scraping metrics every 30 seconds
Metrics are shipped via remote write to the central Prometheus instance on the shared cluster
Grafana provides visualization dashboards backed by the central metrics store
Alertmanager evaluates alert rules and routes notifications to configured channels

Grafana Dashboards

Pre-configured dashboards provide visibility into your Canton nodes and underlying infrastructure:

Canton Network Dashboards

Dashboard	What It Shows
Health	Overall node health status, uptime, and connectivity indicators
Participant Metrics	Ledger API throughput, latency, active contracts, transaction rates
Sequencer Client	Sequencer connection status, message processing, and lag
Synchronizer Fees (Validator)	Fee collection, reward distribution, and token balances
Validator Licenses	License status and validator network participation

Infrastructure Dashboards

Dashboard	What It Shows
Kubernetes Cluster Overview	Cluster-wide resource utilization, node status, and pod counts
Kubernetes Node Resources	Per-node CPU, memory, disk, and network utilization
Kubernetes Pod Resources	Per-pod resource consumption and limits
Kubernetes PVC	Persistent volume capacity, usage, and growth trends
GCP Infrastructure	Cloud-level metrics from Google Cloud Platform

Metrics Collection

Canton Node Metrics

Both Validator and Participant nodes expose Prometheus metrics on port 10013:

Transaction metrics — submission rates, confirmation times, rejection counts
Ledger metrics — active contract count, ledger end offset, pruning status
Connection metrics — sequencer connectivity, domain connections
JVM metrics — heap usage, garbage collection, thread counts
gRPC metrics — API call rates, latencies, error rates

Kubernetes Metrics

Standard Kubernetes metrics are collected automatically:

Container resources — CPU, memory, and network per container
Pod lifecycle — restarts, scheduling latency, readiness
Storage — persistent volume utilization and I/O metrics
Cluster health — node conditions, API server responsiveness

Alerting

Alertmanager processes alert rules and delivers notifications through your preferred channels:

Notification Channels

Email — direct notifications to your operations team
Slack — channel-based alerts for team visibility
PagerDuty — incident management integration for critical alerts
Webhooks — custom integrations with your existing systems

Alert Examples

Alert	Severity	Condition
Node unhealthy	Critical	Validator or participant pod not ready for > 5 minutes
High resource usage	Warning	CPU or memory exceeding 80% of limits
Database storage low	Warning	PostgreSQL PVC usage above 85%
Certificate expiring	Warning	TLS certificate expires within 14 days
Backup failure	Critical	Scheduled backup job did not complete

Client Access

What You See

Each client has access to metrics scoped to their own namespace:

Grafana dashboards filtered to your environment and namespace
Prometheus metrics accessible via authenticated queries
No visibility into other clients’ data or platform internals

Access Control

Dashboard access is authenticated through Keycloak OIDC
Metrics endpoints are protected by Istio authorization policies with IP whitelisting
Only authorized IP addresses can reach the metrics scrape endpoints

Retention and Storage

Tier	Retention	Purpose
Prometheus Agent (per-cluster)	30 minutes	Local buffer before remote write
Central Prometheus (shared cluster)	Configurable	Long-term metrics storage and query
Grafana	N/A	Visualization layer, queries Prometheus on demand