Monitoring
Centralized observability with Prometheus metrics, Grafana dashboards, and configurable alerting — scoped to your environment.
Table of contents
Metrics Architecture
Metrics flow from your Canton nodes through a centralized collection and visualization pipeline:
graph LR
subgraph env["Environment Cluster"]
Validator[Validator<br/>:10013] -->|scrape| Agent[Prometheus<br/>Agent]
Participant[Participant<br/>:10013] -->|scrape| Agent
Kubelet[Kubelet /<br/>cAdvisor] -->|scrape| Agent
end
Agent -->|remote write| Central[Central<br/>Prometheus]
Central --> Grafana[Grafana<br/>Dashboards]
Central --> Alertmanager[Alertmanager]
Alertmanager -->|notify| Channels[Email / Slack /<br/>PagerDuty]
- Prometheus Agent runs on each environment cluster, scraping metrics every 30 seconds
- Metrics are shipped via remote write to the central Prometheus instance on the shared cluster
- Grafana provides visualization dashboards backed by the central metrics store
- Alertmanager evaluates alert rules and routes notifications to configured channels
Grafana Dashboards
Pre-configured dashboards provide visibility into your Canton nodes and underlying infrastructure:
Canton Network Dashboards
| Dashboard | What It Shows |
|---|---|
| Health | Overall node health status, uptime, and connectivity indicators |
| Participant Metrics | Ledger API throughput, latency, active contracts, transaction rates |
| Sequencer Client | Sequencer connection status, message processing, and lag |
| Synchronizer Fees (Validator) | Fee collection, reward distribution, and token balances |
| Validator Licenses | License status and validator network participation |
Infrastructure Dashboards
| Dashboard | What It Shows |
|---|---|
| Kubernetes Cluster Overview | Cluster-wide resource utilization, node status, and pod counts |
| Kubernetes Node Resources | Per-node CPU, memory, disk, and network utilization |
| Kubernetes Pod Resources | Per-pod resource consumption and limits |
| Kubernetes PVC | Persistent volume capacity, usage, and growth trends |
| GCP Infrastructure | Cloud-level metrics from Google Cloud Platform |
Metrics Collection
Canton Node Metrics
Both Validator and Participant nodes expose Prometheus metrics on port 10013:
- Transaction metrics — submission rates, confirmation times, rejection counts
- Ledger metrics — active contract count, ledger end offset, pruning status
- Connection metrics — sequencer connectivity, domain connections
- JVM metrics — heap usage, garbage collection, thread counts
- gRPC metrics — API call rates, latencies, error rates
Kubernetes Metrics
Standard Kubernetes metrics are collected automatically:
- Container resources — CPU, memory, and network per container
- Pod lifecycle — restarts, scheduling latency, readiness
- Storage — persistent volume utilization and I/O metrics
- Cluster health — node conditions, API server responsiveness
Alerting
Alertmanager processes alert rules and delivers notifications through your preferred channels:
Notification Channels
- Email — direct notifications to your operations team
- Slack — channel-based alerts for team visibility
- PagerDuty — incident management integration for critical alerts
- Webhooks — custom integrations with your existing systems
Alert Examples
| Alert | Severity | Condition |
|---|---|---|
| Node unhealthy | Critical | Validator or participant pod not ready for > 5 minutes |
| High resource usage | Warning | CPU or memory exceeding 80% of limits |
| Database storage low | Warning | PostgreSQL PVC usage above 85% |
| Certificate expiring | Warning | TLS certificate expires within 14 days |
| Backup failure | Critical | Scheduled backup job did not complete |
Client Access
What You See
Each client has access to metrics scoped to their own namespace:
- Grafana dashboards filtered to your environment and namespace
- Prometheus metrics accessible via authenticated queries
- No visibility into other clients’ data or platform internals
Access Control
- Dashboard access is authenticated through Keycloak OIDC
- Metrics endpoints are protected by Istio authorization policies with IP whitelisting
- Only authorized IP addresses can reach the metrics scrape endpoints
Retention and Storage
| Tier | Retention | Purpose |
|---|---|---|
| Prometheus Agent (per-cluster) | 30 minutes | Local buffer before remote write |
| Central Prometheus (shared cluster) | Configurable | Long-term metrics storage and query |
| Grafana | N/A | Visualization layer, queries Prometheus on demand |