Case Study: DeFi Infrastructure Hardening & Observability
Overview
A DeFi platform running on Ethereum and sidechains was experiencing intermittent outages, stuck transactions, and unreliable indexers. RPC providers, gas spikes, and fragile node setups made it hard to trust on-chain data.
Webomage worked with the team to harden their infrastructure and observability, so they could operate their protocol with more confidence.
Challenges
- Unreliable RPC and node providers – intermittent errors and rate limits under load.
- Nonce and gas issues – stuck, duplicated, or excessively expensive transactions.
- Fragile indexing stack – shared IPFS and lightly configured indexers made data incomplete or late.
- Limited observability – hard to see where failures occurred in on-chain/off-chain flows.
- High expectations from users and partners – outages directly affected TVL, trading, and user trust.
Approach
- End-to-end flow mapping
- Mapped how transactions flowed from the app through signers, nodes, and smart contracts, into indexers and analytics.
- Identified failure points around RPC, indexing, and third-party dependencies.
- Node & provider strategy
- Introduced a mix of dedicated nodes and high-quality providers for redundancy.
- Tuned connection settings, batching, and retry behaviour to handle spikes.
- Clarified which flows required low latency vs. eventual consistency.
- Indexing & data layer improvements
- Moved from shared/community IPFS to dedicated storage where appropriate.
- Improved The Graph subgraphs and related indexers for critical contracts.
- Ensured key metrics and balances were derived from trustworthy data paths.
- Observability & alerts
- Added monitoring for node health, RPC error rates, queue backlogs, and key contract events.
- Defined alerts around conditions that mattered to users (e.g. withdrawal/claim failures).
- Gave dashboards for protocol operators and support teams.
- Operational playbooks
- Created runbooks for common incident types (RPC provider issues, indexer lag, chain events).
- Documented safe procedures for rolling node upgrades and configuration changes.
Outcomes
- Higher reliability – fewer protocol incidents tied to infra or indexing problems.
- Faster incident response – clearer signals and runbooks when things did go wrong.
- Better partner and user confidence – infra issues no longer dominated the risk story.
- Stronger foundation – the team could keep iterating on product and protocol design without fear of infra collapse.
Relevant capabilities
- Ethereum and sidechain infrastructure operations.
- DeFi application backends and indexers (The Graph and related tooling).
- Monitoring, logging, and alerting for blockchain workloads.
- Secure integrations with price feeds and analytics APIs.
➡️ If your DeFi or Web3 product is limited by infrastructure issues, start a conversation.