State Changes at the Boundary: Lessons From Security Research on Babylon

Written by Qiuhao Li | April 1, 2026

This article presents our security research on Babylon, the largest BTC staking DeFi protocol. We explain how Babylon works, a recurring class of issues we encountered, how we found them, and our thoughts for future work.

Disclaimer: This research was conducted independently in 2025 as an internal initiative and should not be interpreted as a formal security audit. The findings were reported to Babylon Labs, where they were confirmed and passed triage. The goal of this time-boxed research is to share additional, complementary knowledge with the community.

Background

Babylon is a native Bitcoin staking protocol, designed to extend Bitcoin’s security to other networks via a shared-security architecture. In this model, BTC can be staked to help secure other blockchains while maintaining recoverability and supporting relatively fast unstaking, with configurable slashing for provable protocol violations.

At the time of writing, Babylon’s Total Value Locked (TVL) is approximately $3B, according to DefiLlama, making it the largest BTC staking protocol.

Babylon has made a sustained investment in security, including multiple third-party audits and public security contests. You can find the audit and review timeline in the Babylon Audit Reports.

How does Babylon work?

Babylon is a layered system coordinating Bitcoin-side staking transactions with a Cosmos SDK chain (Babylon Genesis), which acts as the control plane for stake activation, voting power, finality, checkpointing, and rewards. The image below provides an overview of the architecture:

Bitcoin-side staking

BTC staking uses Taproot outputs that commit to a script tree with paths for timelock staking, on-demand unstaking, and slashing. Transaction validity is governed by Babylon Genesis parameters and Bitcoin confirmation depth. Compliance hinges on covenant committee (offchain) threshold signatures for certain spends and on pre-signed slashing transactions that become executable if a finality provider misbehaves.

Babylon Genesis chain

Babylon Genesis is a Cosmos chain that implements protocol logic as custom Cosmos SDK modules. Some of the key modules are:

Epoching (x/epoching): partitions the chain into epochs and delays validator-set changes to epoch boundaries.
Checkpointing (x/checkpointing): collects validator signatures and aggregates them into a multi-signature used for Bitcoin checkpoints.
BTC staking (x/btcstaking): verifies/activates BTC delegations, tracks staking parameters, and maintains the active finality provider set and stake-derived voting power.
Finality (x/finality): verifies finality provider votes and finalizes blocks based on stake-derived voting power.

Offchain components

Babylon also depends on offchain components to move data between Bitcoin and Babylon and to operate the staking workflow. The Vigilante suite watches Bitcoin and Babylon and relays key staking/checkpoint information so the system can keep its view of staking events up to date. Other supporting components include staking daemons/services, finality provider software, key management, and the covenant emulator.

A typical staking workflow

Pick a finality provider and prepare a Babylon account that will register the stake and receive rewards.
Fetch the current BTC staking parameters from Babylon Genesis, like staking time/value bounds and confirmation depth requirements. These parameters define what is considered a valid stake.
Construct the required Bitcoin transactions and data according to the staking script format.
Submit a Babylon Genesis registration transaction, without a Bitcoin inclusion proof at this point. The delegation is tracked on Babylon as pending while signatures are collected.
The covenant side submits its verification signatures on Babylon. After enough signatures are recorded, the delegation becomes verified.
Broadcast the Bitcoin staking transaction and wait for it to be sufficiently confirmed.
Offchain watcher submits the Bitcoin inclusion proof back to Babylon Genesis. The delegation becomes active and begins contributing voting power/earning rewards.

Research Plan

We structured the research as a layered manual review focused on untrusted data flows:

Layers: execution layer (custom modules), partial-consensus layer (ABCI++), and offchain components.
Untrusted inputs and boundaries: user-supplied staking data and transactions, validator-supplied votes/proposals, and cross-chain messages.

We built context by:

Reading the codebase with a focus on the key modules.
Reading prior audit reports to understand previously known weaknesses.
Reading GitHub history: PRs starting with “fix:”, issues labeled “security”, and security advisories to understand their root causes and how they were patched.
For known bug patterns, we checked whether they reappear elsewhere or could be adapted into new edge cases.

We occasionally used AI assistance for orientation and pattern search, but most issues discussed here were manually validated with PoC. Since this was time-limited research, we did not use fuzzing or other large-scale testing.

Findings

Improper Delegation Status Handling Allows “Expired → Active” Event Ordering, Enabling Persistent Voting Power

Babylon’s x/btcstaking and x/finality modules coordinate voting power via “state update events” (e.g., ACTIVE when a delegation becomes valid, and EXPIRED when it stops contributing power). The issue arises from BTCDelegation.GetStatus returning PENDING solely because the covenant quorum is missing, even when the Bitcoin height already implies the delegation is about to expire.

This allows covenant signature submission/validation to proceed and eventually append an ACTIVE event after an earlier EXPIRED event for the same stake. Since the finality power update flow won’t see another later expiry event, a finality provider can retain voting power even after the BTC stake is expired/withdrawn, weakening the intended finality and consensus safety assumptions. The image below shows the process:

Finality Providers Can Bypass Jailing by Toggling Active Status at Sliding-Window Boundaries

Babylon’s finality module requires active finality providers (FP) to vote in a “finality round” so blocks can be finalized; the liveness rule is meant to jail FP that miss too many votes in a sliding window (~28 hours). In HandleFinalityProviderLiveness, the jailing condition is evaluated relative to StartHeight, and StartHeight is refreshed when an FP becomes active. Thus, by briefly leaving to be inactive and then re-entering the active set near the boundary, an FP can repeatedly reset the window and avoid being jailed despite chronic non-participation, weakening the intended liveness enforcement.

Please refer to the PoC for more details.

Co-Staking Misaccounting When Delegating/Undelegating a Slashed Validator in the Middle of the Epoch

Babylon’s co-staking logic tracks a per-delegator aggregate (ActiveBaby) that affects rewards, and it uses staking hooks to keep this tracker updated as delegations change. Additionally, these hooks and the removal of slashed validators are triggered at the end of the epoch. When a validator is marked slashed, AfterDelegationModified hook stops updating ActiveBaby and records only delta-shares for later usage; however, the BeforeDelegationRemoved hook still subtracts a token amount derived from total shares using pre-slash ratios.

This inconsistency can subtract stake that was never credited to ActiveBaby in the epoch, thus lowering rewards and blocking tracker validation (via CostakerRewardsTracker.Validate), causing undelegation-related transactions to fail and freezing funds.

Please refer to the PoC for more details.

Unchecked Type Assertion During Block Proposal Validation Can Trigger Panic in Validators

At the start of each epoch, Babylon expects a special “injected checkpoint transaction” to appear as the first transaction in the block proposal, and validators enforce this during proposal validation. In the epoch-boundary ProcessProposal path, the decoder in ExtractInjectedCheckpoint performs a blind Go type assertion, trying to cast the first message into the injected transaction; so an adversarial proposer can place a valid decodable one-message transaction of the wrong type in txs[0], triggering a runtime panic during validation around epoch-boundary blocks.

Please refer to the PoC and fix for more details.

Why These Issues Are Easy to Overlook

Looking at these findings, two recurring patterns emerged:

Timing boundaries: issues surface at specific time boundaries (first-of-epoch blocks, liveness sliding-window thresholds, stake expiration cutoffs).
Concentrated state changes: issues require the system to observe state transitions in a time window with a specific order.

For example, the co-staking misaccounting issue only surfaces at the epoch boundary, and it depends on a specific operation sequence (delegate/undelegate actions to the slashed validator) in the middle of the epoch. Each individual code path looks reasonable in isolation, but the invariant breaks when they execute together at the boundary.

Boundary conditions and tightly-coupled state transitions are easy to overlook during implementation and review. They are often best surfaced by explicitly enumerating the state machine and testing adversarial sequences around those boundaries.

How We Found Them

In addition to manual review, one approach we found useful is to draw on prior bugs and fixes, then search for similar patterns or mutate them into new adversarial scenarios.

Example 1: From a Fix Pull Request to a New Bypass

For the FP jailing bypass issue mentioned above, we started by reading “fix: Panic can be triggered in handling liveness” (PR #584), which fixed a liveness issue triggered by an FP becoming active → non-active → active again.

We then reviewed the finality liveness code for other active-status toggling edge cases, especially around the liveness window boundary, which led to the FP jailing bypass issue.

Example 2: From a Security Advisory to a Similar Panic

For the unchecked type assertion issue mentioned above, we first read about the Nil BlockHash in BLS vote extensions triggers panics GitHub advisory, then used AI to help verify other possible attack methods, and searched for similar chain-halting failures. This led us to the unchecked type assertion panic in the injected-checkpoint decoding path. You can read the conversation with the AI agent here for more details.

Example 3: From an Audit Finding to Related New Bugs

Besides the on-chain issues mentioned above, we also found offchain issues while reading prior audit reports. One report documented a pagination bug (“2. Retrieving staking transactions using incorrect page increments results in skipped transactions”) in Vigilante’s fetchStakingTxsByEvent function, where the page index was incremented by batchSize instead of 1, causing pages to be wrongly skipped.

After seeing this, we checked the same function and related pagination logic, and found two new issues that led to missed staking delegations DoS:

RPC server silently clamps per_page to 100 (so requesting 500 returns only 100), and the fetchStakingTxsByEvent termination logic treats “returned < requested” as “done”, causing early exit.
When extracting hashes in StakingTxHashesByEvent, a single transaction can emit multiple staking events, so “number of hashes” can diverge from “number of returned txs”; this can lead to incorrect page advancement and trigger page validation errors.

See the PoC and fix for more details.

Conclusion & Future Work

This article presents some security findings that share a recurring pattern: state transitions that break invariants at timing boundaries. For future audits of similar systems, it's important to explicitly model the relevant state machines (list states and transitions), and validate boundary behavior under adversarial ordering.

Our findings cover the Babylon Genesis chain and offchain components, but not the Bitcoin-side staking scripts, which are critical for the staking, unstaking, and slashing mechanism. This area might present opportunities to uncover additional issues.

Finally, out-of-the-box AI agents can be helpful for navigation and hypothesis verification, but sometimes they struggle with multi-step, cross-module attack workflows, and can miss key checks in dependencies or parameters in live systems. We may share some research on how to improve AI-assisted bug hunting in the future.

Acknowledgments

We would like to thank Babylon Labs for their professionalism throughout the coordinated disclosure process. They triaged the reports quickly, communicated clearly, and shipped fixes promptly. It’s encouraging to see a project of this scale take security seriously, not just through audits, but in how they engage with external researchers.

During the write-up, Jonas Merhej, Loukas Papachristoforou, Jainil Vora, and Ionut-Viorel Gingu reviewed the draft and provided insightful feedback.

FAQs

What is Babylon, and why did OpenZeppelin research it?

Babylon is the largest Bitcoin staking protocol in terms of Total Value Locked (TVL). OpenZeppelin conducted this research independently as an internal initiative to contribute complementary security knowledge to the ecosystem.

What kinds of vulnerabilities did OpenZeppelin’s research uncover?

The research identified four issues across Babylon’s on-chain components: a delegation status handling flaw that could allow expired stakes to retain persistent voting power; a jailing bypass that lets finality providers evade liveness enforcement; a co-staking accounting inconsistency that could freeze funds at epoch boundaries; and an unchecked type assertion capable of triggering validator panics during block proposal validation. The research also identified an offchain issue in the RPC request logic and explained it briefly.

What is the recurring pattern behind these findings?

All four on-chain issues share a common root cause: state transitions that break system invariants at timing boundaries. These bugs often surface when multiple code paths execute together in a specific order, particularly around epoch boundaries, sliding-window thresholds, or expiration cutoffs.

How did OpenZeppelin’s researchers find these vulnerabilities?

The team used a layered manual review approach focused on untrusted data flows, complemented by a technique of adapting known bug patterns to identify related new cases. AI-assisted tooling helped accelerate code navigation and hypothesis generation/validation.

What are the limitations of this research, and what comes next?

This was a time-boxed research and did not include fuzzing or large-scale automated testing. It also did not cover Bitcoin-side staking scripts, which are considered a high-priority area for future research. Additionally, out-of-the-box AI agents could struggle with complex workflows and miss external checks. We may share how to improve AI-assisted bug hunting in the future.

View full post