Post Mortem: Lido on Ethereum Launchnodes Slashing Incident
An update as of November 28th, 2023: as of Nov 16, 2023, the 20 validators in question are now withdrawable and have thus stopped accumulating penalties. View full update here: https://research.lido.fi/t/slashing-incident-involving-launchnodes-validators-oct-11-2023/5631/4.
Incident Summary and Root Cause
At 15:55 UTC on October 11 2023, Lido DAO contributors alerted the Launchnodes Node Operator of a slashing event taking place which ultimately affected 20 of the validators that they operate as users of the Lido protocol. A full list of the validators impacted is provided in APPENDIX B below.
Within 10 minutes, the affected clusters were brought offline to mitigate potential further risk, and the Launchnodes team began to investigate the root cause. The root cause of the slashing boiled down to executing non-optimal fallback procedures during datacenter connectivity issues. In an attempt to restore validator connectivity, multiple validator client instances (an initial instance and a manually activated fallback instance) were pointed to a single Web3signer instance without slashing protection enabled at the Web3signer level and without blocking the initial instance from the signer (e.g. via firewall rules); this caused double votes to occur for the loaded validators, which led to attester slashings of 20 validators.
The fallback validator client was brought on and connected to Web3signer after an attempt had been made to deactivate the nodes attached to the original validator client instance by moving the associated EL node’s data container.
A full post mortem from Launchnodes’ perspective is available in APPENDIX A below. A full timeline of the incident can be found in section “4. Timeline” below.
The impact on stakers (stETH holders) from a penalties and missed rewards perspective is analysed below:
Following the incident, Launchnodes shut down multiple clusters totalling 2582 validators (including the 20 slashed) to ensure no further slashing could take place. In order to prevent the slashing from spreading, Launchnodes nuked the original node clients & data (EL+CL nodes and validator clients) and the original Web3signer instance. Over the following hours, Launchnodes reactivated the remaining 2562 validators successfully without any further slashing event taking place, with slashing protection enabled on the new Web3signer instance.
Regarding staker compensation, Launchnodes has already disbursed 25.663 ETH to cover the initial slashing penalties and missed rewards due to infrastructure downtime, meaning that stakers suffered no reduced rewards on the day of the slashing, and has pledged to also compensate for additional penalties that the slashed validators will receive until they are withdrawn from the network.
The order and timing of events was outlined below:
- Enable Web3signer slashing database (already confirmed as done).
- Launchnodes to work on plan for setting up infra anew on baremetal using updated risk mitigation processes.
- Launchnodes to communicate plan and updated risk mitigation and anti-slashing processes to Lido DAO community.
- Launchnodes to proceed with shutdown of interim infra and bringing up validators on baremetal infra.
Launchnodes Incident Report
Timeline & Root Cause
The root cause was Launchnodes failure to transition across to its ‘cold standby’ Data Centre, DC2 in an optimal way.
This resulted in nodes being active across 2 different Data Centres simultaneously - a scenario that should not have occurred.
Several actions could have preventing nodes from being slashed, including:
- Destroying the DC1 node cluster before failing over to DC2.
- Destroying the web3 signer before failing over to DC2.