Post Mortem: Delayed Oracle Report (April 8, 2023)

in Post Mortem by Lido


Intro

On 8th of April 2023 the Oracle report finalisation had been made 6 hours later than usual ~12pm UTC. The delay had been caused by the occurence of an edge-case with a report slot being missed on the Consensus Layer, preventing the software from collecting the data.

 

The urgent fix for the said edge-case had been prepared by the Lido Contributors, allowing the Oracle holders to finalize the report after the software upgrade. No user tokens had ever been at risk, and the offchain code for the Oracle for the now-running Lido V2 upgrade works with said edge-case correctly.

 

Why Did It Happen?

To generate the report, the oracle code must coordinate the gathering of data from CL (Consensus Layer) and EL (Execution Layer) nodes. Specifically, the Oracle requires information about the EL block corresponding to a particular CL slot.

 

However, on April 8th, 2023, the particular “report slot” had been missed, thus no EL block was present related to slot. The offchain oracle was not equipped to handle this edge-case, so the report couldn’t have been collected.

 

 

How Did We Fix It?

On the same day Lido Contributors have released an update for the offchain Oracle including the fix for the said edge-case. Oracle holders checked the release code and updated offchain Oracles. The updated code now appropriately addresses situations where slots are missed.

 

In such cases, the code iterates backwards through the slots until it identifies a slot that had been successfully validated. The said edge-case handler is implemented in the currently running offchain Oracle for Lido V2.

 

Incident Recap

(All times in UTC, 08 Apr 2023)

 

  • 12:41: alert in tg groups on Oracle report overdue by 15m.
  • 12:47: incident zoom call gathered, debugging started.
  • 13:00: diagnosed the issue with missed slot & no code for handling it; started preparing the fix.
  • 13:11: sent heads-up and started gathering Oracle members quorum for updating.
  • 13:29: gathered pre-commitments from 5 Oracles.
  • 13:44: tweets on delayed report sent.
  • 16:17: build ready, tested & shared with Oracles.
  • 16:43: last tx for the report is in, report finalised.
  • 21:09: tweets on successful report are sent.