On September 10, 2025, Sendcloud experienced a 45-minute service disruption that prevented customers from accessing recent orders in the shipping panel and temporarily made our API unresponsive. The incident occurred from 15:14 to 15:59 CEST during what should have been a routine billing operation.
The root cause was an unexpected interaction between our database locking mechanisms during the processing of significant invoice reversals. While our engineering team quickly identified and resolved the database lock issue, approximately 584 customer orders temporarily disappeared from view and required manual recovery.
We regret any inconvenience caused by the temporary unavailability of certain order information in your panel. This post explains what happened, how we resolved it, and the measures we're implementing to prevent similar incidents.
Time (CEST) Description
15:05 The routine invoice reversal process begins for four large invoices containing a large number of items
15:14 Database locks escalate, causing API responses to slow significantly. IMPACT START
15:17 Automated monitoring detects database deadlock conditions and alerts the engineering team
15:22 Engineering team joins incident response, begins database analysis
15:35 Root cause identified: circular database locks are preventing all transactions
15:42 Problematic database transaction terminated, releasing locks
15:59 All services have been restored, and the order visibility recovery process has begun. PRIMARY IMPACT END
16:30 584 affected orders identified and restored to customer view
17:15 Customer support guided assisting affected merchants
The incident began during a routine financial operation—reversing four large customer invoices that contained nearly 15,000 individual shipping items. This reversal process creates corrective invoice entries and temporarily locks database records to ensure data consistency during the financial adjustments.
Under normal circumstances, these locks would only affect billing-related database tables, allowing shipping operations to continue uninterrupted. However, an unexpected behaviour in our database framework caused the locks to extend beyond billing records to include customer account and shipping location data.
When multiple invoice items referenced the same customers and shipping destinations, the system attempted to lock the same records in different sequences. This created a circular dependency where transactions waited for each other indefinitely—a classic database deadlock scenario. With thousands of items being processed simultaneously, this exhausted all available database connections, making our entire platform unresponsive.
Customer Impact:
Services Affected:
Recovery Actions Required:
Our automated monitoring systems immediately detected the unusual database activity and alerted our engineering team. The initial challenge was distinguishing between a performance issue and the more serious deadlock condition that was actually occurring.
Once we identified the circular lock pattern, our database specialists quickly located the long-running transaction responsible for the deadlock. Terminating this transaction immediately released all blocked database operations and restored regular service.
The more complex challenge was addressing the side effects. During the database lock period, some orders were marked as "processed" in our system, despite not having actually completed their shipping workflow. This caused them to disappear from customer views, creating confusion about missing orders.
Our team developed queries to identify these affected orders and restored their visibility, while simultaneously providing customer support with tools and guidance to assist merchants who contacted us about missing shipments.
The incident resulted from an unforeseen interaction between our financial processing system and database locking mechanisms:
The four invoices being processed contained an unusually high number of items, creating more simultaneous database locks than our system typically handles. While individually normal, the combined scale pushed our locking mechanisms beyond their tested limits.
Our Django web framework automatically applied database relationship loading (select_related) during the locking process. This caused locks to extend beyond invoice records to include related customer account and shipping location data—significantly more records than our engineers anticipated.
With multiple transactions locking overlapping sets of records (invoices, customer accounts, and locations) in different sequences, the database entered a deadlock state where no transaction could complete. This pattern wasn't apparent during normal operations with smaller invoice volumes.
We're implementing comprehensive changes to prevent similar incidents:
Immediate Actions (Completed):
Short-term Improvements (In Progress):
Long-term Architectural Changes (Planned):
This incident highlighted how database framework optimizations designed for performance can inadvertently create broader failure modes during exceptional conditions. The Django ORM's automatic select_related optimization, beneficial for reducing database queries during normal operations, interacted unexpectedly when combined with explicit locking during high-volume processing.
The experience demonstrates the importance of understanding the full implications of framework behaviours, especially when they interact with explicit database control mechanisms like row locking. It also emphasized the importance of testing system behaviour at scales that exceed standard operational patterns.
Database deadlocks in modern web applications often result from subtle interactions between framework optimizations and explicit database control mechanisms. Our incident response provided new insights into how these systems behave under exceptional load conditions.
The incident also underlined the value of rapid database analysis capabilities during service disruptions. The time between detecting performance issues and identifying the specific deadlock condition was crucial for minimizing customer impact.
Most importantly, we learned that financial operations, while logically separate from shipping workflows, can have unexpected technical dependencies that create broader system impacts than anticipated.
We're committed to preventing similar incidents through both technical improvements and operational changes. Our development teams are reviewing all large-scale batch operations to ensure appropriate lock scoping, and we're implementing monitoring specifically designed to detect deadlock conditions before they impact service availability.
The billing system decoupling initiative, already in progress, has been accelerated to prevent financial operations from affecting core shipping functionality. This architectural separation will provide better isolation between different aspects of our platform.
We appreciate your patience during this incident and understand the operational challenges caused by temporarily missing orders. Our customer support team remains available to assist with any ongoing concerns related to this incident.
For questions about this incident or its impact on your account, please contact our support team, who have been provided with specific tools and information to assist with any remaining issues.