Sendcloud Panel & API - Issues accessing the Sendcloud Panel & using the Sendcloud API service

Incident Report for Sendcloud

Postmortem

A deep dive into Sendcloud's September 10, 2025, database lock incident

On September 10, 2025, Sendcloud experienced a 45-minute service disruption that prevented customers from accessing recent orders in the shipping panel and temporarily made our API unresponsive. The incident occurred from 15:14 to 15:59 CEST during what should have been a routine billing operation.

The root cause was an unexpected interaction between our database locking mechanisms during the processing of significant invoice reversals. While our engineering team quickly identified and resolved the database lock issue, approximately 584 customer orders temporarily disappeared from view and required manual recovery.

We regret any inconvenience caused by the temporary unavailability of certain order information in your panel. This post explains what happened, how we resolved it, and the measures we're implementing to prevent similar incidents.

Timeline

Time (CEST) Description

15:05 The routine invoice reversal process begins for four large invoices containing a large number of items

15:14 Database locks escalate, causing API responses to slow significantly. IMPACT START

15:17 Automated monitoring detects database deadlock conditions and alerts the engineering team

15:22 Engineering team joins incident response, begins database analysis

15:35 Root cause identified: circular database locks are preventing all transactions

15:42 Problematic database transaction terminated, releasing locks

15:59 All services have been restored, and the order visibility recovery process has begun. PRIMARY IMPACT END

16:30 584 affected orders identified and restored to customer view

17:15 Customer support guided assisting affected merchants

What happened

The incident began during a routine financial operation—reversing four large customer invoices that contained nearly 15,000 individual shipping items. This reversal process creates corrective invoice entries and temporarily locks database records to ensure data consistency during the financial adjustments.

Under normal circumstances, these locks would only affect billing-related database tables, allowing shipping operations to continue uninterrupted. However, an unexpected behaviour in our database framework caused the locks to extend beyond billing records to include customer account and shipping location data.

When multiple invoice items referenced the same customers and shipping destinations, the system attempted to lock the same records in different sequences. This created a circular dependency where transactions waited for each other indefinitely—a classic database deadlock scenario. With thousands of items being processed simultaneously, this exhausted all available database connections, making our entire platform unresponsive.

Impact assessment

Customer Impact:

  • The Sendcloud panel became unresponsive for 45 minutes
  • API endpoints returned timeout errors during peak processing
  • 584 recent orders temporarily disappeared from customer dashboards
  • Shipping label creation and order management severely degraded

Services Affected:

  • Sendcloud shipping panel (Web interface)
  • Core API endpoints
  • Order visibility and processing workflows
  • Real-time shipment status updates

Recovery Actions Required:

  • Manual restoration of order visibility for affected customers
  • Customer support assistance for merchants unable to locate recent orders
  • Database integrity verification for all processed transactions

Our response

Our automated monitoring systems immediately detected the unusual database activity and alerted our engineering team. The initial challenge was distinguishing between a performance issue and the more serious deadlock condition that was actually occurring.

Once we identified the circular lock pattern, our database specialists quickly located the long-running transaction responsible for the deadlock. Terminating this transaction immediately released all blocked database operations and restored regular service.

The more complex challenge was addressing the side effects. During the database lock period, some orders were marked as "processed" in our system, despite not having actually completed their shipping workflow. This caused them to disappear from customer views, creating confusion about missing orders.

Our team developed queries to identify these affected orders and restored their visibility, while simultaneously providing customer support with tools and guidance to assist merchants who contacted us about missing shipments.

Root cause analysis

The incident resulted from an unforeseen interaction between our financial processing system and database locking mechanisms:

  1. Invoice Processing Scale

The four invoices being processed contained an unusually high number of items, creating more simultaneous database locks than our system typically handles. While individually normal, the combined scale pushed our locking mechanisms beyond their tested limits.

  1. Framework Lock Propagation

Our Django web framework automatically applied database relationship loading (select_related) during the locking process. This caused locks to extend beyond invoice records to include related customer account and shipping location data—significantly more records than our engineers anticipated.

  1. Circular Lock Dependencies

With multiple transactions locking overlapping sets of records (invoices, customer accounts, and locations) in different sequences, the database entered a deadlock state where no transaction could complete. This pattern wasn't apparent during normal operations with smaller invoice volumes.

Prevention and improvements

We're implementing comprehensive changes to prevent similar incidents:

Immediate Actions (Completed):

  • Modified invoice processing to use explicit lock scoping, preventing automatic relationship locking
  • Implemented database connection monitoring with automatic circuit breakers during lock contention
  • Created runbooks for rapid database deadlock identification and resolution

Short-term Improvements (In Progress):

  • Developing batch processing limits for large invoice operations to prevent lock accumulation
  • Implementing database transaction timeouts specifically calibrated for financial operations
  • Creating automated recovery procedures for orders affected by database lock scenarios

Long-term Architectural Changes (Planned):

  • Decoupling billing operations from core shipping database tables to prevent cascade failures
  • Designing separate database connection pools for financial vs. operational workloads
  • Implementing distributed locking mechanisms for large-scale financial processing

Technical insights

This incident highlighted how database framework optimizations designed for performance can inadvertently create broader failure modes during exceptional conditions. The Django ORM's automatic select_related optimization, beneficial for reducing database queries during normal operations, interacted unexpectedly when combined with explicit locking during high-volume processing.

The experience demonstrates the importance of understanding the full implications of framework behaviours, especially when they interact with explicit database control mechanisms like row locking. It also emphasized the importance of testing system behaviour at scales that exceed standard operational patterns.

What we learned

Database deadlocks in modern web applications often result from subtle interactions between framework optimizations and explicit database control mechanisms. Our incident response provided new insights into how these systems behave under exceptional load conditions.

The incident also underlined the value of rapid database analysis capabilities during service disruptions. The time between detecting performance issues and identifying the specific deadlock condition was crucial for minimizing customer impact.

Most importantly, we learned that financial operations, while logically separate from shipping workflows, can have unexpected technical dependencies that create broader system impacts than anticipated.

Moving forward

We're committed to preventing similar incidents through both technical improvements and operational changes. Our development teams are reviewing all large-scale batch operations to ensure appropriate lock scoping, and we're implementing monitoring specifically designed to detect deadlock conditions before they impact service availability.

The billing system decoupling initiative, already in progress, has been accelerated to prevent financial operations from affecting core shipping functionality. This architectural separation will provide better isolation between different aspects of our platform.

We appreciate your patience during this incident and understand the operational challenges caused by temporarily missing orders. Our customer support team remains available to assist with any ongoing concerns related to this incident.

For questions about this incident or its impact on your account, please contact our support team, who have been provided with specific tools and information to assist with any remaining issues.

Posted Sep 19, 2025 - 10:56 CEST

Resolved

This incident has been resolved. After closely monitoring the performance of our platform, we can confirm that all systems are now operating as expected.

However, we are aware of a remaining issue where some orders processed during the incident may still appear as missing. A workaround is available to reprocess these orders:

1. Go to the Incoming Orders tab.
2. Click on Add filter > Processing Status, then select Already processed.
3. Locate the missing orders, select them, and click on Create shipping labels.

Thank you for your patience while we worked on this issue.
Posted Sep 11, 2025 - 11:13 CEST

Monitoring

A fix has been implemented, and we are monitoring the results.
Posted Sep 10, 2025 - 16:17 CEST

Investigating

We are currently investigating an issue preventing users from accessing the Sendcloud Panel and using the Sendcloud API service.
Posted Sep 10, 2025 - 15:40 CEST
This incident affected: Sendcloud (Sendcloud Panel), Affected market (GLOBAL), and Integrations (Sendcloud API).