Resolved -
✅ Resolved – Intermittent Errors Due to Database Routing Now Resolved
The earlier issue causing intermittent 503 and 504 errors across the platform has now been fully resolved.
One of our database servers (PX4) was automatically shut down by our cluster due to a fault, but routing systems continued to direct some traffic to it. This caused slow connection timeouts for a portion of users, particularly affecting login and job-related actions.
We apologise for the disruption and appreciate your patience while we resolved this issue.
May 9, 12:52 AEST
Monitoring -
🔍 Identified – Traffic Routing to Offline Database Node
We’ve identified that one of our database servers (PX4) experienced a critical error and was automatically shut down by our replication cluster configuration, as designed. At the same time, this event also caused our internal monitoring service to go offline, limiting our ability to detect the root cause immediately.
Because our database routing system (via DNS) was not updated to reflect PX4’s shutdown, a fraction of user traffic continued to be routed to the offline node. Our high-availability proxy layer attempted to reach PX4, resulting in long connection timeouts (503/504 errors) for some users instead of fast failover.
PX4 was removed, rebooted, and added back into the configuration. We are monitoring the cluster closely and implementing safeguards to improve routing synchronisation in future.
May 9, 11:49 AEST
Identified -
Identified Problem
May 9, 11:10 AEST
Update -
We are continuing to investigate the issue
May 9, 10:19 AEST
Investigating -
Incident Summary:
We are investigating an issue affecting some users of the Allotrac 1.0 platform. Drivers may experience intermittent issues logging into the mobile app, and some users may see errors (Gateway Timeout) when performing actions on the web platform.
Impact:
• Driver logins may fail intermittently.
• Some users may encounter errors when using the web platform.
• The platform may appear available, but can become unresponsive when performing actions.
Current Status:
Our team is actively working to identify the root cause and is under priority investigation.
May 9, 06:41 AEST