Unscheduled Maintenance
Incident Report for Allotrac Status Page
Postmortem

Incident Summary

Allotrac began an unplanned maintenance window, causing a temporary outage beginning at 12:10 PM, and was fully restored by 12:22 PM. This maintenance was initiated as a preventative measure to prevent an outage similar to last week.

Timeline
• 12:10 PM: Behavioural spike detected, leading to an unscheduled maintenance window
• 12:15 PM: DevOps team initiated a controlled shutdown of all databases to observe system behaviour and gather diagnostics.
• 12:22 PM: Service restored following comprehensive diagnostics and verification of system stability.

Actions Taken
• All databases were brought down temporarily to closely monitor the system’s response and identify potential root causes.
• Key metrics and system logs were collected to aid in the ongoing investigation of the behavioural spike.

Next Steps
• DevOps will continue to investigate the underlying causes of these behavioural spikes to implement long-term preventative measures.
• Additional monitoring and alerting will be configured to detect and mitigate similar issues proactively.

Posted Nov 05, 2024 - 12:35 AEDT

Resolved
Incident Summary

Allotrac began an unplanned maintenance window, causing a temporary outage beginning at 12:10 PM, and was fully restored by 12:22 PM. This maintenance was initiated as a preventative measure to prevent an outage similar to last week.

Timeline
• 12:10 PM: Behavioural spike detected, leading to an unscheduled maintenance window
• 12:15 PM: DevOps team initiated a controlled shutdown of all databases to observe system behaviour and gather diagnostics.
• 12:22 PM: Service restored following comprehensive diagnostics and verification of system stability.
Actions Taken
• All databases were brought down temporarily to closely monitor the system’s response and identify potential root causes.
• Key metrics and system logs were collected to aid in the ongoing investigation of the behavioural spike.
Next Steps
• DevOps will continue to investigate the underlying causes of these behavioural spikes to implement long-term preventative measures.
• Additional monitoring and alerting will be configured to detect and mitigate similar issues proactively.
Posted Nov 05, 2024 - 12:33 AEDT
Update
Service Restored
Posted Nov 05, 2024 - 12:22 AEDT
Update
Service Restoring
Posted Nov 05, 2024 - 12:18 AEDT
Monitoring
Service has been temporarily interrupted
Posted Nov 05, 2024 - 12:14 AEDT
This incident affected: Web App and Database.