TechnologyOne Status - Performance Degradation

Performance Degradation - CiAnywhere / ANZ Region

Incident Report for TechnologyOne

Postmortem

Issue Summary:

On Thursday 11 April 2024 customers experienced performance degradation within Ci Anywhere. Some varying performance degradation was reported by a small number of customers from 10.30am. From 3pm more customers reported consistent performance degradation issues and analysis identified all issues were related at 3.18pm. Status page updates were provided regularly from 3.23pm as the incident was investigated and mitigated with all customers' performance returned to normal at 9.45pm.

Root Cause Analysis:

The background service used to cache records for users reached a maximum number of connections and started producing errors and prevented any further users connecting to the service. The failover service became overloaded which caused the DPs to stall. The alerts that were in place did not highlight the number of connections reaching threshold limits.

Corrective Measures:

Updated configuration to force users to have new connections created and old connections dropped. This change did not improve performance.

Recycled every app server. This change did not improve performance.

Two new services for background caching were built and the customer data sets split between these services to further balance the required connections.

Preventive Measures:

The alert threshold for errors on the background service has been adjusted and additional alerts created for the background caching service.

The playbook has been adjusted to reorder the steps to be undertaken should a similar issue occur and to cater for the new alerts built.

Posted Apr 18, 2024 - 16:11 AEST

Resolved

Monitoring throughout the day has proven the actions taken yesterday were successful.
We will follow up with root cause and post mortem in the next 14 days.

Thank you for your continued support and feedback

Posted Apr 12, 2024 - 16:47 AEST

Monitoring

We have applied the changes to mitigate the issue and are continuously monitoring.

We sincerely apologise for any inconvenience this disruption may have caused. Your patience and understanding during this time have been greatly appreciated.

Posted Apr 11, 2024 - 22:11 AEST

Update

We've pinpointed an additional change necessary to finalise the fix. We anticipate this will be completed within the next 2 hours.

The next update will be provided within 2 hours or sooner.

Thank you.

Posted Apr 11, 2024 - 19:59 AEST

Update

70% of CiA App servers have been recycled, with the remaining 30% currently in progress.

We'll provide another update within the next 60 minutes.

Thank you.

Posted Apr 11, 2024 - 18:34 AEST

Update

We are continuing to roll out the fix.

We will provide another update within the next 60 minutes.

Thank you.

Posted Apr 11, 2024 - 17:29 AEST

Identified

Our engineers have identified the issue and are in the process of applying a fix.

The next update will be in 60 mins

Thank you.

Posted Apr 11, 2024 - 16:28 AEST

Investigating

We have identified a selection of customer environments experiencing performance degradation within ANZ.

Our engineers are currently investigating the root cause.

We will provide another update within the next 60 minutes.

Thank you.

Posted Apr 11, 2024 - 15:23 AEST

This incident affected: Software as a Service - Australia & New Zealand (User Experience).