Changelog

Follow up on the latest improvements and updates.

RSS

Overview
On 16 March, 23 March, 9 April and 10 April (outlined in red in the L90 image below), vCore and other services experienced performance degradation, including elevated response times.
Despite normal CPU utilisation on the database (~50%) and healthy API indicators, the system exhibited:
  • High request latency
  • Increased database commit latency
  • Application-level slowness
A controlled
Aurora failover (reader → writer promotion)
restored system performance, resulting in the best observed performance baseline in recent periods.
L90 of Cluster Connections
L90
The graph above shows the last 90 days of concurrent database connections. It can be seen that spikes in connection count increased gradually until hitting a critical point mid-last-week which caused a feedback loop of sustained connections, resulting in the performance degradation. Note the total flatline of connections at the far right of the graph (outlined in green), indicating that connections are not piling up and that the failover solution has eliminated stale resources and contentions.
Previous performance issues (notably Monday the 16th and Monday the 23rd of April) are also outlined in red.
Database connections from early Thursday
db-thurs
Latency and Load during degradation
Latency-deg
Latency and Load reduction following failover
(note: AWS metrics were wiped on failover. The latency and load peaks shown above represent the average load for the previous 2 days)
Latency-failover
Database connection stacking from before, during, and after the incident
DB-connection-stack
Impact
  1. User Impact:
  • Slow page loads across the web application
  • Intermittent failures/timeouts on user actions
  1. Business Impact:
  • Degraded user experience
  • Increased operational load during incident response
  • Risk to customer trust due to instability
Detection
Elevated response times reported via application monitoring
Aurora metrics indicated:
  • Increased commit latency
  • No corresponding spike in CPU or memory utilisation
Apache metrics showed:
  • Increased request duration
  • Worker saturation symptoms
Timeline
  • ~09:00 Apr 9 Performance degradation begins
  • ~10:00 Elevated latency observed in application
  • ~10:00-10:30 Database metrics reviewed (CPU normal, latency elevated)
  • 11:00 Apache / application restarts attempted (no improvement)
  • 12:00 Query analysis identifies high-cost UPDATE with full table scan
  • 12:30 Indexes added to mitigate query inefficiency
  • 13:00 Connection count to database began to reduce
  • 14:00 Performance degraded again
  • 09:00 Apr 10 Performance remained degraded
  • 14:00 Aurora failover initiated (reader promoted to writer)
  • 14:05 Immediate restoration of performance
  • ~23:00 Database instance size increased
What Happened (Technical Summary)
The system entered a degraded state characterised by
high database commit latency and request blocking
, despite moderate resource utilisation.
Investigation revealed:
  • A high-frequency UPDATE query performing a full table scan (~800k rows) due to missing indexing
  • Resulting in lock contention and transaction queuing
  • Accumulation of long-lived or blocked transactions
  • Increasing contention within InnoDB internal structures
Although indexing improvements were applied, the database remained in a degraded state due to
residual transactional and locking contention
.
A failover reset:
  • Active connections
  • Open transactions
  • Lock queues
  • InnoDB internal state
  • Various other caches
This immediately restored normal performance.
Root Cause
Primary Root Cause
A high-frequency database update query executed without an appropriate index, resulting in:
  • Full table scans
  • Excessive row-level locking
  • Transaction contention under load
Contributing Factors
1. Transaction and Lock Accumulation
  • Blocked and queued transactions accumulated over time
  • Lock contention propagated across unrelated queries due to shared resources
2. Connection Management Characteristics (PHP + Apache Prefork)
  • High number of concurrent database connections
  • Long-lived connections increasing contention footprint
3. InnoDB State Degradation Under Contention
  • Internal structures (lock queues, undo logs, buffer pool efficiency) degraded under sustained load
  • System did not self-recover after contention was introduced
4. Lack of Early Detection Signals
No alerting on:
  • Commit latency
  • Lock wait time
  • Long-running transactions
Issue was detected only after user-visible degradation
5. Delayed Recovery Without Reset
  • Restarting application layers (Apache/PHP) did not clear database-level contention
  • Only a database failover (hard reset of state) resolved the issue
Why It Affected the Entire System
Although the triggering query targeted a specific table, the impact was systemic due to:
  • Shared InnoDB resources (buffer pool, lock manager)
  • Transaction queue contention affecting unrelated queries
  • Connection pool saturation at the application layer
  • Increased commit latency impacting all write operations
Resolution
Immediate mitigation achieved via:
  • Aurora failover (reader promoted to writer)
Performance returned to baseline immediately after failover
Lessons Learned
  • Moderate CPU utilisation does not indicate database health
  • Commit latency is a critical early warning signal
  • Database engines can enter degraded states that do not self-recover
  • Failover acts as a reset, not a root cause fix
Follow-Up Actions
Short Term
Confirm all high-frequency queries are properly indexed
Enable and review slow query logging (lower threshold temporarily)
Monitor and alert on:
  • Commit latency
  • Lock wait time
  • Active transactions (innodb_trx)
Add visibility into connection counts and states
Medium Term
Review connection management strategy (reduce long-lived connections where possible)
Add dashboards for:
  • Transaction age
  • Lock contention
  • Threads running vs connected
Long Term
● Evaluate architectural changes to reduce high-frequency write contention
● Introduce backpressure or rate limiting on heavy write paths
● Consider read/write isolation improvements or workload partitioning
● Formalise database failover as a controlled operational response (not primary mitigation)
Blameless Summary
This incident was caused by a combination of:
  • An inefficient query pattern under load
  • Insufficient observability into database contention signals
  • Expected but unmanaged behaviour of the database under sustained transactional pressure
No single action or individual directly caused the incident.
The system behaved in line with its current design and constraints.
Incident History (Last 90 Days)
Incident histroy
* An Outage indicates that the system was completely inaccessible. Performance Degradation indicates that while the system was slow, it was still accessible and most work could be done, albeit at a less efficient rate.
The duration of actual outages within the last 90 days results in an uptime of 99.86%
Incident Summary
On Monday 16 March 2026, Visualcare experienced a major service disruption affecting API availability.
The incident was triggered by inefficient database query behaviour within a background worker process responsible for form-related data processing. This resulted in a surge of long-running queries that exhausted available database connections.
As database resources became constrained, API requests were unable to complete, leading to application worker saturation and instability across API nodes, causing significant slowdowns and incomplete queries in both the Visualcare Web Application and Worker Mobile App.
While initial mitigation actions restored partial functionality, the underlying database pressure persisted, resulting in repeated instability until a controlled recovery was completed.
Service access was fully restored at ~13:11 AEST at a reduced speed and the system resumed to normal behaviour at ~14:22 AEST as database pressure subsided.
Impact
Customer Impact
  • Intermittent failures and timeouts when accessing the platform
  • Slow response times and request timeouts
  • Periods where the platform was unavailable
Duration
  • Start:
    ~09:56 AEST
  • Resolved:
    ~13:11 AEST
  • Total duration:
    ~3 hours 15 minutes
What Happened
A background worker responsible for processing form data executed queries that scaled poorly under certain data conditions, resulting in significantly longer execution times than expected.
Top SQL
As these queries accumulated:
  • Database connections became heavily utilised
  • API requests began queuing while waiting for available connections
  • Application workers became saturated handling blocked requests
  • API nodes became unstable under sustained load
This, combined with elevated system load at the time, accelerated database resource exhaustion, resulting in request backlogs, application worker saturation, and progressive service degradation.
As API nodes became increasingly unstable, overall platform performance deteriorated significantly. Requests were delayed or failed as application processes remained blocked waiting for database access.
DB load
Initial recovery actions (traffic redistribution and application restarts) provided only temporary relief, as they did not address the underlying database load:
The worker process continued generating high database activity
Resource contention rapidly reoccurred after each recovery attempt
This resulted in a
repeating cycle of degradation and partial recovery
, significantly extending the duration of the incident
CPUUtil
Detection
The issue was initially identified through customer reports, followed by internal validation of:
  • API responsiveness degradation
  • Elevated database connection usage
  • Application instability
Gap Identified
At the time of the incident, there were limited proactive alerts for:
  • Database connection saturation
  • Long-running query thresholds
Resolution
Service was restored through:
  • Terminating long-running queries across our database shards.
  • Controlled recovery of application processes across API nodes
  • Careful management of traffic during recovery to prevent recurrence
  • Allowing the database load to return to normal operating levels
Once database pressure was reduced and application services stabilised, normal service resumed.
Root Cause
Inefficient database query behaviour in a background worker process led to sustained resource consumption, exhausting database connections and causing cascading failure across API services.
What We’re Improving
We are implementing several improvements to prevent recurrence and strengthen system resilience:
  1. Workload Protection
  • Introduce safeguards to prevent excessive resource consumption from any single workload
  • Improve isolation of database usage across different request types
  1. Query Optimisation & Limits
  • Optimise form-related query patterns
  • Enforce execution time limits on database queries
  1. Query Controls
  • Detect and manage long-running database activity more proactively
  1. Observability & Alerting
  • Add alerts for:
  • Database connection utilisation
  • Query execution duration
Closing Statement
We recognise the impact this incident had and take full responsibility for the disruption.
This event has led to clear improvements in how we:
  • Protect shared system resources
  • Detect abnormal behaviour earlier
  • Maintain stability under load
These changes are already planned and underway to ensure a more resilient and reliable platform moving forward.
Summary
On 12 January 2026, Visualcare experienced a Priority 0 (P0) service incident affecting the Mobile API and related services. The incident was triggered by a sudden and sustained surge of external request traffic, which placed unexpected pressure on backend systems and led to service unavailability.
Core services were restored by 1:35pm AEDT, with degraded performance continuing until 2:25pm AEDT, after which normal service levels were fully re-established.
Incident Classification
  • Priority: P0
  • Detected: 12:45pm AEDT
  • Declared: 12:47pm AEDT
  • Stable: 1:35pm AEDT
  • Normal: 2:25pm AEDT
Customer Impact
During the incident window:
  • The
    Mobile API
    was unavailable or intermittently unresponsive
  • Some customers experienced timeouts or slow responses in connected Visualcare services
  • During the recovery phase, services were available but may have exhibited degraded performance
There was no data loss, no unauthorised access, and no impact to data integrity.
What Happened
The incident was caused by a rapid increase in external request volume directed at the Mobile API. The traffic pattern resulted in a significantly higher number of concurrent requests than typically observed.
As request volume increased, backend processing slowed, and active requests accumulated faster than they could be completed. This led to temporary resource saturation and prevented the system from efficiently accepting or completing new requests.
An initial service restart did not immediately restore normal service. Additional controls were subsequently applied to support stable request handling during high traffic conditions, after which the system recovered.
Detection
The issue was detected through a combination of:
  • Internal monitoring indicating elevated load and degraded responsiveness
  • Customer reports of service unavailability
The incident was escalated and formally declared a P0 once widespread impact was confirmed.
Resolution
Service recovery occurred in two phases:
  1. Stabilisation (by 1:35pm AEDT)
  • Protective request-handling controls were applied
  • Services were restarted in a controlled manner
  • Core functionality was restored and customer access resumed
  1. Performance Recovery (1:35pm–2:25pm AEDT)
  • Elevated traffic gradually subsided
  • System performance progressively returned to normal levels
  • No manual intervention was required for downstream systems once stability was achieved.
Timeline (AEDT)
  • 12:35pm – Elevated external request volume begins impacting service responsiveness
  • 12:45pm–12:50pm – Mobile API becomes unavailable or severely degraded
  • 12:50pm – Incident declared P0
  • 12:50pm – Initial restart attempted; elevated traffic persists
  • 1:30pm – Additional protective controls applied to manage request load
  • 1:35pm – Core services restored (start of degraded performance window)
  • 2:25pm – Traffic normalises; full-service performance restored; P0 cleared
Root Cause
A sudden and sustained surge of external requests placed unexpected load on the Mobile API, leading to temporary saturation of request processing capacity. This prevented the system from handling new requests efficiently until traffic was regulated and services were stabilised.
Preventative Actions
To reduce the risk and impact of similar events in the future, we are implementing the following improvements:
  • Enhanced controls to better regulate and absorb sudden spikes in request traffic
  • Improved monitoring and alerting to detect abnormal traffic patterns earlier
  • Additional safeguards to ensure services recover more quickly under extreme load
These actions are actively being tracked through our internal delivery process.
Closing
We recognise the operational impact this incident may have caused and appreciate your patience.
Visualcare continues to invest in platform resilience and protection to ensure reliable service, even during abnormal traffic conditions.
If you have any questions or would like further clarification, please contact your Customer Success Manager, or the Head of Customer Success, Maddie Hayes (mhayes@visualcare.com.au).

new

Support at Home

Claiming

Support at Home (SAH) Claiming

We’ve released the first version of
Support at Home (SAH) Claiming
in Visualcare, including
CSV export
options. This update also introduces new settings for SAH claiming configuration and user-level permissions.
This release supports providers preparing for their first Support at Home claim and introduces the foundational claiming workflow, aligned to the Aged Care Web Services requirements.
What’s New
SAH Claiming – New Claiming Workspace
A new claiming area is now available:
Timesheets > SAH Claiming
This workspace allows you to:
  • Create draft SAH claims
  • View all SAH claims with real-time status updates (when claiming via API)
  • Reconcile completed or rejected claims
  • Export claims using bulk CSV if you prefer a file-based workflow
Supported statuses include:
  • Draft – items batched and ready to claim
  • Submitted – successfully sent to Services Australia
  • Rejected – items returned with a reason and available for correction
  • Claimed – processed by Services Australia; reconciliation applied
This workflow differs from HCP processes. We strongly recommend familiarising yourself with the new screens and statuses before submitting your first SAH claim.
New SAH Claiming Settings
A new configuration area is available to set your claiming rules:
Settings > Integrations > Manage > SAH Claim Settings
You can now configure:
  • Care Management Claiming options to claim each activity individually, or aggregate Care Management per participant before submission
  • Default Support at Home Claiming Method
  • Rounding Thresholds for SAH service types that use hour as the unit (e.g., Care Management)
  • Option to prevent rounding down to 0 minutes
User Permissions for SAH Claiming
A new permission setting has been added to control who can access SAH Claiming:
Settings > User Group Security
This allows you to limit SAH claiming to specific roles such as finance, coordinators, or administrators.
Where to Find Everything
SAH Claiming - Timesheets > SAH Claiming
SAH Claim Settings - Settings > Integrations
User Access Control - Settings > User Group Security
Need Help?
Refer to the Support at Home Knowledge Hub for detailed guidance, examples, and setup instructions.
For configuration support or assistance validating your first SAH claim, please contact Support or your Customer Success Manager.
We’ve updated the
Expenses Bulk Import
feature to support uploading AT-HM expense items in bulk. This enhancement helps providers manage high-volume AT-HM records more efficiently and ensures correct mapping of the new AT-HM fields required under Support at Home.
This update is available now under:
Operations > Import CSV > Expenses
What’s New
AT-HM Bulk Import Support
The Expenses Bulk Import template has been extended to include all AT-HM-related fields. You can now bulk upload AT-HM items with full classification and linkage to the required AT-HM attributes.
Newly supported fields include:
  • AT-HM Parent
  • AT-HM Item / Wraparound
  • AT-HM Item Code
  • AT-HM Prescribed
  • AT-HM First Payment
  • AT-HM Loaned
  • Home Support Item Code
These fields ensure AT-HM expenses are imported with the correct structure, enabling accurate claiming, reporting, and compliance.
How It Works
  1. Download the latest Expenses CSV template from the Import CSV screen.
  2. Populate AT-HM fields where required.
  3. Upload the completed file under Operations > Import CSV > Expenses.
  4. Imported AT-HM items will appear in the participant’s expenses and will be available for claiming (where relevant under SAH).
Why This Matters
These enhancements reduce manual entry and improve data consistency across AT-HM records, particularly important as providers transition to Support at Home and manage higher volumes of AT-HM activity.
Need Help?
Refer to the updated
AT-HM Bulk Import
guide in the Support at Home Knowledge Hub, or contact Support or your Customer Success Manager for assistance.

fixed

improved

Support at Home

Bugs

Incidents

Patch Fixes

  • Participant contributions not saving
    - Fixed an issue where only a small subset of participant contributions were being retrieved due to incorrect API filtering. Contributions are now pulled per participant, ensuring all data is saved correctly.
  • Participant supplements missing
    - Updated validation so supplements are saved even when Services Australia sends partial or inconsistent fields. Supplement records now load and display reliably.
  • Unspent HCP funds not showing
    - Corrected logic for recognising “active” budgets so grandfathered budgets with no end date now appear in the SAH Participant Profile.
  • SIRS reportable flag & comments not persisting
    - Fixed an issue where SIRS records saved with “Reportable” and comments did not reload on edit. Values now persist as expected.
The NDIA has released an update to the Pricing Arrangements and Price Limits.
Visualcare has updated the pricing catalogue to reflect the new prices, effective 24 November 2025.
Action required:
The updated catalogue is now available for bulk upload.
You can access it from:
Maintenance > Services > NDIS Support Catalogue
and apply the latest pricing in your environment.
More information on how to do this can be found in vDocs: Update NDIS Pricing.
We’ve improved the Expenses → Import CSV feature to make bulk expense uploads more accurate and efficient.
What’s New
  • Service Code field added
    to the expense CSV import process.
  • CSV template updated
    to include the new Service Code column.
Where to find it
Go to
Operations → Import CSV
and select
Expenses
from the dropdown.
How It Works
When preparing your CSV, simply enter the Visualcare
Service Code
for each expense in the new column. The system will attach the correct service to the expense during upload
as long as that service is active and linked to the client's agreement
, ensuring accurate claiming and reporting.
We’ve released several key updates to support the
Aged Care Web API registration
process and
Support at Home (SAH) participant verification
.
Why this matters
To prepare for Support at Home, providers must register for the
Aged Care Web API
so Visualcare can securely
download participant information
from Services Australia. This is a critical step before claims can be made under the new program.
What’s new
You can now complete all Step 4 setup tasks directly in Visualcare, including:
  • Registering for the Aged Care Web API via Services Australia.
  • Establishing a secure API connection between Visualcare and PRODA.
  • Verifying Support at Home participant records against the Aged Care Provider Portal.
How it works
1. Register for the Aged Care Web API
  • Complete the registration process in Services Australia’s Provider Portal.
  • Once approved, you’ll receive confirmation of your registration.
2. Connect the API in Visualcare
  • Go to Settings → Integrations → Services Australia (Aged Care Web API).
  • Enter your organisation’s PRODA details and confirm the connection.
3. Verify Support at Home participants
  • Navigate to Operations → Step 4 Participant Verification
  • Visualcare retrieves participant information and flags any unmatched or incomplete records.
Where to find it
  • Aged Care Web API Connection:
    Settings → Integrations
  • Participant Verification
    : Operations → Step 4 Participant Verification
Who can use it
Users with
System Administrator
or
Organisation Manager
permissions who manage PRODA or government integrations.
Get ready now
  • Ensure your organisation’s PRODA details are current.
  • Complete Aged Care Web API registration as soon as possible.
  • Begin participant verification so client data is ready for import and future claiming.
Learn more

new

Support at Home

Agreements

Bulk Create Agreements from Client List

We’ve added the ability to bulk create Agreements directly from your Client List.
Why this matters
Providers need a fast way to set up SAH agreements at scale. This update lets you create agreement “shells” in bulk and link services.
How it works
Starting from the
Clients → Profile
view, you can:
  • Filter and select clients from the list to bulk create agreements.
  • Complete Agreement Details (header only at this step) and Create.
  • Agreements are created and ready for service linking.
  • Navigate to
    Clients → Agreements
    and use the bulk edit feature to add services to your new agreements.
Where to find it
Go to
Clients → Profile
, to
Bulk Create Agreements
.
Then,
Bulk Edit
to add services via
Clients → Agreements
.
Who can use it
Users with permission to view/edit client profiles and create agreements.
Get ready now
  • Create any missing SAH services using Import CSV (Maintenance → Services → All Services → Import CSV).
  • Identify which clients will need SAH agreements so you can bulk-create them immediately.
Learn more
Load More