Issues with Queue Login and Incoming Calls

Incident Report for Zisson.no

Postmortem

Please see below for the postmortem report about the incident that occurred on the afternoon of November 12th.

Summary
On November 12, 2026, the Zisson Interact platform T experienced an outage in the internal message system between 14:20 and 15:55. During this time, agents were unable to log in or out of queues, receive or make calls using the softphone, change their queue status, or process ongoing queue traffic. Temporary fixes implemented from 15:15 gradually improved the situation until full resolution was achieved at 15:55.

Description
At 14:20, monitoring alarms were triggered due to a growing queue in the Interact message system. The issue, which is still under investigation, was most likely caused by a network issue that led to confusion within the message system cluster, resulting in two servers assuming the master role simultaneously. This caused a feedback loop where thousands of messages per second were generated, heavily overloading the system. As a result, agents were unable to log in or out of queues, receive or make calls with the softphone, change their availability, or process incoming queue traffic. The incident was challenging to diagnose due to misleading symptoms that initially appeared to stem from abnormal traffic levels.

Timeline of Events

  • 14:20: Alarms triggered – message queue congestion detected in Zisson Interact.

  • 14:25: Situation Room initiated – personnel from Operations and Support gathered, troubleshooting started.

  • 14:35: Issue escalated – additional engineers joined the investigation.

  • 14:55: Troubleshooting continued; cause still unclear, extremely high message volume observed. Queue traffic unable to be processed.

  • 15:00: External specialists engaged to assist in diagnostics.

  • 15:15–15:30: Temporary mitigation measures implemented – agents gradually regained ability to log in/out of queues, process queue traffic, and handle calls using the softphone.

  • 15:55: Root cause identified and corrected – system returned to stable operation.

Root Cause

Further investigation during the night revealed that a new network configuration was implemented by our hosting partner shortly before the incident. This change caused the RabbitMQ servers within the message system cluster to lose communication with each other, leading each node to operate independently (a “split-brain” condition).

As a result, multiple servers assumed the master role simultaneously, which generated message loops and excessive load on the system. The issue was not caused by an external network event, but by an unintended consequence of the configuration change, which disrupted internal synchronization between the cluster nodes.

Actions Taken

  • Established Situation Room with Operations, Support, and external experts.

  • Implemented temporary mitigation to restore softphone functionality, queue control, and traffic flow.

  • Identified and resolved master role conflict within the message system.

  • Verified system stability after resolution.

Next Steps

  • Conduct a full review of the message system cluster configuration to prevent double- master conditions.
  • Implement improved monitoring for network drops and role conflicts.

  • Review automatic failover logic to ensure stability during transient network issues.

  • Our hosting partner has updated their procedures for implementing network configuration changes to ensure that such changes are properly reviewed, tested, and coordinated to prevent similar issues in the future.

We understand the critical nature of this disruption and sincerely apologize for the impact it had on our customers’ operations.

— Zisson Operations Team, 2026-11-12

Posted Nov 13, 2025 - 15:42 CET

Resolved

The issue has now been resolved.
All systems are functioning normally again.
There may still be a slight delay in the queue logs, but this is expected to be fully resolved within the next 30 minutes.

Feilen er nå løst, og alle systemer fungerer som normalt igjen.
Det kan fortsatt være litt forsinkelse i køloggen, men dette vil bli løst i løpet av de neste 30 minuttene.
Posted Nov 12, 2025 - 16:45 CET

Monitoring

We are still working to fully resolve the issue.
The situation is mostly stable — it is possible to log in to queues, view logged-in agents, and see calls in queue.
However, there are still issues affecting softphone functionality, and queue logs and statistics remain delayed by approximately 2 hours.


Vi jobber fortsatt med å løse problemet fullt ut.
Situasjonen er stort sett stabil – det er mulig å logge på kø, se påloggede agenter og samtaler i kø.
Det er fortsatt problemer med softphone, og statistikk samt kølogg har fortsatt en forsinkelse på rundt 2 timer.
Posted Nov 12, 2025 - 16:04 CET

Update

We are still working to resolve the issue. The situation is starting to stabilize.
It is now possible to log in to queues, view logged-in agents, and see calls in queue.
However, queue logs and statistics are currently delayed by approximately 2 hours.

For softphone users, it may help to log out and in again


Vi jobber fortsatt med å løse problemet. Situasjonen begynner å stabilisere seg.
Det er nå mulig å logge på kø, se påloggede agenter og samtaler i kø.
Kølogg og statistikk har foreløpig en forsinkelse på rundt 2 timer.

For softphone-brukere kan det hjelpe å logge ut og inn igjen.
Posted Nov 12, 2025 - 15:34 CET

Identified

We are still working to resolve the issue affecting queue logins and incoming calls.
Our team continues to investigate the root cause and implement a fix.
Next update in approximately 30 minutes.

Vi jobber fortsatt med å løse problemet som påvirker pålogging til køer og innkommende samtaler.
Teamet vårt undersøker fortsatt årsaken og jobber med en løsning.
Ny oppdatering om ca. 30 minutter.
Posted Nov 12, 2025 - 15:01 CET

Investigating

We are currently experiencing issues with logging into queues, and incoming calls are not being delivered to agents.
We are working to resolve the issue and will provide updates as we progress


Vi opplever for øyeblikket problemer med å logge på kø, og innkommende samtaler blir ikke levert til agenter.
Vi jobber med å løse problemet og oppdaterer fortløpende.
Posted Nov 12, 2025 - 14:28 CET
This incident affected: Zisson Interact.