In early November of 2018, Microsoft’s Azure Active Directory (AD) multi-factor authentication (MFA) service suffered two global outages. The first outage occurred on November 19, 2018, and users of Azure AD authentication services—including users of Office 365, Dynamics, and other services which use Azure AD for authentication—were unable to log in using MFA.
Though this issue was fixed on the same day, an unnoticed bug led to a second Azure AD MFA outage on November 27, 2018, which was completely resolved after only five hours. These consecutive outages came as a shock for companies that have mandated MFA.
What caused the outage?
Microsoft claimed there were three causes for the outages:
-
Latency in the communication between the MFA front end and its cache services, which occurred when the traffic threshold was reached.
-
Race condition in processing responses from the MFA back-end server that led to recycling of the MFA front-end server processes, which triggered additional latency.
-
An undetected issue in the back-end MFA server that was triggered by the second root cause. This issue caused an accumulation of processes on the MFA back end, leading to resource exhaustion. At this point the back end was unable to process any further requests from the MFA front end, while otherwise appearing healthy in the Office 365 Service Health Dashboard.
How you learn about an outage matters
Have you ever woken up to calls from frustrated users or found yourself unable to log in to your account? In either case, there’s a high chance that you might have suspected a technical glitch rather than a global outage. Learning about an outage after it’s already derailed an employee’s workday is bad for business and your IT department’s reputation.
Faced with the recent Azure MFA outages, IT admins all over the world probably racked their brains over the issue that they couldn’t do anything about until receiving an official notification from Microsoft.
Assuage your fears with O365 Manager Plus
Office 365 outages aren’t new, and in most scenarios IT admins can’t do much to rectify the situation. However, imagine what would’ve happened if IT admins were notified right when this or any other outage occurred. They could have immediately worked on a plan for notifying users, and looked out for alternative ways to carry out their business processes without much hindrance.
O365 Manager Plus is a complete Office 365 reporting, auditing, monitoring, management, and alerting solution. It monitors an Office 365 setup 24×7 and sends out real-time email notifications on service outages, incidents, and advisories. The email notifications and service health dashboard of O365 Manager Plus come with detailed analysis of service outages, which helps in efficient planning.
Receive real-time notifications on service outages
O365 Manager Plus monitors every Office 365 service 24×7, and unlike the native Office 365 portal, it sends out email notifications when a service outage is detected.
You can also configure custom email notifications to get notified about advisories and incidents occurring in your Office 365 setup.