OAKS Service Disruption Analysis

Moving forward, Information Technology will publish an analysis describing each high-impact unscheduled service disruption. The breakdown will be published to the campus in an effort to increase transparency and accountability, and provide insights into our information technology environment. Each analysis will provide a summary of the incident, describe the cause, and outline the steps that IT will take to help prevent a future recurrence. We welcome your feedback to help streamline the information that is most useful to you. Please provide any comments directly to Monica Lavin, Director of IT Communications and Customer Advocacy, via email.

April 29, 2014 OAKS Service Disruption Analysis

Summary: At 8:56 p.m., OAKS (learning management system) web servers started to experience reachability issues. Between 9:00 p.m. and 10:00 p.m., Helpdesk received four reports from students and faculty that they could not successfully login to OAKS.  IT support staff for OAKS were notified. In order to execute the quickest possible solution, all OAKS servers were rebooted at and OAKS was completely functional again by 10:45 p.m. Overall, OAKS access was intermittent for roughly two hours.

Cause:  A definitive cause is currently unknown. Two of the six OAKS web servers were experiencing memory problems and users who happened to reach those two servers were not able to login. Likewise, users who were sent to one of the normally functional servers could successfully login.  Monitoring of the systems indicated that the problem appeared to worsen over time.  To correct the situation, IT support performed a total restart of OAKS, which cleared up the existing login issues.

Prevention: IT will research how the load balancer (device that distributes traffic across a number of servers) functions and whether it should remove (or can be set to remove) the unresponsive servers from the pool of available servers. Thus directing users only to the responsive web servers where they can login successfully.

Since the possibility is low that all six OAKS web servers would be experiencing a problem at the same time, users should be instructed by Helpdesk to clear their browser cache and try again or try to connect using a different browser. If a user does not clear the browser cache between login attempts to OAKS, the system will repeatedly try to reach the same server.  If that server is having issues, the user will not be able to login even though there may be other web servers that are functioning normally. IT will monitor the web servers more closely for this error and proactively reboot servers that exhibit cache problems until a long-term solution is identified.