What is our primary use case?
We use xMatters as our automated on-call engagement system. We use ServiceNow for major incident management and processing for the university’s IT services. When there is an incident of sufficient priority, impact, or urgency, we make use of the integration between ServiceNow and xMatters. xMatters contacts our staff members who are on call to make them aware that there’s an issue going on.
It gets them the information they need to log in and fix whatever might be happening. xMatters can do a lot of other things, but we use it primarily for our major incident response and automated on-call processes.
How has it helped my organization?
In 2019, we embarked on a “lights out” process. We had staff members sitting in our operations center, 24/7/365. They had to watch the screens and make sure that when something went down, there was someone there to pick up the phone and call somebody. In December 2019, we were able to move those staff members back into a role with regular working hours, and for some move them into other roles. We let the machines do the hard work of notifying people if something goes wrong. xMatters was a big part of that because it allowed our managers to maintain their own rosters, and cell phones didn’t have to be handed from one person to another. The process just worked really well.
We also onboarded our institution’s public safety/police department. Before, if they had an issue where everything went down and they couldn’t do anything from their office, they would either call or walk over to the IT building and find somebody in the operations center, and then they would call somebody from networks. Now, we have onboarded several people from the police station, and they have the ability to use the xMatters mobile app to contact our major incident managers directly. That means they don’t have to physically come work with us or find us. We were able to replace that physical process that existed prior to 2019 with a fully automated process now.
The automation provided by xMatters has helped us respond to incidents. It puts the responsibility for responding on the groups and the people who are responsible for providing service. They’re getting a notification when something happens that meets a certain threshold. That’s in contrast to the subjective process we had in place previously where a person in the operations center decided not to call somebody for whatever reason. Now that it’s automated and everybody is playing by the same rules, there have been improvements on the monitoring side of things and in how things are architected. They know that if something goes down, they’re going to get a call. Having the managers and the people closer to the process, with the ability to manage their own rosters, results in a little bit more accountability, rather than just passing it off to the person who’s sitting in the operations center.
The automated notification process has made people understand that they have to fix things before they go “bump” in the night. They know there is no longer a person sitting in our operations center who might decide not to wake somebody up. The machines are going to detect that something has gone wrong and they’re going to notify xMatters, and xMatters is going to notify the group. Tangentially, that results in people proactively fixing things ahead of time. In turn, with people being a little bit more proactive in handling things, issues don’t get up to a priority-one level as much. But when it happens, xMatters does its job and gets out of the way really quickly. It helps us deal with incidents when they happen. In addition, the targeted notifications have helped reduce response times to IT incidents. It doesn’t require a person in the operations center to call five people five times. It handles things synchronously. I would absolutely posit that our response time is quicker than it used to be.
What is most valuable?
In terms of its flexibility, we’ve been using xMatters for close to two years, and we have yet to encounter a situation where somebody hasn’t been able to configure it to work the way we want. We can configure groups to be members of other groups, enabling us to nest sequences of rosters, and that has been super-helpful in a number of scenarios. We provided a little bit of training and documentation for the managers who had to manage their rosters and the sequence of calls, and since then we really haven’t had to do a lot. We tell them the URL and that they should log in, and they can figure it out from there. It’s fairly straightforward to understand how you add a user or add a member to the roster or add a device. It doesn’t take a lot of administrative overhead and that’s important for us. We don’t have a lot of people to manage every little thing, so people being able to do it themselves is important.
Because we use xMatters primarily for our major incident response and automated on-call processes, the automatic logging that’s built into xMatters, especially the timeline of events, is very helpful because we can figure out why a particular person got a call. We can see, for instance, that it was because an incident showed up in that person’s group and it went to the first person on-call and that person hit skip or ignore. It then went to the next person, called all of their devices, but they never acknowledged anything. Then it went to the next person and that’s who actually picked up.
Having that level of detail built-in makes it really easy for me or the managers to prove that’s what happened, and we can self-serve that information. It gives people the autonomy to know why they got a call. Just click here and you’ll see exactly why the fourth person in the roster got the call instead of the first.
The integration of xMatters with ServiceNow worked pretty easily. There was a little bit of configuration and coordination with our ServiceNow, but once it was set up it just worked. It does the right thing for us. We don’t want every single instance that ServiceNow handles to generate an on-call notification. We only want priority-one and priority-two to result in notifications, for certain groups, via xMatters. It does that really well, and the integration was super-easy. I have also done some work with the xMatters API to pull out information about users and groups and rosters into a Google sheet. I used a Google Apps Script to interact with xMatters and pull information out for reporting purposes. That was also really easy.
We use that information to see how many people are in xMatters, who’s licensed, and if people have left the university we can make sure close their accounts. xMatters has also helped us build workflows that meet our needs. In comparison to all the organizations that use xMatters, our workflows are not complex, but it does what it does well and easily. Our simple workflows consist of an incident coming in and the right group being contacted. Within that group it goes through the sequence of people in the roster, in the right order. It was also very easy to set up another simple workflow where we use Zoom and Google Meet for our bridge process. If somebody isn’t sure about something that is going on they can send out a “please jump on the bridge” message. We can use either the xMatters bridge or the Zoom or Google Meet bridges that we have set up. That helps us control access and costs because we’re already using Zoom and Google.
What do I think about the stability of the solution?
The stability has been great. I can’t think of a time in the last two years that it’s been down when we’ve needed it. They’ve done upgrades, but I can’t remember it ever being down.
What do I think about the scalability of the solution?
The pricing was good, from our perspective, for scaling. It hits the mark. As far as the technology goes, it seems to me that scaling is easy to manage. You start with the ability to put groups inside of groups and have nested rosters. There are workflows that are specific to groups or to particular processes and that makes it fairly easy to configure. I would expect it to be a scalable solution if we decided to roll it out in a significant way.
Currently, we have 105 people licensed, and 102 of them are in central IT. The other three are in the police department. Everybody in IT who is licensed is an active user because they are on call in whatever rotation has been defined. It’s yet to be decided if we will increase our usage. If we had to roll it out to other departments around the university, I don’t see it being an issue. But we are a heavily centralized IT operation here. We don’t have a lot of distributed IT infrastructure or staff. Pretty much everything has to flow through IT.
How are customer service and support?
xMatters customer support is quick. They literally react within minutes at times, after you put a ticket in. They’ve been great with any support issues we have had. That was especially true early on. We haven’t had one in a while, but when we had questions that weren’t bugs but just our not understanding something, they were getting back to us within minutes.
Which solution did I use previously and why did I switch?
We did not have something that was similar to xMatters. What we had was an old-fashioned analog method of on-call management, in which people would share a cell phone. The cell phone would be handed from person to person as they went off-call. We had staff who sat in our operations center, 24/7/365. They had the list of phone numbers in a document on their machines that gave them the cell phone numbers to call for each group. So there was a system, but it wasn’t a modern solution.
How was the initial setup?
We did a couple of walkthrough training sessions with xMatters staff. It involved a core group from our side, people who were going to be the admins or the main people using and configuring xMatters. I then did a handful of walkthroughs with different groups in our IT department. Those were about 45 minutes to an hour in length and I showed them the interface and how to add their devices. We did a little bit of documentation, but not much, about our process as it relates to xMatters. We then rolled it out. We did all of the training within a few weeks, once we got close to that “lights out” deadline at the end of December of 2019. In terms of our infrastructure, we just added the module for ServiceNow, filled in some details according to the documentation, and hit save. That was it. As for maintenance, the only thing we’ve had to do is add users and remove users. It’s a set-it-and-forget-it solution.
What was our ROI?
There have been savings in process and overhead that we have been able to realize. We no longer need to have our staff looking at a screen overnight, on weekends, and during the day, every day of the year. We repurposed those staff members to work of higher value.
Which other solutions did I evaluate?
We looked at a few alternative solutions. We considered all of them and looked at some demos, but we didn’t get as far as doing a full proof of concept. The main reason we ended up going with xMatters was that it seemed that a lot of the alternatives I mentioned were built on the premise of being the actual incident management tool, and not just an on-call management tool. We were very clear that we needed a tool to do on-call management, and that ServiceNow was going to be our incident management tool. We just needed something to bring people together by notifying their mobile devices or by making a phone call to alert them in the middle of the night. xMatters fit that perfectly.
What other advice do I have?
I don’t think I’ve ever had a complaint about xMatters, it just works.