What is Linaro’s primary xMatters use case?
We use AWS CloudWatch to monitor our infrastructure, and when CloudWatch detects an anomaly, it sends an alarm to xMatters, which triggers a workflow. Depending on what the alarm is, the workflow will either try to remediate it automatically, e.g. if it’s the server running out of disc space, or it will look at our online dashboard to see if the affected server is in a maintenance window. If it is, it doesn’t do anything else, because an alarm would be expected during a maintenance window. If it’s not in any maintenance window, it generates an incident on the dashboard so that our customers can see that the system’s affected, and an xMatters alert for the on-call team; xMatters takes care of notifying whoever is currently on call that there’s a problem to be investigated.
How has xMatters helped Linaro?
We recently released a software as a service platform and that required us to provide 24/7 support, something the company’s never done before. I’d previously been using xMatters just within IT to monitor the systems for us, but not really for alerting us. For this new service, we decided to use xMatters to support our needs. There are teams in the UK, US, and Asia handling support hours, and four weekend teams that it rotates through. So, there’s that complexity that it handles for us. We’ve got monitoring of the systems with CloudWatch, but then feeding into xMatters to alert who’s on call. It then notifies the Slack channel for everyone so that you can see that something happened. Plus, we’ve also got it tied in with Jira service desk, so if a customer puts in a high priority ticket, one that has to be dealt with within four hours, it raises an xMatters incident so on-call staff know they’ve got to deal with it very quickly. We just would not have been able to do that if we didn’t know about xMatters.
xMatters helped to automate our incident notification processes. If CloudWatch tells us that something’s gone wrong, the workflow sets up an incident within xMatters and it notifies the people on call. It also notifies the management team just so they’re aware that something’s happened. Within xMatters, there’s an incident template you can use to record the steps you need to take to deal with an incident, so when it’s all dealt with you have everything in one place to create a post-mortem report from.
This automation of incident notification processes has immensely changed how we respond to incidents. It means that we can be on call on a weekend, but not have to sit in front of a computer. We can just go about our weekends, and when the phone goes off with an alert, then we know we’ve got an incident to deal with. It sets up a Slack channel specifically for that incident so that any chatter around what’s gone wrong and how to deal with it is kept in one place and not in the middle of the general conversation, and that’s all done automatically. It has absolutely helped build workflows that meet our needs. I’ve looked at other platforms, and I don’t think I’ve come across anything else that allows you to write code that executes within the workflow, and it has absolutely 100% solved many of our pressing issues.
These workflows also helped to address issues proactively. The classic one is the workflow to deal with the server running at disk space. So, we have it set up so that if the amount of free space falls below 15%, it triggers an alarm and the alarm triggers the workflow; the workflow doubles the space, and it handles this situation before the server actually runs out of space and that’s helped us a lot as well.
We use coding to expand the flexibility. The disk expansion one is 100% JavaScript, there are no xMatters bits in there at all. It’s entirely written by myself, and the benefit is that xMatters themselves don’t have any support for calling AWS APIs, so I had to work out how to do that. It took quite a bit of work, but it’s something I’ve now made open source so anyone else who wants to call the APIs for xMatters, it’s all there for them to get on with. We can have different teams being assigned different areas of responsibility, so if an alarm goes off, you target the specific group for that responsibility. It means you’re getting the right person at the right time.
What is most valuable?
One of the things that really attracted me to xMatters is the workflows, where you can write your own custom steps in JavaScript. You are not restricted to the steps that they provide. If you can write it in JavaScript, you can pretty much do anything. It gives me flexibility in ways that other platforms don’t. For example, the online dashboard system we use is not a widely used one, but they have an API. I’m able to write JavaScript steps to do things like check if a system is in a maintenance window, or create an incident, the dashboard or change the status of an incident. I’m not dependent on the dashboard provider or xMatters creating steps for me.
We have integrated xMatters with CloudWatch and the dashboard. We’ve actually got two different dashboards depending on which platform we’re monitoring. I’ve integrated with that, Slack, and Google Chat. It’s really easy to integrate it with third-party products. You’ve got third-party platforms for data management, but even if they don’t have something out-of-the-box, so long as the product you’re trying to integrate with has an API and you are fairly conversant in JavaScript, you can do it yourself. We also use REST API. It’s really strong at helping to customize processes and information. The only shortcoming, I would identify is that when they’re rolling out new features, the REST API can take a release or two to catch up, and that’s because they’ll be firming up on what the functionality is of the feature before allowing you to then start accessing it via the API. Initially, it’s only handled by built-in steps. The Rest API is really powerful.
For how long has Linaro used xMatters?
I have been using xMatters for two to three years.
What do you think about the stability of xMatters?
It’s very stable. There’s quarterly releases of new features. We’ve never had an outage on xMatters at all. It’s rock-solid from our perspective.
What do you think about the scalability of xMatters?
It’s really scalable. I don’t think they give much away about how it’s running behind the scenes, but they don’t seem to place any constraints on how many workflows you have, what you do in the workflows, how many agents you have, that sort of thing. I don’t remember any limitations that they announced. We’re paying for 15 users at the moment. Most of them are support agents for the SaaS product.
How are xMatters customer service and support?
The staff for xMatters is brilliant. When we first started using xMatters we were on their free plan. The great thing about their free plan is that it only really constrains you to the number of agents you can have using it. There are no constraints on workflows or anything like that, which is unlike other products. With xMatters, it’s only the number of users, but even there, you can get full technical support from them. When I first started writing my own steps in the workflows, not only do they help you, but they encourage you. You get really positive feedback from them and that helps you to feel positive about the changes you’re making.
Which solution did you use previously and why did you switch?
I have experience with a competitor product. The intuitiveness becomes a trade-off. I think that if a system offers a simple level of managing who’s on call and things like that, then it is more intuitive to use, but you are constrained by that simplicity.
What was the ROI?
Initially, we were using it at zero cost and it was 100% meeting our needs, and I can’t say fairer than that. And then when I was asked by the department that was setting up this SaaS product what I would suggest. I said use what I’m using. xMatters will 100% meet your needs and I’ve got the experience of using it. We didn’t even look any further because we knew we had a product that would do what we needed it to do.
What’s my experience with pricing, setup cost, and licensing?
I think it is excellent value for money. I can’t remember what we’re paying now, but the per agent cost is extremely reasonable for what the platform does. It’s entirely agnostic of where you are getting your alarms from. You could even trigger an alarm by email if you want. It’s that open to what triggers an event.
What other advice do you have?
I would rate it a nine out of ten. It’s not perfect, but it’s really close to it. My advice would be to give it a try. It literally costs nothing to try and there are a lot of integrations that you can easily add that xMatters provides. You don’t have to do coding. You don’t have to know JavaScript. It’s really easy to put the steps onto a workflow and join them together. If you check for results and branch off to do different things depending on what the results are, there’s basically a lot you can do without having to do any coding, but if you’re comfortable with JavaScript, then the sky’s the limit. You can really go for it.