Hydra Scratches an Engineering Itch
One of the great things about being a software engineer is that when we have an itch, we have the means to design and build a custom back-scratcher. In our monthly hackathons at xMatters, known as 555 Days, some of the most popular projects have come from an engineer writing a tool that scratches his own itch, only to find that the rest of us had an itch in the same spot. And when one of those engineers teams up with a database administrator who doesn’t have time to scratch everyone’s back for them…
The Itch
The itch we all had was a recurring need for xMatters engineers to query one or more of our hundreds of PostgreSQL databases. Every time a support issue required access to live data to aid in debugging, all query requests would have to funnel through Operations personnel – primarily Bricklen, our database administrator. This introduced delays in resolving support cases and was a source of aggravation for everyone involved.
But it wasn’t feasible to allow direct access to the databases to all engineering staff; and even if it had been, some issues required queries against dozens or even all of our databases. Logging into databases one-by-one, retrieving the data, and collecting it in a CSV report is still a tremendous time waster. (Just ask Bricklen)
Hydra makes it so easy to get answers that we’re thinking up questions we might not otherwise have asked.
The back-scratcher we needed was a simple and efficient interface that would allow us to issue the same query on one or more databases and collect all of the results. We found plenty of examples of interfaces that would let us connect to multiple databases, one database at a time, but nothing that would let us query a number of databases in parallel.
The Scratch
Since 555 Day projects may be team efforts, Bricklen and I teamed up to write the tool we needed. Named Hydra after the mythological monster who could spawn an endless number of heads, our application allows engineers and operations staff to select one or more servers from a list of xMatters databases and execute the same query against them all.
The results come back formatted as either CSV, HTML, or JSON, and they can be merged into a single CSV file that can be downloaded and further analyzed in Excel.
There is even some basic graphing functionality built in for those times when you need a quick visual display of your results. For example: “Graph how many notifications per hour we have sent over the past week.”
Security
Hydra was designed from the ground up with the safety of the databases in mind. The Hydra server is only accessible from within our internal network, and users must sign in with their network user name and password. User access and queries are logged.
The queries themselves are first checked that they are SELECT-only queries, and the database user who does the querying is limited to read-only privileges in the databases.
To avoid any impact on performance of our production servers, we have limited the number of concurrent queries to any given database cluster and added a timeout function so that queries cannot run for more than a few minutes. Hydra also provides a “cancel query” option to allow users to kill their running queries if they were executed in error or if they’re simply taking longer than expected.
Technical Details
Hydra is written for Node.js using the Express framework. Front-end scripting uses jQuery, the Ink user interface library, and D3.js for graphing. Nothing in Hydra is specific to xMatters systems, so we hope to be able to open-source the application soon.
Results
Hydra has become an integral part of our engineering and support activities. Not only has it improved turnaround time in resolving support cases, it has allowed us to make informed decisions that improve the xMatters application. We can test the performance of queries on production servers before they are released into production. We can look at usage patterns in order to design a user interface that supports the common case while allowing for outliers. Do we need to support customers with 10,000 groups? (Yes, we do!) Do we need to support groups with 10,000 members? (Yes, we do!) Is this typeahead query going to be fast enough? Is anyone still using this legacy feature?
Though the first version of Hydra was already fully functional and allowed querying multiple databases, it has turned into an ongoing project, being built out over several 555 Days and getting some engineering attention as part of our Continuous Improvement initiative. Hydra makes it so easy to get answers that we’re thinking up questions we might not otherwise have asked – almost scratching itches before we have them!