Monday, February 25, 2019

DataOps at MAQ Software

For nearly a decade, we have specialized in building, delivering, and operating business intelligence and data analytics solutions. During this time, we have seen significant growth in the volume and complexity of data. The increased complexity results from two primary causes: 1. Metric definitions are increasingly complex and involve calculations with multiple data points (compared to simple metrics in the past). 2. The variety of data points has increased over time. We’ve also seen an increasing number of interdependencies between projects. These interdependencies are a result of solutions that are combined or built on top of one another.

Despite the increased volume, complexity, and number of interdependencies, modern business practices demand that developers must create and deploy data-based solutions faster than ever. To address the challenges of delivering projects in such a complex environment, we have adopted agile methodologies, DevOps practices, and statistical process control. Together, these processes have come to form what we now call DataOps.

We have embraced agile methodologies since our inception, and we continue to deliver business intelligence and data analytics projects using these methodologies. Incremental delivery—one of the core agile practices—has enabled us to deliver business value to our customers early in the development cycle, allowing them to immediately unlock the potential of their data assets. Agile methodologies have proven practical in projects where early requirements are often difficult to ascertain. Incremental delivery allows our customers to continuously develop their requirements as they begin to better appreciate the story their data can tell. We have found that the close customer collaboration afforded by agile practices is vital in ensuring the success of our data projects.

We also have long embraced DevOps practices. These practices hasten, automate, and streamline the development, validation, integration, and deployment of data solutions. By introducing automation at all stages of the development life cycle, we have shortened the time it takes for data solutions to reach production. This means that we can push changes to production on demand, with minimal human intervention or errors. Automation has significantly reduced the cost of releasing incremental changes. As a result, it is now possible to issue several releases to production every day. From code check-in, to code quality checks, to continuous integration, to automated validations, to automated deployments, automation has streamlined the entire release process. In many cases—due to the ever-increasing complexity and interdependency of our projects—automation is not just a convenience, but a necessity.

More recently, we improved the efficiency of live data pipelines by creating ongoing alerts. These monitoring mechanisms are a set of automated test cases that run at each processing stage of the data pipeline. Because data is processed at various stages of the pipeline on an ongoing basis, it is crucial to ensure that the check-gates along the data pipeline prevent incorrect data from flowing through the system. Statistical process control, missing data, excessive data volumes, and wide variations in the average values of key metrics calculated from data are all red flags that prompt timely alerts to DevOps team members and trigger mechanisms that prevent the flow of data through the system. These monitoring and control mechanisms help maintain the quality and integrity of the data in the live data pipelines. Because of these processes, customers can manage their day-to-day business operations with the confidence that they have received accurate insights from their data.

Using agile methodologies to develop data solutions, DevOps to build and deploy solutions, and statistical process control techniques to monitor and operate the data pipelines has led to tremendous benefits for our workflow and—more importantly—our customers. Agile methodologies gave us the flexibility and speed required to compete effectively in dynamic business environments. The ability to incrementally build data solutions, use them early, and use the feedback from their usage to define further requirements has been instrumental in ensuring that solutions remain relevant from conception to deployment. DevOps practices helped teams overcome the traditional bottleneck of deploying solutions to production, shortening the time from conception to deployment and improving the ease of integration and deployment. DevOps practices also resulted in the ability to move small changes to production more frequently, minimizing the risk of regression issues and the resulting downtime. Statistical process control techniques ensured that live solutions continue to operate as expected. Data is now churned through the solutions in a reliable manner, ensuring the ongoing delivery of value from the data.

The combination of agile methodologies, DevOps, and statistical process control techniques has evolved over time into DataOps. DataOps is the logical combination of highly proven methods of software development, delivery, and operations. DataOps is driven by the need of businesses to unlock the value of their data assets in a timely, reliable, consistent, and continuous manner. With DataOps practices, the time it takes for development has decreased despite increases in volume, complexity, and the number of interdependencies in modern data solutions.

DataOps, however, is not a revolution, nor is it groundbreaking. It’s what we’ve been doing all along. It is the result of methodologies designed to handle complex data requirements with ever-increasing efficiency. By adopting these methods, we have incrementally improved our processes and increased the value that we deliver to our customers. Whether we think of the workflow as a combination of agile, DevOps, and statistical process control—or as DataOps—the resulting delivery benefits are undeniable. As data processing demands become more complex, we will continue to pursue the most efficient means of data processing, support, delivery, and operations.