When you boil it right down, we provide and build a proprietary data platform that predicts and prevents injury in industrial workplaces. There are some critical principles we leverage to do so, which are extracted from software engineering and applicable across the many different technologies we use.
When it comes to data handling, it’s all in the architecture. Building a platform starts with a vision of what you want the platform to do: How many data points should it hold? How many people does it need to support? What are the limits that we need to consider when building it?
All of this thinking is best considered up front. Misunderstanding these broad strokes can be costly, or difficult to recover after the fact.
“If the system is built in a way that relies on data always being in a very closed, specific format, then what occurs when data in the pipeline changes?
Robert C. Martin has been at the forefront of helping engineers structure their work and designs. Using software design principles, like his lauded SOLID system, are useful for the engineer, designer, or builder to create architectures - think: blueprints - that are resilient to failures from inception of the system through its lifecycle.
SOLID is one of the myriad tools we use to create a stable, reliable platform for data handling. Here, I’ll share how these principles influence the structure of data systems in general, and StrongArm’s more specifically.
Single responsibility: an object within the system has a single responsibility - only one part of the specification affects the specification of the object.
In the single responsibility principle, a part of the system is well defined in what it can or cannot do. It has a single purpose: “get data,” “write data,” etc. As we manage information from sensors in the field, each piece of the system performs only one task on that piece of data. At the first stage of the pipeline we decompress the data as it arrives. This is done once, and only once, before data is operated upon.
Open–closed: “Software entities ... should be open for extension, but closed for modification.”
This principle defines the structure of the thing (object) we are deploying as part of the pipeline. The object (code, infrastructure, network, device) needs to be put into a consistent form for ease of maintainability. Organize objects consistently with the data they handle. Simplify the infrastructure to basic structure. If done properly, the working pipeline stays in service as new pieces are added to extend the functionality. If not followed, then the architecture doesn’t permit extensions to the service without breaking the flow of data across the pipeline. This issue impacts testing; if not followed, the pipeline becomes unwieldy and eventually untestable, ultimately requiring a rewrite.
Liskov substitution: Objects in a system should be replaceable with other instances without altering the system.
With a clear definition of the responsibilities of each stage, the data pipeline can and should be modularized. This means taking the problem we’re working on and breaking it into coherent, replaceable pieces. This helps avoid many different problems in the future, such as:
- “No one understands what that part of the pipeline does. People avoid changing anything there, forever stalling the business.”
- “This provider offering cloud services is now charging a LOT of money for a critical feature.” Thus, holding the company hostage with technology choices if the technology cannot be swapped.
- “Spaghetti code” - where the engineer has chosen to build everything into a very tightly coupled system with lots of different features. This object becomes impossible to monitor, manage, and operate for anyone except the original engineer. Undoing this in a production environment becomes a massive challenge since any change to the code introduces major fluctuations in the object’s delivery and requires extremely deep testing every time.
- Interface segregation: "Many client-specific interfaces are better than one general-purpose interface."
Since many different pieces of the system must handle and pass data, it’s best to send it along a pipeline that lends itself to flexibility. This helps keep the system from becoming monolithic - unmovable and unyielding in complexity. The monolith approach is unscalable and more likely to fail under scaling concerns as more data needs to be processed and parallelized to keep up with data flow. In a rush to put features out the door, this becomes an all-too-familiar trap: “We’ll just add a new feature instead of correcting it.”
Dependency inversion: “Depend upon abstractions, [not] concretions.”
With data comes challenges of how the data is formatted, how it’s addressed in a data pipeline, where it’s sourced, and of course, how it’s been timestamped. If the system is built in a way that relies on data always being in a very closed, specific format, then what occurs when data in the pipeline changes?
The pipeline must be nearly re-written to accommodate new information. This usually causes cost overruns - everything is duplicated for a pipeline to run in parallel. It can also cause extensive testing time, where every stage of the pipeline has to be tested on every release of new data and this is duplicated for each pipeline deployed. Using abstractions to manage data points makes the system more flexible to change. Be kind to your future self and abstract data into the right data structures and naming conventions.
Deploying the principles of SOLID, StrongArm has architected the low risk, speedy, maintainable platform that powers industrial workplaces with data-derived, predictive analytics to protect Industrial Athletes™ as they move.
But, a SOLID foundation is just the first step.