Reef Keeping and IT Automation: A retrospective

April 18, 2018 admin

We hear about automation constantly, in fact ever since the industrial revolution it’s been an almost love-it-or-hate-it truth; if it can be automated, it will be. Short of being Luddites, we generally adapt to these advancements in technology and integrate them into our lives. What once was a time-consuming chore like washing clothes, is now just the push of a button for most of us. It would seem silly to fight against what seems like common sense, but often times there are both risks and rewards, and how we apply automation can vary the results greatly.

I’ve worked on the IT side of automation for the past 12 years and through that journey – you learn your environment, your ecosystem, so to speak. What works, what doesn’t. The things that make noise at night are always the ones that get the most attention. Though how do you know something is wrong in the first place? By pinpointing specific variables.

For much, the same applies to any type of environment, or ecosystem. When you have a complex system, ultimately, this turns into multiple variables. And those variables are constantly changing, sometimes based on consumption or growth, other times it’s us, directly changing the system. And that brings up one of the first important points: Some changes happen without our direct intervention.

We can tightly control how we interact with the system. In the IT world, you can talk all day up and down about change control, code management, and so on. However, those are the types of changes we have full control over, we are taking a direct action, and with those controls in place, if something breaks, we know exactly why. If you dump some chemicals in your tank and bad things happen right afterwards, we know exactly why. If you make a code change and the system breaks, we know why.

Those types of things are easy wins for automation. They’re literally no longer a problem once automated, short of a failure with that automation system. Instead of manually dumping in chemicals, maybe use a dosing pump. Instead of manually making changes to the system – use a code deployment tool.

I think we all know by now, that’s not what’s going to keep us up at night. It’s going to be the weird stuff, the out-of-the-ordinary type things. And with that – we need not just alerting, but trending. We need to know what variables are changing, and from what they are changing to and from, over time. The big debate you could have is whether to directly take action on those alerts, or wait for human intervention. And the same applies whether it’s computers, or reef tanks. Ultimately, some things might be too risky to act upon automatically. You have a downward PH spike, you have a server or service that’s down. Should you add PH buffer automatically, or say, try to restart the server automatically?

These are the types of things that sometimes take a bit of situational awareness to act upon, and taking the wrong action could even result in further catastrophe. For a simple service, restarting it might be okay. For a complicated database cluster, attempting an automatic restart might result in data loss. Likewise, if the PH drop was due to a heavy over feeding, adding a ton of PH buffer could very well destroy the tank. If it’s due to increase in coral consumption, it might be the correct action, how do you tell the difference?

And that’s where trending or monitoring comes in, or to put it even more simply, graphs. With graphs of the variable, over time, we should have a good idea of our general rate of consumption. If that variable is suddenly low, and doesn’t meet our expected rate of consumption, then we need to next determine if there was any new variables introduced that might have caused this. If there was – this is the likely culprit. If there was not, it’s worth investigating further to make sure there are no other problems, and manual intervention may then be needed if this is determined to be the new pattern of growth. In the case of a reef tank, this might mean that over time, your corals have been growing well, and healthy. As they grow larger, they consume more calcium, alkalinity, and other trace elements. For a server, this might mean that your service is growing in usage, and needs more resources.

In the case of growth, there is much work in the IT world to automate even that – through automatic scaling of services and servers. And likewise, with the advent of things like the kh guardian and the upcoming trident – even consumption on the reef side may eventually be further accounted for through automation. And with that – it almost comes full circle, except, even those systems need to be built and maintained. And so now instead of just maintaining a server, or a tank, we’re now maintaining an entire ecosystem of services that maintain it all for us.

Sometimes though, things are lost through automation. Autonomy, being the big one. Most of us don’t make our own clothes anymore, and don’t carry the skill-sets needed. Many of us do own a cloud-enabled device, though. This is a concept that quite simply – didn’t exist 20 years ago. If you had a device, a product, you bought it, and generally, it operated autonomously. It didn’t need to connect to the cloud nor did it receive any “firmware updates” or other such things. If you did need such an update, it often meant replacing an EPROM chip, clearly an option only saved for emergencies. What this meant is that the quality and standards before releasing to the market had to be that much greater. You couldn’t afford to have a major bug in the code, a recall would be a huge ordeal. With the cloud, and with automation, it is easy to forget how stranded on an island you will be, if the automation ceases to work. You can release a product with bugs, because you can “fix it in the cloud.” Another big thing often lost through such automation: quality control.

In the IT world – failures of core systems can be disastrous, and backups and disaster recovery are usually a core component to any IT architecture. This is likely one of the most overlooked areas of reef automation. If your aquarium controller “brain” is fried – and your equipment is non-functional without that brain, your system is down until you can replace it. Likewise, when you build an automated ecosystem that over-heavily relies on a particular component, or has component “feature creep” – trying to do too many things at once with one tool, it is a risk. It goes back to the UNIX philosophy – make tools that do one thing, do that thing well, and interact well with other tools.

With many systems – both in the IT and reef world – there is a tendency to fall into vendor lock-in traps. Oh, we have this cool product, and it does all these things you want! What they don’t tell you is that it requires their full support package, or their cloud-enabled hub, or excludes integration with anything but their brand, or starts overlapping and feature-creeping with other products you already use with no integration, etc, etc. For some things – the days that you can just plug in a box and have it work independently, are long gone. This can be a scary gesture, if that very thing could mean life or death for your system.

All this means is that with automation, we need to use our due-diligence when vetting products, even more so than before. There will be systems with glaring engineering flaws and those systems still having PR and marketing firms promoting those products. It’s always our jobs as the consumers to weed through and pick the very best tools, whether it be for automation, or otherwise. With the proper use of automation, it’s possible to maintain much more complicated systems with much fewer hands, however, it’s a double-edged sword, and when that automation fails, you now have a very complicated system to maintain with not enough hands to do it. Always have a plan B. Know how to do it manually too. Pick good vendors. Don’t overly rely on vendor lock-in type features. Tools that interact well should be preferred over those that don’t. Avoid proprietary interfaces.