If you own a car, you already know the concept of preventive maintenance: change oil and filters regularly and the likelihood of your engine failure will dramatically decrease. Your engine is still fine, there aren’t any weird noises and it operates as it should, but you still plan on doing some maintenance because otherwise the costs of replacing a broken engine or gearbox will be much higher. And not only that, try to remember a case when the car broke NOT at the moment when you needed it most. All this means that without preventive maintenance, you are increasing your chances of losing money, time and maybe getting a couple of new grey hairs.
Data centers, as you imagine, require preventive maintenance as well, but the stakes are much higher: not only does downtime cost you money for repairs, you also lose customers and damage your brand.
So, based on the schedule you build depending on equipment age or workload importance, be sure to include the following into the checklist:
- Visual inspection to check that all components are clean and function within their designed specifications
- Environmental inspection to check your data center room conditions: temperature, airflow, dust, etc.
- Disk scans to ensure that you don’t have any unexpected data corruption. Add capacity planning to this to ensure you don’t run out of space
- Event log inspection to see the trends, security threats, or any unusual activity
- Test and then install patches and updates
- Schedule the next maintenance
What are the benefits of preventive maintenance?
If you stick to your schedule and carefully investigate and fix any problems you find in your data center, here’s what you gain:
- Enhanced efficiency of your equipment. Data centers consume a massive amount of energy, so if you want to lower your operational expenses, make sure your equipment works efficiently.
- Extended server lifespan. Keep in mind that server components running outside of their requirements will degrade faster, so don’t let them overheat, pay attention to electricity spikes, dust contamination, etc.
- Environmental sustainability. Today, this is one of the priorities you should keep in mind. Running environmental maintenance will make sure your data center is running more efficient.
How can preventive maintenance reduce business risks?
Unexpected downtime will affect not only your hardware, it will also cost you money. If you check the Availability Report, one hour of downtime of a “high-priority” application is estimated to cost around $67,000. You will also lose clients that are not satisfied with your RTO and RPOs.
These failures not only affect your customers but also place additional psychological risks on your employees: extra shifts, stress, and time pressure.
By performing preventive maintenance, you can take care of your equipment, protect your brand reputation, and keep employees happy.
Differences between preventive and reactive maintenance
This is easy: maintenance can be planned and unplanned. Preventive is planned and conducted to prevent any breakdowns. Unplanned is reactive and a response to a breakdown you need to react to.
My grandma used to say, “Don’t go solving problems that didn’t happen yet.” While that might be good advice in daily life and a easy concept to follow, that’s a bad strategy for a data center. Repairing or replacing broken equipment might seem like an easy plan, but it’s very costly and doesn’t meet modern RTO and RPO requirements. So, let’s agree that we will use some form of preventive maintenance.
What are the different various types of preventive maintenance?
There are a few types of preventive maintenance, but they have quite a lot in common and you can certainly combine them in your data center maintenance strategy.
This strategy means that you perform your equipment assessment based on time intervals. The goal here is to increase the reliability of your servers by regularly checking and cleaning the equipment and fixing any issues. The downside of this type of maintenance is that it’s time-costly and doesn’t take into account the asset wear.
This plan requires purchasing some additional condition-monitoring equipment. Special sensors will be measuring your asset wear based on vibrations, infrared heatmaps, ultrasonic scans, etc. Based on this data, you’ll be able to predict when a piece of your equipment is about to fail and fix or replace it upfront. However, this is costly and requires some skilled personnel to interpret the sensors data.
This is a more modern type of predictive maintenance that involves figuring out the reason for failures. This type of maintenance happens only when necessary. You gather data about your equipment and when you detect a possible failure of a certain component, only then you apply the required fixes. The main goal of this type of maintenance is to extend your equipment’s lifespan.
The blog was originally written by Dmitry Kniazev for Veeam Blogs.