Search
Close this search box.

The Art of Maximising Data Centre Uptime

Archive
Share

The-Art-of-Maximising-Data-Centre-Uptime

By Avi Sethi, Executive Director EFS Facilities Services (India) Pvt. Ltd

A Data Centre is a facility that houses an organization’s critical IT infrastructure, including servers, storage systems, networking equipment and other hardware and software. It is designed to support the storage, processing, and distribution of large amounts of data, and is often used to support mission-critical business processes.

Types of Data Centres

Data centres can be classified into three main categories:
1. Enterprise Data Centres: Owned and operated by a single organisation
2. Colocation Data Centres: Owned by third parties and rented out to multiple organisations
3. Cloud Data Centres: Owned and operated by cloud service providers and used to deliver cloud-based services to customers.

The Role of FM

The data centre is seen as a mission-critical facility worldwide as they support many mission-critical verticals ranging from Internet-based services such as banking and financial services to hospitals & healthcare and the industrial sector.

Managing these facilities is a complex and ever-evolving task due to the management of HVAC and Fire and Safety infrastructure and Power Usage Efficiency (PUE) to disaster planning and security.

Since reliability has become the single most critical factor, this has increased the complexity in a data centre since failure to operate is not an option. Managing and operating a data centre is very different from managing a commercial office building or a factory as the margins for error between a well-managed data centre and one which has inefficient management are growing narrower by the day.

Facility management professionals are responsible for ensuring that all of the systems and equipment in the data centre are operating correctly and efficiently. This includes everything from the servers and storage systems to the power and cooling systems, as well as the building infrastructure and support systems.

To accomplish this, firstly, FM professionals must have a deep understanding of the various systems and equipment in the data centre, as well as the processes and procedures that are needed to maintain and repair these systems. This requires a combination of technical expertise and practical experience, as well as strong problem-solving and communication skills.

Maximal Uptime

One of the key challenges that facilities management professionals face in a data centre is the need to maintain high levels of uptime and availability. Data centres are critical to the operations of many businesses, and even a brief outage can have significant consequences.

As such, facilities management professionals must be prepared to respond quickly to any issues that arise and take the necessary steps to restore service as quickly as possible.

To achieve this level of reliability, data centres rely on a number of systems and processes, including redundant power and cooling systems, as well as robust monitoring and alerting systems. Facilities management professionals must be proficient in these systems and be able to identify and troubleshoot issues as they arise.

Energy Efficiency

Another challenge that facilities management professionals face is the need to balance efficiency and cost. Data centres are energy-intensive facilities, and the cost of powering and cooling these facilities can be significant. As such, facilities management professionals must be proactive in identifying opportunities to improve the energy efficiency of the data centre, while also ensuring that the facility remains cost-effective.

This can involve a range of activities, including the implementation of best practices, as well as skill in managing and optimising the various systems and processes to ensure that they are running as efficiently as possible.

FM Skilling

In addition to these challenges, facilities management professionals must also be prepared to handle the ongoing evolution of technology. Data centres are constantly evolving as new technologies and systems are developed and adopted, and facilities management professionals must be able to adapt to these changes and stay up to date on the latest technologies and trends in the industry, as well as developing the skills and expertise needed to support these technologies.

In addition to technical expertise, FM professionals in a data centre must also possess strong communication and interpersonal skills. Data centres are complex facilities with many different systems and components, and FM professionals must be able to work effectively with a variety of stakeholders, including IT professionals and multiple clients.

Effectively managing and operating in this type of environment dictates that facility management companies and their staff adapt a ‘mission critical mentality’ that focuses on risk mitigation and grasps the interconnectedness of facility and IT systems. The facilities team that embodies this mindset will be in a much better position to successfully implement and manage an effective O&M program.

The facilities team should keep in mind certain focus areas: EHS, people management, performance monitoring and review, BCP plans and response, infrastructure maintenance schedule and calendars, energy management, change management, documentation management, training, quality controls and financial management.

Data Center’s O&M is Critical

Operations and maintenance (O&M) is critical to any data centre’s operations, as they ensure that all equipment and systems are running smoothly and efficiently. Regular maintenance helps identify small problems early and avoid them escalating into a bigger issue.

One of the main causes of unplanned downtime for businesses is poor data centre maintenance and this can lead to a significant financial loss as well as damage to their reputation.

The O&M team plays a critical role in the operation and maintenance of a data centre. They are responsible for ensuring that all equipment and systems are running smoothly and efficiently, and for performing regular maintenance to prevent problems from occurring.

There are several key reasons why O&M is important in a data centre:

Reliability and uptime: O&M is essential for ensuring the reliability and uptime of a data centre. By performing regular maintenance tasks, the O&M team can help to prevent equipment failures and downtime, which can have serious consequences for businesses that rely on the data centre for mission-critical processes.

Efficiency: O&M can help to improve the efficiency of a data centre by ensuring that all equipment and systems are operating at their optimal levels. This can help to reduce energy costs and improve the overall performance of the data centre.

Compliance: O&M is also important for ensuring that a data centre meets all relevant regulatory and compliance standards. This can include standards related to data security, environmental regulations, and more.

Cost savings: Proper O&M can help to reduce costs in a data centre by minimising the need for repairs and replacements, and by improving the efficiency of the data centre’s systems and equipment.

O&M Components

Physical infrastructure: The physical infrastructure of a data centre includes all of the electrical, mechanical, and HVAC systems that are used to support the operation of the data centre.

The O&M team is responsible for monitoring and maintaining these systems to ensure that they are operating safely and efficiently. This can include tasks such as inspecting and cleaning equipment, replacing worn or damaged parts, and performing regular maintenance to prevent problems from occurring.

Networking and security: The networking and security systems of a data centre are critical to the operation of the facility. The O&M team is responsible for managing and maintaining these systems, including routers, switches, firewalls, and other networking equipment.

This can include tasks such as configuring and testing network connections, implementing security measures, and monitoring network performance.

Storage and backup: The storage and backup systems of a data centre are used to store and protect the data that is processed and distributed by the facility.

The O&M team is responsible for managing and maintaining these systems, including hard drives, tape drives, and other storage media. This can include tasks such as configuring and testing backup systems, monitoring storage usage, and managing data retention policies.

Maintenance tasks: In addition to the specific tasks related to the various systems and equipment in the data centre, the O&M team is also responsible for performing regular maintenance tasks to ensure that everything is operating properly.

This can include tasks such as cleaning and replacing parts as needed and conducting regular inspections to identify potential issues before they become problems.

Emergency response: The O&M team is also responsible for responding to emergencies and disruptions to the data centre’s operation. This can include tasks such as identifying and addressing issues with equipment or systems, and coordinating with other teams to ensure that the data centre is able to continue operating smoothly.

Best Practices

The following are some of the most important best practices that support data centre operations and maintenance:

Ensure Uptime by Creating Redundancies

The biggest challenge for data centres is looking to create alternate pathways for networked equipment and communication channels in case of a failure. These redundancies support the creation of a backup system that allows staff to perform maintenance or to install system upgrades without any interruption to service.

Data centres run on a tier system which is ranked from 1 to 4 and dictates how much uptime customers can expect to receive.

  • Tier 1: Does not have any redundancies, the lowest guarantee of uptime is at 99.671%, with an average downtime of 28.8 hours expected yearly.
  • Tier 2: Includes partial redundancy for powering and cooling critical systems and offers 99.749% uptime, with an expectation of 22 hours downtime yearly.
  • Tier 3: Allows for simultaneous maintenance with an expected downtime of 1.6 hours yearly and up to 99.982% uptime.
  • Tier 4: Offers full redundancy and guarantees 99.995% uptime, expected yearly downtime is of only 2.4 minutes.

Keep Indoor Climate Stable

All computers, IT and other equipment often require pre-defined temperatures and levels of humidity in order to function properly and protect data storage systems and software.

Environmental management systems can offer a solution through usage of IoT devices to monitor temperature and humidity, identify hot pockets, and alert when filters within HVAC systems require replacing or cleaning.

Create Stronger Testing Protocols

There should be an established protocol for regularly testing software for updates and any other new technology prior to deployment. There should be robust testing for any new technology that is added to the system as this then becomes critical to preventing failures.

Implement Predictive Maintenance

Data centre operations and maintenance best practices focus on the goal of continuous operations, setting up strategies to supply sufficient resources and defining roles and responsibilities.

Routine inspections and preventative maintenance tasks are often performed at pre-defined intervals to keep systems and components from failing. Smart technologies used for monitoring can transform maintenance tasks by leveraging use of analytics.

An advanced analytics platform with machine learning capabilities can anticipate maintenance needs by identifying trends and predicting when equipment reaches the thresholds at which it will likely fail.

Staff

Employees play an integral role in ensuring all the system operates at optimum levels and so the staff so hired and deployed should be trained to implement best practices, and all tasks and responsibilities should be made clear to them.

Maintaining Cleaning Standards

Modern technology is averse to dust. Along with preventative maintenance, creating a clean environment within a data centre will extend the lifespan of equipment and limit downtime. Provision of dust control mats at entrances, banning food and drink in sensitive areas, routinely mopping floors and keeping equipment like diesel generators, HVAC filters, electrical systems, etc clean all help a data centre run efficiently.

Maintain Emergency Preparedness

Even with the best infrastructure, most capable staff and smart systems, data centres can still encounter some risks. Therefore, preparing for unplanned disruptions, even though they might never occur, will ensure that employees can react to these emergencies more effectively.

Part of this involves developing detailed emergency operating procedures that helps staff identify what to do in case certain scenarios occur.

Such kind of preparedness helps personnel with advanced knowledge on how to identify and isolate certain faults and thereafter quickly restore services.

Many of these operations can be automated and monitored with the help of IoT sensors, and triggered according to analytics software when needed, while being overseen by competent and well-trained technicians.

Partner with an Expert

With all that is involved in data centre operations and maintenance, best practices play a critical role in protecting critical systems and ensuring continuous service for its clientele. Partnering with the right service provider makes implementing those best practices easier.

 

Source: Clean India Journal

Scroll to Top