Building the System – The High Availability Configuration Spec

Software Installation Guide Table of Contents

In this Chapter:

A high-availability Lustre file system managed by Integrated Manager for Lustre software requires that your entire storage system configuration and all interfaces comply with the High Availability Configuration Specification presented in this chapter.

If you are creating a Lustre file system that will use OpenZFS as the backend, see the guide Lustre Installation and Configuration using Integrated Manager for Lustre software and OpenZFS.

Overall System Configuration

The high-level configuration of an HA file system managed by Manager for Lustre software consists of the following. See Figure 1.

Note: After you have completely configured the system and installed Integrated Manager for Lustre software on the manager server, you will be ready to create the Lustre file system using the Integrated Manager for Lustre software. Note that installation consists of installing Integrated Manager for Lustre software on the manager server only. For HA file systems, the software automatically installs required packages on the file system’s servers to support HA. This avoids the need to manually install the Integrated Manager for Lustre software on storage servers and avoids possible errors.

During this physical configuration of your file system hardware, be sure to write down how servers and storage are configured so you can later assign primary and failover servers to each volume (using the Integrated Manager for Lustre software GUI). Also, keep records of how failover power control has been implemented (IPMI or PDUs) as this will be needed later.

The following figure shows the high-level HA system configuration.

lustre-configuration4.png

Note: All references herein to the manager GUI refer to the Integrated Manager for Lustre software graphical user interface.

Manager Server Requirements

The manager server is a dedicated server on which the Integrated Manager for Lustre software is installed. This is unique from the management server. Requirements for the manager server are listed next.

Note: Before using the Red Hat or RHEL software referenced herein, please refer to Red Hat’s website for more information, including without limitation, information regarding the mitigation of potential security vulnerabilities in the Red Hat software.

Manager for Lustre software is supported on:

Management Server and Metadata Server Requirements

The management server (MGS) is configured as a failover server with the metadata server (MDS), and vice-versa, so the MGS and MDS share the same configuration requirements.

Note: The MGS is separate from the independent server running Integrated Manager for Lustre software.

The following figure depicts the configuration, interconnect requirements and targets for the MGS and MDS.

mgt_mdt_config2.png

The MDS and MGS, both independent servers, share the following requirements.

Management Target

Metadata Target

Object Storage Server and Target Requirements

The object storage server (OSS) provides access to the object storage target(s) (OST). There is no specific limit to the number of OSSs. For HA, each OSS must have a failover twin. This means that OSSs are paired. Each OSS pair can provide access for to up to 8 targets or LUNS. The maximum capacity for an OST is 128 terabytes. Figure 3 depicts the configuration and interconnect requirements for HA OSSs and OSTs. See the Lustre 2.12.4 File System Operations Manual, Chapter 5, Setting up a Lustre File System for more information.

oss_config.png

Object Storage Server(s) and Target(s) Configuration

Requirements for HA object storage servers and targets are as follows:

Power Control to Support Failover

High availability requires the ability to shut down a failing server so that it will not interfere with file system operations, allowing the backup (failover) server to assume its role. This control can be provided by using power distribution units (PDUs) or IPMI. To comply with this High Availability Configuration Specification, you must use either PDU control or IPMI, but not both.

Intelligent Platform Management Interface

High availability requires that you configure IPMI or power distribution units to support failover. The Intelligent Platform Management Interface (IPMI) enables server failover support. For this configuration, each managed server requires an IPMI controller that connects directly to the management network via a dedicated Ethernet port. A failing HA server is automatically power-cycled and access to its target storage devices is provided by the backup server. Power-cycling the failed server forces it to relinquish control of its resources and allows administrators to troubleshoot it.

After the failed server is repaired and ready for return-to-service, it is not automatically brought back online as the primary server (failed-back). Fail-back is performed manually, by the administrator at the Integrated Manager for Lustre software GUI.

Note: See Issues Regarding Power Loss to the BMC or PDU

After you have connected and configured IPMI, see Appendix A, IPMI Checks.

Power Distribution Units

High availability requires that you configure IPMI or power distribution units to support failover. Power distribution units (PDUs) can be used to give control over the power supplied to a HA server to its peer server. If one server of an HA pair detects the failure of its peer server, the detecting server turns off power to the PDU outlets connected to the failing server. If you chose to use PDUs for power control, be sure to note which PDUs and outlets are connected to which servers. Also, for redundancy, be sure that the primary and backup power outlets connected to each server reside on different PDUs. After configuring PDUs and noting PDU/server assignments, you will later configure these assignments on the Integrated Manager for Lustre software Power Control tab.

Note: See Issues Regarding Power Loss to the BMC or PDU

Issues Regarding Power Loss to the BMC or PDU

Regarding failover, if the method of power control is not functioning (e.g., loss of power to the fencing device, misconfiguration, etc.), HA will be unable to fail the targets from the failed server to its failover server. This is because in order to complete failover, the failover server must be able to guarantee that the failed server can no longer access targets running on it. The only way to be sure this is true is to remove power from the failed server. Thus, the failover server must be able to communicate with the fencing device of the failed server for failover to occur successfully.

With IPMI, the power for each HA server and its fencing device is coupled together. Accordingly, there are more scenarios where both may lose power at once (chassis power failure, motherboard failure, etc.). If a server suffers chassis power failure such that the BMC is not operational, HA will be unable to fail the targets over. The remedy in this situation is to restore power to the chassis of the failed server to restore the functionality of your file system. If HA coverage for the scenarios just described is important to you, we strongly recommend using smart PDUs, rather than IPMI as your fencing device.

Power loss to a PDU will mean that HA will be unable to fail the targets over. As in the above situation, the remedy is to restore power to the PDU to restore the functionality of your file system. We recommend redundant PDUs if availability is critical.

Top of page