Introducing Integrated Manager for Lustre software

Online Help Table of Contents

In this Chapter:

Enterprises and institutions of all sizes use high performance computing to solve today’s most intense computing challenges. Just as compute clusters exploit parallel processors and development tools, storage solutions must be parallel to deliver the sustained performance at the large scales that today’s applications require. The Lustre file system is the ideal distributed, parallel file system for high performance computing.

Accordingly, as storage solutions continue to grow in complexity, powerful, yet easy-to-use software tools to install, configure, monitor, manage, and optimize Lustre-based solutions are essential. Integrated Manager for Lustre software is purpose-built to simplify the deployment and management of Lustre-based solutions. Integrated Manager for Lustre software reduces management complexity and costs, enabling storage superusers to exploit the performance and scalability of Lustre storage and to accelerate critical applications and work flows.

Integrated Manager for Lustre software greatly simplifies the creation and management of Lustre file systems, using either the graphical user interface (GUI) or a command line interface (CLI). The GUI dashboard lets you monitor one or more distributed Lustre file systems. Real-time storage-monitoring lets you track Lustre file system usage, performance metrics, events, and errors at the Lustre level. Plug-ins provided by storage solution providers enable monitoring of hardware-level performance data, disk errors and faults, and other hardware-related information.

Integrated Manager for Lustre software, when integrated with Linux, aggregates a range of storage hardware into a single Lustre file system that is well-proven for delivering fast IO to applications across high-speed network fabrics such as InfiniBand* and Ethernet. An existing Lustre file system that has been set up outside of Integrated Manager for Lustre software can be monitored, but not managed by the manager. In this case, Lustre commands can be used to manage metadata or object storage servers in the Lustre file system.

Top of Page

Overview of Integrated Manager for Lustre software

Integrated Manager for Lustre software is a global single-namespace file system architecture that allows parallel access by many clients to all the data in the file system across many servers and storage devices. Designed to take advantage of the reliability features of enterprise-class storage hardware, Integrated Manager for Lustre software supports availability features such as redundant servers with storage failover. Metadata and data are stored on separate servers to allow each system to be optimized for different workloads. The components of an Integrated Manager for Lustre software file storage system include the following:

The servers on which the MGT, MDT, or OSTs are located can all be configured as high-availability (HA) servers, so that if a server for a target fails, a standby server can continue to make the target available.

Lustre Configuration

Top of Page

Key Features

The following entries are key features of Integrated Manager for Lustre software:

GUI-based creation and management of Lustre file systems

The Integrated Manager for Lustre software provides a powerful, yet easy-to-use GUI that enables performance monitoring and management of multiple Lustre file systems. In future releases, the GUI will support rapid Lustre file system creation, high availability configuration and support for expansion.

Graphical charts display real-time performance metrics

Color charts display a variety of real-time performance metrics for single or multiple file systems with detailed output for both individual servers and targets. These metrics are rendered using various charts.

Auto-configured high-availability clustering for server pairs

Pacemaker and Corosync are configured automatically when the system design follows configuration guidance. This removes the need for manually installing HA configuration files on storage servers, and simplifies high-availability configuration. See High-availability file system support.

PDU configuration and server outlet assignments support automatic failover

The PDU window lets you configure and manage power distribution units. At this window you can add a detected PDU and assign specific PDU outlets to specific servers. When you associate PDU failover outlets with servers using this tool, STONITH is automatically configured.

IPMI and BMC Configuration

An alternative to PDU configuration, support for Intelligent Platform Management Interface and baseboard management controllers support server monitoring, high-availability configuration, and failover.

Integrated Manager for Lustre client software can be installed and configured to run on Intel® Xeon Phi™ Coprocessor clients. This means that the Intel® Xeon Phi™ Coprocessor clients can directly mount Lustre.

Hierarchical Storage Management

Integrated Manager for Lustre software includes support for hierarchical storage management. HSM provides a way to free up file system storage capacity by archiving the less-frequently accessed files into secondary, archival storage. You can configure the HSM framework directly from the Integrated Manager for Lustre software GUI.

Robinhood Policy Engine

The Robinhood policy engine has been incorporated into Lustre and is included with Integrated Manager for Lustre software. Integrated Manager for Lustre software performs the provisioning of the Robinhood agent server, which is performed via the manager GUI. Robinhood can be used with the HSM capabilities described above to automate HSM archiving and report generation.

Apache Hadoop* adapter software

Integrated Manager for Lustre software is supported by the Apache Hadoop* adapter software; this adapter is available for download separately. Hadoop software allows users who run MapReduce jobs to bypass storing data in HDFS and store the MapReduce output directly to Lustre instead. This allows the analytical processes direct access to scientific output instead of transferring data from the compute cluster storage system to another file system. Optimizations have also been made to the shuffle step in MapReduce to take advantage of Lustre’s high-speed network access to data. Many workloads will see an overall reduction in end-to-end processing time by using the Hadoop adapter with the Integrated Manager for Lustre software file system. For more information, see Hadoop Adaptor for Lustre.

Automated Provisioning of Custom Lustre Service Nodes

This feature allows users to create custom profiles for new Lustre client types and based on a given profile, deploy and install custom code to provide new services. HSM copytool (above) is deployed in this way. Other services might include Samba file services, etc.

Simplified ISO-less installation and automated deployment mechanism streamlines overall installation

The installation strategy removes the need to manually install the software on each server. Integrated Manager for Lustre software is quickly installed on the manager server while required packages are automatically deployed to all storage servers. Storage servers and the manager server can run the same standard operating system as the rest of your estate. Additional software built for CentOS or Red Hat will also work on servers managed by Integrated Manager for Lustre software.

Note: The manager server is the server where the Integrated Manager for Lustre software dashboard is installed.

Support for OpenZFS in Management Mode

Integrated Manager for Lustre software supports ZFS as a back-end file system replacement for ldiskfs. It has the ability to configure and manage high-availability Lustre storage solutions and discover / manage ZFS file systems. See Creating and Managing ZFS-based Lustre file systems.

Integrated Manager for Lustre software ZFS Snapshots

The OpenZFS file system provides integrated support for snapshots, a data protection feature that enables an operator to checkpoint a file system volume. In Integrated Manager for Lustre software, Intel® has developed a mechanism in Lustre that leverages ZFS to take a coordinated snapshot of an entire Lustre file system, if all of the storage targets in the file system are formatted using ZFS.

HPC Job Scheduler integration with MapReduce

Integrated Manager for Lustre software works with the HPC job scheduler to integrate MapReduce; however, the job scheduler integration is a separate download. The HPC job scheduler integration supports Apache Hadoop. This adapter for job schedulers allows you to integrate common resource schedulers into your cluster. You have the choice of installing the SLURM (Simple Linux Utility for Resource Management) job scheduler integration or the PBS (portable batch system) job scheduler integration.

Hadoop commonly uses Yarn to manage MapReduce jobs. Installing more than one job scheduler (such as SLURM and Yarn) on a single system can cause problems. The HPC Job Scheduler integration with MapReduce replaces YARN with an interface to the main resource manager for the system. This allows MapReduce applications to be run as normal HPC jobs.

Apache Hive compatibility

Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. Intel® has tested the Hadoop adapter for Lustre provided with Integrated Manager for Lustre software for compatibility with Apache Hive version 2.3.

Apache Hbase compatibility

HBase is a non-relational, distributed database modeled after Google’s BigTable and written in Java*. Hbase runs on top of HDFS (Hadoop Distributed File System). Intel® has tested the Hadoop adapter for Lustre provided with Integrated Manager for Lustre software for compatibility with Apache Hbase version 2.5.

Lustre 2.12.4

This release of Integrated Manager for Lustre software is based on the Intel® Foundation Edition for Lustre 2.12.4 release tree, representing a major update to the underlying Lustre version for the Integrated Manager for Lustre software (as of version 6.3.0.0).

Online Lustre File System Consistency Checks (LFSCK)

LFSCK is an administrative tool that was first introduced in Lustre software release 2.3 for checking and repairing attributes specific to a mounted Lustre file system. LFSCK is similar in concept to an offline FSCK repair tool for a local file system, but LFSCK is implemented to run as part of the Lustre file system while the file system is mounted and in use. LFSCK allows consistency checking and repair by the Lustre software without downtime, and can be run on the largest Lustre file systems with negligible disruption to normal operations.

Distributed Namespace

Distributed Namespace (DNE) allows the Lustre metadata to be distributed across multiple metadata servers. Integrated Manager for Lustre software supports DNE1 (as of release 2.3.0.0), which supports the use of multiple MDTs. This enables the size of the Lustre namespace and metadata throughput to be scaled with the number of OSSs. This feature will be supported in future releases of the Integrated Manager for Lustre software GUI.

DNE II Striped Directories Support (Preview)

Striped directories support (Distributed Name Space, phase 2) is available in Integrated Manager for Lustre software, as of version 3.0, as a technology preview. Striped directories allow operators to shard directory entries across multiple metadata storage targets, providing both namespace and metadata performance scalability.

Single Client Metadata Concurrency

Also referred to as multi-slot last_rcvd , this update to the metadata communications interface between client and server allows multiple metadata RPCs to be in flight in parallel, per-client for both read and write transactions. Prior to this release, any client RPCs that modified file system metadata (for example, creates or unlinks), were sent serially to the server. With this update, this restriction is removed.

Differentiated Storage Services

Differentiated Storage Services (DSS) allows I/O data to be classified, sometimes referred to as hinting . These hints pass seamlessly through Integrated Manager for Lustre software, at which point data can be tiered and intelligently cached by the storage system. This enables a more efficient use of cache space and decreases the likelihood of critical data being evicted when the cache fills. Intel® is working directly with storage and cache vendors to enable DSS hinting in Lustre appliances, and to provide optimized performance to Integrated Manager for Lustre software deployments with a mix of SSD and traditional storage.

Support for Intel® Omni-Path Architecture

Intel® Omni-Path fabric support is available for Integrated Manager for Lustre software systems running RHEL 7.7. (Intel® OPA driver support requires RHEL 7.1 or newer, and so is not available for RHEL 6.x based systems.)

LNet Configuration

This feature assists in configuring LNet for a given server’s network interface by setting the LNet network ID for that port. IML supports the configuration of multiple LNet interfaces.

Dynamic LNet Configuration

Dynamic LNet configuration (DLC) is a powerful extension of the LNet software to simplify system administration tasks for Lustre networking. DLC allows an operator to make changes to LNet (for example, network interfaces can be added and removed, or parameters changed, ) without requiring that the kernel modules be removed and reloaded. Parameters can be altered while LNet is still running, meaning that tuning and optimization can be conducted while Lustre is still running on the target node. Dynamic LNet configuration also applies to LNet routers, so that routes can be added, removed and updated without affecting other Lustre network traffic.

Kerberos Network Authentication and Encryption

Kerberos provides a means for authentication and authorization of participants on a computer network, as well as providing secure communications through authentication. This functionality has been applied to Integrated Manager for Lustre software for the purposes of establishing trust between Lustre servers and clients, and optionally, supporting encrypted network communications.

Top of Page

Management mode versus Monitor-only mode

Management Mode Explained

The Integrated Manager for Lustre software lets you create and manage new HA Lustre file systems from its GUI. For each HA file system, the GUI and dashboard let you create, monitor, and manage all servers and their respective targets. The software lets you define failover servers to support HA. RAID-based fault tolerance for storage devices is implemented independent of Integrated Manager for Lustre software.

To provide robust HA support, Integrated Manager for Lustre software automatically configures Corosync and Pacemaker, and takes advantage of IPMI or PDUs to support server failover.

Note: Managed HA support requires that your entire storage system configuration and all interfaces be compliant with a pre-defined configuration. See The High Availability Configuration Spec for more details.

Note: Management mode is supported in Integrated Manager for Lustre software, versions 6.3.0.0 and later. No claims of support are made for any versions of Lustre outside of that shipped with Integrated Manager for Lustre software.

Monitor-only Mode Explained

Monitor-only mode allows you to “discover” an existing Lustre file system using Integrated Manager for Lustre software. You can then monitor the file system in the Integrated Manager for Lustre software dashboard. All of the charts presented on the manager dashboard to monitor performance and statistics, are available in monitor-only mode.

Monitor-only mode can be used to establish monitoring for file systems that don’t fully conform to the High Availability Configuration Specification. In this situation, the Corosync and Pacemaker configuration modules provided with Integrated Manager for Lustre software are not automatically deployed. This means that Integrated Manager for Lustre software cannot configure the file system for server failover.

Note: RAID-based fault tolerance for storage devices are implemented independent of Integrated Manager for Lustre software.

Top of Page

Overview of the graphical user interface

This section provides an overview of the Integrated Manager for Lustre software GUI. For a complete description of the GUI, see Graphical User Interface.

The Integrated Manager for Lustre software GUI presents a set of intuitive windows that let you monitor and manage Lustre file systems. File system setup and configuration are going to be added in future GUI releases. The Dashboard window provides access to these capabilities. Click the following links for further information:

Dashboard

The Dashboard displays a set of charts that provide usage and performance data at several levels in the file systems being monitored. At the top level, this window displays an aggregate view of all file systems. You can select to view and monitor individual file systems and servers in the Dashboard.

The following is a partial view of the Dashboard.

Dashboard

Charts

The Dashboard presents several charts that display rich visual information about the current and historical performance of each Lustre file system.

The following charts are available. For more information, see the overview of the Graphical User Interface.

Management menu

The Management menu provides access to the following several windows, where you can monitor and manage file systems:

The following is a view of the Management menu:

Management menu

Logs

The Logs window displays log information and lets you filter events by date range, host, service, and messages from Lustre or all sources. The logs window also features linkable host names.

Logs

Activities

Activity messages provide information about the functioning and health of a managed filesystem.

Activities

The counter next to the activity icon reflects the number of active issues associated with the cluster. The color of the icon will change between green, yellow, and red according to the severity of the highest active issue.

There are five types of activity messages. Each message will be displayed with a color that represents its severity. Gray messages represent running commands, green messages represent successfully executed commands, blue messages represent general information, yellow messages represent warnings, and red messages represent errors. See more information in the overview of the Graphical User Interface.

Status Page

Access the Dashboard from a smartphone or tablet

You can access the Integrated Manager for Lustre software GUI from your smartphone or tablet. To access the GUI from your smartphone or tablet, your device needs to be running the latest version of Chrome or Firefox browser:

  1. Point your device’s browser to the manager server running the Integrated Manager for Lustre software. The window will fit within the device’s screen.
  2. To view the menu bar, click

Mobile Button

  1. To hide the menu bar, click

Mobile Button

again.

Top of Page