-
Notifications
You must be signed in to change notification settings - Fork 66
Installing Software Environment Modules on HPC Systems
Follow these instructions and policies for installing and maintaining software environment modules on HPC systems.
Libraries and applications are built around the concept of 'toolchains'; at present a toolchain is defined as a specific version of a compiler and MPI library or lack thereof. Applications are typically built with only a single toolchain, whereas libraries are built with and installed for potentially multiple toolchains as necessary to accommodate ABI differences produced by different toolchains. Workflows are primarily composed of the execution of a sequence of applications which may use different tools and might be orchestrated by an application or other tool. The toolchains presently supported are:
- impi-intel
- openmpi-gcc
- comp-intel (no MPI)
- gcc (no MPI)
Loading one of the above MPI-compiler modules will also automatically load the associated compiler module (currently gcc 4.8.2 and comp-intel/13.1.3 are the recommended compilers). Certain applications may of course require alternative toolchains. If demand for additional options becomes significant, requests for additional toolchain support will be considered on a case-by-case basis.z
Here are the steps for building an associated environment module for the installed mysoft software. First, create the appropriate module location
% mkdir -p /nopt/nrel/apps/modules/candidate/modulefiles/mysoft # Use a directory and not a file.
% touch /nopt/nrel/apps/modules/candidate/modulefiles/mysoft/1.3 # Place environment module tcl code here.
% touch .version # If required, indicate default module in this file.
Next, edit the module file itself ("1.3" in the example). The current version of the HPC Standard Module Template is:
#%Module -*- tcl -*-
# Specify conflicts
# conflict 'appname'
# Prerequsite modules
# prereq 'appname/version....'
#################### Set top-level variables #########################
# 'Real' name of package, appears in help, display message
set PKG_NAME pkg_name
# Version number (eg v major.minor.patch)
set PKG_VERSION pkg_version
# Name string from which enviro/path variable names are constructed
# Will be similar to, be not necessarily the same as, PKG_NAME
# eg PKG_NAME-->VisIt PKG_PREFIX-->VISIT
set PKG_PREFIX pkg_prefix
# Path to the top-level package install location.
# Other enviro/path variable values constructed from this
set PKG_ROOT pkg_root
# Library name from which to construct link line
# eg PKG_LIBNAME=fftw ---> -L/usr/lib -lfftw
set PKG_LIBNAME pkg_libname
######################################################################
proc ModulesHelp { } {
global PKG_VERSION
global PKG_ROOT
global PKG_NAME
puts stdout "Build: $PKG_NAME-$PKG_VERSION"
puts stdout "URL: http://www.___________"
puts stdout "Description: ______________________"
puts stdout "For assistance contact [email protected]"
}
module-whatis "$PKG_NAME: One-line basic description"
#
# Standard install locations
#
prepend-path PATH $PKG_ROOT/bin
prepend-path MANPATH $PKG_ROOT/share/man
prepend-path INFOPATH $PKG_ROOT/share/info
prepend-path LD_LIBRARY_PATH $PKG_ROOT/lib
prepend-path LD_RUN_PATH $PKG_ROOT/lib
#
# Set environment variables for configure/build
#
##################### Top level variables ##########################
setenv ${PKG_PREFIX} "$PKG_ROOT"
setenv ${PKG_PREFIX}_ROOT "$PKG_ROOT"
setenv ${PKG_PREFIX}_DIR "$PKG_ROOT"
####################################################################
################ Template include directories #######################
Only path names
setenv ${PKG_PREFIX}_INCLUDE "$PKG_ROOT/include"
setenv ${PKG_PREFIX}_INCLUDE_DIR "$PKG_ROOT/include"
# 'Directives'
setenv ${PKG_PREFIX}_INC "-I $PKG_ROOT/include"
####################################################################
################## Template library directories #####################
Only path names
setenv ${PKG_PREFIX}_LIB "$PKG_ROOT/lib"
setenv ${PKG_PREFIX}_LIBDIR "$PKG_ROOT/lib"
setenv ${PKG_PREFIX}_LIBRARY_DIR "$PKG_ROOT/lib"
# 'Directives'
setenv ${PKG_PREFIX}_LD "-L$PKG_ROOT/lib"
setenv ${PKG_PREFIX}_LIBS "-L$PKG_ROOT/lib -l$PKG_LIBNAME"
####################################################################
- The tags 'pkg_name', 'pkg_version', 'pkg_prefix', 'pkg_root' and 'pkg_libname' should be replaced by the appropriate names for the library or application.
- Specify any module prerequisites and/or conflicts.
- Provide content for the URL and Description in the 'ModulesHelp' procedure
- Give a one-line description for the 'module whatis' line. This should reflect pkg_name, pkg_version, and the toolchain used to build.
- If needed, augment or edit the pre-defined environment settings (Note: the default procedure is to use 'prepend-path' to permit bottom-up construction of an environment stack_)._
- The ${PKG_PREFIX}_LIBS variable could require manual changes by developers due to non-standard and/or multiple library names.
The current module file template is maintained in a version control repo at [email protected]:hpc/hpc-devel.git. The template file is located in hpc-devel/modules/modTemplate. To see the current file
git clone [email protected]:hpc/hpc-devel.git
cd ./hpc-devel/modules/
cat modTemplate
Next specify a default version of the module package. Here is an example of an an associated .version file for a set of module files
% cat /nopt/nrel/apps/modules/candidate/modulefiles/mysoft/.version
#%Module#########################################
vim: syntax=tcl
set ModulesVersion "1.3"
The .version file is only useful if there are multiple versions of the software installed. Put notes in the modulefile as necessary in stderr of the modulefile for the user to use the software correctly and for additional pointers.
NOTE: For modules with more than one level of sub-directory, although the default module as specified above is displayed correctly by the modules system, it is not loaded correctly—if more than one version exists, the most recent one will be loaded by default. In other words, the above will work fine for dakota/5.3.1 if 5.3.1 is a file alongside the file dakota/5.4, but not for dakota/5.3.1/openmpi-gcc when a dakota/5.4 directory is present. In this case, to force the correct default module to be loaded, a dummy symlink needs to be added in dakota/ that points to the module specified in .version.
Example
% cat /nopt/nrel/apps/modules/default/modulefiles/dakota/.version
#%Module########################################
# vim: syntax=tcl
set ModulesVersion "5.3.1/openmpi-gcc"
% module avail dakota
------------------------------------------------------------------
/nopt/nrel/apps/modules/default/modulefiles -------------------------------------------------------------------
dakota/5.3.1/impi-intel dakota/5.3.1/openmpi-epel dakota/5.3.1/openmpi-gcc(default)
dakota/5.4/openmpi-gcc dakota/default
% ls -l /nopt/nrel/apps/modules/default/modulefiles/dakota
total 8
drwxrwsr-x 2 ssides n-apps 8192 Sep 22 13:56 5.3.1
drwxrwsr-x 2 hsorense n-apps 96 Jun 19 10:17 5.4
lrwxrwxrwx 1 cchang n-apps 17 Sep 22 13:56 default -> 5.3.1/openmpi-gcc
Software which is made accessible via the modules system generally falls into one of three categories.
- Applications: these may be intended to carry out scientific calculations, or tasks like performance profiling of codes.
- Libraries: collections of header files and object code intended to be incorporated into an application at build time, and/or accessed via dynamic loading at runtime. The principal exceptions are technical communication libraries such as MPI, which are categorized as toolchain components below.
- Toolchains: compilers (e.g., Intel, GCC, PGI) and MPI libraries (OpenMPI, IntelMPI, mvapich2).
Often a package will contain both executable files and libraries. Whether it is classified as an Application or a Library depends on its primary mode of utilization. For example, although the HDF5 package contains a variety of tools for querying HDF5-format files, its primary usage is as a library which applications can use to create or access HDF5-format files. Each package can also be distinguished as a vendor- or developer-supplied binary, or a collection of source code and build components (e.g., Makefile(s)).
For pre-built applications or libraries, or for applications built from source code, the basic form of the module name should be
{package_name}/{version}
For libraries built from source, or any package containing components which can be linked against in normal usage, the name should be
{package_name}/{version}/{toolchain}
The difference arises from two considerations. For supplied binaries, the assumed vendor or developer expectation is that a package will run either on a specified Linux distribution (and may have specific requirements satisfied by the distribution), or across varied distributions (and has fairly generic requirements satisfied by most or all distributions). Thus, the toolchain for supplied binaries is implicitly supplied by the operating system. For source code applications, the user should not be directly burdened with the underlying toolchain requirement; where this is relevant (i.e., satisfying dependencies), the associated information should be available in module help output, as well as through dependency statements in the module itself.
Definitions:
{package_name}: This should be chosen such that the associated Application, Library, or Toolchain component is intuitively obvious, while concomitantly distinguishing its target from other Applications, Libraries, or Toolchain components likely to be made available on the system through the modules. So, "gaussian" is a sensible package_name, whereas "gsn" would be too generic and of unclear intent. Within these guidelines, though, there is some discretion left to the module namer.
{version}: The base version generally reflects the state of development of the underlying package, and is supplied by the developers or vendor. However, a great deal of flexibility is permitted here with respect to build options outside of the recognized {toolchain} terms. So, a Scalapack-enabled package version might be distinguished from a LAPACK-linked one by appending "-sc" to the base version, provided this is explained in the "module help" or "module show" information. {version} provides the most flexibility to the module namer.
{toolchain}: This is solely intended to track the compiler and MPI library used to build a source package. It is not intended to track the versions of these toolchain components, nor to track the use of associated toolkits (e.g., Cilk Plus) or libraries (e.g., MKL, Scalapack). As such, this term takes the form {MPI}-{compiler}, where {MPI} is one of
- openmpi
- impi (Intel MPI)
and {compiler} is one of
- gcc
- intel
- epel (which implies the gcc supplied with the OS, possibly at a newer version number than that in the base OS exposed in the filesystem without the EPEL module).
For general support, modulefiles can be installed in three top locations:
- /nopt/Modules/3.2.10/modulefiles (Modules supplied by HP, not supported by the HPC Modeling & Simulation group)
- /nopt/nrel/apps/modules (majority of modules, most common location)
- /nopt/nrel/ecom/modules (specialized modules for certain groups/applications)
In addition, more specific requests can be satisfied in two other ways:
- /projects/X/modules (Modules useful to a single project with allocated resource)
- /home/$USER/modules (Modules useful to a single user)
For the '/nopt/nrel/apps' modules location (where most general installations should be made), the following sub-directories have been created:
- /nopt/nrel/apps/modules/candidate/modulefiles
- /nopt/nrel/apps/modules/default/modulefiles
- /nopt/nrel/apps/modules/deprecated/modulefiles
- /nopt/nrel/apps/modules/hpc/modulefiles
to manage how modules are developed, tested and provided for production level use. An example directory hierarchy for the module files is as follows:
[wjones@login2 nrel]$ tree -a apps/modules/default/modulefiles/hdf5-parallel
/apps/modules/default/modulefiles/hdf5-parallel/
├── .1.6.4
│ ├── impi-intel
│ ├── openmpi-gcc
│ └── .version
├── 1.8.11
│ ├── impi-intel
│ └── openmpi-gcc
└── .version
[wjones@login2 nrel]$ tree -a apps/modules/default/modulefiles/hdf5
apps/modules/default/modulefiles/hdf5
├── .1.6.4
│ └── intel
├── 1.8.11
│ ├── gcc
│ └── intel
└── .version
[wjones@login2 nrel]$ module avail hdf5
------------------------------------------------------- /nopt/nrel/apps/modules/default/modulefiles -------------------------------------------------------
hdf5/1.8.11/gcc hdf5-parallel/1.8.11/impi-intel(default)
hdf5/1.8.11/intel(default) hdf5-parallel/1.8.11/openmpi-gcc
-
There are three file paths for which this document is intended. Each corresponds to a status of modules within a broader workflow for managing modules. (The other module locations are not directly part of the policy).
- /nopt/nrel/apps/modules/candidate/modulefiles: This is the starting point for new modules. Modules are to be created here for testing and validation prior to production release. Modules here are not necessarily expected to work without issues, and may be modified or deleted without warning.
- /nopt/nrel/apps/modules/default/modulefiles: This is the production location, visible to the general user community by default. Modules here carry the expectation of functioning properly. Movement of modulefiles into and out of this location is managed through a monthly migration process.
- /nopt/nrel/apps/modules/deprecated/modulefiles: This location contains older modules which are intended for eventual archiving. Conflicts with newer software may render these modules non-functional, and so there is not an expectation of maintenance for these. They are retained to permit smooth migration out of HPC software stack (i.e., users will still have access to them and may register objections/issues while retaining their productivity).
-
"modifications" to modules entail
- Additions to any of the three stages;
- Major changes in functionality for modules in …/default or …/deprecated;
- Archiving modules from …/deprecated; or,
- Making a module "default"
These are the only acceptable atomic operations. Thus, a migration is defined as an addition to one path and a subsequent deletion from its original path.
-
Announcements to users may be one of the following six options:
- Addition to …/candidate—"New Module";
- Migration from …/candidate to …/default—"Move to Production";
- Migration from …/default to …/deprecated—"Deprecate";
- Removing visibility and accessibility from …/deprecated—"Archive"; or,
- Major change in functionality in …/default or …/deprecated—"Modify"
- Make default—"Make default"
Changes outside of these options, e.g., edits in …/candidate, will not be announced as batching these changes would inhibit our ability to respond nimbly to urgent problems.
-
A "major change in functionality" is an edit to the module that could severely compromise users’ productivity in the absence of adaptation on their part. So, pointing to a different application binary could result in incompatibilities in datasets generated before and after the module change; changing a module name can break workflows over thousands of jobs. On the other hand, editing inline documentation, setting an environment variable that increases performance with no side effects, or changing a dependency maintenance revision (e.g., a secondary module load of a library from v3.2.1 to v3.2.2) is unlikely to create major problems and does not need explicit attention.
-
All module modifications are to be documented in the Sharepoint Modules Modifications table prior to making any changes <(this table is linked at http://cs.hpc.nrel.gov/modeling/hpc-sharepoint-assets).
-
Module modifications are to be batched for execution on monthly calendar boundaries, and (a) announced to [email protected] two weeks prior to execution, and (b) added to http://hpc.nrel.gov/users/announcements as a new page, which will auto-populate the table visible on the front page. Endeavor to make this list final prior to the first announcement.
-
Modules may not be added to or deleted from …/default without a corresponding deletion/addition from one of the other categories, i.e., they may only be migrated relative to …/default, not created or deleted directly.