Caffe-MPI for Deep Learning

Introduction

Caffe-MPI is developed by AI & HPC Application R&D team of Inspur. It is a parallel version for multi-node GPU cluster, which is designed based on the NVIDIA/Caffe forked from the BVLC/caffe ( https://github.com/NVIDIA/caffe, more details please visit http://caffe.berkeleyvision.org).

Features

(1) The design basics

The Caffe-MPI is designed for high density GPU clusters; The new version supports InfiniBand (IB) high speed network connection and shared storage system that can be equipped by distributed file system, like NFS and GlusterFS. The training dataset is read in parallel for each MPI process. The hierarchical communication mechanisms were developed to minimize the bandwidth requirements between computing nodes. The intra node communication is done by NCCL through PCIE, while the inter node communication is made by MPI through IB. The GPU direct RDMA has been supported in this version.

(2) High performance and high scalability

The GoogLeNet model has been tested with this new version on a GPU cluster, which includes 4 nodes, and each of which has 4 M40 GPUs. A 13X speedup has been achieved on 16 GPUs compared with 1 GPU. The batch size used in the tests is 128, and the datasets is ImageNet.

(3) Good inheritance and easy-using

Caffe-MPI retains all the features of the original Caffe architecture, namely the pure C++/CUDA architecture, support of the command line, Python interfaces, and various programming methods. As a result, the cluster version of the Caffe framework is user-friendly, fast, modularized and open, and gives users the optimal application experience.

Try your first MPI Caffe

This program can run 1 processes at least.

cifar10

Run data/cifar10/get_cifar10.sh to get cifar10 data.
Run examples/cifar10/create_cifar10.sh to conver raw data to leveldb format.
Run mpi_train_quick.sh to train the net.
Example of mpi_train_quick.sh script. mpirun -host node1,node2 -mca btl_openib_want_cuda_gdr 1 --mca io ompio -np 2 -npernode 1 ./build/tools/caffe train --solver=examples/cifar10/cifar10_quick_solver.prototxt --gpu=0,1,2,3

Reference

More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server
Deep Image: Scaling up Image Recognition

Ask Questions

For reporting bugs, please use the caffe-mpi/issues page or send email to us.
Email address: [email protected]

Author

Shaohua Wu; Shutao Song.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
3rdparty		3rdparty
CMakeLists.txt		CMakeLists.txt
CONTRIBUTING.md		CONTRIBUTING.md
CONTRIBUTORS.md		CONTRIBUTORS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Makefile		Makefile
Makefile.config.example		Makefile.config.example
README.md		README.md
Scripts_SSH		Scripts_SSH
caffe.cloc		caffe.cloc
cmake		cmake
data		data
docker		docker
docs		docs
examples		examples
include		include
matlab		matlab
models		models
packaging		packaging
python		python
scripts		scripts
src		src
tools		tools

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Caffe-MPI for Deep Learning

Introduction

Features

(1) The design basics

(2) High performance and high scalability

(3) Good inheritance and easy-using

Try your first MPI Caffe

cifar10

Reference

Ask Questions

Author

About

Releases 1

Packages

Languages

License

Sampson1107/test02

Folders and files

Latest commit

History

Repository files navigation

Caffe-MPI for Deep Learning

Introduction

Features

(1) The design basics

(2) High performance and high scalability

(3) Good inheritance and easy-using

Try your first MPI Caffe

cifar10

Reference

Ask Questions

Author

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages