High-Performance Big Data

MPI4Spark 0.3 Features

Based on Apache Spark 3.3.0
Support for the YARN cluster manager
Compliant with user-level Apache Spark APIs and packages
High performance design that utilizes MPI-based communication

Utilizes MPI point-to-point operations
(NEW) Enhanced the MPI Dynamic Process Management (DPM) logic for launching executor processes for the standalone cluster manager
(NEW) Relies on the Multiple-Program-Multiple-Data (MPMD) launcher mode for both YARN and the standalone cluster managers
(NEW) Supports MVAPICH versions 2.3.7 and 4.0

Built on top of the MVAPICH2-J Java bindings for MVAPICH2 family of MPI libraries
Tested with
- (NEW) OSU HiBD-Benchmarks, GroupBy and SortBy
- (NEW) Intel HiBench Suite, Micro Benchmarks, Machine Learning and Graph Workloads
- Mellanox InfiniBand adapters (EDR and HDR 100G and 200G)
- HPC systems with Intel OPA and Cray Slingshot interconnects
- Various multi-core platforms

MPI4Dask 0.3 Features

(NEW) Based on Dask Distributed 2022.8.1
Compliant with user-level Dask APIs and packages
Support for MPI-based communication for GPU-based Dask applications
Implements point-to-point communication co-routines
Efficient chunking mechanism implemented for large messages
Built on top of mpi4py over MVAPICH2, MVAPICH2-X, and MVAPICH2-GDR libraries
Support for MPI-based communication for CPU-based Dask applications
Supports starting execution of Dask programs using Dask-MPI
Tested with
- (NEW) Mellanox InfiniBand adapters (FDR, EDR, and HDR)
- (NEW) Various multi-core platforms
- (NEW) Various benchmarks used by the community (MatMul, Slicing, Sum Transpose, cuDF Merge, etc.)
- (NEW) Scaling out to hundreds of workers (processes)
- (NEW) NVIDIA V100 and A100 GPUs

RDMA-Apache-Hadoop-3.x 0.9.1 Features

Based on Apache Hadoop 3.0.0
Compliant with Apache Hadoop 3.0.0 APIs and applications
Support for RDMA Device Selection
High performance design with native InfiniBand and RoCE support at the verbs level for HDFS component
Supports deploying Hadoop with Slurm and PBS in different running modes (HHH and HHH-M)
Easily configurable for different running modes (HHH and HHH-M) and different protocols (native InfiniBand, RoCE, and IPoIB)
On-demand connection setup
HDFS over native InfiniBand and RoCE
- RDMA-based write
- RDMA-based replication
- Overlapping in different stages of write and replication
- Enhanced hybrid HDFS design with in-memory and heterogeneous storage (HHH)
  - Supports two modes of operations
    - HHH (default) with I/O operations over RAM disk, SSD, and HDD
    - HHH-M (in-memory) with I/O operations in-memory
  - Policies to efficiently utilize heterogeneous storage devices (RAM Disk, SSD, and HDD)
    - Greedy and Balanced policies support
    - Automatic policy selection based on available storage types
  - Hybrid replication (in-memory and persistent storage) for HHH default mode
  - Memory replication (in-memory only with lazy persistence) for HHH-M mode
Tested with
- Mellanox InfiniBand adapters (QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- RAM Disks, SSDs, and HDDs
- OpenJDK and IBM JDK

RDMA-Apache-Spark 0.9.5 Features

New features and enhancements compared to 0.9.4 release are marked as (NEW).

Based on Apache Spark 2.1.0
(NEW) Built with Apache Hadoop 2.8.0
(NEW) Initial support for POWER architecture
(NEW) Performance optimization and tuning on OpenPOWER clusters
(NEW) Support for various communication thread binding policies
(NEW) Support for RDMA Device Selection
High performance design with native InfiniBand and RoCE support at the verbs-level for Spark
- RDMA-based data shuffle
- SEDA-based shuffle architecture
- Support pre-connection, on-demand connection, and connection sharing
- Non-blocking and chunk-based data transfer
- Off-JVM-heap buffer management
Compliant with Apache Spark 2.1.0 APIs and applications
RDMA support for Spark SQL
Integration with HHH in RDMA for Apache Hadoop
Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)
Tested with
- (NEW) Various multi-core platforms (e.g., x86, POWER)
- (NEW) OpenJDK and IBM JDK
- Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms
- RAM Disks, SSDs, and HDDs

RDMA-Apache-Kafka 0.9.1 Features

Based on Apache Kafka 1.0.0
High performance design with native InfiniBand and RoCE support at the verbs-level for Apache Kafka
Compliant with Apache Kafka 1.0.0 APIs and applications
Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)
On-demand connection setup
Support for RDMA Device Selection
Tested with
- Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms

RDMA-Apache-Hadoop-2.x 1.3.5 Features

New features and enhancements compared to 1.3.0 release are marked as (NEW).

Based on Apache Hadoop 2.8.0
Compliant with Apache Hadoop 2.8.0 APIs and applications
(NEW) Support for Containers (Docker and Singularity)
Initial support for POWER architecture
Performance optimization and tuning on OpenPOWER cluster
Support for RDMA Device Selection
Hortonworks Data Platform (HDP) 2.5.0.3, and Cloudera Distribution Including Apache Hadoop (CDH) 5.8.2 APIs and applications
High performance design with native InfiniBand and RoCE support at the verbs level for HDFS, MapReduce, and RPC components
Plugin-based architecture supporting RDMA-based designs for HDFS (HHH, HHH-M, HHH-L, and HHH-L-BB), MapReduce, MapReduce over Lustre and RPC, etc.
- Plugin for Cloudera Distribution Including Apache Hadoop (CDH) (tested with 5.8.2)
- Plugin for Apache Hadoop distribution (tested with 2.7.3)
- Plugin for Hortonworks Data Platform (HDP) (tested with 2.5.0.3)
Supports deploying Hadoop with Slurm and PBS in different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre)
Easily configurable for different running modes (HHH, HHH-M, HHH-L, HHH-L-BB, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB)
On-demand connection setup
HDFS over native InfiniBand and RoCE
- RDMA-based write
- RDMA-based replication
- Parallel replication support
- Overlapping in different stages of write and replication
- Enhanced hybrid HDFS design with in-memory and heterogeneous storage (HHH)
  - Supports four modes of operations
    - HHH (default) with I/O operations over RAM disk, SSD, and HDD
    - HHH-M (in-memory) with I/O operations in-memory
    - HHH-L (Lustre-integrated) with I/O operations in local storage and Lustre
    - HHH-L-BB (Burst Buffer) with I/O operations in Memcached-based burst buffer (RDMA-based Memcached) over Lustre
  - Policies to efficiently utilize heterogeneous storage devices (RAM Disk, SSD, HDD, and Lustre)
    - Greedy and Balanced policies support
    - Automatic policy selection based on available storage types
  - Hybrid replication (in-memory and persistent storage) for HHH default mode
  - Memory replication (in-memory only with lazy persistence) for HHH-M mode
  - Lustre-based fault-tolerance for HHH-L mode
    - No HDFS replication
    - Reduced local storage space usage
MapReduce over native InfiniBand and RoCE
- RDMA-based shuffle
- Prefetching and caching of map output
- In-memory merge
- Advanced optimization in overlapping
  - map, shuffle, and merge
  - shuffle, merge, and reduce
- Optional disk-assisted shuffle
- Automatic Locality-aware Shuffle
- Optimization of in-memory spill for Maps
- High performance design of MapReduce over Lustre
  - Supports two shuffle approaches
    - Lustre read based shuffle
    - RDMA based shuffle
  - Hybrid shuffle based on both shuffle approaches
    - Configurable distribution support
  - In-memory merge and overlapping of different phases
Support for priority-based local directory selection in MapReduce Shuffle
RPC over native InfiniBand and RoCE
- JVM-bypassed buffer management
- RDMA or send/recv based adaptive communication
- Intelligent buffer allocation and adjustment for serialization
Tested with
- Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms (e.g., x86, POWER)
- RAM Disks, SSDs, HDDs, and Lustre
- OpenJDK and IBM JDK
- (NEW) Docker 18.03.0-ce
- (NEW) Singularity 2.4.6

RDMA-Memcached 0.9.6 Features

New features and enhancements compared to 0.9.5 release are marked as (NEW).

(NEW) Based on Memcached 1.5.3
- (NEW) Compliant with the Memcached’s new item chaining feature in In-Memory mode
- (NEW) Compliant with the latest Memcached’s LRU maintainer and slab balancer enhancements
Based on libMemcached 1.0.18
High performance design with native InfiniBand and RoCE support at the verbs-level for Memcached Server and Client
High performance design of SSD-assisted hybrid memory
(NEW) Runtime selection of HCA device for nodes equipped with multiple InfiniBand/RocE HCAs
(NEW) Enable and disable item chaining through extended server options
Compliant with libMemcached 1.0.18 APIs and applications
Non-Blocking Libmemcached Set/Get API extensions
- APIs to issue non-blocking set/get requests to the RDMA-based Memcached servers
- APIs to support monitoring the progress of non-blocking requests issued in an asynchronous fashion
- Facilitating overlap of concurrent set/get requests
Support for burst-buffer mode in Lustre-integrated design of HDFS in RDMA for Apache Hadoop-2.x
Support for both RDMA-enhanced and socket-based Memcached clients
Easily configurable for native InfiniBand, RoCE and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)
On-demand connection setup
Tested with
- Native Verbs-level support with Mellanox InfiniBand adapters (QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms
- SATA-SSD, PCIe-SSD, and NVMe-SSD

RDMA-Apache-Hadoop-1.x 0.9.9 Features

Based on Apache Hadoop 1.2.1
High performance design with native InfiniBand and RoCE support at the verbs-level for HDFS, MapReduce, and RPC components
Compliant with Apache Hadoop 1.2.1 APIs and applications
Easily configurable for native InfiniBand, RoCE, and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)
On-demand connection setup
HDFS over native InfiniBand and RoCE

RDMA-based write
RDMA-based replication
Parallel replication support

MapReduce over native InfiniBand and RoCE

RDMA-based shuffle
Prefetching and caching of map outputs
In-memory merge
Advanced optimization in overlapping

map, shuffle, and merge
shuffle, merge, and reduce

RPC over native InfiniBand and RoCE

JVM-bypassed buffer management
RDMA or send/recv based adaptive communication
Intelligent buffer allocation and adjustment for serialization

Tested with

Mellanox InfiniBand adapters (DDR, QDR, and FDR)
RoCE support with Mellanox adapters
Various multi-core platforms
Different file systems with disks and SSDs

RDMA-Apache-HBase 0.9.1 Features

Based on Apache HBase 1.1.2
High performance design with native InfiniBand and RoCE support at the verbs-level for Apache HBase
Compliant with Apache HBase 1.1.2 APIs and applications
Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)
On-demand connection setup
Tested with
- Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR)
- RoCE support with Mellanox adapters
- Various multi-core platforms

OSU HiBD-Benchmarks 0.9.3 Features

New features and enhancements compared to 0.9.2 release are marked as (NEW) .

Micro-benchmarks for Hadoop Distributed File System (HDFS)
- Sequential Write Latency (SWL) Benchmark
- Sequential Read Latency (SRL) Benchmark
- Random Read Latency (RRL) Benchmark
- Sequential Write Throughput (SWT) Benchmark
- Sequential Read Throughput (SRT) Benchmark
- Support benchmarking
  - Apache Hadoop 1.x and 2.x HDFS
  - Hortonworks Data Platform (HDP) HDFS
  - Cloudera Distribution of Hadoop (CDH) HDFS
Micro-benchmarks for Memcached
- Get Latency Benchmark
- Set Latency Benchmark
- Mixed Get/Set Latency Benchmark
- Non-Blocking API Latency Benchmark
- Hybrid Memory Latency Benchmark
- (NEW) Yahoo! Cloud Serving Benchmark (YCSB) Extension for RDMA-Memcached
Micro-benchmarks for HBase
- Get Latency Benchmark
- Put Latency Benchmark
Micro-benchmarks for Spark
- GroupBy
- SortBy

High-Performance Big Data (HiBD)