The HiBD team members participated in multiple events during Supercomputing '16. Slides of the presentations are available.


The HiBD team to Provide Big Data Computing Expertise for Neuroscience in NSF BD-Spoke Project.


Overview

Welcome to the High-Performance Big Data project created by the Network-Based Computing Laboratory of The Ohio State University. The HiBD packages are being used by more than 200 organizations worldwide in 27 countries (Current Users) to accelerate Big Data applications. As of Dec '16, more than 18,850 downloads have taken place from this project's site. The HiBD project contains the following packages:


RDMA-based Apache HBase (RDMA-HBase)

The RDMA for Apache HBase package is a derivative of Apache HBase. This package can be used to exploit performance on modern clusters with RDMA-enabled interconnects for Big Data applications. Major features of RDMA for Apache HBase 0.9.1 are given below.

  • Based on Apache HBase 1.1.2
  • High performance design with native InfiniBand and RoCE support at the verbs-level for Apache HBase
  • Compliant with Apache HBase 1.1.2 APIs and applications
  • Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)
  • On-demand connection setup

RDMA-based Apache Spark (RDMA-Spark)

The RDMA for Apache Spark package is a derivative of Apache Spark. This package can be used to exploit performance on modern clusters with RDMA-enabled interconnects for Big Data applications. Major features of RDMA for Apache Spark 0.9.3 are given below. New features and enhancements compared to 0.9.1 release are marked as (NEW).

  • Based on Apache Spark 1.5.1
  • High performance design with native InfiniBand and RoCE support at the verbs-level for Spark
    • RDMA-based data shuffle
    • SEDA-based shuffle architecture
    • (NEW) Support pre-connection, on-demand connection, and connection sharing
    • Non-blocking and chunk-based data transfer
    • Off-JVM-heap buffer management
  • (NEW) RDMA support for Spark SQL
  • (NEW) Integration with HHH in RDMA for Apache Hadoop
  • Compliant with Apache Spark 1.5.1 APIs and applications
  • Easily configurable for native InfiniBand, RoCE, and the traditional sockets based support (Ethernet and InfiniBand with IPoIB)

RDMA-based Apache Hadoop 2.x (RDMA-Hadoop-2.x)

The RDMA for Apache Hadoop package is a derivative of Apache Hadoop. This package can be used to exploit performance on modern clusters with RDMA-enabled interconnects for Big Data applications. Major features of RDMA for Apache Hadoop 2.x 1.1.0 are given below. New features and enhancements compared to 1.0.0 release are marked as (NEW).

  • (NEW) Compliant with Apache Hadoop 2.7.3, Hortonworks Data Platform (HDP) 2.5.0.3, and Cloudera Distribution Including Apache Hadoop (CDH) 5.8.2 APIs and applications
  • (NEW) Based on Apache Hadoop 2.7.3
  • High performance design with native InfiniBand and RoCE support at the verbs-level for HDFS, MapReduce, and RPC components
  • (NEW) Plugin-based architecture supporting RDMA-based designs for HDFS (HHH, HHH-M, HHH-L, HHH-L-BB), MapReduce, MapReduce over Lustre and RPC, etc.
    • Plugin for Cloudera Distribution Including Apache Hadoop (CDH) (tested with 5.8.2)
    • Plugin for Apache Hadoop distribution (tested with 2.7.3)
    • Plugin for Hortonworks Data Platform (HDP) (tested with 2.5.0.3)
  • Supports deploying Hadoop with Slurm and PBS in different running modes (HHH, HHH-M, HHH-L, and MapReduce over Lustre)
  • Enhanced hybrid HDFS design with in-memory and heterogeneous storage (HHH)
    • Supports four modes (default, in-memory, Lustre-integrated, and Lustre-integrated with Burst Buffer) of operations
    • Policies to efficiently utilize heterogeneous storage devices (RAM Disk, SSD, HDD, and Lustre)
    • Hybrid replication (in-memory and persistent storage) for HHH default mode
    • Memory replication (in-memory only with lazy persistence) for HHH in-memory mode
    • Lustre-based fault-tolerance for HHH Lustre-integrated mode
  • (NEW) Support for priority-based local directory selection in MapReduce Shuffle
  • High performance design of MapReduce over Lustre
    • Supports two shuffle approaches (Lustre read and RDMA)
    • Hybrid shuffle based on both shuffle approaches
  • Easily configurable for different running modes (HHH, HHH-M, HHH-L, HHH-L-BB, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB)
A complete set of features and supported platforms can be found here..

RDMA-based Apache Hadoop 1.x (RDMA-Hadoop-1.x)

The RDMA for Apache Hadoop package is a derivative of Apache Hadoop. This package can be used to exploit performance on modern clusters with RDMA-enabled interconnects for Big Data applications. Major features of this package include:

  • Based on Apache Hadoop 1.2.1
  • Compliant with Apache Hadoop 1.2.1 APIs and applications
  • High performance design with native InfiniBand and RoCE support at the verbs-level for HDFS, MapReduce, and RPC components
  • Easily configurable for native InfiniBand, RoCE and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)

RDMA-based Memcached (RDMA-Memcached)

The RDMA for Memcached/libMemcached package is a derivative of Memcached/libMemcached. This package can be used to exploit performance on modern clusters with RDMA-enabled interconnects for Memcached-based applications. Major features of RDMA for Memcached/libMemcached 0.9.5 are given below. New features and enhancements compared to 0.9.4 release are marked as (NEW).

  • Memcached server designs based on Memcached 1.4.24
    • Compliant with the new Memcached's core LRU algorithm
  • Memcached client designs based on libMemcached 1.0.18
  • Compliant with libMemcached APIs and applications
  • High performance design with native InfiniBand and RoCE support at the verbs-level for Memcached Server and Client
  • High performance design of SSD-assisted hybrid memory
    • Support for enabling and disabling direct I/O for SSD read/write
  • (NEW) Non-Blocking Libmemcached Set/Get API extensions
    • (NEW) APIs to issue non-blocking set/get requests to the RDMA-based Memcached servers
    • (NEW) APIs to support monitoring the progress of non-blocking requests issued in an asynchronous fashion
    • (NEW) Facilitating overlap of concurrent set/get requests
  • (NEW) Support for burst-buffer mode in Lustre-integrated design of HDFS in RDMA for Apache Hadoop-2.x
  • Support for both RDMA-enhanced and socket-based Memcached clients
  • Easily configurable for native InfiniBand, RoCE and the traditional sockets-based support (Ethernet and InfiniBand with IPoIB)

OSU HiBD-Benchmarks (OHB)

The OSU HiBD-Benchmarks project aims at developing benchmarks for evaluating Big Data middleware. The current version (0.9.2) of OHB consists of micro-benchmarks for Hadoop Distributed File System (HDFS), Memcached, HBase and Spark.

Announcements


(NEW) RDMA-Spark 0.9.3 built with Apache Hadoop 2.7.3, RDMA support for Spark SQL, integration with HHH in RDMA-Hadoop, support for the pre-connection mechanism, support for Mellanox EDR HCA, compliant with Spark 1.5.1 APIs and applications, and supporting different protocols (native InfiniBand, RoCE, and IPoIB) is available. [more]

(NEW) OSU HiBD Benchmarks (OHB) 0.9.2 with support for Spark micro-benchmarks is available.

RDMA-Apache-Hadoop-2.x 1.1.0 (based on Apache Hadoop 2.7.3) with plugin-based designs for Apache Hadoop 2.7.3, Cloudera Distribution including Apache Hadoop (CDH) 5.8.2, Hortonworks Data Platform (HDP) 2.5.0.3, supporting hybrid HDFS design with in-memory and heterogeneous storage (HHH), Memcached-based burst buffer for MapReduce over Lustre-integrated HDFS, easily configurable for different running modes (HHH, HHH-M, HHH-L, HHH-L-BB, and MapReduce over Lustre) and different protocols (native InfiniBand, RoCE, and IPoIB) is available. [more]

Upcoming Tutorials: Accelerating Big Data Processing with Hadoop, Spark and Memcached at Supercomputing 2016, and HPCA 2017. Past Tutorials presented at: Hot Interconnect 2016, Field Programmable Logic and Applications (FPL '16), IEEE Cluster 2016, ISCA 2016, CCGrid 2016, and ASPLOS 2016.

RDMA-Memcached 0.9.5 (based on Memcached 1.4.24 and libMemcached 1.0.18) with native InfiniBand and RoCE support, high performance design of SSD-assisted hybrid memory, Non-Blocking Libmemcached Set/Get API extensions, support for burst-buffer mode in Lustre-integrated design of HDFS in RDMA for Apache Hadoop-2.x, and easy configuration for InfiniBand-RDMA, RoCE and sockets is available. [more]

RDMA-HBase 0.9.1 (based on Apache HBase 1.1.2) supporting high-performance design for native InfiniBand and RoCE at the verbs-level, compliant with Apache HBase 1.1.2 APIs and applications, and supporting different protocols (native InfiniBand, RoCE, and IPoIB) is available. [more]

Talk on Big Data Acceleration Presented at Hadoop Summit, Dublin, Ireland.

RDMA-based Apache Hadoop 2.X is available on SDSC Comet for XSEDE Users. Visit here for more details.

High Performance Big Data Computing (HPBDC) International Workshop (HPBDC '16) was held in conjunction with IPDPS '16. Copies of Presentations Available.

HiBD in the News