What is MDTM? 

The Problem

Multicore and manycore have become the norm for high-performance computing. These new architectures provide advanced features that can be exploited to design and implement a new generation of high-performance data movement tools. To date, numerous efforts have been made to exploit multicore parallelism to speed up data transfer performance. However, existing data movement tools are still bound by major inefficiencies when running on multicore systems for the following reasons:

  • Existing data transfer tools are unable to fully and efficiently exploit multicore hardware under the default OS support, especially on NUMA systems.
  • The disconnect between software and multicore hardware renders network I/O processing on multicore systems inefficient.
  • On NUMA systems, the performance gap between disk and networking devices cannot be effectively narrowed or hidden under the default OS support.
  • Data transfer tools receive only best-effort handling for their process threads. There is no differentiation in service based on transfer characteristics, thread locality needs, or prioritization requirements.

These inefficiencies are fundamental and common problems that data movement tools will inevitably encounter when running on multicore systems. These inefficiencies will ultimately result in performance bottlenecks on the end systems. Such end system performance bottlenecks also impede the effective use of advanced networks. The DOE ANI (advanced network initiative) deployed 100-gigabit WAN testbed in support of the next-generation distributed extreme-scale data movement. Resolving performance issues within computer hosts is becoming the critical element within the end-to-end paradigm of the distributed extreme-scale data movement.

These inefficiencies and limitations suggest that general purpose OSes do not support extreme-scale data movement tools well on multicore systems. There are two options for solving this problem. First, a customized OS that fits the needs of extreme-scale data movement tools can be designed and developed. From a performance perspective, this option probably provides the best performance. However, developing and maintaining a customized OS not only is costly and time consuming but also creates OS type and version dependencies. The second option is to develop a new middleware solution to address the above problems. The middleware will harness multicore parallelism to scale data movement toolkits on end systems. It will provide generic services and functions that can be called by data movement tools to ensure efficient resource utilization at end systems. Our research seeks to address the latter option.

 

A Multicore-Aware Data Transfer Middleware (MDTM)

To address these inefficiencies and limitations, DOE’s Advanced Scientific Computing Research (ASCR) office has funded Fermilab and Brookhaven National Laboratory to collaboratively work on the Multicore-Aware Data Transfer Middleware (MDTM) project. MDTM aims to accelerate data movement toolkits at multicore systems. Essentially, MDTM consists of two research components (Figure 1):

  • MDTM data transfer applications/tools research and development
  • MDTM middleware research and development


Figure 1 MDTM System Architecture

 

 For the MDTM project, we plan to achieve the following research goals:

  • To develop and optimize ultra high-speed data transfer applications/tools on modern multi-core systems.
  • To investigate, design, and implement generic middleware mechanisms to enable extreme-scale data movement tools to exploit the multicore hardware fully and efficiently, especially on NUMA system. The middleware would have several key features:
    • Data transfer-centric scheduling and resource management capabilities. MDTM schedules and assigns system resources based on the needs and requirements of data transfer applications. Characteristics of individual data transfers that are useful in scheduling and resource management include the number of threads generated, thread ids, number of flows handled by each thread, and transfer source and destination. MDTM uses this information as the scheduling hints and memory placement directives. This feature provides optimal resource provisioning for data transfer applications.
    • NUMA topology-aware scheduler.  On NUMA systems, MDTM carefully pins data transfer threads to particular cores and places its memory on particular nodes. Threads belonging to the same data transfers will be assigned to the same nodes. In addition, data transfer threads will be assigned to cores near the I/O devices they use. This feature allows data transfer applications to make efficient use of the underlying hardware.
    • Enabling efficient network I/O and disk I/O on multicore systems.
    • Supporting QoS mechanism to allow differentiated data transfer.
  • To deploy, test, and comprehensively evaluate the developed middleware/applications, on advanced multicore hosts, and over 100Gbps+ testbed networks.

 

Research Team

FNAL (Lead institution)

  • Liang Zhang, Email: liangz@fnal.gov
  • Lauri Loebel Carpenter, Email: lauri@fnal.gov
  • Phil DeMar, Email: demar@fnal.gov
  • Wenji Wu (PI), Email: wenji@fnal.gov

BNL

  • Shudong Jin, Email: shudong.jin@gmail.com
  • Dantong Yu (Co-PI), Email: dtyu@bnl.gov

  • Last modified
  • 07/18/2016