Optimizing Performance and Enhancing Functionality of Distributed Applications using Logistical Networking

 

Micah Beck, Univ. of Tennessee

Jack Dongarra, Univ. of Tennessee

James Plank, Univ. of Tennessee

Rich Wolski, Univ. of California at Santa Barbara

 

Summary

 

Logistical Networking is a new way of synthesizing networking and storage to create a communication infrastructure that provides superior control of the temporal dimension of data movement and state management for distributed applications of all kinds. With Logistical Networking technologies, we are developing the advanced network and middleware services necessary to create a storage-enabled Internet that will enhance the scalability, functionality, and performance of SciDAC’s collaborative applications.

 


Logistical Networking is the global scheduling and optimization of data movement based on a model that takes into account all the network’s underlying physical resources, including storage and computation. The term Logistical Networking is used in analogy to conventional logistics, which focuses on the coscheduling of movement, storage and processing of military and industrial material. In order to put large quantities of sharable storage resources into the network in a scalable way, our approach applies the same paradigm that the Internet has used so successfully to share transmission bandwidth. Of immediate interest to SciDAC collaborators, such as the TeraScale Supernova Initiative (TSI), is the ease with which Logistical Networking facilitates the transfer of massive data sets.

 

Logistical Networking software already provided to and in use by the SciDAC community includes:

 

1.) Internet Backplane Protocol (IBP): IBP is the key enabling technology for Logistical Networking, providing a low level mechanism for managing remote storage as a sharable network resource. Modeled on IP datagram service, which is the foundation of the Internet, IBP is designed to make it possible to infuse the network with storage resources that can be shared, scaled up, and exposed for external scheduling just as IP datagram service is. IBP allocations are lightweight, and their normal mode is time limited, so that policies set on the storage “depots” can enforce predictable time multiplexing of the resource (disk, RAM, etc.) to increase its sharability. Widespread deployment of IBP depots in the ESnet community will make it possible for any SciDAC application that needs to manage data in the network to benefit from the kind of standardization, interoperability, and scalability that have made the Internet into such a powerful communication tool.

 

2.) exNode: Logistical Networking is built on “end-to-end” design principles, which means that storage services with strong properties—reliability, fast access, unbounded allocation, unbounded duration, etc.—must be created in higher layers that aggregate more primitive IBP byte-arrays beneath them. To represent such an aggregation of storage allocations, we have created a single, generalized data structure called an external node, or exNode, to aggregate byte arrays in IBP depots to form a pseudo-file. 

 

3.) Logistical Backbone (L-Bone): The L-Bone software provides directory and resource discovery services for deployments of IBP depots. It maintains a directory of IBP depots and metadata about those depots. IBP depots register themselves with the L-Bone, and clients may then query the L-Bone for depots that have various characteristics, including minimum storage capacity, duration policy, proximity, etc. We have one L-bone deployment for the academic community (currently +8TB), and are creating a second one just for ESnet and the SciDAC community.

 

4.) Logistical Runtime System (LoRS): The exNode provides the basis for LoRS, which is a set of tools for implementing files and other storage abstractions with strong properties, such as large size, fast access, and reliability.

 

5.) Data Movers: Data Movers are auxiliary depot modules that support all kinds of customized or special purpose depot-to depot-transfers, including point-to-point, point-to-multipoint, multicast, and non-IP transmission. Since the movement of large data sets is of immediate interest to our SciDAC application collaborators, we are experimenting with depots equipped with high-performance, non-IP data movers for massive, long-haul transfers among remote collaborators (e.g. ORNL and CERN). Such a transfer is depicted in Figure 1.

Figure 1.  IBP depots provide a generic buffer service that facilitates the integration of heterogeneous (e.g. IP and non-IP) networks.

 

SciDAC Collaborations: We are currently working directly with the members of the TSI community to address their data movement problems. Like several other SciDAC projects, the inability to easily move massive data sets within the community is a serious bottleneck to the pace of research. TSI participants are installing Logistical Networking software and already beginning to see its benefits. Similar work is also been initiated with the Particle Physics Data Grids and Lattice Quantum Chromo-Dynamics communities.

 

Next twelve months:

·         Integration with HRM (Hierarchical Resource Manager)

·         Deployment of depots among entire TSI community and other ESnet locations.

·         Data Movers testing Net100/FAST and Fiber Channel over TCP

·         Experimentation with overlay multicast.

·         Experimentation with TCP gateways

·         Working with 1—10TB datasets during the next year.

·         Deployment at CERN, Starlight, and Surfnet, with 10G connections.