Optimizing Performance and Enhancing Functionality of Distributed Applications using Logistical Networking
Micah Beck, Jack Dongarra, James Plank, Rich Wolski
Communication is the foundation for all collaboration, and accordingly the growth of the kind distributed collaboration envisioned by the SciDAC Program has closely tracked the development of high-performance networking. Sustained collaboration, however, inevitably involves communication with both synchronous and asynchronous elements. While current forms of networking address the former, the latter requires uses of storage that are not part of the current network model. Applications that need significant amounts of non-archival, "working storage" for distributed state management, latency hiding, and exposed buffer management, are not well served by traditional models of file and database storage services. By providing best-effort control over substantial shared, untrusted state management resources (RAM and disk), at network intermediate nodes and at shared edge locations, our project on Logistical Networking (LoN) addresses these needs by implementing communication services that IP networking alone have not been able to provide.
As the name suggests, the goal of Logistical Networking is to bring data transmission and storage within one framework, much as military or industrial logistics treat transportation lines and storage depots as coordinate elements of one infrastructure. Achieving this goal requires a form of network storage that is globally scalable, meaning that the storage systems attached to the global data transmission network can expand beyond organizational and geographical boundaries and can grow large enough to be global in scope.
The approach to this problem we developed for SciDAC is based upon an exposed approach to computer service architecture, in which one offers client software a primitive service the semantics of which are closely based on the underlying physical infrastructure. Higher-level tools and protocols with more abstract semantics running on clients compose these primitive services to implement more sophisticated run-time algorithms for applications. In this way the infrastructure of LoN follows the exposed approach embodied in IP datagram service. LoN uses scalable storage resources in the network to model communication in both its synchronous and asynchronous aspects; its fundamental service is the movement of data between buffers on adjacent nodes.
The Internet Backplane Protocol (IBP) supplies LoN with its basic mechanism. Itimplements scalable management of storage in the network using shared physical resources. IBP is implemented by intermediate nodes, called "depots," that offer standard storage operations, including allocate, load, store and third party copy. From a networking perspective, an IBP depot can be viewed as a kind of router that exposes buffers to clients, thereby enabling the implementation of flexible network services, such as overlay multicast with explicit specification of the delivery tree by the source.
To create a shared resource fabric that exposes network storage for general use in this way, we defined a new storage stack (Figure 1), analogous to the Internet stack, using a bottom-up and layered design approach that adheres to the end-to-end principles. IBP is the lowest layer of the storage stack that is globally accessible from the network. Its design is modeled on the design of IP datagram delivery. Just as IP is a more abstract service based on link-layer datagram delivery, so IBP is a more abstract service based on blocks of data (on disk, memory, tape or other media) that are managed as "byte arrays." By masking the details of the storage at the local level - fixed block size, differing failure modes, local addressing schemes - this byte array abstraction allows a uniform IBP model to be applied to storage resources generally. The use of IP networking to access IBP storage resources creates a globally accessible storage service.
As the case of IP shows, however, in order to scale globally the service guarantees that IBP offers must be weakened, i.e. it must present a "best effort" storage service. First and foremost, this means that IBP storage allocations can be time limited. When the lease on an IBP allocation expires, the storage resource can be reused and all data structures associated with it can be deleted. Additionally an IBP allocation can be refused by a storage resource in response to over-allocation, much as routers can drop packets, and such "admission decisions" can be based on both size and duration. Enforcing time limits puts transience into storage allocation, giving it some of the fluidity of datagram forwarding; more importantly, it makes network storage far more sharable, and creates an infrastructure that can scale.
The semantics of IBP storage allocation also assume that an IBP storage resource can be transiently unavailable. Since the user of remote storage resources depends on so many uncontrolled, remote variables, it may be necessary to assume that storage can be permanently lost. Thus, IBP is a "best effort" storage service. To encourage the sharing of idle resources, IBP even supports "soft" storage allocation semantics, where allocated storage can be revoked at any time. In all cases such weak semantics mean that the level of service must be characterized statistically.
Like the higher levels of the traditional network stack, the higher levels of the network storage stack - exNode, Logistical Runtime System (LoRS), Logistical Backbone (L-bone, and the Logistical Files System - implement abstractions and services necessary to aggregate and utilize the generic best effort storage service provided by IBP. A discussion of each of these layers is integrated into the account of our work on Logistical Networking for the SciDAC research community presented below.
For further information please view our SciDAC Project Review (PDF).
This project is supported by the Department of Energy Scientific Discovery through Advanced Computing program under grant #DE-FC02-01ER25465.