DOE/MICS Yearly Project Report: May 14, 2002
Project Title: Optimizing Performance and Enhancing Functionality of Distributed
Applications Using Logistical Networks (DE-FC02-01ER25465)
Project Type: SciDAC
PI: Institution: Micah Beck (lead), University of Tennessee, Knoxville
Jack Dongarra, University of Tennessee Knoxville
James S. Plank, University of Tennessee, Knoxville
Rich Wolski, University of California at Santa Barbara
1. Executive summary
The multi-threaded research project in Logistical Networking for DOE's SciDAC
project aims to create advanced, storage-enabled network services that can provide
reliable, fast, flexible, scalable, and efficient delivery of data to support
distributed and high performance applications of all types.
During the most recent period, meaningful progress toward this goal was made
on all key fronts - research and development, publication and dissemination,
and planning and interaction for productive collaboration with the SciDAC research
community. Highlights include the following:
- Released version 1.2 of the IBP depot, which has improved reliability and includes
the Data Mover plug-in feature to support non-IP data transport, "soft"
storage allocations to promote use free disk space for depots, and a resource
abstraction layer to support memory-only depots for applications that require
very low latency.
- Redesigned the exNode data structure to reflect knowledge gained from early
experimentation and have reimplemented the exNode tools using the new data structure
and including performance enhancements such as multi-threaded download. A Win32
version of the exNode tools was also released.
- Released new beta versions of the Logistical Backbone (L-bone) client and server
software. The full distribution now allows users to set up directory and resource
discovery services for private deployments of depot infrastructure.
- Created and showed (Spring Internet2 meeting) an advanced video IBPster demonstration
that uses our logistical tools to duplicate the kind of advanced delivery of
video otherwise available only with expensive streaming video servers. This
demo with further enhancements will be presented iGrid2002 in Amsterdam in September
and SC2002 in Baltimore.
- Had three new papers accepted for publication, most notable "An End-to-end
Approach to Globally Scalable Network Storage" to be presented at ACM Sigcomm
2002 in August.
- Began collaboration with JLAB team to integrate our Logistical Networking technologies
JLAB middleware support for Particle Physics Data Grids (PPDG) and Lattice Quantum
Chromo-Dynamics (LQCD). Concrete steps for integrating IBP and exNode technologies
with JLab's Storage Resource Manager (SRM) and Reliable Transport Service have
been specified and more expansive plans (e.g. a parallel file system in IBP
to support high performance LQCD computations on clusters) are under development.
A testbed for this work will be deployed at JLab, Univ. of TN, ORNL, and some
JLab academic partner sites during the next 6 mos. New interactions also include
Bill Wing (ORNL), Richard Mount (SLAC), and Bruce Gibbard (BNL).
2. Current accomplishments
Research and development - Our most recent research has expanded
the effectiveness of Logistical Networking in both performance and functionality.
Below are some of our accomplishments:
- Released version 1.2 of the depot that includes the Data Mover plug-in feature,
allowing experimentation with innovative networking technologies in transferring
data between depots. This version implements "soft" storage, which
allows free space on file systems to be used for IBP depot allocations but reclaimed
on demand whenever the file system asks for it. It also improves depot robustness
by providing a mechanism to allow recovery of data after a depot has been down
for some time. Finally, version 1.2 implements a resource abstraction layer
that facilitates the implementation of the basic IBP depot technology over different
underlying physical storage resources, such as RAM or tape; this layer provides
the basis for the memory only depot described below.
- Developed and tested a version of the depot for latency-sensitive applications
that uses only memory resources, rather than disk.
- Created an improved version of the exNode data structure with a new, extensible
metadata mechanism. A version of a new exNode library in C will be in general
release next month and we will release a Java version within 6 months.
- Released beta version 0.6 of the Logistical Backbone (L-bone) client and server
software. The full distribution now allows users to set up directory and resource
discovery services for private deployments of depot infrastructure.
- Developed and began testing libraries that implement end-to-end services such
as security and integrity checking. This work is enabled by the enhanced metadata
capabilities of the new exNode datastructure.
- Released Win32 exNode tool suite, which runs under CYGWIN on Windows 2000.
Enhanced NWS resolution system for exNodes, both in terms of its interface and
its capabilities.
- Improved the performance of uploading and downloading exNode files using multithreading
and smarter scheduling (again based on performance forecasting using the Network
Weather Service). Multithreaded download is now well tested; multithreaded upload
and download will be part of the next release of exNode tools this summer.
- Created and tested a Read-Only File System, which is sufficient for limited
forms of data distribution. This is a significant first step in building a Logistical
File System, i.e. a file system with a traditional interface built on top of
time-limited network storage buffers.
- Experiment with the design of a class of tools utilizing logistical scheduling,
which are intended for released within the next 6 months.
- Developed an alpha version of the IBP protocol specification to be submitted
as an RFC to the IETF
- Submitted a detailed proposal for a tutorial on IBP and Logistical Networking
tools for SC2002.
- Created and showed (Spring Internet2 meeting) an advanced video IBPster demonstration
that uses our logistical tools to duplicate the kind of advanced delivery of
video otherwise available only with expensive streaming video servers. Further
enhancements, possibly including the use of datamover technology, are in preparation
for iGrid2002 in Amsterdam in September and SC2002 in Baltimore.
Earlier Research and Development of Logistical Networking Tools (Prior
to February 2002) -
The Internet Backplane Protocol (IBP)
- Released IBP Server version 1.1.1 for Unix: With this release the IBP
implements many new features. From the Server Sync, that allows applications
to synchronize flows of data through a depot, to increased depot handling
capacity (now 16 exabytes).
- Released IBP shell 0.1 for both Solaris and Linux. With the ibp shell,
users are able to use the full power of IBP directly from the command line,
so that they can experiment with the fastest, easiest and safest ways to
transfer data.
- Released IBP Client Library DLL 1.1.1 for Windows, allowing Win 32 users
to take advantage of the latest features of the IBP server, such as timeouts
and increased handling capacity. Linux and Solaris clients already possess
these features.
External Node (exNode)
- An initial design of the exNode data structure was completed and a corresponding
XML encoding was performed. This version was used to create an alpha version
of the exNode tools (see below), which were used for experimentation and
for the SC2001 demo described below.
- Created an Alpha version of the exNode tools built on top of the L-bone
and the exNode. We not only experimented with the design tools, we used
them successfully in the SC2001 demonstration.
Logistical Backbone (L-Bone)
- Released a new L-Bone Client (1.0beta), allowing users to query the L-bone
server for information about IBP depots on the L-Bone.
Experiments using logistical tools for distributed state management on
the Grid - We are implementing a generalized cache management service
that will support a parameterized class of storage management policies, e.g.
write-back vs. write-through, dynamic vs. preloaded, etc. for use in a numerical
software library setting. These caches will support hierarchical and cooperative
arrangements to create scalable storage management systems for Grid computing
middleware, such as NetSolve. Our experimental work focused on working with
IBP directly, in preparation for integrating exNode technology, which is expected
to mature during the next phase of the effort.
We are investigating three different perspectives to solving the same problem,
Ax = b. The focus here is on the amount of time required for the different approaches
to get the data in place and ready for the application routine PDGESV from ScaLAPACK.
The first case is for the time to allocate, generate, and load the necessary
data structures utilizing a remote memory depot -IBP. It is noteworthy that
the process is started on the user end in a serial environment where the user
gets help generating his structures into the depot and is then freed of all
knowledge of how his problem is solved. The middleware we are developing makes
an inquiry as to the available resources (NWS) and then makes a decision of
the resources to utilize based upon the state of the resources and knowledge
of the specific kernel to be invoked at the application level. Once resources
are selected, the parallel application routine is forked and waited upon by
the middleware. The root node is then responsible for bringing the data incore
from the remote depot and distributing it to the correct processes in the logical
grid. The mapping is from IBP in natural form to a block-cyclic distribution
amongst the processes. At this point the matrix solve is ready to go. Naturally,
the results have to be collected by root and returned to IBP in the natural
ordering the user anticipates. In the next case illustrates where the user is
on a local cluster where the matrix and vector coefficients are generated incore
and then asks for help from the middleware. In this case, the middleware utilizes
the local file system to map the data to files (eventually based on exNodes),
which represent the block cyclic map appropriate for the logical process grid
that was determined in the resource allocation stage. The middleware then forks
the parallel application and waits. The application does a parallel read of
the data incore and proceeds to solve the problem. On solution, processes owning
elements of the solution write these to file. The middleware assembles the solution
and returns it to the user. The expert case is also being investigated to provide
insight as to the overhead involved in relieving the user of solving his problem.
In the future the exNode will be the overarching data structure used in managing
all these activities. This will enable NetSolve to effectively solve large problems
in a Grid setting.
NWS integration - Working together, the UTK and UCSB teams developed
a proximity resolution system for exNodes and the LBONE based on the NWS. The
resolution system consists of three components: a client resolution library,
a resolution server, and the NWS network sensor. Using the library, an exNode
client can determine which of several equivalent depots (that it learns about
from the LBONE) can deliver the data most expeditiously. At SC2001, where the
networks exhibited a great deal of performance variation, NWS forecasts that
were incorporated within the client interface made it possible to make this
choice reliably. (See demo description below) The client library also made it
possible for IBP depots to resolve the performance proximity to the client.
Hence, exNode clients could become temporary "resolution sensors"
while they execute.
IBPster Demo at Internet 2 Member meeting and SC2001 - The power
of using IBP, the L-Bone, the exNode, and NWS in an integrated way has been
showcased in a demonstration called IBPster, which uses these tools together
to perform better and more flexible multimedia delivery. We call this application
IBPster. IBPster is basically a system for storing MPEG files for audio and
video as exNodes in the L-Bone and managing them using exNode-based tools. At
SC01, in both the Tennessee and NPACI booths, we presented a live demonstration
of IBPster (see figure 1) in which MP3 files were uploaded to IBP depots located
on the L-bone all over the United States (and in some cases, in Europe), striped
and replicated according to several scenarios, and then played in real time
by streaming them from the closest storage buffers. At the Internet2 Spring
member meeting we successfully demonstrated Video IBPster (http://apps.internet2.edu/html/spring02-featured-demos.html#HighBandwidthOverIBP).
NWS supplies the network monitoring and performance forecasting necessary to
find the closest buffers
2.1 Publications
M. Beck, T. Moore, and J. S. Plank, "An End-to-end Approach to Globally
Scalable Network Storage," to be presented at ACM Sigcomm 2002, Pittsburgh,
PA, August 19-23, 2002 (to appear).
K. Meyer-Patel and M. Beck, "A Logistical Networking Model for Video-On-Demand,"
to be presented at IEEE International Conference on Multimedia and Expo, Lausanne,
Switzerland, August 26-29, 2002 (to appear).
A. Bassi, M. Beck, J. Gelas, and L. Lefevre, "Logistical Storage in Active
Networking: a promising framework for network services," to be presented
at 3rd International Conference on Internet Computing (IC 2002), Las Vegas,
NV, June 24-27, 2002 (to appear).
Bassi, M. Beck, G. Fagg, T. Moore, J. Plank, M. Swany, and R. Wolski, "The
Internet Backplane Protocol: A Study in Resource Sharing," to be presented
at IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002),
Berlin, Germany, May 21-24, 2002 (to appear).
M. Beck and T. Moore, "Logistical Networking: When Institutions Peer,"
to be presented at 2nd International Workshop on Global and Peer-to-Peer Computing
on Large Scale Distributed Systems, part of CCGrid 2002, Berlin, Germany, May
21-24, 2002 (to appear).
A. Bassi, M. Beck, E. Fuentes, T. Moore, and J. S. Plank, "Logistical
Storage Resources for the Grid," in Proceedings of International Conference
on Computational Science (ICCS 2002), vol. 2330, LNCS. Amsterdam, The Netherlands:
Springer Verlag, 2002.
S. Atchley, S. Soltesz, J. S. Plank, M. Beck, and T. Moore, "Fault-Tolerance
in the Network Storage Stack," presented at Annual Workshop on Fault-Tolerant
Parallel and Distributed Systems (held in conjunction with the International
Parallel & Distributed Processing Symposium), Ft. Lauderdale, FL, USA, April
15-19, 2002.
James S. Plank, Micah Beck and Terry Moore, ``Logistical Networking Research
and the Network Storage Stack,'' USENIX FAST 2002, Conference on File and Storage
Technologies, work in progress report, January, 2002.
M. Beck, T. Moore, and J. S. Plank, "Exposed vs. Encapsulated Approaches
to Grid Service Architecture," presented at 2nd International Workshop
on Grid Computing, Denver, CO, Nov. 12, 2001.
J. S. Plank, A. Bassi, M. Beck, T. Moore, M. Swany, and R. Wolski, "Managing
Data Storage in the Network," IEEE Internet Computing, vol. 5, no. 5, pp.
50-58, September/October, 2001.
M. Beck, T. Moore, and J. S. Plank, "Scalable Sharing of Wide Area Storage
Resources," IEEE Communications (submitted).
2.2 Publications in Process
Bassi, M. Beck, J. Plank, "The Internet Backplane Protocol" to be
submitted to the 10th IEEE International Conference on Network Protocols.
J. Millar and M. Beck, "exDN: Ad Hoc Content Distribution Using Logistical
Networking." Submitted to the 7th International Workshop on Web Caching
and Content Distribution.
M. Beck and E. Fuentes, "Overlay Multicast using the Internet Backplane
Protocol" to be submitted to the Fourth International Workshop on Networked
Group Communication.
3. Future Accomplishments
3.1 Next three months:
- Deploy a JLab Storage Resource Manager (SRM) at ORNL to support their Data
Grid work, in collaboration with Bill Wing and Randy Burris of ORNL and at JLab
project partner universities.
- Visit Brookhaven National Laboratory to discuss applicability of IBP to their
data storage and transfer needs.
- Establish Grid Forum Network Storage Research Group
- Determine concrete steps for integration of Logistical Networking infrastructure
with standard Grid tools and protocols (GSI, MDS, GridFTP).
- Initial release of the Logistical Runtime System (LoRS) library providing end-to-end
services (including security, encryption, compression, reliability, performance
tuning) on top of the exNode and LoRS tools implemented using this library.
- Usability improvements to the Read Only Logistical File System to implement
active services including long-term data maintenance and fault tolerance, caching,
and content distribution, and implementing user authentication and security.
- Improve the fault-tolerance of exNodes by adding error correction in the form
of RAID parity blocks, and Reed-Solomon coding blocks.
3.2 Next six months:
- Make IBP usable as a protocol usable by the JLab Reliable Transport Service.
- Make IBP usable as one of the data storage protocols usable by the JLab SRM.
- Implement a client that can use logistical techniques for movement of data stored
in IBP at a SRM
- to a cluster for computation and
- to tape silo for long terms storage.
- Build a consistent storage image from IBP and the LBONE using NWS technologies
and statistical techniques.
- Demonstration of high bandwidth video delivery from IBP servers at iGrid 2002
and SC2002.
- Initial implementation of a high performance TCP Datamover for IBP using buffer
tuning based on Web/Net100, and of a Datamover using native UDP multicast.
- Initial investigation of a DataMover built on private IP or non-IP data transport
to serve DoE wide area data transfer needs.
- Implementation of improved access control mechanisms for IBP.
- Design of peering mechanism between IBP resource pools.
- Expansion of IBP deployment within ESNET and at partner Universities.
- Development of specific plans for integration of IBP with Data Grids and other
DoE Science projects.
3.3 Next twelve months:
- Optimize JLab Reliable File Transfer (RTF) to move data between SRMs that store
data using IBP.
- Implement a parallel file access utility on top of IBP to support the LQCL application.
- Implement DataMover with alternative to shared IP network.
- Integration of IBP with DoE Science projects.
3.4 Next eighteen months:
- Integrate the IBP parallel file access utility with JLab SRM
- Implement IBP traffic and policy simulator.
- Implementation of peering between IBP resource pools.
3.5 Next 24 months:
- Experimentation with IBP architectures that remove barriers to high-speed data
transfer between disk and network.
- Conduct large scale measurements of IBP traffic and use those as inputs to traffic
simulator.
4. Research interactions
Collaboration with JLAB on middleware and tools for PPDG and LQCD -
We have initiated a collaboration with Chip Watson and his team from JLAB to
integrate our Logistical Networking technologies with their Storage Resource
Management (SRM) tools, their web service infrastructure, and in general with
their software efforts to support the SciDAC projects on a Particle Physics
Data Grid (PPDG) and Lattice Quantum Chromo-Dynamics (LQCD). The concrete steps
for integrating IBP and exNode technologies with JLab's Storage Resource Manager
(SRM) and Reliable Transport Service are detailed above, and more expansive
plans (e.g. a parallel file system in IBP to support high performance LQCD computations
on clusters) are under development. A testbed for this work will be deployed
at JLab, Univ. of TN, ORNL, and some JLab academic partner sites during the
next 6 mos.
Other recent interactions (within the last 3 months) include:
Bill Wing, ORNL
Richard Mount, SLAC
Bruce Gibbard, BNL
- Early and ongoing interactions with within MICS High Performance Networking
projects
Nagi Rao, ORNL
Randy Burris, ORNL
Les Cottrell, SLAC
- We are talking to the Particle Physics Data Grid project and pursuing
collaboration with
Miron Livney, Univ. of Wisconsin
Chip Watson, JLab
- We are collaborating actively with Laurent Lefevre, ENS Lyon, France Cosimo
Anglano, Universita' del Piemonte Orientale, Italy and have a lot of interest
from groups within INRIA
- We are pursuing the possibility of a deployment at ORNL and LBL, connected
by ESNet and potentially using experimental networking technologies.
We are working with the Web100/Net100 project on improved bandwidth between
our systems.
- We are working through the Grid Forum to interact with Grid efforts and
promulgate our ideas. This includes participation in the development of
research groups in network storage and/or data replication.
We developing plans for collaborative, large-scale deployment of logistical
networking technologies with Sun Microsystems and with the storage provider
Yotta Yotta.
5. Remarks
None