DOE/MICS Yearly Project Report: May 14, 2002

Project Title: Optimizing Performance and Enhancing Functionality of Distributed Applications Using Logistical Networks (DE-FC02-01ER25465)
Project Type: SciDAC
PI: Institution: Micah Beck (lead), University of Tennessee, Knoxville
Jack Dongarra, University of Tennessee Knoxville
James S. Plank, University of Tennessee, Knoxville
Rich Wolski, University of California at Santa Barbara

1. Executive summary

The multi-threaded research project in Logistical Networking for DOE's SciDAC project aims to create advanced, storage-enabled network services that can provide reliable, fast, flexible, scalable, and efficient delivery of data to support distributed and high performance applications of all types.
During the most recent period, meaningful progress toward this goal was made on all key fronts - research and development, publication and dissemination, and planning and interaction for productive collaboration with the SciDAC research community. Highlights include the following:

2. Current accomplishments

Research and development - Our most recent research has expanded the effectiveness of Logistical Networking in both performance and functionality. Below are some of our accomplishments:

Earlier Research and Development of Logistical Networking Tools (Prior to February 2002) -

Experiments using logistical tools for distributed state management on the Grid - We are implementing a generalized cache management service that will support a parameterized class of storage management policies, e.g. write-back vs. write-through, dynamic vs. preloaded, etc. for use in a numerical software library setting. These caches will support hierarchical and cooperative arrangements to create scalable storage management systems for Grid computing middleware, such as NetSolve. Our experimental work focused on working with IBP directly, in preparation for integrating exNode technology, which is expected to mature during the next phase of the effort.

We are investigating three different perspectives to solving the same problem, Ax = b. The focus here is on the amount of time required for the different approaches to get the data in place and ready for the application routine PDGESV from ScaLAPACK. The first case is for the time to allocate, generate, and load the necessary data structures utilizing a remote memory depot -IBP. It is noteworthy that the process is started on the user end in a serial environment where the user gets help generating his structures into the depot and is then freed of all knowledge of how his problem is solved. The middleware we are developing makes an inquiry as to the available resources (NWS) and then makes a decision of the resources to utilize based upon the state of the resources and knowledge of the specific kernel to be invoked at the application level. Once resources are selected, the parallel application routine is forked and waited upon by the middleware. The root node is then responsible for bringing the data incore from the remote depot and distributing it to the correct processes in the logical grid. The mapping is from IBP in natural form to a block-cyclic distribution amongst the processes. At this point the matrix solve is ready to go. Naturally, the results have to be collected by root and returned to IBP in the natural ordering the user anticipates. In the next case illustrates where the user is on a local cluster where the matrix and vector coefficients are generated incore and then asks for help from the middleware. In this case, the middleware utilizes the local file system to map the data to files (eventually based on exNodes), which represent the block cyclic map appropriate for the logical process grid that was determined in the resource allocation stage. The middleware then forks the parallel application and waits. The application does a parallel read of the data incore and proceeds to solve the problem. On solution, processes owning elements of the solution write these to file. The middleware assembles the solution and returns it to the user. The expert case is also being investigated to provide insight as to the overhead involved in relieving the user of solving his problem. In the future the exNode will be the overarching data structure used in managing all these activities. This will enable NetSolve to effectively solve large problems in a Grid setting.

NWS integration - Working together, the UTK and UCSB teams developed a proximity resolution system for exNodes and the LBONE based on the NWS. The resolution system consists of three components: a client resolution library, a resolution server, and the NWS network sensor. Using the library, an exNode client can determine which of several equivalent depots (that it learns about from the LBONE) can deliver the data most expeditiously. At SC2001, where the networks exhibited a great deal of performance variation, NWS forecasts that were incorporated within the client interface made it possible to make this choice reliably. (See demo description below) The client library also made it possible for IBP depots to resolve the performance proximity to the client. Hence, exNode clients could become temporary "resolution sensors" while they execute.

IBPster Demo at Internet 2 Member meeting and SC2001 - The power of using IBP, the L-Bone, the exNode, and NWS in an integrated way has been showcased in a demonstration called IBPster, which uses these tools together to perform better and more flexible multimedia delivery. We call this application IBPster. IBPster is basically a system for storing MPEG files for audio and video as exNodes in the L-Bone and managing them using exNode-based tools. At SC01, in both the Tennessee and NPACI booths, we presented a live demonstration of IBPster (see figure 1) in which MP3 files were uploaded to IBP depots located on the L-bone all over the United States (and in some cases, in Europe), striped and replicated according to several scenarios, and then played in real time by streaming them from the closest storage buffers. At the Internet2 Spring member meeting we successfully demonstrated Video IBPster (http://apps.internet2.edu/html/spring02-featured-demos.html#HighBandwidthOverIBP). NWS supplies the network monitoring and performance forecasting necessary to find the closest buffers



2.1 Publications

M. Beck, T. Moore, and J. S. Plank, "An End-to-end Approach to Globally Scalable Network Storage," to be presented at ACM Sigcomm 2002, Pittsburgh, PA, August 19-23, 2002 (to appear).

K. Meyer-Patel and M. Beck, "A Logistical Networking Model for Video-On-Demand," to be presented at IEEE International Conference on Multimedia and Expo, Lausanne, Switzerland, August 26-29, 2002 (to appear).

A. Bassi, M. Beck, J. Gelas, and L. Lefevre, "Logistical Storage in Active Networking: a promising framework for network services," to be presented at 3rd International Conference on Internet Computing (IC 2002), Las Vegas, NV, June 24-27, 2002 (to appear).

Bassi, M. Beck, G. Fagg, T. Moore, J. Plank, M. Swany, and R. Wolski, "The Internet Backplane Protocol: A Study in Resource Sharing," to be presented at IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2002), Berlin, Germany, May 21-24, 2002 (to appear).

M. Beck and T. Moore, "Logistical Networking: When Institutions Peer," to be presented at 2nd International Workshop on Global and Peer-to-Peer Computing on Large Scale Distributed Systems, part of CCGrid 2002, Berlin, Germany, May 21-24, 2002 (to appear).

A. Bassi, M. Beck, E. Fuentes, T. Moore, and J. S. Plank, "Logistical Storage Resources for the Grid," in Proceedings of International Conference on Computational Science (ICCS 2002), vol. 2330, LNCS. Amsterdam, The Netherlands: Springer Verlag, 2002.

S. Atchley, S. Soltesz, J. S. Plank, M. Beck, and T. Moore, "Fault-Tolerance in the Network Storage Stack," presented at Annual Workshop on Fault-Tolerant Parallel and Distributed Systems (held in conjunction with the International Parallel & Distributed Processing Symposium), Ft. Lauderdale, FL, USA, April 15-19, 2002.

James S. Plank, Micah Beck and Terry Moore, ``Logistical Networking Research and the Network Storage Stack,'' USENIX FAST 2002, Conference on File and Storage Technologies, work in progress report, January, 2002.

M. Beck, T. Moore, and J. S. Plank, "Exposed vs. Encapsulated Approaches to Grid Service Architecture," presented at 2nd International Workshop on Grid Computing, Denver, CO, Nov. 12, 2001.

J. S. Plank, A. Bassi, M. Beck, T. Moore, M. Swany, and R. Wolski, "Managing Data Storage in the Network," IEEE Internet Computing, vol. 5, no. 5, pp. 50-58, September/October, 2001.

M. Beck, T. Moore, and J. S. Plank, "Scalable Sharing of Wide Area Storage Resources," IEEE Communications (submitted).

2.2 Publications in Process

Bassi, M. Beck, J. Plank, "The Internet Backplane Protocol" to be submitted to the 10th IEEE International Conference on Network Protocols.
J. Millar and M. Beck, "exDN: Ad Hoc Content Distribution Using Logistical Networking." Submitted to the 7th International Workshop on Web Caching and Content Distribution.

M. Beck and E. Fuentes, "Overlay Multicast using the Internet Backplane Protocol" to be submitted to the Fourth International Workshop on Networked Group Communication.

3. Future Accomplishments

3.1 Next three months:

3.2 Next six months:

3.3 Next twelve months:

3.4 Next eighteen months:

3.5 Next 24 months:

4. Research interactions

Collaboration with JLAB on middleware and tools for PPDG and LQCD - We have initiated a collaboration with Chip Watson and his team from JLAB to integrate our Logistical Networking technologies with their Storage Resource Management (SRM) tools, their web service infrastructure, and in general with their software efforts to support the SciDAC projects on a Particle Physics Data Grid (PPDG) and Lattice Quantum Chromo-Dynamics (LQCD). The concrete steps for integrating IBP and exNode technologies with JLab's Storage Resource Manager (SRM) and Reliable Transport Service are detailed above, and more expansive plans (e.g. a parallel file system in IBP to support high performance LQCD computations on clusters) are under development. A testbed for this work will be deployed at JLab, Univ. of TN, ORNL, and some JLab academic partner sites during the next 6 mos.

Other recent interactions (within the last 3 months) include:

5. Remarks

None