Data Intensive Sciences University Network
DISUN Development & Integration
"An important objective of the proposed effort is to maintain the appropriate balance between making the services developed by DISUN available to other communities in the form of components that are part of the Virtual Data Toolkit (VDT) and effectively supporting the specific needs of the CMS Experiment. Given the time scale of CMS, with data taking expected to start in late 2007, we need to deploy production quality cyberinfrastructure iteratively with increasing functionality. The functionality deployed through this iterative process has to be well matched to the total data volume generated by the CMS detector. In addition to the lessons learned from operating a production cyberinfrastructure, we expect our engagement with other sciences at our respective campuses, as well as within the Open Science Grid, to strongly influence the details of the factorization of the DISUN software. The following describes several phases of development and deployment of the DISUN services. The combined four phases reflect 5.5 years of work (6 months before funding and 5 years after the proposed funding). The existing Tier-2 prototype infrastructure at Caltech, UCSD, and UFL is fully embedded into Grid3. As successful as Grid3 has been for production of simulated CMS data, it requires several additional functionalities for use by CMS physicists as an infrastructure for data analysis. In the following we briefly describe the different functionality and its relationship to existing CMS infrastructure. Most of this functionality is representative of applications with large data sets."
(Excerpt from DISUN proposal Spring 2005)
In the following we state the DISUN goals in form of excerpts from the Spring 2005 proposal, followed by the status towards achieving these goals.
Phase 1 Goals (first 2 years) and Status
Managed Storage
Goal
With the current state of middleware in Grid3, all data are shared within an organization like CMS and are accessible across a cluster through file systems that are neither access nor quota protected. We will work with the U.S. CMS and Fermilab Privilege Project, as well as with the PPDG project, on integrating and deploying SRM-based storage and dynamic account assignments to improve on these major deficiencies. The initial stages of this development are already underway.
Status as of January 2006
DISUN is helping to integrate SRM/dcache into the OSG software stack. This includes testing of the srm client (UCSD), advertising SRM availability at a site via the Generic Information Provider (UFL & UCSD), and helping a new application community (SDSS) to adapt their applications for use of SRM. DISUN (UCSD) is helping OSG to define, and accurately advertise the different types of storage supported on OSG. The latter includes configuration instructions for site administrators, as well as guidance for the application communities. DISUN has deployed role-based authentication for SRM/dCache at its UCSD site. This is now being evaluated as part of the production system before it is packaged for wider deployment.
Managed Data Transfers
Goal
We will develop a user interface to the CMS data transfer system (Physics Experiment Data Export, PhEDEx) for dependable data movements and data set placement across the CMS hierarchy of storage elements. This will allow individual users to request staging of multi-Terabyte datasets to specific location at high data throughputs. These systems will stage data and metadata to a managed storage system at the site, which is provided as part of the U.S. CMS data grid infrastructure.
Status as of January 2006
Two of the four DISUN sites, Caltech and UFL, were part of the winning team, lead by Caltech and the Ultralight project, in the "bandwidth challenge" at Supercomputing 2005. All four sites are connected at 10Gb/sec, and we are beginning work towards understanding how to exploit the large bandwidth of these links for production transfers between our production storage systems.
Supporting Data Parallelism and Fault Tolerance
Goal
In order to analyze large volumes of data conveniently and efficiently, it is essential for the infrastructure to provide reasonably intelligent parallelization of workloads automatically, track completion of workloads, and support resubmission in case of infrastructure - rather than user related - failures. Eventually this will include automatic data movement to collocate data and CPU dynamically, and shared access to resources by interactive and batch based processing systems.
Status as of January 2006
DISUN's work in this area benefits greatly from the close integration of DISUN into the CMS Software & Computing project, as part of the Distributed Computing Tools group. DISUN's contributions in this area are threefold so far:
- DISUN (UFL) is responsible for CMS software installation and validation on all CMS resurces accessible via the OSG. This includes at this point the tier-1 center, all six deployed tier-2 centers, and one tier-3 center. DISUN (UFL) developed an infrastructure for automatic installations based on cmsi, the rpm based CMS software distribution. The installations are validated using MCPS. Once validated, installed software versions are advertized via both the General Information Provider (GIP) as well as the OSG Discovery Service, developed by Caltech as part of PPDG. This activity was started by DISUN, and has already successfully migrated from development to production.
- The existing MC production system is labor intensive, and doesn't scale to large scale production activities on the Worldwide LHC Computing Grid. A complete redesign to benefit from the new Edm and framwork, as well as the new catalogue infrastructure is a main focus of CMS since late Fall 2005. DISUN contributed significantly to the design of the new system (Caltech, UCSD, UFL), and is now working (Caltech) closely with FNAL on the new Monte Carlo Production System within the context of global CMS.
- CMS-INFN is leading the development of CRAB, the CMS Remote Analysis Builder. This INFN effort is focused on the LCG infrastructure. DISUN (UFL) together with FNAL is repsonsible for adaptation of CRAB to OSG. This activity also informs the OSG-LCG interoperability effort. A first prototype of OSG-CRAB was used during service challenge 3 (sc3) to submit more than 18,000 analysis jobs across the six participating CMS tier-2 centers in the US. DISUN (UFL) was responsible for operations of this part of sc3 as described in detail here. In the context of CRAB submissions to OSG via the LCG resource broker, DISUN is driving the OSG-LCG interoperability for USCMS tier-2 sites. First successful submissions to UFL and UCSD via the LCG resource broker occured on February 6th, 2006.
Improving the Interface
Goal
Our experience with Grid3 has shown the limitation of the existing middleware interface. A single user submitting a large number of jobs can bring down a whole cluster due to the lack of resource allocation capabilities for the head nodes in the current architecture of the grid middleware. We will extend this interface to support a large number of users and jobs. These scalability issues are currently being addressed in middleware developments like Condor-C developed by the Wisconsin group, which we will integrate and deploy in the Tier-2 environment.\
Status as of January 2006
The job failure rate for CMS production Monte Carlo submissions on OSG 0.2.1 software stack is roughly 30%.
We identified the dominant causes of failure to be headnode overloads. When the CE of a grid site overloads, all
running jobs on that site are typically lost. We identified the following causes for such overloads:
- CE overloads due to the fact that the pre-ws GRAM models its state in processes on the headnode. As a result, a large number of pending jobs will lead to resource exhaustion on the CE. While Condor has developed a workaround refered to as the "grid monitor" in Condor-g, it is up to the Condor-g client, i.e. the user of the grid infrastructure, to turn on this feature. There is no way that a site in OSG 0.2.1 can enforce the use of the grid monitor, and reject submissions that would not use this feature of Condor-g.
- CE overloads due to the fact that jobmanager-fork allows unscheduled forking of arbitrary number of jobs on the CE hardware.
- CE overloads due to user applications that stream stderr and stdout via the CE to the client's submission host. We find that typical CMS analysis jobs have such large stderr/stdout that a single user turning it on will bring down the CE if that user happens to have a few 10's of jobs running simultaneously on the cluster.
- CE overloads due to deployment of CE and SE on the same hardware, coupled with an SE that does not provide any scheduling of stage-in or stage-out of data. Such unscheduled data movement routinely leads to overloaded SE's, which will bring down the hardware it is deployed on. In OSG 0.2.1 it is customary to deploy both services, CE and SE, on a single piece of hardware.
DISUN (UW & UCSD) have focused on solutions to the last three of these problems. Solutions have been tested in a testbed environment, and are presently being deployed on the production cluster at one site (UCSD) within the context of the OSG 0.4.0 release. As we gain experience with these modifications in a production environment, we will deploy them across all DISUN sites, as well as the other CMS tier-2 sites. Part of the solution to the last of the four problems is the ubiquitous deployment of SRM/dCache. DISUN is working with the Fermilab computing division, and the USCMS tier-1 team in this area as described above. Software modifications are being fed back into VDT for deployment in OSG 0.4.1, scheduled for March 2006. We hope to decrease the failure rate by a factor of three as a result of these changes. To resolve the first of these causes for instability of the CE, we expect to deploy the ws GRAM. DISUN and the CMS tier-2 teams in general (Nebraska & Purdue) are working with the Globus team at ANL, as well as the VDT team at UW on the ws GRAM configuration for OSG. We expect to be operating the improved pre-ws and the GT4 ws GRAM side by side on two seperate pieces of hardware as part of the OSG 0.4.1 infrastructure. This will allow us to gain experience on the reliability of these two different CE implementations under production conditions. The goal for 2006 is to decrease the job failure rates for CMS Monte Carlo production on OSG by an order of magnitude from the 30% we have seen within the last few months. We expect to devote significant effort to this goal.
Providing additional monitoring capabilities
Goal
We plan to integrate and deploy a first implementation of the application monitoring services developed by UCSD for CMS as part of the PPDG project. This will provide users real-time read-only access to their running execution environments in a transparent fashion, including specific user tools to manipulate and get information about the environment (like "ls", "tail", "ps" etc.) providing the familiar paradigm of processes running on the user's desktop.
Status as of January 2006
JobMon is now available as part of VDT. It has been deployed on DISUN within the context of MCPS, a first preliminary prototype of the new Monte Carlo production system for CMS. In addition, it is used by CDF as part of its LCG-CAF infrastructure.
Interactive Analysis Environment
Goal
In addition to the batch-processing infrastructure for large-scale data analysis, we require an interactive environment in which visualization of the data is possible. The dominant tool for this activity in HEP is ROOT [21], including the parallel root facility (PROOF). We expect a first production version of the Proof Enabled Analysis Center (PEAC) [22] to be available. PEAC provides a multi-user interactive ROOT ntuple analysis environment based on PROOF. It is based on Clarens as web services infrastructure, and is being developed as part of the Grid Analysis Environment (GAE) effort by Caltech, MIT, and UCSD. First proof of principle prototypes were presented at SC2003, and at SC2004, supporting applications from the two HEP experiments CMS and CDF. The long-term vision is for PEAC to run parasitically on the batch based services of the SDAF. We expect this to be possible due to the low duty cycle of interactive analysis: physicists spend more time thinking than computing when they work interactively. The Wisconsin group will develop Condor-C in order to integrate the Computing On Demand (CoD) capabilities of Condor with Condor-C.
Status as of January 2006
We do not expect this to be a priority within the first year of DISUN, given the work in other areas.
Expanding the User Community
Goal
During phase 1, we expect to integrate GLOW's resources as well as its user community into DISUN. We expect efforts that expand the DISUN user community beyond physics to account for roughly 0.5 FTE throughout the lifetime of DISUN. We expect the focus of such efforts to shift frequently between the four sites, depending on the expertise required, as well as the locality of the new user communities to be addressed. In general, we expect new user communities to become users of the full Open Science Grid soon after they are operating comfortably within DISUN. DISUN thus lowers the threshold for entry into the OSG.
Status as of January 2006
DISUN has provided targeted support for LIGO, SDSS, CDF, and GLOW for brief periods of time
in order to ease their transition onto the OSG infrastructure in one way or another.
The work with SDSS has centered around helping SDSS to incorporate the use of SRM as part of their workflow in order to
benefit from the large amount of storage available via SRM/dCache within DISUN.
We see SDSS as an early adopter among a larger class of applications that all use VDS, the Virtual Data System. We are
leading a parallel session at the OSG consortium meeting at UFL end of January that was focused on enabling these
communities to benefit from our SRM/dCache deployments within DISUN in particular, as well as the LHC
tier-1 and tier-2 program in general.
The work with LIGO was focused on deploying a new VDS version, and working with them to succeessfully run their first
science application on the OSG.
The work with CDF was focused on deploying the CDF glide-caf portal to benefit from some of the DISUN,
as a first step in the development of the OSG-CAF portal for CDF.
In addition, DISUN is involved in preliminary
discussions on interoperability between OSG and TeraGrid with a focus on understanding how to run CMS applications transparently on
both infrastructures.

