US-CMS Core Application Software Status Report - PowerPoint PPT Presentation

About This Presentation

US-CMS Core Application Software Status Report


... one must store summary another way or loop over higher ... At the time of the last review only CERN and Fermilab had successfully run all production steps. ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 45
Provided by: cms5
Learn more at:


Transcript and Presenter's Notes

Title: US-CMS Core Application Software Status Report

US-CMS Core Application Software Status Report
  • Ian Fisk
  • SCOP Review
  • October 26, 2001

  • Quick Introduction to the Core Application
    Software Project (CAS)
  • Scope of the project
  • Division of labor
  • Status
  • Plans
  • Progress
  • Problems
  • News From Management
  • Milestones and schedule
  • Summary

Introduction to CAS
  • CAS is the US-CMS Core Application Software
    Project. We are involved in 4 main areas
  • WBS 2.1 CMS Software Architecture
  • Core Framework Development
  • Sub-System Architecture Development
  • CAFÉ (CMS Architecture Forum and Evaluation)
  • WBS 2.2 IGUANA
  • Graphical User Interfaces
  • Visualization
  • Data Browsing, plotting, fitting
  • WBS 2.3 Distributed Data Management and
  • Evaluation,Testing, Integration of Grid Tools
  • Distributed Process and Database Management
  • System Services and Load Balancing
  • Development of Production Tools
  • Development of Distributed Production and
    Analysis Prototypes
  • System Simulation and System Scalability
  • WBS 2.4 Support

Inside International CMS
  • CPT is a combination of Computing, Physics,
    and Trigger DAQ. Computing has been divided into
    7 sub-projects. There are 5 cross project
    groups to handle interactions between projects.

CCS Core Computing Software
PRS Physics Reconstruction and Selection
TriDAS Online Software
1. Computing Centres
2. General CMS Computing Services
9. Tracker / b-tau
3. Architecture, Frameworks / Toolkits
7. Online Filter Software Framework
10. E-gamma / ECAL
8. Online Farms
4. Software Users and Developers Environment
11. Jets, Etmiss/HCAL
5. Software Process and Quality
CAS Work
12. Muons
6. Production Processing Data Management
7. Grid Systems
RPROM (Reconstruction Project Management)
SPROM (Simulation Project Management)
CPROM (Calibration Project Management)to be
Cafe (CMS Architectural Forum and Evaluation)
GPI (Group for Process Improvement)recently
Introduction to CAS
  • CAS Currently Employs 8 On-Project Developers at
    5 Institutions
  • Michael Case UC Davis 2.1 75 2.4 25
  • Greg Graham Fermilab 2.3 75 2.4 25
  • Iosif Legrand Caltech (CERN) 2.3 75 2.4 25
  • Vladimir Litvin Caltech 2.1 50 2.3 25 2.4
  • Ianna Osborne Northeastern (CERN) 2.2 75 2.4
  • Natalia Ratnikova Fermilab 2.4 50
  • Lassi Tuura Northeastern (CERN) 2.1 75 2.4 25
  • Hans Wenzel Fermilab 2.1 25 2.4 25
  • Tony Wildish Princeton (CERN) 2.3 50 2.4 50
  • WBS 2.1 2.25
  • WBS 2.2 0.75
  • WBS 2.3 2.25
  • WBS 2.4 2.75
  • Totals 8.0

Progress and Plans
  • In the past we have focused a lot on technical
    progress during these reviews
  • In the next several years CMS has several high
    level milestones related to the software project
  • DAQ TDR 2002
  • Computing TDR 2003
  • Physics TDR 2004
  • 20 Data Challenge in early 2004
  • Try to talk today about the pieces needed to
    complete the upcoming milestones and technical
    progress being made toward completing those
  • Break the discussion into 3 pieces
  • Software for Reconstruction and Simulation
  • Software for Analysis
  • Software for Distributed Computing

Simulation and Reconstruction
  • CMS Software has a data store, a central
    framework, a number of components, and a variety
    of support packages.

Visualization Tools
The Database
  • Final choice currently scheduled for the end of
  • Considerable effort required to make a reasonable
  • Long term commercial viability of Objectivity far
    from assured.
  • ORACLE 9i being investigated by a CMS CERN
    fellow. Preliminary indications are that some of
    the Object handling aspects have not been
    completely productized yet.
  • 50 areas of concern have been submit to IT to
    determine if there are any show-stoppers
  • Root-IO being examined
  • Workshop on October 10-11, with presentations by
    CAS engineers Tony Wildish and Hans Wenzel
  • This is a key area of concern to CMS

Root IO Workshop conclusions and
presentations http//
CMS Software Framework
  • This year CMS reorganized the central framework
    into the COBRA Project
  • Includes the elements of CARF and Utilities
  • More efficient to reuse code
  • SCRAM managed project. Ianna Osborne responsible
    for modularization
  • CMS software packages modified to use the new
    framework ORCA, OSCAR, IGUANA
  • Investigation over the next half year of how to
    separate the production and file handling aspects
    of COBRA from the event and reconstruction
    aspects, CAS Engineers Tony Wildish and Greg

CMS Reconstruction and Simulation Packages
  • The GEANT3 Based Fortran simulation program.
    Workhorse since the first CMS Fortran code.
  • Supported by expected to be replaced by OSCAR
  • Simple test application for Grid Developers
  • The GEANT4 based simulation. Needed for
    Physics TDR
  • Fully functional software expected by the end of
  • Physics Validation through next year
  • Production Software expected by the end of 2002
  • Fast Simulation Program. Needed to complete the
    physics TDR and 20 Data Challenge.
  • Currently in Proof-of-concept phase. Lack of
  • CMS Reconstruction Code. Advanced. Needed for
    DAQ TDR, Physics TDR, Computing TDR, and Data
  • More information in David Sticklands Second Talk

CMSIM GEANT3 Fortran Based Simulation
OSCAR GEANT4 Based Simulation
FAMOS Fast Simulation
ORCA Reconstruction
Technical Progress
  • Last year OSCAR could not do full detector
  • This year several hundred events can be simulated
  • Big infusion of people. New coordinator, new
    developers, new librarian Hans Wenzel who has
    handled release and configuration
  • Fast Development requires frequent release
  • Need to rapidly progress to reliably being able
    to simulate thousands of events for physics
  • ORCA continues to develop and improve
  • 68 Developers supporting 190k lines of code
  • Most reconstructed objects are stored
    persistently with the release of ORCA 5 including
    tracks with reasonable subset of functionality
  • CAS engineer Vladimir Litvin has been working on
    the framework used in the calorimetry
    reconstruction. Main developer left CMS and code
    has been unsupported since.
  • First phase expected before the end of October

Support Packages
  • Visualization was listed as a support package, a
    useful debugging tool. Part of the IGUANA
    package and will be covered in analysis.
  • CAS Engineer Michael Case has been participating
    in the development of the Detector Description
    Database. A consolidated store of detector
    information used by CMS software clients. Needed
    for consistent geometry input to CMS Software
  • Lots of other supporting packages SCRAM for
    configuration, DAR for distribution, OVAL for
  • Functional prototype Nov. 2001
  • Basic Geometry and Materials
  • Basic Core Functionality
  • XML Schema
  • Fully Functional Prototype April 2002
  • All CMS Required Solids
  • All CMS positioning parameters
  • Numbering Scheme
  • Prototype of XML DDD editor

CMS Architecture CAFE
  • Café cross-project task force set-up (end 2000)
    to evaluate, document, and provide feed back for
  • Documentation and evaluation of the existing
    architecture, design, use-cases, scenarios,
  • 4 task forces
  • Online helped to define simulation program
  • Framework gave recommendation for organization
    of sub-projects and the reuse of core software.
  • Analysis is working on a CMS analysis
    requirements document to be fed to IGUANA.
  • Distributed Computing working on CMS
    requirements for the grid projects as well as
    CMS Distributed Production and Distributed
    Analysis Requirements
  • CAS engineer is working on the top level
    description document for the CMS central Software
  • CAFÉ hasnt succeeded in providing the
    evaluations or the documentation that was
    expected. CMS is still trying to determine how
    to revitalize this project.

  • In order to complete high level milestones a lot
    of analyses must be performed
  • 2002 DAQ TDR
  • Physics analysis for high level trigger studies
  • 2003 Computing TDR
  • Physics analysis techniques both local and
  • Verification of on-line trigger routines,
    reconstruction code, required networking, etc
  • 2004 Physics TDR
  • Prototypical Analysis for everything
  • 2004 20 Data Challenge
  • 20 test of the entire system starting from the
    raw detector readout, though triggering,
    reconstruction, and analysis

Analysis and Data Accessibility
  • CMS has increased data accessibility through the
    use of the database. Allows more transparent
    access to many levels of the data
  • Loop over data summary quickly
  • Access lower levels of data for more detailed
  • Possible to visualize even the raw data for small
    set of selected events
  • Unfortunately very few people have been able take
    advantage of the improved accessibility.
  • Accessing the database directly through the
    framework is possible but until recently there
    hasnt been a summary format which maintains the
    connections ntuples, root files, etc. break the
  • Without a workable summary format, one must store
    summary another way or loop over higher levels of
    data for each analysis step
  • This has been painful due to deficiencies in the
    staging system at CERN
  • Led almost all analysis groups to write
    alternative summary format
  • Muon writes Root files
  • Jet/Met writes ntuples

Summary Format
  • Current front runner for summary formats is the
    use of tags
  • Tags can store small amounts of data like ntuples
    or root files
  • Can be looped over quickly for analysis jobs
  • Can be stored so that connections are maintained
    to more detailed levels of the database
  • Need Physics groups to help define AOD (Analysis
    Object Data).
  • Good example code exists for creating generic
  • Tools used to analyze them are still under
    discussion whether its tags or some other summary

Analysis Tools
  • No unique choice satisfying both users
  • Users tend to use any tool with the required
  • Developers worry about quality, integration and
    support issues
  • Idea is to take advantage of existing tools as
    much as possible, but to combine the
  • Create a uniform architecture and interfaces for
    multitude of tools. Interoperable
  • Allows custom functionality to the achieved
    without the manpower required for creating
    completely custom tools
  • Workshop at the next CPT week meeting
  • Explain current ideas and plans to the physics
  • Demonstrations of generic analysis packages
  • Reactions of Physics and Developers
  • Demonstrations of preliminary integrations of
    analysis modules and CMS software.
  • Get input from the PRS groups about desired and
    required functionality

New Analysis Architecture
  • New Analysis Architecture is being written by a
    CAS engineer
  • Relying on a very small very flexible kernel
  • Uniform architecture and interfaces for multitude
    of tools
  • Interoperable components/plug-ins
  • Consistent with HEP trends (e.g. HEPVis 2001)
  • Consistent with Lizard (C), JAS (Java),
    Hippodraw (C/Java)
  • Close links to CMS data without strong-coupling
    of software
  • First implementation of new architecture will be
    released October 2001

  • CAS Engineers Ianna Osborne and Lassi Tuura have
    led the development of Interactive Graphics for
    User ANAlysis
  • Main IGUANA focus - interactive detector and
    event visualisation
  • High-performance 2D/3D graphics
  • Graphical user interfaces
  • Data browsers.
  • Integration of other tools, components
  • The goal is to provide common look and feel for
    the CMS interactive graphical applications
  • Interactive analysis is not considered a primary
    goal. It is assumed that this functionality will
    be provided by other tools (JAS, Hippodraw,
    Lizard, ROOT, or OpenScientist)

ORCA Visualisation with IGUANA
ORCA Visualisation
  • Based on generic IGUANA toolkit
  • with CMS specific extensions for Detector
  • Geant3 detector geometry
  • Reconstruction geometry for the Tracker
  • and Event
  • Muon DT, CSC, and RPC sim hits DT and CSC
    track segments, CSC rec hits reconstructed and
    simulated tracks
  • Tracker-Bt simulated and reconstructed tracks,
    measurements with directions, sim hits
  • ECAL-Eg simulated and reconstructed hits
  • HCAL-JetMEt digits, jets.

Sim Hits and Sim Tracks
Z slice
setenv OO_FD_BOOT cmsuf01/cms/reconstruction/us
er/jet0501/jet0501.boot InputCollections
OSCAR (GEANT4) Visualisationusing IGUANA
IGUANA Viewer displaying OpenInventor scene
Control of arbitrary GEANT 4 tree
Correlated Picking
OSCAR Visualisation Next Step
  • Integration of the detector overlap tool (Martin
  • Extending the scope of the configuration wizard

Example extension (a trivial wizard) Queried
from plug-in database, located on request and
bound to IGUANA G4 Run Manager
Software For Distributed Computing
CMS Distributed Computing System
  • Prototypes of several of the dedicated
    Facilities exist
  • Part Time PrototypeTier0 facility at CERN
  • Full Time Prototypical Tier1 Facility at Fermilab
  • Several Full Time Prototype Tier2 Facilities in
    the US and Italy
  • More shared production facilities, many of which
    will eventually be entries in the chart

Production Software
  • First production performed at CERN (by David
    Stickland and Tony Wildish)
  • Production needed to complete TDRs and Physics
    Studies rapidly overwhelms capabilities of CERN
  • Need to take advantage of computing resources
    both dedicated and shared at remote facilities
  • How do you arrange for a lot of people to rapidly
    become production managers?
  • Clone and Distribute David and Tony
  • How do you maintain the consistency of jobs run
    all over the world?
  • How do you transfer the results to central
    facilities for analysis?
  • A few files at a few centers is easy but the
    complexity grows very rapidly
  • Need easy to use common production tools which
    can consistently specify and execute production
  • Flexible and site independent enough to be used
  • Need applications to transfer and manage data
    from remote sites.

IMPALA Production Tools
  • Intelligent Monte carlo Production and Analysis
    Local Administrator
  • In response to CMS need for reliable production
    tools IMPALA was created
  • Production scripts initially developed by Hans
  • Were Ported to CERN and made site independent by
    Greg Graham
  • Now used by almost all CMS Production centers
  • Has resulted in smoother and more reproducible
  • At the time of the last review only CERN and
    Fermilab had successfully run all production
    steps. Now several regional centers have
  • IMPALA is implemented as bash scripts. It allows
    for good functionality, but it is hitting the
    limits of complexity.
  • As more functionality and site independence is
    desired a more flexible implementation is
  • Next set of job specification tools called
    MC_runjob, a joint project between D0 and CMS, is
    implemented in Python.

MC_runjob implementation
  • Currently in initial release
  • Allows specification and chaining of executables
  • Allows production jobs to be templated and
    reduces the manual configuration of production
  • Has a GUI
  • First big test is a 12 million event sample for
    calibration which will be run from generation
    through simulation, reconstruction and analysis
    in a single job

  • The Grid Data Mirroring Package, developed by
    CMS, PPDG, and EDG, is an example of CMS
    successfully interacting with the Grid Projects.
  • Tools are needed to transfer and manage results
    run at remote centers
  • Easy to handle this manually with a few centers,
    impossible with lots of data at many centers
  • GDMP is based around Globus Middleware and a
    flexible architecture
  • Globus Replica Catalogue Recently implemented to
    handle file format independent replication
  • Formerly only could manage Objectivity Data Files
  • Successfully used to replicate about a 1TB of CMS
    data during tests
  • GDMP Heartbeat monitor proposed by CAS engineer
    Greg Graham
  • Automatically verifies the client and servers are
    working and notifies users when problems occur.

Production Issues
  • Almost immediately pieces that are needed begin
    to appear
  • Information about the job parameters used to run
    the jobs isnt stored in a convenient way making
    it difficult for the end users to determine
    exactly how jobs were produced
  • Job and Request tracking is done almost entirely
  • Web pages are updated by hand, stored in several
  • It is difficult to determine if production farms
    are running efficiently and to diagnose and solve
  • Solutions Proposed
  • EU DataGrid and CMS developed BOSS system helps
    with the Job specification tracking with the use
    of a simple database
  • US-CMS members are working to include the IMPALA
    specified parameters into the database
  • CAS Engineer Iosif Legrand is working on
    monitoring tools for clusters
  • Variety of technologies are being investigated
  • In the short-tem this helps the efficiency of the
    production system
  • In the long-term monitoring serves information to
    advanced Grid-Service

Agent-Based Distributed System
  • Based on JINI
  • Includes Station Servers (static) that host
    Dynamic Services
  • Servers interconnected dynamically to form a
    fabric in which mobile agents can travel with a
    payload of physics analysis tasks
  • Prototype is highly flexible and robust against
    network outages
  • Amenable to deployment on leading edge and future
    portable devices (WAP, iAppliances, etc.)
  • The ultimate system for the travelling physicist!
  • Design document submitted as part of review
    material (I. Legrand)
  • Studies using the MONARCSimulator (build on SONN

Task Allocation to SitesReplica and Workflow
System Scalability
  • CMS has an aggressive ramp up of computing
    complexity to reach a full sized system.
  • Target to reach 50 of complexity by 2004
  • T0/T1 with approx 600 cpu boxes
  • (CPU boxes is an inadequate measure of full
  • Double each year
  • End of 2001 200 boxes
  • End of 2002 400 boxes
  • Along with the effort to make use of distributed
    computing resources, there is considerable effort
    needed to use large amounts of local resources
    when there are central services.
  • CAS Engineer Tony Wildish has been instrumental
    in achieving the complexity milestones.

Production Now
  • While CMS production has not gone as quickly as
    we would have like the use of distributed
    facilities has been successful with 6 fully
    operational centers and another several expected.
  • Unfortunately has many manual components
  • Production Tool improvements will help some of
  • Manpower intensive, requires a production manager
    at all participating sites
  • The next step is to begin to investigate
    Distributed Production Systems
  • Even for the 20 data challenge, tools to
    automate the use of remote facilities for
    production and reconstruction are needed
  • CMS plans for manpower at the Tier2 centers does
    not allow for full time production people at
    Tier2 facilities. Long-term need to reduce
    manpower needed for production and eventually

Distributed Production Systems
  • To automate even basic predictable, schedulable
    production need tools
  • Authentication Modules
  • Distributed Schedulers
  • Automated Tools for Data Replication
  • Job Tracking Tools
  • System Monitoring
  • Production Configuration Tools
  • To get even a little more advanced
  • Resource Discovery Tools
  • Resource Brokers
  • Load Balancing
  • Cleary substantial support is needed from the
    Grid Projects

Distributed Production Prototypes
  • PPDG Developed MOP System
  • Relies on GDMP for file replication
  • Globus GRAM for authentication
  • Condor-G and local queuing systems for Job
  • IMPALA for Job Specification
  • Currently the system is deployed
  • at FNAL, UCSD, Caltech, and
  • U. Wisc
  • Allows submission of cmsim jobs
  • from a central location, run on
  • remote locations, and return
  • results
  • More complicated ORCA
  • production testing expected
  • soon.

Prototypes and Plans
  • EU DataGrid Developed TestBed1 will run before
    the end of the year
  • They hope to achieve a similar functionality to
  • Switching from predictable Distributed Production
    Prototypes to choatic Distributed Analysis
    Prototypes is a significant step in complexity
  • Addition of analysis users
  • More complex authentication
  • Additional security to protect against the
    careless and the malicious
  • Resource Discovery for both data and computing
    resources necessary
  • Load Balancing more complicated
  • Time Estimation Tools required
  • Good Interactions with the Grid Necessary
  • CMS needs to clearly define requirements and
  • Process started, but clearly a lot of work is
  • New CCS Level 2 Task for Grid should define the
    CMS requirements and evaluate the prototypes

Schedule and Milestones
  • CMS Software Schedule is tight
  • CMS Architecture Development WBS 2.1
  • WBS Detector Description Database
  • Has a release of a functional prototype in Nov
  • More critical that the Fully Functional Software
    is released in April (WBS for
    integration with the CMS software packages that
    will use it
  • Development can progress without it, but its
    important to have a common set of information for
    all software and it would be good to validate
    OSCAR with something close to a final system
  • Important to get Users reaction to tools
    (Michael Case is scheduled to work with the End
    Cap muon group in California assessing techniques
    and user interface)
  • It has been difficult keeping the remote
    engineers working as efficiently as the people
  • resident at CERN.
  • Requires considerable effort on both sides to
    make it work.
  • The most successful examples are at Fermilab
    where there is a large team to work
  • with

  • WBS OSCAR Development is progressing much
    better than last year
  • In order to perform physics validation of GEANT4
    fully functional code must be delivered soon.
  • US-CMS has 0.5 FTE working in this area.
  • WBS Sub-System Architecture Development
  • This was a new effort this year. First release
    was expected end of September
  • Has slipped but next major production is not
    scheduled until Jan. 2002
  • Will be critical if initial modifications and not
    completed in time to validate
  • WBS Analysis Sub-Architecture
  • On schedule for a release at the end of the month
  • WBS Production Architecture is a new task
    for the coming year
  • Primarily a US responsibility with 1FTE of effort
    identified over 2 people for development

  • WBS 2.2 IGUANA
  • Several of IGUANAs milestones slipped because
    the funds for an additional developer expected at
    the beginning of the year were only made
    available recently. Search progressing to fill
  • Data Browsers for CMS software were pushed into
    the early part of next year.
  • Analysis in general has not gone as smoothly as
    everyone would like.
  • WBS 2.3 Distributed Data Management and
  • WBS 2.3.2 Computing Complexity Progression
  • Involved a tremendous amount of effort to stay on
  • Does not appear to be getting easier and there is
    still a long way to go
  • May require that CMS rethink elements of the
    local computing model to make it simpler and more

  • WBS 2.3.6 Distributed Production Tools
  • Work has generally gone very well
  • A few milestones were delayed, but production in
    the spring should 2002 should run a lot faster
    and smoother than the Fall 2000 production which
    took almost a year to complete.
  • Reaching a level of maturity that we will be able
    to use a lot of Gregs time in the next year for
    other development tasks.
  • WBS System Monitoring Tools
  • Effort recently started this summer by Iosif
    Legrand in cooperation with a CERN CCS developer
    and two Pakistani students
  • First release may be available in time for the
    Spring 2002 Production run
  • WBS 2.3.4, 2.3.7, and 2.3.8 Distributed Data
    Management, Distributed Production Prototyping,
    and Distributed Analysis Prototyping require good
    interactions with the grid projects
  • So far there have been some interesting
  • More formal interactions are needed
  • Better evaluations and requirements from CMS

Milestones 2.1 Architecture Use Case Analysis For New Analysis Architecture Feb 9, 01 Feb 9, 01 Tools for Conversion of XML to GEANT3 March 1, 01 March 15, 01 First Release of CAFÉ Documentation Tools March 1, 01 March 1, 01 Top Level CARF Description Document May 1, 01 Dec 1, 01 Assessment of XML Technology July 1, 01 August 1, 01 Release of redesign document July 3, 01 July 7, 01 Release of code for use in production Sept 27, 01 Oct 15, 01 Analysis Architecture kernel defined Oct 31, 01 Release of DDD Prototype Nov 15, 01 Release of Calo Code Phase 1-4 Dec 19, 01
Milestones 2.2 IGUANA Review of baseline technology GUI technologies Oct 31, 01 Review of baseline graphics technologies Oct 31, 01
Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower Several IGUANA milestones from Oct were delayed 6 months due to lack of manpower
DDMP Milestones 2.3 File format Replication In GDMP Feb 12, 01 Aug 01, 01 Release of Distributed Production Prototype Design Document Feb 13, 01 May 25, 01 Implementation of Security Protocol in Objectivity Mar 01, 01 Apr 04, 01 Port of FNAL Scripts to CERN Mar 3, 01 Mar 01, 01 Release of Site Independent Scripts May 2, 01 May 2, 01 Test of Distributed Production System between FNAL and U. Wisc July 4, 01 Oct 2, 01 Test of Distributed Production Tier2 Aug 16, 01 Sep 28, 01 Tools for Job Specification Aug 6, 01 Oct 15, 01 Tools for Job Specification in BOSS Sep 25, 01 Nov 1, 01 Evaluation Document for MOP Oct 12, 01 Dec 1, 01 Compilation of User Reaction to spec Nov. 28, 01
  • Lot of progress in a variety of areas, good
    contributions from CAS engineers
  • CMS Software schedule is tight with high level
    milestones approaching quickly
  • Lots of work left to do
  • Choice of viable database solution
  • Validated GEANT4 simulator
  • CMS Complexity Progression very aggressive
  • Production Tool Improvements
  • Production scheduled for Feb. 2002 and needed to
    be completed by summer is as large as the sample
    which recently took a year.
  • Analysis Tools and reaping the benefits of the
    improved data accessibility.
Write a Comment
User Comments (0)