eScience eBusiness eGovernment and their Technologies Introduction - PowerPoint PPT Presentation

1 / 58
About This Presentation

eScience eBusiness eGovernment and their Technologies Introduction


We will cover Java client/server, three-tiered and network programming. ... USA Network. Terabit Networks ... Observatory will link Australia, Japan, USA . – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 59
Provided by: servo


Transcript and Presenter's Notes

Title: eScience eBusiness eGovernment and their Technologies Introduction

e-Science e-Business e-Government and their
  • Bryan Carpenter, Geoffrey Fox, Marlon Pierce
  • Pervasive Technology Laboratories
  • Indiana University Bloomington IN 47401
  • January 12 2004
  • http//

Course Topics Background/Core
  • Java Programming
  • We will assume basic Java programming proficiency
  • We will cover Java client/server, three-tiered
    and network programming.
  • Ancillary but interesting Java topics to be
    covered include Apache Ant, XML-Beans, and Java
    Message Service
  • XML and XML Schema
  • We will provide introductory material.
  • Necessary to understand Web Service standards
  • XML/Java Programming
  • XML Databases (Xindice, Sleepycat)
  • XPath, XQuery

Course Topics Web and Grid Services
  • Overview Material
  • Grid and Web Service Architectures
  • Basic Web Service Standards
  • WSDL, SOAP structure and definitions
  • Building services in Java Apache Axis
  • Advanced Web Services Emerging capabilities
  • WS-ReliableMessaging, WS-Security, WS-Transaction
  • Computational Grids
  • Globus Toolkit 2
  • Java COG Kit for Globus programming
  • Grids Meet Web Services
  • Open Grid Service Architecture/Infrastructure
  • Implementations GSX from Indiana University
  • The Semantic Grid Information Models for
    Describing Resources
  • RDF, DAML-OIL, and OWL

What are we doing
  • This is a semester-long course on Grids (viewed
    as technologies and infrastructure) and the
    application mainly to science but also to
    business and government
  • We will assume a basic knowledge of the Java
    language and then interweave 6 topic areas
    first four cover technologies that will be used
    by students
  • Advanced Java including networking, Java Server
    Pages and perhaps servlets
  • XML Specification, Tools, Linkage to Java
  • Web Services Basic Ideas, WSDL, Axis and Tomcat
  • Grid Systems GT3/Cogkit, Gateway, XSOAP, Portlet
  • Advanced Technology Surveys CORBA as history,
    OGSA-DAI, security, Semantic Grid, Workflow
  • Applications Bioinformatics, Particle Physics,
    Engineering, Crises, Computing-on-demand Grid,
    Earth Science

Grid Computing Making The Global Infrastructure
a Reality
  • Based on work done in preparing book edited
    withFran Berman andAnthony J.G. Hey,
  • ISBN 0-470-85319-0
  • Hardcover 1080 Pages
  • Published March 2003
  • http//

  • See the webcast in an Oracle technology
  • See also the Gap Analysishttp//grids.ucs.india
  • We can send you nicely printed versions of this
  • End of this is a good collection of references
    and it gives both a general survey of current
    Grids and specific examples from UK
  • Appendix with more details ishttp//
  • See also GlobusWorld http//
  • and the Grid Forum http//

e-moreorlessanything and the Grid
  • e-Business captures an emerging view of
    corporations as dynamic virtual organizations
    linking employees, customers and stakeholders
    across the world.
  • The growing use of outsourcing is one example
  • e-Science is the similar vision for scientific
    research with international participation in
    large accelerators, satellites or distributed
    gene analyses.
  • The Grid integrates the best of the Web,
    traditional enterprise software, high performance
    computing and Peer-to-peer systems to provide the
    information technology e-infrastructure for
  • A deluge of data of unprecedented and inevitable
    size must be managed and understood.
  • People, computers, data and instruments must be
  • On demand assignment of experts, computers,
    networks and storage resources must be supported

So what is a Grid?
  • Supporting human decision making with a network
    of at least four large computers, perhaps six or
    eight small computers, and a great assortment of
    disc files and magnetic tape units - not to
    mention remote consoles and teletype stations -
    all churning away. (Licklider 1960)
  • Coordinated resource sharing and problem solving
    in dynamic multi-institutional virtual
  • Infrastructure that will provide us with the
    ability to dynamically link together resources as
    an ensemble to support the execution of
    large-scale, resource-intensive, and distributed
  • Realizing thirty year dream of science fiction
    writers that have spun yarns featuring worldwide
    networks of interconnected computers that behave
    as a single entity.

What is a High Performance Computer?
  • We might wish to consider three classes of
    multi-node computers
  • 1) Classic MPP with microsecond latency and
    scalable internode bandwidth (tcomm/tcalc 10 or
  • 2) Classic Cluster which can vary from
    configurations like 1) to 3) but typically have
    millisecond latency and modest bandwidth
  • 3) Classic Grid or distributed systems of
    computers around the network
  • Latencies of inter-node communication 100s of
    milliseconds but can have good bandwidth
  • All have same peak CPU performance but
    synchronization costs increase as one goes from
    1) to 3)
  • Cost of system (dollars per gigaflop) decreases
    by factors of 2 at each step from 1) to 2) to 3)
  • One should NOT use classic MPP if class 2) or 3)
    suffices unless some security or data issues
    dominates over cost-performance
  • One should not use a Grid as a true parallel
    computer it can link parallel computers
    together for convenient access etc.

  • e-Science is about global collaboration in key
    areas of science, and the next generation of
    infrastructure that will enable it. This is a
    major UK Program
  • e-Science reflects growing importance of
    international laboratories, satellites and
    sensors and their integrated analysis by
    distributed teams
  • CyberInfrastructure is the analogous US initiative

Grid Technology supports e-Science and
CyberInfrastructure It is software (middeleware)
built on top of networks
Global Terabit Research Network
  • The Grid software and resources run on top of
    high performance global networks

USA Network
Terabit Networks
  • Network performance will increase faster than
    Moores law partly because optical fiber has
    almost unlimited bandwidth and partly because
    there are many old networks to be replaced
  • Home dial-ups (56kbit) ? DSL/Cable Modem (2
    megabits/sec) ? FTTP (Fiber to the Premise at
    gigabit performance)
  • 2006 Goal of Global Terabit Research
    NetworkInternational National Backbone
    Organization Optical Desktop Copper Desktop
    is10001000100101 Gigabit/sec

e-Business and (Virtual) Organizations
  • Enterprise Grid supports information system for
    an organization includes university computer
    center, (digital) library, sales, marketing,
  • Outsourcing Grid links different parts of an
    enterprise together (Gridsourcing)
  • Manufacturing plants with designers
  • Animators with electronic game or film designers
    and producers
  • Coaches with aspiring players (e-NCAA or e-NFL
  • Customer Grid links businesses and their
    customers as in many web sites such as
  • e-Multimedia can use secure peer-to-peer Grids to
    link creators, distributors and consumers of
    digital music, games and films respecting rights
  • Distance education Grid links teacher at one
    place, students all over the place, mentors and
    graders shared curriculum, homework, live

e-Defense and e-Crisis
  • Grids support Command and Control and provide
    Global Situational Awareness
  • Link commanders and frontline troops to
    themselves and to archival and real-time data
    link to what-if simulations
  • Dynamic heterogeneous wired and wireless networks
  • Security and fault tolerance essential
  • System of Systems Grid of Grids
  • The command and information infrastructure of
    each ship is a Grid each fleet is linked
    together by a Grid the President is informed by
    and informs the national defense Grid
  • Grids must be heterogeneous and federated
  • Crisis Management and Response enabled by a Grid
    linking sensors, disaster managers, and first
    responders with decision support

Classes of Computing Grid Applications
  • Running Pleasing Parallel Jobs as in United
    Devices, Entropia (Desktop Grid) cycle stealing
  • Can be managed (inside the enterprise as in
    Condor) or more informal (as in SETI_at_Home)
  • Computing-on-demand in Industry where jobs
    spawned are perhaps very large (SAP, Oracle )
  • Support distributed file systems as in Legion
    (Avaki), Globus with (web-enhanced) UNIX
    programming paradigm
  • Particle Physics will run some 30,000
    simultaneous jobs this way
  • Pipelined applications linking data/instruments,
    compute, visualization
  • Seamless Access where Grid portals allow one to
    choose one of multiple resources with a common

Utility Computing
  • An important business application of Grids is
    utility computing
  • Namely support a pool of computers to be assigned
    as needed to take-up extra demand
  • Pool shared between multiple applications
  • One his application is common in academia where
    different simulations share resources
  • Web Servers
  • Financial Modeling
  • Data-mining
  • Simulation response to crisis like forest fire or
  • Architecture is Farm of Grid Services connected
    to Internet not cluster of computers connected to
    each other

  • Computing-on-demand uses dynamically assigned
    (shared) pool of resources to support excess
    demand in flexible cost-effective fashion

Static Assignment with redundancy
Dynamic on-demand Assignment
Some Important Styles of Grids
  • Computational Grids were origin of concepts and
    link computers across the globe high latency
    stops this from being used as parallel machine
  • Knowledge and Information Grids link sensors and
    information repositories as in Virtual
    Observatories or BioInformatics
  • More detail on next slide
  • Education Grids link teachers, learners, parents
    as a VO with learning tools, distant lectures
  • e-Science Grids link multidisciplinary
    researchers across laboratories and universities
  • Community Grids focus on Grids involving large
    numbers of peers rather than focusing on linking
    major resources links Grid and Peer-to-peer
    network concepts
  • Semantic Grid links Grid, and AI community with
    Semantic web (ontology/meta-data enriched
    resources) and Agent concepts

Information/Knowledge Grids
  • Distributed (10s to 1000s) of data sources
    (instruments, file systems, curated databases )
  • Data Deluge 1 (now) to 100s petabytes/year
  • Moores law for Sensors
  • Possible filters assigned dynamically (on-demand)
  • Run image processing algorithm on telescope image
  • Run Gene sequencing algorithm on compiled data
  • Needs decision support front end with what-if
  • Metadata (provenance) critical to annotate data
  • Integrate across experiments as in
    multi-wavelength astronomy

Data Deluge comes from pixels/year available
2.4 Petabytes Today
SERVOGrid Solid Earth Research Virtual
Observatory will link Australia, Japan, USA
SERVOGrid Requirements
  • Seamless Access to Data repositories and large
    scale computers
  • Integration of multiple data sources including
    sensors, databases, file systems with analysis
  • Including filtered OGSA-DAI (Grid database
  • Rich meta-data generation and access with
    SERVOGrid specific Schema extending openGIS
    (Geography as a Web service) standards and using
    Semantic Grid
  • Portals with component model for user interfaces
    and web control of all capabilities
  • Collaboration to support world-wide work
  • Basic Grid tools workflow and notification

In flight data
5000 engines
Gigabyte per aircraft per Engine per
transatlantic flight
Global Network Such as SITA
Ground Station
Engine Health (Data) Center
Maintenance Centre
Internet, e-mail, pager
Rolls Royce and UK e-Science ProgramDistributed
Aircraft Maintenance Environment
NASA Aerospace Engineering Grid
Virtual Observatory Astronomy GridIntegrate
Dust Map
Visible X-ray
Galaxy Density Map
e-Chemistry LaboratoryExperiments-on-demand
Grid-enabled Output Streams
Grid Resources
CERN LHC Data Analysis Grid
Typical Grid Architecture
Sources of Grid Technology
  • Grids support distributed collaboratories or
    virtual organizations integrating concepts from
  • The Web
  • Agents
  • Distributed Objects (CORBA Java/Jini COM)
  • Globus, Legion, Condor, NetSolve, Ninf and other
    High Performance Computing activities
  • Peer-to-peer Networks
  • With perhaps the Web and P2P networks being the
    most important for Information Grids and Globus
    for Compute Grids

The Essence of Grid Technology?
  • We will start from the Web view and assert that
    basic paradigm is
  • Meta-data rich Web Services communicating via
  • These have some basic support from some runtime
    such as .NET, Jini (pure Java), Apache
    TomcatAxis (Web Service toolkit), Enterprise
    JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit
  • These are the distributed equivalent of operating
    system functions as in UNIX Shell
  • Called Hosting Environment or platform
  • W3C standard WSDL defines IDL (Interface
    standard) for Web Services

  • Meta-data is usually thought of as data about
  • The Semantic Web is at its simplest considered as
    adding meta-data to web pages
  • For example, the hospital web-page has meta-data
    telling you its location, phone-number,
    specialties which can be used to automate
    Google-style searches to allow planning of
    disease/accident treatment from web
  • Modern trend (Semantic Grid) is meta-data about
    web-services e.g. specify details of interface
    and useage
  • Such as that a bioinformatics service is free or
    bandwidth input is of limited amount
  • Provenance history and ownership of data very

A typical Web Service
  • In principle, services can be in any language
    (Fortran .. Java .. Perl .. Python) and the
    interfaces can be method calls, Java RMI
    Messages, CGI Web invocations, totally compiled
    away (inlining)
  • The simplest implementations involve XML messages
    (SOAP) and programs written in net friendly
    languages like Java and Python

PaymentCredit Card
Web Services
WSDL interfaces
Warehouse Shipping control
WSDL interfaces
Web Services
Services and Distributed Objects
  • A web service is a computer program running on
    either the local or remote machine with a set of
    well defined interfaces (ports) specified in XML
  • Web Services (WS) have many similarities with
    Distributed Object (DO) technology but there are
    some (important) technical and religious points
    (not easy to distinguish)
  • CORBA Java COM are typical DO technologies
  • Agents are typically SOA (Service Oriented
  • Both involve distributed entities but Web
    Services are more loosely coupled
  • WS interact with messages DO with RPC (Remote
    Procedure Call)
  • DO have factories WS manage instances
    internally and interaction-specific state not
    exposed and hence need not be managed
  • DO have explicit state (statefull services) WS
    use context in the messages to link interactions
    (statefull interactions)
  • Claim DOs do NOT scale WS build on experience
    (with CORBA) and do scale

Details of Web Service Protocol Stack
  • UDDI finds where programs are
  • remote (distributed) programs are just Web
  • (not a great success)
  • WSFL links programs together(under revision as
  • WSDL defines interface (methods, parameters, data
  • SOAP defines structure of message including
    serialization of information
  • HTTP is negotiation/transport protocol
  • TCP/IP is layers 3-4 of OSI
  • Physical Network is layer 1 of OSI

Classic Grid Architecture
Content Access
Middle TierBrokers Service Providers
Middle Tier becomes Web Services
Users and Devices
Grid Services for the Education Process
  • Learning Object XML standards already exist
  • Registration
  • Performance (grading)
  • Authoring of Curriculum
  • Online laboratories for real and virtual
  • Homework submission
  • Quizzes of various types (multiple choice, random
  • Assessment data access and analysis
  • Synchronous Delivery of Curricula including
    Audio/Video Conferencing and other synchronous
    collaborative tools as Web Services
  • Scheduling of courses and mentoring sessions
  • Asynchronous access, data-mining and knowledge
  • Learning Plan agents to guide students and

Grid Learning Model
  • Education and Research Grids share some services
    both for content and process
  • For example collaboration services are largely
  • Research will use much larger simulation engines
    to get high resolution results
  • Maybe a researcher uses a CAVE to visualize
    education a Macintosh
  • But both can share data services but run through
    different filters to select for precision
    (research) or pedagogical value (education)
  • Education has digital textbook frontend to
    resources of the research Grid
  • Both use same workflow technologies to link
    services together

(No Transcript)
Some Observations
  • Traditional Grids manage and share
    asynchronous resources in a rather centralized
  • Peer-to-peer networks are just like Grids with
    different implementations of message-based
    services like registration and look-up
  • Collaboration systems like WebEx/Placeware
    (Application sharing) or Polycom (audio/video
    conferencing) can be viewed as Grids
  • Computers are fast and getting faster. One can
    afford many strategies that used to be
    unrealistic including rich usually XML based
  • Web Services interact with messages
  • Everything (including applications like
    PowerPoint) will be a Web Service?
  • Grids, P2P Networks, Collaborative Environments
    are (will be) managed message-linked Web Services

Peer to Peer Grid
Service FacingWeb Service Interfaces
User FacingWeb Service Interfaces
Peer to Peer Grid
A democratic organization
System and Application Services?
  • There are generic Grid system services security,
    collaboration, persistent storage, universal
  • OGSA (Open Grid Service Architecture) is
    implementing these as extended Web Services
  • An Application Web Service is a capability used
    either by another service or by a user
  • It has input and output ports data is from
    sensors or other services
  • Consider Satellite-based Sensor Operations as a
    Web Service
  • Satellite management (with a web front end)
  • Each tracking station is a service
  • Image Processing is a pipeline of filters which
    can be grouped into different services
  • Data storage is an important system service
  • Big services built hierarchically from basic
  • Portals are the user (web browser) interfaces to
    Web services

Satellite Science Grid Environment
What is Happening?
  • Grid ideas are being developed in (at least) two
  • Web Service W3C, OASIS
  • Grid Forum (High Performance Computing,
  • Service Standards are being debated
  • Grid Operational Infrastructure is being deployed
  • Grid Architecture and core software being
  • Particular System Services are being developed
    centrally OGSA framework for this in
  • Lots of fields are setting domain specific
    standards and building domain specific services
  • There is a lot of hype
  • Grids are viewed differently in different areas
  • Largely computing-on-demand in industry (IBM,
    Oracle, HP, Sun)
  • Largely distributed collaboratories in academia

OGSA OGSI Hosting Environments
  • Start with Web Services in a hosting environment
  • Add OGSI to get a Grid service and a component
  • Add OGSA to get Interoperable Grid correcting
    differences in base platform and adding key

Technical Activities of Note
  • Look at different styles of Grids such as
    Autonomic (Robust Reliable Resilient)
  • New Grid architectures hard due to investment
  • Critical Services Such as
  • Security build message based not connection
  • Notification event services
  • Metadata Use Semantic Web, provenance
  • Databases and repositories instruments, sensors
  • Computing Submit job, scheduling, distributed
    file systems
  • Visualization, Computational Steering
  • Fabric and Service Management
  • Network performance
  • Program the Grid Workflow
  • Access the Grid Portals, Grid Computing

Issues and Types of Grid Services
  • 1) Types of Grid
  • R3
  • Lightweight
  • P2P
  • Federation and Interoperability
  • 2) Core Infrastructure and Hosting Environment
  • Service Management
  • Component Model
  • Service wrapper/Invocation
  • Messaging
  • 3) Security Services
  • Certificate Authority
  • Authentication
  • Authorization
  • Policy
  • 4) Workflow Services and Programming Model
  • Enactment Engines (Runtime)
  • Languages and Programming
  • Compiler
  • 7) Information Grid Services
  • Integration with compute resources
  • P2P and database models
  • 8) Compute/File Grid Services
  • Job Submission
  • Job Planning Scheduling Management
  • Access to Remote Files, Storage and Computers
  • Replica (cache) Management
  • Virtual Data
  • Parallel Computing
  • 9) Other services including
  • Grid Shell
  • Accounting
  • Fabric Management
  • Visualization Data-mining and Computational
  • Collaboration
  • 10) Portals and Problem Solving
  • 11) Network Services

10 Job Status
1 Job Management Service (Grid Service Interface
to user or program client)
2 Schedule and control Execution
8 VirtualData
3 Access to Remote Computers
6 File and Storage Access
7 CacheDataReplicas
5 Data Transfer
Technology Components of (Services in)a
Computing Grid
9 Grid MPI
Application WS
  • Build on e-Science methodology and Grid
  • Science applications with multi-scale models,
    scalable parallelism, data assimilation as key
  • Data-driven models for earthquakes, climate,
    environment ..
  • Use existing code/database technology
    (SQL/Fortran/C) linked to Application Web/OGSA
  • XML specification of models, computational
    steering, scale supported at Web Service level
    as dont need high performance here
  • Allows use of Semantic Grid technology

Why we can dream of using HTTP and that slow stuff
  • We have at least three tiers in computing
  • Client (user portal)
  • Middle Tier (Web Servers/brokers)
  • Back end (databases, files, computers etc.)
  • In Grid programming, we use HTTP (and used to use
    CORBA and Java RMI) in middle tier ONLY to
    manipulate a proxy for real job
  • Proxy holds metadata
  • Control communication in middle tier only uses
  • Real (data transfer) high performance
    communication in back end

  • The Grid could and sometimes does virtualize
    various concepts should do more
  • Location URI (Universal Resource Identifier)
    virtualizes URL (WSAddressing goes further)
  • Replica management (caching) virtualizes file
    location generalized by GriPhyn virtual data
  • Protocol message transport and WSDL bindings
    virtualize transport protocol as a QoS request
  • P2P or Publish-subscribe messaging virtualizes
    matching of source and destination services
  • Semantic Grid virtualizes Knowledge as a
    meta-data query
  • Brokering virtualizes resource allocation
  • Virtualization implies all references can be
    indirect and needs powerful mapping (look-up)
    services -- metadata

Integration of Data and Filters
  • One has the OGSA-DAI Data repository interface
    combined with WSDL of the (Perl, Fortran, Python
    ) filter
  • User only sees WSDL not data syntax
  • Some non-trivial issues as to where the filtering
    compute power is
  • Microsoft says filter next to data

SERVOGrid Complexity Computing Environment
Parallel SimulationService
Sensor Service
Middle Tier with XML Interfaces
XML Meta-dataService
CCE Control Portal Aggregation
OGSA-DAIGrid Services
AnalysisControl Visualize
This Type of Grid integrates with Parallel
computing Multiple HPC facilities but only use
one at a time Many simultaneous data sources and
HPC Simulation
Grid Data Assimilation
Other Gridand Web Services
Distributed Filters massage data For simulation
SERVOGrid (Complexity) Computing Model
Two-level Programming I
  • The paradigm implicitly assumes a two-level
    Programming Model
  • We make a Service (same as a distributed object
    or computer program running on a remote
    computer) using conventional technologies
  • C Java or Fortran Monte Carlo module
  • Data streaming from a sensor or Satellite
  • Specialized (JDBC) database access
  • Such services accept and produce data from users
    files and databases
  • The Grid is built by coordinating such services
    assuming we have solved problem of programming
    the service

Two-level Programming II
  • The Grid is discussing the composition of
    distributed services with the runtime interfaces
    to Grid as opposed to UNIX pipes/data streams
  • Familiar from use of UNIX Shell, PERL or Python
    scripts to produce real applications from core
  • Such interpretative environments are the single
    processor analog of Grid Programming
  • Some projects like GrADS from Rice University are
    looking at integration between service and
    composition levels but dominant effort looks at
    each level separately

  • Grids are inevitable and pervasive
  • Can expect Web Services and Grids to merge with a
    common set of general principles but different
    implementations with different scaling and
    functionality trade-offs
  • e-Science will grow in importance as Science
    grows as an international team sport affects
    scientists and organizations
  • Enough is known that one can start today
  • We will be flooded with data, information and
    purported knowledge
  • One should be learning about Grids understanding
    relevant Web and Grid standards and developing
    new domain specific standards
  • Note many existing (standards) efforts assume
    client-server and not a brokered service model
    these will need to change!
Write a Comment
User Comments (0)