The%20Performance%20of%20Bags-Of-Tasks%20in%20Large-Scale%20Distributed%20Computing%20Systems - PowerPoint PPT Presentation

About This Presentation
Title:

The%20Performance%20of%20Bags-Of-Tasks%20in%20Large-Scale%20Distributed%20Computing%20Systems

Description:

Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and Dick Epema. ACM/IEEE Int'l. Symposium on High ... S-U-GRR: Select the User Round-Robin/all tasks for this user. ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 25
Provided by: EPE28
Category:

less

Transcript and Presenter's Notes

Title: The%20Performance%20of%20Bags-Of-Tasks%20in%20Large-Scale%20Distributed%20Computing%20Systems


1
The Performance of Bags-Of-Tasks in Large-Scale
Distributed Computing Systems
Alexandru Iosup, Ozan Sonmez, Shanny Anoep, and
Dick Epema
Parallel and Distributed Systems Group, TU Delft
ACM/IEEE Intl. Symposium on High Performance
Distributed Computing
2
The VL-e project
Natural gas price ? for grid computing
  • A grid project in the Netherlands (2004-)
  • Natural gas money VL-e 45 MEuro / 800 MEuro
    total research package
  • Overall aim
  • to design and build a virtual lab for
    (digitally) enhanced science (e-science)
    experiments (no in-vivo or in-vitro, but
    in-silico experiments).
  • Goals
  • create prototypes of application-specific
    e-science environments
  • design and develop re-usable ICT/grid components
  • validate with real-life applications in testbeds

3
The VL-e project application areas
Philips
Unilever
IBM
Data Intensive Science
Medical Diagnosis Imaging
Bio- Diversity
Bio- Informatics
Food Informatics
Dutch Telescience
Virtual Laboratory (VL) Application Oriented
Services
Management of comm. computing
4
The VL-e project application areas
Philips
Unilever
IBM
Data Intensive Science
Bags-of-Tasks
Medical Diagnosis Imaging
Bio- Diversity
Bio- Informatics
Food Informatics
Dutch Telescience
Virtual Laboratory (VL) Application Oriented
Services
Management of comm. computing
5
The VL-e project application areas
Philips
Unilever
IBM
Data Intensive Science
Medical Diagnosis Imaging
Bio- Diversity
Bio- Informatics
Food Informatics
Dutch Telescience
Bags-of-Tasks
Virtual Laboratory (VL) Application Oriented
Services
Management of comm. computing
6
The Challenge
  • Complete scientific work better,
  • User-oriented performance metrics(time a
    critical performance component)
  • Bags-of-tasks for ease-of-use
  • in real systems
  • Workloads (now that real traces are available)
  • Information unavailability
  • What to do?
  • Hint the next 10 improvement wont cut it!

7
The Challenge (contd.)
  • System modelWhat is a good model for the study
    of large-scale distributed computing systems that
    run bag-of-tasks?
  • Input modelWhat is a good model for bag-of-tasks
    workloads in large-scale distributed computing
    systems?
  • What is the best setup for such system/input?
  • How to find the best?
  • If a best is found, can there be another?

8
The Performance of Bags-of-Tasks in Large-Scale
Distributed Computing Systems
  1. Introduction and Motivation
  2. Context System Model
  3. Workload Model
  4. Design Space Exploration
  5. Conclusion

9
Context System Model 1/4Overview
  • System Model
  • Clustersexecute jobs
  • Resource managerscoordinate job execution
  • Resource management architecturesroute jobs
    among resource managers
  • Task selection policiescreate the eligible set
  • Task scheduling policiesschedule the eligible
    set

10
Context System Model 2/4Resource Management
Architecturesroute jobs among resource managers
11
Context System Model 3/4Task Selection
Policiescreate the eligible set
  • Age-based
  • S-T Select Tasks in the order of their arrival.
  • S-BoT Select BoTs in the order of their arrival.
  • User priority based
  • S-U-Prio Select the tasks of the User with the
    highest Priority.
  • Based on fairness in resource consumption
  • S-U-T Select the Tasks of the User with the
    lowest res. cons.
  • S-U-BoT Select the BoTs of the User with the
    lowest res. cons.
  • S-U-GRR Select the User Round-Robin/all tasks
    for this user.
  • S-U-RR Select the User Round-Robin/one task for
    this user.

12
Context System Model 4/4Task Scheduling
Policiesschedule the eligible set
  • Information availability
  • Known
  • Unknown
  • Historical records
  • Sample policies
  • Earliest Completion Time (with Prediction of
    Runtimes) (ECT(-P))
  • Fastest Processor First (FPF)
  • (Dynamic) Fastest Processor Largest Task
    ((D)FPLT)
  • Shortest Task First w/ Replication (STFR)
  • Work Queue w/ Replication (WQR)

13
The Performance of Bags-of-Tasks in Large-Scale
Distributed Computing Systems
  1. Introduction and Motivation
  2. Context System Model
  3. Workload Model
  4. Design Space Exploration
  5. Conclusion

14
Workload Modeling 101 What Matters
  • Job arrival process job service time
  • Self-similarity (burstiness) vs. Poisson Leland
    Ott ToN94
  • Job grouping bags-of-tasks dominant application
    type in multi-cluster grids and cycle-scavenging
    systems (the e-Science infrastructure) IosupJSE
    EuroPar07
  • Job size almost always 1 CPU IosupDELW Grid06

TimeUnit100s
Longer queues
No.Packets/Time Unit
TimeUnit0.01s
No.Packets/Time Unit
Time Units
Time Units
15
A Bag-of-Tasks Workload Model
  • Model
  • Users, Bags-of-Tasks, Tasks
  • Heavy-tailed distributions for inter-arrival
    time, job service time? can model self-similar
    workloads
  • More details (e.g., parameter values) see
    article
  • Validation data the Grid Workloads Archive
  • 7 long-term grid traces
  • gt5 million tasks
  • gt2500 users
  • gt40k CPUs
  • Domains HEP, graphics, AI, math, biomed,
    climate, finance, aero

http//gwa.ewi.tudelft.nl/
16
The Performance of Bags-of-Tasks in Large-Scale
Distributed Computing Systems
  1. Introduction and Motivation
  2. Context System Model
  3. Workload Model
  4. Design Space Exploration
  5. Conclusion

17
Design Space Exploration 1/5Overview
  • Design space exploration time to understand how
    our solutions fit into the complete system.
  • Study the impact of
  • The Task Scheduling Policy (s policies)
  • The Workload Characteristics (P characteristics)
  • The Dynamic System Information (I levels)
  • The Task Selection Policy (S policies)
  • The Resource Management Architecture (A policies)

s x 7P x I x S x A x (environment) ? gt2M design
points
18
Design Space Exploration 2/5Experimental Setup
  • Simulator
  • DGSim IosupETFL SC07, IosupSE EuroPar08
  • System
  • DAS Grid5000 Cappello Bal CCGrid07
  • gt3,000 CPUs relative perf. 1-1.75
  • Metrics
  • Makespan
  • Normalized Schedule Length speed-up
  • Workloads
  • Real DAS Grid5000
  • Realistic system load 20-95 (from workload
    model)

19
Design Space Exploration 3/5 Selected Results
ADesign Guidelines for Scheduling Policies
  • Influence of the information type
  • (K,K) best balance between MS and NSL
  • (,U),(U,) surprisingly good (FPF) to
    surprisingly poor (WQR4x)
  • (,H),(H,) poor. Simple runtime predictors
    dont work (see article)
  • Where to invest time?
  • K -gt H, K-gt U adapt for information type with
    lowest variation

WQR4x
FPF
20
Design Space Exploration 4/5 Selected Results B
Task Selection Only for Busy Systems
  • Not much difference until system load over 50.
  • For DAS Grid5000 no change of task selection
    policy.

S-BoT
Same performance
S-T
21
Design Space Exploration 5/5 Selected Results C
Resource Management Architecture
  • Centralized, separated, or distributed?
  • Centralized is best Note job overhead not
    considered.
  • Distributed good for system load below 50
    over 50 it does not finish all
    tasks.

22
The Performance of Bags-of-Tasks in Large-Scale
Distributed Computing Systems
  1. Introduction and Motivation
  2. Context System Model
  3. Workload Model
  4. Design Space Exploration
  5. Conclusion

23
Conclusion
System Model Resource Management Architecture
Task Selection
Policy Task
Scheduling Policy Information availability
framework BoT workload model Design space
exploration the performance of bags-of-tasks
?
Future Work
  • Better predictors
  • (H,H) task scheduling policies

24
Thank you! Questions? Remarks? Observations?
  • Contact A.Iosup_at_gmail.com google Iosup
  • Web sites
  • http//www.vl-e.nl VL-e project
  • http//www.pds.ewi.tudelft.nl PDS group
    articles software
  • Help building the Grid Workloads
    Archivehttp//gwa.ewi.tudelft.nl

25
What About Other Workloads?
  • (High Performance vs. High Throughput
    Computing)Parallel jobs vs. bags-of-tasksWorkflo
    ws
  • We need your traces!We work blindly without
    them.
  • For parallel jobs, the architecture counts much
    more IosETFL SC07
  • For workflows, we dont know much about
    performance.
Write a Comment
User Comments (0)
About PowerShow.com