Session D: Tashi - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Session D: Tashi

Description:

Title: Slide 1 Author: Gordy Adsit Last modified by: mryan3 Created Date: 12/10/2008 7:12:35 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 84
Provided by: Gordy7
Category:

less

Transcript and Presenter's Notes

Title: Session D: Tashi


1
Session DTashi
2
Tashi
Michael Ryan Intel
3
Agenda
  • Introduction 8.30-9.00
  • Hadoop 9.00-10.45
  • Break 10.45-11.00
  • Pig 11.00-12.00
  • Lunch 12.00-1.00
  • Tashi 1.00-3.00
  • Break 3.00-3.15
  • PRS 3.15-5.00
  1. Overview
  2. User view
  3. Administration
  4. Installation
  5. Internals
  6. Summary

4
Overview
5
Tashi
  • An infrastructure
  • through which service providers
  • are able to build applications
  • that harness cluster computing resources
  • to efficiently access repositories
  • of Big Data

6
Example Applications
Application Big Data Algorithms Compute Style
Video search Video data Object/gesture identification, face recognition, MapReduce
Internet library search Historic web snapshots Data mining MapReduce
Virtual world analysis Virtual world database Data mining TBD
Earth study Ground model Earthquake simulation, thermal conduction, HPC
Language translation Text corpuses, audio archives, Speech recognition, machine translation, text-to-speech, MapReduce HPC
7
Cluster Computing A Users Perspective
Job-submission spectrum
Tight environment coupling
Loose environment coupling
Runtime-specific (i.e. Hadoop)
Queue-based (i.e. Condor or Torque)
Virtual Machine-based (i.e. EC2 or COD)
8
Tashi System Requirements
  • Provide high-performance execution over Big Data
    repositories
  • ? Many spindles, many CPUs, co-location
  • Enable multiple services to access a repository
    concurrently
  • Enable low-latency scaling of services
  • Enable each service to leverage its own software
    stack
  • ? Virtualization, file-system protections
  • Enable slow resource scaling for growth
  • Enable rapid resource scaling for power/demand
  • ? Scaling-aware storage

9
Tashi High Level Architecture
Remote cluster users
Cluster Mgr
Remote cluster owners
Logical clusters

Distributed storage system(s)
Note Tashi runtime and distributed storage
systems do not necessarily run on the same
physical nodes as the logical clusters
10
Tashi Components
Services are instantiated through virtual
machines
Most decisions happen in the scheduler manages
compute/storage in concert
Scheduler
Data location information is exposed to
scheduler and services
Cluster Manager
Cluster nodes are assumed to be commodity
machines
CM maintains databases and routes
messages decision logic is limited
11
Tashi Operation
The web server converts the query into a parallel
data processing request
answers.opencirrus.net web server running in 1 VM
Acting as a Tashi client, a request for
additional VMs is submitted
Scheduler
Virtualization Service
Storage Service
Cluster Manager
After the data objects are processed, the results
are collected and forwarded to Alice. The VMs
can then be destroyed
12
Why Virtualization?
  • Ease of deployment
  • Boot 100 copies of an operating system in 2
    minutes
  • Cluster lubrication
  • Machines can be migrated or even restarted very
    easily in a different location
  • Overheads are going down
  • Even workloads that tax the virtual memory
    subsystem can now run with a very small overhead
  • I/O intensive workloads have improved
    dramatically, but still have some room for
    improvement

13
User View
14
Tashi in a Nutshell
  • Tashi is primarily a system for managing Virtual
    Machines (VMs)
  • Virtual Machines are software containers that
    provide the illusion of real hardware, enabling
  • Physical resource sharing
  • OS-level isolation
  • Users specification of custom software
    environments
  • Rapid provisioning of services
  • Users will use Tashi to request the creation,
    destruction, and manipulation of VMs

15
Tashi Native Interface
  • Users invoke Tashi actions through a Tashi client
  • The client will have been configured by an
    administrator to communicate with the Tashi
    Cluster Manager
  • Example client actions include
  • tashi createVm
  • tashi destroyVm
  • tashi createMany
  • etc.

16
Tashi AWS-compatibility
  • Tashi also has a client interface that is
    compatible with a subset of Amazon Web Services
  • Parts of the SOAP and QUERY interfaces

17
Tashi AWS-compatibility
Elastic Fox
Client
ec2-api-tools
QUERY
SOAP
VM instance DB
Cluster Manager (CM)
Node Manager DB
18
Tashi Organization
  • Each cluster contains one Tashi Cluster Manager
    (CM)
  • The CM maintains a database of
  • Available physical resources (nodes)
  • Active virtual machines
  • Pending requests for virtual machines
  • Virtual networks
  • Users submit requests to the CM through a Tashi
    Client
  • The Tashi Scheduler uses the CM databases to
    invoke actions, such as VM creation, through the
    CM
  • Each node contains a Node Manager that carries
    out actions, such as invoking the local Virtual
    Machine Manager (VMM), to create a new VM, and
    monitoring the performance of VMs

19
Tashi Software Architecture
Site Specific Plugin(s)
Centralized cluster administration
Cluster Manager (CM)
VM instance DB
Scheduling Agent
DFS Proxy
Client
Client API
Node Manager DB
VM
Ganglia
VM
VM
VM
CM-NM API
Node Manager (NM)
Resource Controller Plugins (VMM, DFS, power,
etc.)
VMM
DFS
Sensor Plugins
Legend
DFS Metadata Server
Tashi component
system software
nmd
iptables /vlan
non-Tashi component
sshd
Compute node
20
Tashi Native Client Interface (I)
  • VM Creation/Destruction Calls (Single Version)
  • createVm --userId ltvaluegt --name ltvaluegt
    --cores ltvaluegt --memory ltvaluegt
    --disks ltvaluegt --nics ltvaluegt --hints
    ltvaluegt
  • destroyVm --instance ltvaluegt
  • shutdownVm --instance ltvaluegt
  • VM Creation/Destruction Calls (Multiple Version)
  • createMany --userId ltvaluegt --basename ltvaluegt
    --cores ltvaluegt --memory ltvaluegt --disks
    ltvaluegt --nics ltvaluegt --hints ltvaluegt
    --count ltvaluegt
  • destroyMany --basename ltvaluegt

21
Creating a VM
  • tashi createVm --name mikes-vm --cores 4
    --memory 1024 --disks hardy.qcow2
  • --name specifies the DNS name to be created
  • --disks specifies the disk image
  • Advanced
  • --nics ltvaluegt
  • --hints ltvaluegt

22
Tashi Instances
  • An instance is a running VM
  • Each disk image may be used for multiple VMs if
    the persistent bit is not set
  • A VM may be booted in persistent mode to make
    modifications without building an entirely new
    disk image

23
getMyInstances Explained
  • tashi getMyInstances
  • This lists all VMs belonging to your userId
  • This is a good way to see what youre currently
    using

24
getVmLayout Explained
  • tashi getVmLayout
  • This command displays the layout of currently
    running VMs across the nodes in the cluster
  • id name state instances
    usedMemory memory usedCores cores
  • --------------------------------------------------
    -------------------------------------
  • 126 r3r2u42 Normal 'bfly3', 'bfly4'
    14000 16070 16 16
  • 127 r3r2u40 Normal 'mpa-00'
    15360 16070 8 16
  • 128 r3r2u38 Normal 'xren1', 'jpan-vm2'
    15480 16070 16 16
  • 129 r3r2u36 Normal 'xren3', 'collab-00'
    14800 16070 16 16
  • 130 r3r2u34 Normal 'collab-02', 'collab-03'
    14000 16070 16 16
  • 131 r3r2u32 Drained
    0 16068 0 16
  • 132 r3r2u30 Normal 'collab-04', 'collab-05'
    14000 16070 16 16
  • 133 r3r2u28 Normal 'collab-06', 'collab-07'
    14000 16070 16 16

25
Tashi Native Client Interface (II)
  • VM Management Calls
  • suspendVm --instance ltvaluegt
  • resumeVm --instance ltvaluegt
  • pauseVm --instance ltvaluegt
  • unpauseVm --instance ltvaluegt
  • migrateVm --instance ltvaluegt --targetHostId
    ltvaluegt
  • vmmSpecificCall --instance ltvaluegt --arg ltvaluegt

26
Tashi Native Client Interface (III)
  • Bookkeeping Calls
  • getMyInstances
  • getInstances
  • getVmLayout
  • getUsers
  • getNetworks
  • getHosts

27
Creating Multiple VMs
  • tashi createMany count 10 --basename mikes-vm
    --cores 4 --memory 1024 --disks hardy.qcow2
  • --name specifies the DNS name to be created
  • --disks specifies the disk image
  • Advanced
  • --nics ltvaluegt
  • --hints ltvaluegt

28
Example cluster Maui/Torque
  • Configure a base disk image from an existing
    Maui/Torque cluster (or setup a new one)
  • Weve done this - amd64-torque_node.qcow2
  • Ask the Cluster Manager (CM) to create ltNgt VMs
    using this image
  • Have one preconfigured to be the scheduler and
    queue manager
  • Or set it up once the VMs have booted
  • Or have a separate image

29
Example cluster Web Service
  • Configure a base image for a web server, and
    whatever other tiers (database, etc) you need for
    your service
  • Variable numbers of each can be created by
    requesting them from the CM
  • Conventional architecture for a web service

30
Example cluster Hadoop
  • Configure a base image including Hadoop
  • Ask the CM to create instances
  • Note Hadoop wants memory
  • Two options
  • Let HDFS reside in the VMs
  • Not ideal for availability/persistence
  • Use HDFS from the hosts
  • Upcoming topic

31
Appliances
  • Not surprisingly, this set of examples makes one
    think of VM appliances
  • Certainly not a new concept
  • Weve built several of these from the software
    configuration of common systems at our site
  • Configuration of old physical nodes
  • Clean images after an OS install (Ubuntu)

32
Where are we today?
  • Tashi can reliably manage virtual machines spread
    across a cluster
  • In production use for over a year
  • Still some opportunities to add features
  • Security
  • Intelligent scheduling
  • Additional opportunities for research
  • Power management
  • Alternative distributed file systems
  • Other

33
Where are we today? (cont)
  • Our deployment of Tashi has managed 500 VMs
    across 150 hosts
  • Primary access mechanism for the Big Data cluster
  • Maui/Torque and Hadoop have been pulled into VMs
    and are running on top of Tashi

34
Tashi Deployment
  • Intel Labs Pittsburgh
  • Tashi is used on the Open Cirrus site at ILP
  • Majority of the cluster
  • Some nodes run Maui/Torque, Hadoop
  • Primary source of computational power for the lab
  • Mix of preexisting batch users, HPC workloads,
    Open Cirrus customers, and others

35
Storage
36
Storing the Data Choices
Model 1 Separate Compute/Storage
Compute and storage can scale independently Many
opportunities for reliability
Model 2 Co-located Compute/Storage
No compute resources are under-utilized Potential
for higher throughput
compute/storage servers
37
How is this done currently?
HPC
Amazon EC2/S3
Fine-grained parallelism
Virtualized compute
Separate Compute/Storage
Task(s)
Storage
Compute
See also Usher, CoD, Eucalyptus, SnowFlock,
Hadoop/Google
Tashi
Coarse-grained parallelism
Co-located Compute/Storage
Multiple Cluster Users
Single Cluster User
38
Example cluster hardware
4/8 Gbps
48 port Gbps switches
30 Servers 2 disks/server
40 Servers 2 disks/server
15 Servers 6 disks/server
39
Far vs Near
  • With co-located compute/storage
  • Near data consumed on node where it is stored
  • Far data consumed across the network
  • System software must enable near access for good
    performance
  • MapReduce provides near access
  • HPC typically provides far access, unless
    function shipping

40
Far vs Near Analysis
  • Far vs Near Methodology
  • Assume I/O bound (scan) application
  • One task per spindle, no CPU load
  • In the far system, data is consumed on a randomly
    selected node
  • In the near system, data is consumed on the node
    where stored
  • Average throughput, no queueing model

Scenario 2 5 Racks _at_ 8 Gbps
41
Far vs Near Access Throughput
396
264
352
8.1x
10.3x
11.3x
5.0x
5.8x
6.0x
2.4x
2.8x
2.8x
42
Storage Service
  • Many options possible
  • HDFS, PVFS, pNFS, Lustre, JBOD, etc.
  • A standard interface is needed to expose location
    information

43
Data Location Service
  • struct blockInfo
  • encodingType type
  • byteRange range
  • listlthostIdgt nodeList
  • listltblockInfogt
  • getBlockInfoByteRange(fileId f, byteRange r)

How do we know which data server is the best?
44
Resource Telemetry Service
  • typedef double metricValue
  • metricValue
  • getMetric(hostId from, hostId to, metricType t)
  • listlt listltmetricValuegt gt
  • getAllMetrics(listlthostIdgt fromList,
  • listlthostIdgt toList, metricType t)

Example metrics include latency, bandwidth,
switch count, fault-tolerance domain,
45
Putting the Pieces Together
Data Location Service
LA application
LA application
LA runtime
Resource Telemetry Service
LA runtime
Virtual Machines
DFS
DFS
Guest OS
OS
DFS
VMM
VM Runtime
OS
(a) non-virtualized
(b) virtualized
46
DFS Performance
47
Administration
48
Key Configuration Options
  • Tashi uses a series of configuration files
  • TashiDefaults.cfg is the most basic and is
    included in the source tree
  • Tashi.cfg overrides this for site specific
    settings
  • Agent.cfg, NodeManager.cfg, ClusterManager.cfg,
    and Client.cfg override those setting based on
    which app is launched
  • Files in /.tashi/ override everything else

49
Key Configuration Options (CM hostname)
  • You need to set the hostname used for the CM by
    Node Managers
  • Some example settings are listed below
  • Tashi.cfg
  • Client
  • clusterManagerHost merkabah
  • NodeManagerService
  • clusterManagerHost merkabah

50
Key Configuration Options (VFS)
  • You need to set the directory that serves disk
    images
  • Were using NFS for this at the moment
  • Some example settings are listed below
  • Tashi.cfg
  • Vfs
  • prefix /mnt/merkabah/tashi/

51
Key Configuration Options (DHCP/DNS)
  • If you want Tashi to manage DHCP and DNS, you
    need to have servers that are dynamically
    configurable
  • Some example settings are listed below
  • Agent.cfg
  • DhcpDns
  • dnsKeyFile /root/cluster-admin/scripts/Kmerkabah
    .15736480.private
  • dnsServer 172.16.0.5 53
  • dnsDomain bigdata.research.intel-research.net
  • dnsExpire 60
  • dhcpServer 172.16.0.5
  • dhcpKeyName merkabah
  • dhcpSecretKey ABcdEf12GhIJKLmnOpQrsT
  • ipRange300 172.17.10.1-172.17.10.254
  • ipRange310 172.17.20.1-172.17.20.254
  • ipRange999 172.16.192.1-172.16.255.254
  • ipRange1001 172.16.1.10-172.16.1.19
  • reverseDns True

52
Intel BigData Cluster - Networking
External IPs
NAT ports 80 and 443 inbound
SSH gateway
Firewall
Firewall
DMZ VLAN
Tashi Client
Tashi CM
Private VLAN
Tashi VM SSH NAT
53
Key Configuration Options (LDAP)
  • Tashi can use LDAP to perform user lookups
  • Some example settings are listed below
  • Tashi.cfg
  • ClusterManager
  • data tashi.clustermanager.data.LdapOverride
  • LdapOverride
  • ldapCommand ldapsearch -x -w AbCdefGH -h
    10.212.3.3 -b ouCCT,dcresearch,dcintel-research
    ,dcnet -D cncctldapsearch,cnUsers,dcresearch,d
    cintel-research,dcnet msSFU30LoginShell -z 0

54
Key Configuration Options (syslog)
  • Tashi can send all log messages to syslog
  • Some example settings are listed below
  • Tashi.cfg
  • handlers
  • keys consoleHandler,fileHandler,syslogHandler
  • logger_root
  • handlers consoleHandler,fileHandler,syslogHandle
    r
  • handler_syslogHandler
  • args (/dev/log,18)
  • 18 is the syslog facility

55
Key Configuration Options (packing)
  • The primitive scheduler can either dense pack VMs
    or spread them out
  • Some example settings are listed below
  • Tashi.cfg
  • Primitive
  • densePack True

56
Key Configuration Options (Maui)
  • Tashi can be scheduled with Maui
  • Maui must be compiled with the Wiki interface
  • Some example settings are listed below
  • Tashi.cfg
  • MauiWiki
  • authuser mryan3
  • authkey 12345
  • It is necessary to run ./src/tashi/agents/mauiwiki
    .py and maui directly in order to launch the
    system

57
Key Configuration Options (General CM)
  • You may want to adjust several CM settings
  • The amount of time things are allowed to be
    decayed
  • Whether NMs with version mismatches are allowed
  • Necessary if you plan to live-upgrade between
    versions
  • Maximum amount of memory and number of cores that
    can be allocated to a single VM
  • Some example settings are listed below
  • Tashi.cfg
  • ClusterManagerService
  • allowDecayed 60.0
  • allowMismatchedVersions True
  • maxMemory 15872
  • maxCores 16

58
Key Configuration Options (General NM)
  • You may want to adjust several NM settings
  • The frequency of collecting stats
  • The amount of time migration is given before
    timing out
  • Some example settings are listed below
  • Tashi.cfg
  • NodeManagerService
  • statsInterval 15.0
  • Qemu
  • monitorTimeout 3600.0
  • migrateTimeout 3600.0
  • statsInterval 15.0

59
Managing the Resources
  • If using MySQL, adding nodes is fairly
    straight-forward
  • If using Pickle, the easiest way to add nodes is
    to use the debug console
  • See http//incubator.apache.org/tashi/documentatio
    n-single.html
  • By defaults, the list of users can is fetched
    through getent
  • LDAP can also be used
  • The debug console can be used if both of these
    mechanisms are turned off

60
Key Cluster Components
  • DHCP and DNS servers
  • Servers that can accept dynamic updates using
    omshell and nsupdate
  • Example ISC DHCPD and bind9
  • See http//incubator.apache.org/tashi/documentatio
    n-cluster.html
  • Disk images
  • Disk images need to be available in the same
    location on all the machines
  • NFS can be used for this
  • Ive seen people use GlusterFS as well

61
Interpreting the Tashi Logs
  • VM startup
  • When a VM goes to the held state and fails to
    initialize, look for the QEMU command log
    message
  • DHCP/DNS
  • Look for Adding lthostnamegt to DHCP or DNS
  • Heartbeats
  • Messages about Host lthostnamegt has expired after
    1234 seconds are related to the NM being unable
    to send a heartbeat for a certain time window
  • Messages about State reported as Running instead
    of Orphaned for instance 1234 on host lthostnamegt
    are related to an NM recovering after failing to
    send heartbeats
  • Ive found that this occurs under some situations
    of extreme network saturation or disk utilization

62
Installation
63
Installing Tashi
  • Setup
  • Checkout code
  • Configure for your site
  • Get/create a disk image
  • Starting Tashi
  • Cluster manager
  • Node manager
  • Scheduler
  • Launching a VM
  • getHosts
  • createVm
  • Try logging in
  • destroyVm

64
Installing Tashi Details (1)
  • Install Qemu/KVM, Thrift, and rpyc
  • svn co http//svn.apache.org/repos/asf/incubator/t
    ashi/trunk ./tashi
  • cd tashi
  • make
  • mkdir p /var/tmp/images
  • ltOptionally create Tashi.cfggt

65
Installing Tashi Details (2)
  • ltOptionally create Tashi.cfggt
  • This config file overrides the defaults in
    TashiDefaults.cfg
  • You can switch to the MySQL backend or change
    other configuration parameters
  • Follow CM setup instructions on
    http//incubator.apache.org/tashi/documentation-si
    ngle.html

66
Installing Tashi Details (3)
  • ltcreate /var/tmp/images/i386-hardy.qcowgt
  • export PYTHONPATHpwd/src
  • screen
  • 0 DEBUG1 ./bin/clustermanager.py
  • 1 DEBUG1 ./bin/nodemanager.py
  • 2 python ./bin/primitive.py
  • 3 ./bin/tashi-client.py getHosts
  • 3 ./bin/tashi-client.py createVm name foobar
    disks i386-hardy.qcow memory 1024 cores 1
  • 3 ./bin/tashi-client.py getInstances
  • 3 ps aux grep ltvmIdgt
  • 3 ssh mryan3_at_172.20.0.15 ltor other known
    username/IPgt
  • 3 ./bin/tashi-client.py vmmspecificcall
    instance foobar arg startvnc
  • 3 ./bin/tashi-client.py destroyVm instance
    foobar

67
Tashi on OpenCirrus
Exclusive PRSs Systems-level research and
services
  • One shared PRS runs a VM-based cloud system,
    managed by the cluster, and shared by many users
  • Users who dont need access to the underlying
    hardware
  • Remaining exclusive PRSs are managed and owned
    by individual research groups

Shared PRS App-level research
Open service research
Apps running in a VM mgmt infrastructure (e.g.,
Tashi, Eucalyptus)
Tashi development
Production storage service
Proprietary service research
Open workload monitoring and trace collection
68
Internals
69
Software Architecture
  • Cluster Manager
  • Keeps global system state, accepts user requests,
    listens for heartbeats from nodes
  • Node Manager
  • Manages virtual machines, reports system state,
    sends heartbeats to CM
  • Agents
  • Monitor/manage the system
  • Scheduler is an example
  • Used to keep the CM/NM separate from
    site-specific logic and potentially complicated
    (read unreliable) scheduling logic
  • RPCs
  • Using RPyC, it has some nice features
  • Handles exceptions, object serialization

70
Cluster Manager Concepts
  • The CM main responsibilities
  • Virtual cluster owner authentication (RPyC)
  • Cluster state maintenance
  • The CM maintains databases of all pertinent
    cluster state, but actuation is typically the
    purview of some other component. The intent is
    to make the CM as robust as possible.
  • E.g. After a client submits a createVm()
    request, the CM only records that fact in the VM
    instance database. The scheduling agent is
    responsible for polling that database, deciding
    where/when to activate the VM, and issuing the
    actuation commands.

71
Node Manager Concepts
  • The main responsibilities of the node manager
    (NM) are
  • Manage the creation, monitoring, and destruction
    of virtual machines on a particular compute node
  • Monitor the physical resources of the compute
    node (CPU load, memory, etc)
  • Monitor the virtual resources of each VM
  • The NM registers itself with the CM upon
    initialization
  • And sends heartbeats that have state information
    piggy-backed on them
  • Each compute node is also expected to host sshd
    for emergency manual administration
  • Compute nodes may also use facilities such as
    iptables to protect the cluster from untrusted
    VMs
  • Weve focused on using VLANs instead, but this
    approach should work as well
  • vmmspecificcall is a mechanism to pass options
    and information directly to the VMM
  • (Eg. startVnc allows the VM to expose the guests
    virtual console over VNC when using Qemu/KVM)

72
Agent/Scheduler Concepts
  • The main responsibilities for the Agent/Scheduler
    are
  • Using activateVm to move VMs from the pending
    queue to an actual physical machine
  • Make any higher-level decisions about system
    management
  • Perform site-specific initialization for VMs
  • Site-specific plugins to the scheduling agent are
    responsible for taking appropriate actions during
    VM creation
  • E.g. Before using activateVm(), it could enable
    site-specific plugins to prepare entries in the
    appropriate DHCP/DNS servers
  • Similarly, such hooks could enable VM image
    composition prior to activation
  • The DFS proxy(ies) enables the scheduling agent
    to query the DFS metadata server(s) to determine
    the locations of data needed by virtual machines,
    if specified
  • This information could inform the placement of
    virtual machines

73
Cluster Manager Service
  • // Client-facing RPCs
  • instanceId createVm()
  • void shutdownVm(instanceId)
  • void destroyVm(instanceId)
  • void suspendVm(instanceId)
  • instanceId resumeVm()
  • listltInstancegt getInstances()
  • listltHostgt getHosts()
  • listltNetworkgt getNetworks()
  • listltUsergt getUsers()
  • // NodeManager-facing RPCs
  • i32 registerNodeManager(host, listltinstanceIdgt)
  • void vmStateChange(instanceId, oldState,
    newState)
  • // Agent-facing RPCs
  • void activateVm(instanceId, host)
  • void migrateVm(instanceId, host)
  • void pauseVm(instanceId)
  • void unpauseVm(instanceId)

74
Node Manager Service
  • // ClusterManager-facing RPCs
  • i32 instantiateVm(1Instance instance)
  • listlti32gt listVms()
  • Instance getVmInfo(1i32 vmId)
  • void shutdownVm(1i32 vmId)
  • void destroyVm(1i32 vmId) throws
    (1TashiException e)
  • void suspendVm(1i32 vmId, 2string destination,
    3string suspendCookie)
  • ResumeVmRes resumeVm(1Instance instance,
    2string source)
  • string prepReceiveVm(1Instance instance, 2Host
    source)
  • void migrateVm(1i32 vmId, 2Host target,
    3string transportCookie)
  • i32 receiveVm(1Instance instance, 2string
    transportCookie)
  • void pauseVm(1i32 vmId) void unpauseVm(1i32
    vmId)
  • string vmmSpecificCall(1i32 vmId, 2string arg)

75
VM State Diagram As maintained by VM instance DB
VMcreate()
activating
pending
unpausing
VMresume()
resuming
running
paused
Dark green states are static, and blue are
transitional Errors typically occur while a VM
is in a transitional state A VM can be put in
the destroying state from any previous state
pausing
suspending
shutting down
destroying
exited
76
Summary
77
Apache Incubator
  • Environment in which development is taking place
  • Will ultimately determine the success of Tashi as
    a software artifact
  • URLs
  • http//incubator.apache.org/projects/tashi
  • Incubation status
  • http//incubator.apache.org/tashi/
  • Project webpage

78
Apache Incubator Process
  • Mentors
  • These people have volunteered to help guide us
    through the incubation process
  • Craig Russell
  • Matthieu Riou
  • Paul Freemantle
  • Committers
  • These are the people with commit permissions for
    the source repository
  • Michael Ryan (Intel)
  • Mike Stroucken (CMU)
  • Patches
  • Other contributors have to submit changes as
    patches on tashi-dev_at_incubator.apache.org
  • Until they become committers
  • Its also possible to open JIRA issues
  • http//issues.apache.org/jira/browse/tashi

79
Apache Incubator Process
  • Go to the incubation status webpage to subscribe
    to the mailing lists
  • Discussion around future features and code will
    be on tashi-dev_at_i.a.o
  • What follows is the list of known
    issues/development goals we had going into the
    Incubator that still remain

80
Known Issues (1)
  • Priority 1 DFS interactions
  • Were currently looking at using HDFS on the
    hosts and exposing the locality information to
    various parts of the Map/Reduce runtime in Hadoop
    to provide similar behavior/performance
    characteristics to physical operations.
  • Additional work is taking place in looking at
    other file systems (GlusterFS, self-, and
    others)
  • Priority 2 Physical boot
  • Our current proposed approach to this is to
    network isolate any physical machines given to
    users. This allows us to insert a narrow access
    path (single VM) to limit traffic coming out of
    the virtual cluster.
  • Plan Use current PXE booting schemes to boot
    VLAN isolated clusters and work toward creating
    an automatic conversion mechanism for Tashi disk
    images
  • Priority 3 Site-specific plugins
  • The current DHCP/DNS plugin has been suitable for
    at least two very different sites.
  • Plan Continue to solicit feedback from partners
    to determine for which steps in VM
    creation/activation customization is critical

81
Known Issues (2)
  • Priority 4 VM scheduling model
  • Tashi does not currently have a well-integrated
    scheduler that supports VM priorities, quotas,
    billing, etc. We have, however, implemented a
    bridge to use Maui as a scheduler for the
    project.
  • Plan Implement features on as needed basis
    look to leverage existing scheduling mechansims
    (Eg. Maui)
  • Priority 5 Multi-VM job control
  • The current scheduling agent activates VMs one at
    a time. A transactional mechanism needs to be
    added that only starts a VM group if there is
    room to accommodate the entire group and enables
    easy tear-down if any portion of the group fails
  • Plan Use Maui if possible to perform this
    operation

82
Tashi Software Architecture
Site Specific Plugin(s)
Centralized cluster administration
Cluster Manager (CM)
VM instance DB
Scheduling Agent
DFS Proxy
Client
Client API
Node Manager DB
VM
Ganglia
VM
VM
VM
CM-NM API
Node Manager (NM)
Resource Controller Plugins (VMM, DFS, power,
etc.)
VMM
DFS
Sensor Plugins
Legend
DFS Metadata Server
Tashi component
system software
nmd
iptables /vlan
non-Tashi component
sshd
Compute node
83
Demo
Write a Comment
User Comments (0)
About PowerShow.com