Title: Session D: Tashi
1Session DTashi
2Tashi
Michael Ryan Intel
3Agenda
- Introduction 8.30-9.00
- Hadoop 9.00-10.45
- Break 10.45-11.00
- Pig 11.00-12.00
- Lunch 12.00-1.00
- Tashi 1.00-3.00
- Break 3.00-3.15
- PRS 3.15-5.00
- Overview
- User view
- Administration
- Installation
- Internals
- Summary
4Overview
5Tashi
- An infrastructure
- through which service providers
- are able to build applications
- that harness cluster computing resources
- to efficiently access repositories
- of Big Data
6Example Applications
Application Big Data Algorithms Compute Style
Video search Video data Object/gesture identification, face recognition, MapReduce
Internet library search Historic web snapshots Data mining MapReduce
Virtual world analysis Virtual world database Data mining TBD
Earth study Ground model Earthquake simulation, thermal conduction, HPC
Language translation Text corpuses, audio archives, Speech recognition, machine translation, text-to-speech, MapReduce HPC
7Cluster Computing A Users Perspective
Job-submission spectrum
Tight environment coupling
Loose environment coupling
Runtime-specific (i.e. Hadoop)
Queue-based (i.e. Condor or Torque)
Virtual Machine-based (i.e. EC2 or COD)
8Tashi System Requirements
- Provide high-performance execution over Big Data
repositories - ? Many spindles, many CPUs, co-location
- Enable multiple services to access a repository
concurrently - Enable low-latency scaling of services
- Enable each service to leverage its own software
stack - ? Virtualization, file-system protections
- Enable slow resource scaling for growth
- Enable rapid resource scaling for power/demand
- ? Scaling-aware storage
9Tashi High Level Architecture
Remote cluster users
Cluster Mgr
Remote cluster owners
Logical clusters
Distributed storage system(s)
Note Tashi runtime and distributed storage
systems do not necessarily run on the same
physical nodes as the logical clusters
10Tashi Components
Services are instantiated through virtual
machines
Most decisions happen in the scheduler manages
compute/storage in concert
Scheduler
Data location information is exposed to
scheduler and services
Cluster Manager
Cluster nodes are assumed to be commodity
machines
CM maintains databases and routes
messages decision logic is limited
11Tashi Operation
The web server converts the query into a parallel
data processing request
answers.opencirrus.net web server running in 1 VM
Acting as a Tashi client, a request for
additional VMs is submitted
Scheduler
Virtualization Service
Storage Service
Cluster Manager
After the data objects are processed, the results
are collected and forwarded to Alice. The VMs
can then be destroyed
12Why Virtualization?
- Ease of deployment
- Boot 100 copies of an operating system in 2
minutes - Cluster lubrication
- Machines can be migrated or even restarted very
easily in a different location - Overheads are going down
- Even workloads that tax the virtual memory
subsystem can now run with a very small overhead - I/O intensive workloads have improved
dramatically, but still have some room for
improvement
13User View
14Tashi in a Nutshell
- Tashi is primarily a system for managing Virtual
Machines (VMs) - Virtual Machines are software containers that
provide the illusion of real hardware, enabling - Physical resource sharing
- OS-level isolation
- Users specification of custom software
environments - Rapid provisioning of services
- Users will use Tashi to request the creation,
destruction, and manipulation of VMs
15Tashi Native Interface
- Users invoke Tashi actions through a Tashi client
- The client will have been configured by an
administrator to communicate with the Tashi
Cluster Manager - Example client actions include
- tashi createVm
- tashi destroyVm
- tashi createMany
- etc.
16Tashi AWS-compatibility
- Tashi also has a client interface that is
compatible with a subset of Amazon Web Services - Parts of the SOAP and QUERY interfaces
17Tashi AWS-compatibility
Elastic Fox
Client
ec2-api-tools
QUERY
SOAP
VM instance DB
Cluster Manager (CM)
Node Manager DB
18Tashi Organization
- Each cluster contains one Tashi Cluster Manager
(CM) - The CM maintains a database of
- Available physical resources (nodes)
- Active virtual machines
- Pending requests for virtual machines
- Virtual networks
- Users submit requests to the CM through a Tashi
Client - The Tashi Scheduler uses the CM databases to
invoke actions, such as VM creation, through the
CM - Each node contains a Node Manager that carries
out actions, such as invoking the local Virtual
Machine Manager (VMM), to create a new VM, and
monitoring the performance of VMs
19Tashi Software Architecture
Site Specific Plugin(s)
Centralized cluster administration
Cluster Manager (CM)
VM instance DB
Scheduling Agent
DFS Proxy
Client
Client API
Node Manager DB
VM
Ganglia
VM
VM
VM
CM-NM API
Node Manager (NM)
Resource Controller Plugins (VMM, DFS, power,
etc.)
VMM
DFS
Sensor Plugins
Legend
DFS Metadata Server
Tashi component
system software
nmd
iptables /vlan
non-Tashi component
sshd
Compute node
20Tashi Native Client Interface (I)
- VM Creation/Destruction Calls (Single Version)
- createVm --userId ltvaluegt --name ltvaluegt
--cores ltvaluegt --memory ltvaluegt
--disks ltvaluegt --nics ltvaluegt --hints
ltvaluegt - destroyVm --instance ltvaluegt
- shutdownVm --instance ltvaluegt
- VM Creation/Destruction Calls (Multiple Version)
- createMany --userId ltvaluegt --basename ltvaluegt
--cores ltvaluegt --memory ltvaluegt --disks
ltvaluegt --nics ltvaluegt --hints ltvaluegt
--count ltvaluegt - destroyMany --basename ltvaluegt
21Creating a VM
- tashi createVm --name mikes-vm --cores 4
--memory 1024 --disks hardy.qcow2 - --name specifies the DNS name to be created
- --disks specifies the disk image
- Advanced
- --nics ltvaluegt
- --hints ltvaluegt
22Tashi Instances
- An instance is a running VM
- Each disk image may be used for multiple VMs if
the persistent bit is not set - A VM may be booted in persistent mode to make
modifications without building an entirely new
disk image
23getMyInstances Explained
- tashi getMyInstances
- This lists all VMs belonging to your userId
- This is a good way to see what youre currently
using
24getVmLayout Explained
- tashi getVmLayout
- This command displays the layout of currently
running VMs across the nodes in the cluster - id name state instances
usedMemory memory usedCores cores - --------------------------------------------------
------------------------------------- - 126 r3r2u42 Normal 'bfly3', 'bfly4'
14000 16070 16 16 - 127 r3r2u40 Normal 'mpa-00'
15360 16070 8 16 - 128 r3r2u38 Normal 'xren1', 'jpan-vm2'
15480 16070 16 16 - 129 r3r2u36 Normal 'xren3', 'collab-00'
14800 16070 16 16 - 130 r3r2u34 Normal 'collab-02', 'collab-03'
14000 16070 16 16 - 131 r3r2u32 Drained
0 16068 0 16 - 132 r3r2u30 Normal 'collab-04', 'collab-05'
14000 16070 16 16 - 133 r3r2u28 Normal 'collab-06', 'collab-07'
14000 16070 16 16
25Tashi Native Client Interface (II)
- VM Management Calls
- suspendVm --instance ltvaluegt
- resumeVm --instance ltvaluegt
- pauseVm --instance ltvaluegt
- unpauseVm --instance ltvaluegt
- migrateVm --instance ltvaluegt --targetHostId
ltvaluegt - vmmSpecificCall --instance ltvaluegt --arg ltvaluegt
26Tashi Native Client Interface (III)
- Bookkeeping Calls
- getMyInstances
- getInstances
- getVmLayout
- getUsers
- getNetworks
- getHosts
27Creating Multiple VMs
- tashi createMany count 10 --basename mikes-vm
--cores 4 --memory 1024 --disks hardy.qcow2 - --name specifies the DNS name to be created
- --disks specifies the disk image
- Advanced
- --nics ltvaluegt
- --hints ltvaluegt
28Example cluster Maui/Torque
- Configure a base disk image from an existing
Maui/Torque cluster (or setup a new one) - Weve done this - amd64-torque_node.qcow2
- Ask the Cluster Manager (CM) to create ltNgt VMs
using this image - Have one preconfigured to be the scheduler and
queue manager - Or set it up once the VMs have booted
- Or have a separate image
29Example cluster Web Service
- Configure a base image for a web server, and
whatever other tiers (database, etc) you need for
your service - Variable numbers of each can be created by
requesting them from the CM - Conventional architecture for a web service
30Example cluster Hadoop
- Configure a base image including Hadoop
- Ask the CM to create instances
- Note Hadoop wants memory
- Two options
- Let HDFS reside in the VMs
- Not ideal for availability/persistence
- Use HDFS from the hosts
- Upcoming topic
31Appliances
- Not surprisingly, this set of examples makes one
think of VM appliances - Certainly not a new concept
- Weve built several of these from the software
configuration of common systems at our site - Configuration of old physical nodes
- Clean images after an OS install (Ubuntu)
32Where are we today?
- Tashi can reliably manage virtual machines spread
across a cluster - In production use for over a year
- Still some opportunities to add features
- Security
- Intelligent scheduling
- Additional opportunities for research
- Power management
- Alternative distributed file systems
- Other
33Where are we today? (cont)
- Our deployment of Tashi has managed 500 VMs
across 150 hosts - Primary access mechanism for the Big Data cluster
- Maui/Torque and Hadoop have been pulled into VMs
and are running on top of Tashi
34Tashi Deployment
- Intel Labs Pittsburgh
- Tashi is used on the Open Cirrus site at ILP
- Majority of the cluster
- Some nodes run Maui/Torque, Hadoop
- Primary source of computational power for the lab
- Mix of preexisting batch users, HPC workloads,
Open Cirrus customers, and others
35Storage
36Storing the Data Choices
Model 1 Separate Compute/Storage
Compute and storage can scale independently Many
opportunities for reliability
Model 2 Co-located Compute/Storage
No compute resources are under-utilized Potential
for higher throughput
compute/storage servers
37How is this done currently?
HPC
Amazon EC2/S3
Fine-grained parallelism
Virtualized compute
Separate Compute/Storage
Task(s)
Storage
Compute
See also Usher, CoD, Eucalyptus, SnowFlock,
Hadoop/Google
Tashi
Coarse-grained parallelism
Co-located Compute/Storage
Multiple Cluster Users
Single Cluster User
38Example cluster hardware
4/8 Gbps
48 port Gbps switches
30 Servers 2 disks/server
40 Servers 2 disks/server
15 Servers 6 disks/server
39Far vs Near
- With co-located compute/storage
- Near data consumed on node where it is stored
- Far data consumed across the network
- System software must enable near access for good
performance - MapReduce provides near access
- HPC typically provides far access, unless
function shipping
40Far vs Near Analysis
- Far vs Near Methodology
- Assume I/O bound (scan) application
- One task per spindle, no CPU load
- In the far system, data is consumed on a randomly
selected node - In the near system, data is consumed on the node
where stored - Average throughput, no queueing model
Scenario 2 5 Racks _at_ 8 Gbps
41Far vs Near Access Throughput
396
264
352
8.1x
10.3x
11.3x
5.0x
5.8x
6.0x
2.4x
2.8x
2.8x
42Storage Service
- Many options possible
- HDFS, PVFS, pNFS, Lustre, JBOD, etc.
- A standard interface is needed to expose location
information
43Data Location Service
- struct blockInfo
- encodingType type
- byteRange range
- listlthostIdgt nodeList
-
- listltblockInfogt
- getBlockInfoByteRange(fileId f, byteRange r)
How do we know which data server is the best?
44Resource Telemetry Service
- typedef double metricValue
- metricValue
- getMetric(hostId from, hostId to, metricType t)
- listlt listltmetricValuegt gt
- getAllMetrics(listlthostIdgt fromList,
- listlthostIdgt toList, metricType t)
Example metrics include latency, bandwidth,
switch count, fault-tolerance domain,
45Putting the Pieces Together
Data Location Service
LA application
LA application
LA runtime
Resource Telemetry Service
LA runtime
Virtual Machines
DFS
DFS
Guest OS
OS
DFS
VMM
VM Runtime
OS
(a) non-virtualized
(b) virtualized
46DFS Performance
47Administration
48Key Configuration Options
- Tashi uses a series of configuration files
- TashiDefaults.cfg is the most basic and is
included in the source tree - Tashi.cfg overrides this for site specific
settings - Agent.cfg, NodeManager.cfg, ClusterManager.cfg,
and Client.cfg override those setting based on
which app is launched - Files in /.tashi/ override everything else
49Key Configuration Options (CM hostname)
- You need to set the hostname used for the CM by
Node Managers - Some example settings are listed below
- Tashi.cfg
- Client
- clusterManagerHost merkabah
- NodeManagerService
- clusterManagerHost merkabah
50Key Configuration Options (VFS)
- You need to set the directory that serves disk
images - Were using NFS for this at the moment
- Some example settings are listed below
- Tashi.cfg
- Vfs
- prefix /mnt/merkabah/tashi/
51Key Configuration Options (DHCP/DNS)
- If you want Tashi to manage DHCP and DNS, you
need to have servers that are dynamically
configurable - Some example settings are listed below
- Agent.cfg
- DhcpDns
- dnsKeyFile /root/cluster-admin/scripts/Kmerkabah
.15736480.private - dnsServer 172.16.0.5 53
- dnsDomain bigdata.research.intel-research.net
- dnsExpire 60
- dhcpServer 172.16.0.5
- dhcpKeyName merkabah
- dhcpSecretKey ABcdEf12GhIJKLmnOpQrsT
- ipRange300 172.17.10.1-172.17.10.254
-
- ipRange310 172.17.20.1-172.17.20.254
- ipRange999 172.16.192.1-172.16.255.254
- ipRange1001 172.16.1.10-172.16.1.19
- reverseDns True
52Intel BigData Cluster - Networking
External IPs
NAT ports 80 and 443 inbound
SSH gateway
Firewall
Firewall
DMZ VLAN
Tashi Client
Tashi CM
Private VLAN
Tashi VM SSH NAT
53Key Configuration Options (LDAP)
- Tashi can use LDAP to perform user lookups
- Some example settings are listed below
- Tashi.cfg
- ClusterManager
- data tashi.clustermanager.data.LdapOverride
- LdapOverride
- ldapCommand ldapsearch -x -w AbCdefGH -h
10.212.3.3 -b ouCCT,dcresearch,dcintel-research
,dcnet -D cncctldapsearch,cnUsers,dcresearch,d
cintel-research,dcnet msSFU30LoginShell -z 0
54Key Configuration Options (syslog)
- Tashi can send all log messages to syslog
- Some example settings are listed below
- Tashi.cfg
- handlers
- keys consoleHandler,fileHandler,syslogHandler
- logger_root
- handlers consoleHandler,fileHandler,syslogHandle
r - handler_syslogHandler
- args (/dev/log,18)
- 18 is the syslog facility
55Key Configuration Options (packing)
- The primitive scheduler can either dense pack VMs
or spread them out - Some example settings are listed below
- Tashi.cfg
- Primitive
- densePack True
56Key Configuration Options (Maui)
- Tashi can be scheduled with Maui
- Maui must be compiled with the Wiki interface
- Some example settings are listed below
- Tashi.cfg
- MauiWiki
- authuser mryan3
- authkey 12345
- It is necessary to run ./src/tashi/agents/mauiwiki
.py and maui directly in order to launch the
system
57Key Configuration Options (General CM)
- You may want to adjust several CM settings
- The amount of time things are allowed to be
decayed - Whether NMs with version mismatches are allowed
- Necessary if you plan to live-upgrade between
versions - Maximum amount of memory and number of cores that
can be allocated to a single VM - Some example settings are listed below
- Tashi.cfg
- ClusterManagerService
- allowDecayed 60.0
- allowMismatchedVersions True
- maxMemory 15872
- maxCores 16
58Key Configuration Options (General NM)
- You may want to adjust several NM settings
- The frequency of collecting stats
- The amount of time migration is given before
timing out - Some example settings are listed below
- Tashi.cfg
- NodeManagerService
- statsInterval 15.0
- Qemu
- monitorTimeout 3600.0
- migrateTimeout 3600.0
- statsInterval 15.0
59Managing the Resources
- If using MySQL, adding nodes is fairly
straight-forward - If using Pickle, the easiest way to add nodes is
to use the debug console - See http//incubator.apache.org/tashi/documentatio
n-single.html - By defaults, the list of users can is fetched
through getent - LDAP can also be used
- The debug console can be used if both of these
mechanisms are turned off
60Key Cluster Components
- DHCP and DNS servers
- Servers that can accept dynamic updates using
omshell and nsupdate - Example ISC DHCPD and bind9
- See http//incubator.apache.org/tashi/documentatio
n-cluster.html - Disk images
- Disk images need to be available in the same
location on all the machines - NFS can be used for this
- Ive seen people use GlusterFS as well
61Interpreting the Tashi Logs
- VM startup
- When a VM goes to the held state and fails to
initialize, look for the QEMU command log
message - DHCP/DNS
- Look for Adding lthostnamegt to DHCP or DNS
- Heartbeats
- Messages about Host lthostnamegt has expired after
1234 seconds are related to the NM being unable
to send a heartbeat for a certain time window - Messages about State reported as Running instead
of Orphaned for instance 1234 on host lthostnamegt
are related to an NM recovering after failing to
send heartbeats - Ive found that this occurs under some situations
of extreme network saturation or disk utilization
62Installation
63Installing Tashi
- Setup
- Checkout code
- Configure for your site
- Get/create a disk image
- Starting Tashi
- Cluster manager
- Node manager
- Scheduler
- Launching a VM
- getHosts
- createVm
- Try logging in
- destroyVm
64Installing Tashi Details (1)
- Install Qemu/KVM, Thrift, and rpyc
- svn co http//svn.apache.org/repos/asf/incubator/t
ashi/trunk ./tashi - cd tashi
- make
- mkdir p /var/tmp/images
- ltOptionally create Tashi.cfggt
65Installing Tashi Details (2)
- ltOptionally create Tashi.cfggt
- This config file overrides the defaults in
TashiDefaults.cfg - You can switch to the MySQL backend or change
other configuration parameters - Follow CM setup instructions on
http//incubator.apache.org/tashi/documentation-si
ngle.html
66Installing Tashi Details (3)
- ltcreate /var/tmp/images/i386-hardy.qcowgt
- export PYTHONPATHpwd/src
- screen
- 0 DEBUG1 ./bin/clustermanager.py
- 1 DEBUG1 ./bin/nodemanager.py
- 2 python ./bin/primitive.py
- 3 ./bin/tashi-client.py getHosts
- 3 ./bin/tashi-client.py createVm name foobar
disks i386-hardy.qcow memory 1024 cores 1 - 3 ./bin/tashi-client.py getInstances
- 3 ps aux grep ltvmIdgt
- 3 ssh mryan3_at_172.20.0.15 ltor other known
username/IPgt - 3 ./bin/tashi-client.py vmmspecificcall
instance foobar arg startvnc - 3 ./bin/tashi-client.py destroyVm instance
foobar
67Tashi on OpenCirrus
Exclusive PRSs Systems-level research and
services
- One shared PRS runs a VM-based cloud system,
managed by the cluster, and shared by many users - Users who dont need access to the underlying
hardware - Remaining exclusive PRSs are managed and owned
by individual research groups
Shared PRS App-level research
Open service research
Apps running in a VM mgmt infrastructure (e.g.,
Tashi, Eucalyptus)
Tashi development
Production storage service
Proprietary service research
Open workload monitoring and trace collection
68Internals
69Software Architecture
- Cluster Manager
- Keeps global system state, accepts user requests,
listens for heartbeats from nodes - Node Manager
- Manages virtual machines, reports system state,
sends heartbeats to CM - Agents
- Monitor/manage the system
- Scheduler is an example
- Used to keep the CM/NM separate from
site-specific logic and potentially complicated
(read unreliable) scheduling logic - RPCs
- Using RPyC, it has some nice features
- Handles exceptions, object serialization
70Cluster Manager Concepts
- The CM main responsibilities
- Virtual cluster owner authentication (RPyC)
- Cluster state maintenance
- The CM maintains databases of all pertinent
cluster state, but actuation is typically the
purview of some other component. The intent is
to make the CM as robust as possible. - E.g. After a client submits a createVm()
request, the CM only records that fact in the VM
instance database. The scheduling agent is
responsible for polling that database, deciding
where/when to activate the VM, and issuing the
actuation commands.
71Node Manager Concepts
- The main responsibilities of the node manager
(NM) are - Manage the creation, monitoring, and destruction
of virtual machines on a particular compute node - Monitor the physical resources of the compute
node (CPU load, memory, etc) - Monitor the virtual resources of each VM
- The NM registers itself with the CM upon
initialization - And sends heartbeats that have state information
piggy-backed on them - Each compute node is also expected to host sshd
for emergency manual administration - Compute nodes may also use facilities such as
iptables to protect the cluster from untrusted
VMs - Weve focused on using VLANs instead, but this
approach should work as well - vmmspecificcall is a mechanism to pass options
and information directly to the VMM - (Eg. startVnc allows the VM to expose the guests
virtual console over VNC when using Qemu/KVM)
72Agent/Scheduler Concepts
- The main responsibilities for the Agent/Scheduler
are - Using activateVm to move VMs from the pending
queue to an actual physical machine - Make any higher-level decisions about system
management - Perform site-specific initialization for VMs
- Site-specific plugins to the scheduling agent are
responsible for taking appropriate actions during
VM creation - E.g. Before using activateVm(), it could enable
site-specific plugins to prepare entries in the
appropriate DHCP/DNS servers - Similarly, such hooks could enable VM image
composition prior to activation - The DFS proxy(ies) enables the scheduling agent
to query the DFS metadata server(s) to determine
the locations of data needed by virtual machines,
if specified - This information could inform the placement of
virtual machines
73Cluster Manager Service
- // Client-facing RPCs
- instanceId createVm()
- void shutdownVm(instanceId)
- void destroyVm(instanceId)
- void suspendVm(instanceId)
- instanceId resumeVm()
- listltInstancegt getInstances()
- listltHostgt getHosts()
- listltNetworkgt getNetworks()
- listltUsergt getUsers()
- // NodeManager-facing RPCs
- i32 registerNodeManager(host, listltinstanceIdgt)
- void vmStateChange(instanceId, oldState,
newState) - // Agent-facing RPCs
- void activateVm(instanceId, host)
- void migrateVm(instanceId, host)
- void pauseVm(instanceId)
- void unpauseVm(instanceId)
74Node Manager Service
- // ClusterManager-facing RPCs
- i32 instantiateVm(1Instance instance)
- listlti32gt listVms()
- Instance getVmInfo(1i32 vmId)
- void shutdownVm(1i32 vmId)
- void destroyVm(1i32 vmId) throws
(1TashiException e) - void suspendVm(1i32 vmId, 2string destination,
3string suspendCookie) - ResumeVmRes resumeVm(1Instance instance,
2string source) - string prepReceiveVm(1Instance instance, 2Host
source) - void migrateVm(1i32 vmId, 2Host target,
3string transportCookie) - i32 receiveVm(1Instance instance, 2string
transportCookie) - void pauseVm(1i32 vmId) void unpauseVm(1i32
vmId) - string vmmSpecificCall(1i32 vmId, 2string arg)
75VM State Diagram As maintained by VM instance DB
VMcreate()
activating
pending
unpausing
VMresume()
resuming
running
paused
Dark green states are static, and blue are
transitional Errors typically occur while a VM
is in a transitional state A VM can be put in
the destroying state from any previous state
pausing
suspending
shutting down
destroying
exited
76Summary
77Apache Incubator
- Environment in which development is taking place
- Will ultimately determine the success of Tashi as
a software artifact - URLs
- http//incubator.apache.org/projects/tashi
- Incubation status
- http//incubator.apache.org/tashi/
- Project webpage
78Apache Incubator Process
- Mentors
- These people have volunteered to help guide us
through the incubation process - Craig Russell
- Matthieu Riou
- Paul Freemantle
- Committers
- These are the people with commit permissions for
the source repository - Michael Ryan (Intel)
- Mike Stroucken (CMU)
- Patches
- Other contributors have to submit changes as
patches on tashi-dev_at_incubator.apache.org - Until they become committers
- Its also possible to open JIRA issues
- http//issues.apache.org/jira/browse/tashi
79Apache Incubator Process
- Go to the incubation status webpage to subscribe
to the mailing lists - Discussion around future features and code will
be on tashi-dev_at_i.a.o - What follows is the list of known
issues/development goals we had going into the
Incubator that still remain
80Known Issues (1)
- Priority 1 DFS interactions
- Were currently looking at using HDFS on the
hosts and exposing the locality information to
various parts of the Map/Reduce runtime in Hadoop
to provide similar behavior/performance
characteristics to physical operations. - Additional work is taking place in looking at
other file systems (GlusterFS, self-, and
others) - Priority 2 Physical boot
- Our current proposed approach to this is to
network isolate any physical machines given to
users. This allows us to insert a narrow access
path (single VM) to limit traffic coming out of
the virtual cluster. - Plan Use current PXE booting schemes to boot
VLAN isolated clusters and work toward creating
an automatic conversion mechanism for Tashi disk
images - Priority 3 Site-specific plugins
- The current DHCP/DNS plugin has been suitable for
at least two very different sites. - Plan Continue to solicit feedback from partners
to determine for which steps in VM
creation/activation customization is critical
81Known Issues (2)
- Priority 4 VM scheduling model
- Tashi does not currently have a well-integrated
scheduler that supports VM priorities, quotas,
billing, etc. We have, however, implemented a
bridge to use Maui as a scheduler for the
project. - Plan Implement features on as needed basis
look to leverage existing scheduling mechansims
(Eg. Maui) - Priority 5 Multi-VM job control
- The current scheduling agent activates VMs one at
a time. A transactional mechanism needs to be
added that only starts a VM group if there is
room to accommodate the entire group and enables
easy tear-down if any portion of the group fails - Plan Use Maui if possible to perform this
operation
82Tashi Software Architecture
Site Specific Plugin(s)
Centralized cluster administration
Cluster Manager (CM)
VM instance DB
Scheduling Agent
DFS Proxy
Client
Client API
Node Manager DB
VM
Ganglia
VM
VM
VM
CM-NM API
Node Manager (NM)
Resource Controller Plugins (VMM, DFS, power,
etc.)
VMM
DFS
Sensor Plugins
Legend
DFS Metadata Server
Tashi component
system software
nmd
iptables /vlan
non-Tashi component
sshd
Compute node
83Demo