Clarkson GDC

Workload Migration

With the time constraint of GDC node power outage, migrating the computation out over gigabyte network is relatively feasible. The computations will unlikely get interrupted and better than node failover and reboot. To migrate the whole data out of GDC is more likely unrealistic considering the migration time it needs to take. Moreoever, computation resource is relatively small-scale and easy to migrate but as disk space is cheap and easy to expand to Terabytes or Perabytes, user data is hard to move. So when we consider about the workload hosting in GDC, computation and data need to be treated separately. Our task is to perform a thorough evaluation of these migration technologies in a wide range of environments – between two machines in the same rack sharing a NAS, between two machines in the same rack not sharing a NAS, between two machines located in two different labs on campus with and without NAS and finally between an on-campus machine and an off-campus machine with and without NAS.

The goal of Task 4 is to characterize the workloads for which hosting in a POD datacenter would be a good match. Part of this task includes a detailed study of existing virtual machine migration technologies and an evaluation of the role virtual machine migration could play in shifting workloads from one POD datacenter to another to take advantage of available power. In addition, we would like to explore a number of other tools and strategies for making POD datacenter hosting a viable alternative for a larger range of workloads. We expect that there will be some workloads for which POD datacenter hosting is a natural fit (e.g. hosting of static content) and other workloads for which POD datacenter hosting may never be a good fit (e.g. write intensive workloads with high availability and coherency requirements). In addition to virtual machine migration, we plan to explore other tools and strategies for managing downtime at one POD datacenter and/or shifting work from one POD datacenter to another according to available power.

Virtual Machine Migration

The first set of subtasks we have identified involves conducting a series of quantitative experiments of virtual machine migration. Most of the popular server class hypervisor/ virtualization technologies including KVM, VMware, Xen, and Hyper-V offer a form of virtual machine migration. However, in many instances, it is assumed that the migration will take place between machines in the same datacenter and possibly even on the same network switch. In many cases, the machine initiating the migration must have access to the same network attached storage device (NAS) as the machine accepting the migrated VM and only the memory state is migrated from one machine to another. Some hypervisors also support live migration with storage migration where the disk state of the VM is transferred as well as the memory state. Our first task is to perform a thorough evaluation of these migration technologies in a wide range of environments – between two machines in the same rack sharing a NAS, between two machines in the same rack not sharing a NAS, between two machines located in two different labs on campus with and without NAS and finally between an on-campus machine and an off-campus machine with and without NAS.We are in the process of collecting a variety of interesting measurements of virtual machine migration including the total time to accomplish the migration (from start to finish), the downtime or time the virtual machine is unresponsive during migration (typically during the last stage of migration) and the total amount of data transferred to accomplish the migration. Eventually, we would also like to collect data on the total power required on the transmitting side to complete the migration. We will also collect these same measurements of two extreme configurations – “cold” migration and a high-availability pair. In cold migration, the VM is suspended, the files transferred to other side and the VM resumed. This represents a worst-case scenario for the amount of time a VM will be unresponsive but makes the fewest assumptions about the environment (no NAS required for example). In a high-availability pair, VMs run constantly on both machines so full VM migration is never required. In some cases, data will be shipped from one VM to another to keep the two VMs in sync. The only thing that changes is where requests are sent (to both VMs if available, to just one, etc.).

Progress to Date

Surveyed advertised features/requirements of several major hypervisors and cloud orchestration tools, and added this information into a master spreadsheet
Set up mini-GDC architecture in the lab with new HP servers and Iomega NAS. We are using this new hardware to repeat measurements taken on some older hardware we had available in the lab over the summer.
Set up a similar rack of servers in one of the labs in CAMP which can, in part, be used for testing cross-campus migration
Resolved the problem of being able to tune network bandwidth by using Netgear managed switch
Improved GDC testbed to be compatible with all kinds of virtualization by removing dependencies on using the hypervisor or host OS for monitoring
Captured a substantial amount of additional data on live migration with KVM (both memory migration and storage migration) using existing (older) hardware in our lab with VMware (both memory migration and storage migration)
Assembled models for using GDC beyond virtual machine migration including rolling queries converging to a final answer

Hardware Description

We purchased two HP servers with two AMD Opteron 6220 8-core 3.0G CPUs and 64G memory. These specifications are a better match for modern datacenter hardware than the older hardware we used over the summer. We install the virtualization hypervisor under test on each server installed Virtualization hypervisor to host VMs. Each site is also equipped with an 8-Terabyte Network Attached Storage (NAS) which will host VM images on RAID 10. Together a HP server and a NAS represent a small GDC in our prototype. Customer services are hosted in VM and can be relocated or migrated across the servers. The picture below is taken in Clarkson Computer Lab and thanks to them for allowing us to host our GDC project in their server room.

Current GDC computing setup

At the same time, we also have two auxiliary machines to accomplish the test. The GDB workload controller or monitor machine mentioned above is an IBM eServer xSeries345 installed with Ubuntu. It is responsible for starting the migration, measuring the migration time and downtime. When testing VMware, we also use an additional machine to run VMware vCenter which is required to manage the VMware ESX servers. In this case, the workload controller will contact vCenter and ask vCenter to start the live migration. For these two additional machines, we were able to repurpose some exsting machine in the lab.