Isolation Benchmark Suite (IBS)

Results from running the Isolation Benchmark Suite

Comparision of Full Virtualization, Paravirtualization, and OS-level Virtualization

	VMware Workstation		Xen		OpenVZ
	Good	Bad	Good	Bad	Good	Bad
Memory	0^[1]	91.30	0.03	DNR^[2]	0	DNR
Fork	0	DNR	0.04	DNR	0.01	87.80
CPU	0	0	0.01	0	0	0
Disk Intensive	0	39.80	1.67	11.53	2.52	2.63
Network ^[3] Server Transmits	0	52.9	0	0.33	21.3	28.97
Network ^[4] Server Receives	0	0	0.04	0.03	DNR	DNR

IBM ThinkCentre with Pentium 4 processor, 1 GB DDR RAM, Gigabit NIC
VMware Workstation 5.5, OpenVZ 2.6.18, Xen 3.0 Stable
Apache 2.x, PHP 5.x

Summary of Stress Test Results

Memory:
In both Xen and OpenVZ cases, the misbehaving VM did not report results, but all others continued to report nearly 100% good results as before. In the VMware Workstation case, the misbehaving VM survived to report significantly degraded performance (8.7% average good responses) and the other three servers continued to report 100% good response time, as in the baseline.
Fork:
Under both Xen and VMware Workstation, the misbehaving virtual machine presented no results, but the other three well-behaved containers continued to report 100% (or near 100%) good response time. Under OpenVZ, the well-behaved guests were also protected and even the misbehaving guest survived to report results, but only 12.2% good response times.
CPU:
All three of our virtualization systems performed well on this test – even the misbehaving VMs. We verified on all platforms that the CPU load on the misbehaving server does rise to nearly 100%.
Disk IO:
On VMware Workstation, 100% good performance was maintained on the three well-behaved VMs. However, the misbehaving VM saw a degradation of 40%. For Open VZ, the well-behaved and misbehaving VMs saw similar degradation of 2.52% and 2.63% respectively. Although the degradation is relatively minor, there is no evidence of isolation on this test. On, Xen, the situation was mixed. The misbehaving VM saw a degradation of 12% and the other three VMs were also impacted, showing an average degradation of 1-2%.
UDP Transmit:
Under VMware Workstation, the well-behaved VMs continue to show 100% good response, but the misbehaving VM shows substantial degradation of 53%. For Xen, the well-behaved VMs also show no degradation and the misbehaving VMs shows a slight but repeatable degradation of less than 1%. For OpenVZ, all VMs experience significant degradation. The misbehaving VM experiences almost 29% degradation, while the well-behaved VMs fare almost as poorly with an average degradation of 21.3%. Once again, this is evidence of weak isolation.
UDP Receive:
For OpenVZ, none of the 4 VMs survived to report any results. While on VMWare Workstation, the opposite occurred - all four VMs retained 100% good response. We did not even see a degradation on the misbehaving VM as we did in the sender transmit case. In the OpenVZ case, however, it is clear that the incoming packets are indeed creating interference. For Xen, the misbehaving VM and the well-behaving VMs are similarly affected with a very slight degradation of 0.03-0.04%.

Comparison of Solaris with Resource Controls to without Resource Controls

Table 2

	Solaris (Without resource controls)		Solaris (With resource controls)
	Good	Bad	Good	Bad
Memory	DNR	DNR	0.06	0.03 / DNR
Fork	DNR	DNR	0.04	DNR
CPU	0	0	0.06	0.07
Disk Intensive	1.48	1.23	1.59	1.13
Network Server Transmits	4.00	3.53	1.00	0.93
Network Server Receives	1.24	1.67	92.73	92.43

Summary of Solaris Stress Test Results

Memory:
Without the resource limits in place, none of the Solaris Containers survived to report results. The test effectively shut down all virtual machines - misbehaving and well behaved.
With the resource limits in place, we experienced two different situations that together illustrate the nature of the resource limits nicely.
In both situations, the well-behaved VMs reports a trivial degradation of 0.06%, but the results for the misbehaving VM depends on a slight difference in timing whether we start the memory bomb first or the SPECweb test first. (Note: In general, we tried to start them at approximately the same time.) If we start the memory bomb first, no results will be satisfied because the web server’s memory requirements will be denied. If we start the SPECweb test first, it will report little degradation (0.03%) similar to the well-behaved VMs. In this case, the web server‘s memory requests are satisfied, but the memory bomb will report insufficient memory errors.
Fork:
Without resource controls, results were not reported for any of the four containers.
With the resource limits in place, the misbehaving VM was still completely unresponsive, but the well-behaved VMs experienced only 0.04% degradation on average.
CPU:
With resource controls, there is no degradation for any of the VMs. Interestingly, there was slight degradation in the case with resource controls of 0.06% for the well-behaved VMs and 0.07% for the misbehaving VM.
Disk IO:
With and without resource controls, all VMs experience a slight degradation of 1.13 - 1.59%. (unaware of any configuration options for disk resources in Solaris.)
UDP Transmit:
With no resource controls, a degradation of 3.53% to 4% is reported for all VMs.
With the resource controls, but no network specific controls the overall degradation is quite low (1.0% for the well-behaved VMs and 0.93% for the misbehaving VM), but no evidence of isolation.
UDP Receive:
With no resource controls, a degradation of 1.24% to 1.67% degradation is reported for all VMs.
With the resource controls, but no network specific controls. The results for the misbehaving and well-behaved VMs are similar to each other. However, a very high degradation of about 92% is reported for all VMs.

Planned Comparision of Virtualization Systems

Microsoft Virtual PC
VMware ESX Server
Xen Express
KVM
UML

[Note] Back to Table

^[1] Percent of degradation:
In each test, Specweb reports 3 test iterations and in each iteration reports percent good, percent tolerable and percent failed (tolerable + failed = 100% and good <=tolerable). We reported the degradation in percent good - in other words 100- good. Also, for each test, there are three well-behaved servers and one misbehaving/bad server. We average the results over all three well-behaved servers

^[2]DNR:
DNR indicates the SPECweb client reported only an error and no actual results because of the unresponsiveness of the server it was testing.

^[3]Network Server Transmits:
Network server is an external physical server machine which sends UDP packets to the misbehaving virtual machine.

^[4]Network Server Receive:
Misbehaving virtual machine sends UDP packets to an external physical client machine.

Isolation Benchmark Suite Download Resources

The submitted version of the paper to the Experimental CS conference is available here: DOWNLOAD
The current version of the Isolation Benchmark Suite is available here: DOWNLOAD
The current results from running the Isolation Benchmark Suite are available here: DOWNLOAD