Avoid hot pages being replaced by others to remarkably decrease cache
misses
Sample results with the test program which quote from xbzrle.txt ran in
vm:(migrate bandwidth:1GE and xbzrle cache size 8MB)
the test program:
include <stdlib.h>
include <stdio.h>
int main()
 {
        char *buf = (char *) calloc(4096, 4096);
        while (1) {
            int i;
            for (i = 0; i < 4096 * 4; i++) {
                buf[i * 4096 / 4]++;
            }
            printf(".");
        }
 }
before this patch:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":1020,"xbzrle-cache":{"bytes":1108284,
"cache-size":8388608,"cache-miss-rate":0.987013,"pages":18297,"overflow":8,
"cache-miss":1228737},"status":"active","setup-time":10,"total-time":52398,
"ram":{"total":12466991104,"remaining":1695744,"mbps":935.559472,
"transferred":5780760580,"dirty-sync-counter":271,"duplicate":2878530,
"dirty-pages-rate":29130,"skipped":0,"normal-bytes":5748592640,
"normal":1403465}},"id":"libvirt-706"}
18k pages sent compressed in 52 seconds.
cache-miss-rate is 98.7%, totally miss.
after optimizing:
virsh qemu-monitor-command test_vm '{"execute": "query-migrate"}'
{"return":{"expected-downtime":2054,"xbzrle-cache":{"bytes":5066763,
"cache-size":8388608,"cache-miss-rate":0.485924,"pages":194823,"overflow":0,
"cache-miss":210653},"status":"active","setup-time":11,"total-time":18729,
"ram":{"total":12466991104,"remaining":3895296,"mbps":937.663549,
"transferred":1615042219,"dirty-sync-counter":98,"duplicate":2869840,
"dirty-pages-rate":58781,"skipped":0,"normal-bytes":1588404224,
"normal":387794}},"id":"libvirt-266"}
194k pages sent compressed in 18 seconds.
The value of cache-miss-rate decrease to 48.59%.
Signed-off-by: ChenLiang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
		
	
			
		
			
				
	
	
		
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
XBZRLE (Xor Based Zero Run Length Encoding)
 | 
						|
===========================================
 | 
						|
 | 
						|
Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
 | 
						|
of VM downtime and the total live-migration time of Virtual machines.
 | 
						|
It is particularly useful for virtual machines running memory write intensive
 | 
						|
workloads that are typical of large enterprise applications such as SAP ERP
 | 
						|
Systems, and generally speaking for any application that uses a sparse memory
 | 
						|
update pattern.
 | 
						|
 | 
						|
Instead of sending the changed guest memory page this solution will send a
 | 
						|
compressed version of the updates, thus reducing the amount of data sent during
 | 
						|
live migration.
 | 
						|
In order to be able to calculate the update, the previous memory pages need to
 | 
						|
be stored on the source. Those pages are stored in a dedicated cache
 | 
						|
(hash table) and are accessed by their address.
 | 
						|
The larger the cache size the better the chances are that the page has already
 | 
						|
been stored in the cache.
 | 
						|
A small cache size will result in high cache miss rate.
 | 
						|
Cache size can be changed before and during migration.
 | 
						|
 | 
						|
Format
 | 
						|
=======
 | 
						|
 | 
						|
The compression format performs a XOR between the previous and current content
 | 
						|
of the page, where zero represents an unchanged value.
 | 
						|
The page data delta is represented by zero and non zero runs.
 | 
						|
A zero run is represented by its length (in bytes).
 | 
						|
A non zero run is represented by its length (in bytes) and the new data.
 | 
						|
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
 | 
						|
 | 
						|
There can be more than one valid encoding, the sender may send a longer encoding
 | 
						|
for the benefit of reducing computation cost.
 | 
						|
 | 
						|
page = zrun nzrun
 | 
						|
       | zrun nzrun page
 | 
						|
 | 
						|
zrun = length
 | 
						|
 | 
						|
nzrun = length byte...
 | 
						|
 | 
						|
length = uleb128 encoded integer
 | 
						|
 | 
						|
On the sender side XBZRLE is used as a compact delta encoding of page updates,
 | 
						|
retrieving the old page content from the cache (default size of 512 MB). The
 | 
						|
receiving side uses the existing page's content and XBZRLE to decode the new
 | 
						|
page's content.
 | 
						|
 | 
						|
This work was originally based on research results published
 | 
						|
VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
 | 
						|
Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
 | 
						|
Additionally the delta encoder XBRLE was improved further using the XBZRLE
 | 
						|
instead.
 | 
						|
 | 
						|
XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
 | 
						|
ideal for in-line, real-time encoding such as is needed for live-migration.
 | 
						|
 | 
						|
Example
 | 
						|
old buffer:
 | 
						|
1001 zeros
 | 
						|
05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
 | 
						|
3074 zeros
 | 
						|
 | 
						|
new buffer:
 | 
						|
1001 zeros
 | 
						|
01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
 | 
						|
3074 zeros
 | 
						|
 | 
						|
encoded buffer:
 | 
						|
 | 
						|
encoded length 24
 | 
						|
e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
 | 
						|
 | 
						|
Cache update strategy
 | 
						|
=====================
 | 
						|
Keeping the hot pages in the cache is effective for decreased cache
 | 
						|
misses. XBZRLE uses a counter as the age of each page. The counter will
 | 
						|
increase after each ram dirty bitmap sync. When a cache conflict is
 | 
						|
detected, XBZRLE will only evict pages in the cache that are older than
 | 
						|
a threshold.
 | 
						|
 | 
						|
Usage
 | 
						|
======================
 | 
						|
1. Verify the destination QEMU version is able to decode the new format.
 | 
						|
    {qemu} info migrate_capabilities
 | 
						|
    {qemu} xbzrle: off , ...
 | 
						|
 | 
						|
2. Activate xbzrle on both source and destination:
 | 
						|
   {qemu} migrate_set_capability xbzrle on
 | 
						|
 | 
						|
3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
 | 
						|
power of 2. The cache default value is 64MBytes. (on source only)
 | 
						|
    {qemu} migrate_set_cache_size 256m
 | 
						|
 | 
						|
4. Start outgoing migration
 | 
						|
    {qemu} migrate -d tcp:destination.host:4444
 | 
						|
    {qemu} info migrate
 | 
						|
    capabilities: xbzrle: on
 | 
						|
    Migration status: active
 | 
						|
    transferred ram: A kbytes
 | 
						|
    remaining ram: B kbytes
 | 
						|
    total ram: C kbytes
 | 
						|
    total time: D milliseconds
 | 
						|
    duplicate: E pages
 | 
						|
    normal: F pages
 | 
						|
    normal bytes: G kbytes
 | 
						|
    cache size: H bytes
 | 
						|
    xbzrle transferred: I kbytes
 | 
						|
    xbzrle pages: J pages
 | 
						|
    xbzrle cache miss: K
 | 
						|
    xbzrle overflow : L
 | 
						|
 | 
						|
xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
 | 
						|
indicates that the cache size is set too low.
 | 
						|
xbzrle overflow: the number of overflows in the decoding which where the delta
 | 
						|
could not be compressed. This can happen if the changes in the pages are too
 | 
						|
large or there are many short changes; for example, changing every second byte
 | 
						|
(half a page).
 | 
						|
 | 
						|
Testing: Testing indicated that live migration with XBZRLE was completed in 110
 | 
						|
seconds, whereas without it would not be able to complete.
 | 
						|
 | 
						|
A simple synthetic memory r/w load generator:
 | 
						|
..    include <stdlib.h>
 | 
						|
..    include <stdio.h>
 | 
						|
..    int main()
 | 
						|
..    {
 | 
						|
..        char *buf = (char *) calloc(4096, 4096);
 | 
						|
..        while (1) {
 | 
						|
..            int i;
 | 
						|
..            for (i = 0; i < 4096 * 4; i++) {
 | 
						|
..                buf[i * 4096 / 4]++;
 | 
						|
..            }
 | 
						|
..            printf(".");
 | 
						|
..        }
 | 
						|
..    }
 |