 7c2b0f65cc
			
		
	
	
		7c2b0f65cc
		
	
	
	
	
		
			
			1. Default cache size is 64MB. 2. Semantics correction. Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
		
			
				
	
	
		
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			137 lines
		
	
	
		
			4.8 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| XBZRLE (Xor Based Zero Run Length Encoding)
 | |
| ===========================================
 | |
| 
 | |
| Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
 | |
| of VM downtime and the total live-migration time of Virtual machines.
 | |
| It is particularly useful for virtual machines running memory write intensive
 | |
| workloads that are typical of large enterprise applications such as SAP ERP
 | |
| Systems, and generally speaking for any application that uses a sparse memory
 | |
| update pattern.
 | |
| 
 | |
| Instead of sending the changed guest memory page this solution will send a
 | |
| compressed version of the updates, thus reducing the amount of data sent during
 | |
| live migration.
 | |
| In order to be able to calculate the update, the previous memory pages need to
 | |
| be stored on the source. Those pages are stored in a dedicated cache
 | |
| (hash table) and are accessed by their address.
 | |
| The larger the cache size the better the chances are that the page has already
 | |
| been stored in the cache.
 | |
| A small cache size will result in high cache miss rate.
 | |
| Cache size can be changed before and during migration.
 | |
| 
 | |
| Format
 | |
| =======
 | |
| 
 | |
| The compression format performs a XOR between the previous and current content
 | |
| of the page, where zero represents an unchanged value.
 | |
| The page data delta is represented by zero and non zero runs.
 | |
| A zero run is represented by its length (in bytes).
 | |
| A non zero run is represented by its length (in bytes) and the new data.
 | |
| The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)
 | |
| 
 | |
| There can be more than one valid encoding, the sender may send a longer encoding
 | |
| for the benefit of reducing computation cost.
 | |
| 
 | |
| page = zrun nzrun
 | |
|        | zrun nzrun page
 | |
| 
 | |
| zrun = length
 | |
| 
 | |
| nzrun = length byte...
 | |
| 
 | |
| length = uleb128 encoded integer
 | |
| 
 | |
| On the sender side XBZRLE is used as a compact delta encoding of page updates,
 | |
| retrieving the old page content from the cache (default size of 64MB). The
 | |
| receiving side uses the existing page's content and XBZRLE to decode the new
 | |
| page's content.
 | |
| 
 | |
| This work was originally based on research results published
 | |
| VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
 | |
| Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
 | |
| Additionally the delta encoder XBRLE was improved further using the XBZRLE
 | |
| instead.
 | |
| 
 | |
| XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
 | |
| ideal for in-line, real-time encoding such as is needed for live-migration.
 | |
| 
 | |
| Example
 | |
| old buffer:
 | |
| 1001 zeros
 | |
| 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
 | |
| 3074 zeros
 | |
| 
 | |
| new buffer:
 | |
| 1001 zeros
 | |
| 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
 | |
| 3074 zeros
 | |
| 
 | |
| encoded buffer:
 | |
| 
 | |
| encoded length 24
 | |
| e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69
 | |
| 
 | |
| Cache update strategy
 | |
| =====================
 | |
| Keeping the hot pages in the cache is effective for decreasing cache
 | |
| misses. XBZRLE uses a counter as the age of each page. The counter will
 | |
| increase after each ram dirty bitmap sync. When a cache conflict is
 | |
| detected, XBZRLE will only evict pages in the cache that are older than
 | |
| a threshold.
 | |
| 
 | |
| Usage
 | |
| ======================
 | |
| 1. Verify the destination QEMU version is able to decode the new format.
 | |
|     {qemu} info migrate_capabilities
 | |
|     {qemu} xbzrle: off , ...
 | |
| 
 | |
| 2. Activate xbzrle on both source and destination:
 | |
|    {qemu} migrate_set_capability xbzrle on
 | |
| 
 | |
| 3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
 | |
| power of 2. The cache default value is 64MBytes. (on source only)
 | |
|     {qemu} migrate_set_cache_size 256m
 | |
| 
 | |
| 4. Start outgoing migration
 | |
|     {qemu} migrate -d tcp:destination.host:4444
 | |
|     {qemu} info migrate
 | |
|     capabilities: xbzrle: on
 | |
|     Migration status: active
 | |
|     transferred ram: A kbytes
 | |
|     remaining ram: B kbytes
 | |
|     total ram: C kbytes
 | |
|     total time: D milliseconds
 | |
|     duplicate: E pages
 | |
|     normal: F pages
 | |
|     normal bytes: G kbytes
 | |
|     cache size: H bytes
 | |
|     xbzrle transferred: I kbytes
 | |
|     xbzrle pages: J pages
 | |
|     xbzrle cache miss: K
 | |
|     xbzrle overflow : L
 | |
| 
 | |
| xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
 | |
| indicates that the cache size is set too low.
 | |
| xbzrle overflow: the number of overflows in the decoding which where the delta
 | |
| could not be compressed. This can happen if the changes in the pages are too
 | |
| large or there are many short changes; for example, changing every second byte
 | |
| (half a page).
 | |
| 
 | |
| Testing: Testing indicated that live migration with XBZRLE was completed in 110
 | |
| seconds, whereas without it would not be able to complete.
 | |
| 
 | |
| A simple synthetic memory r/w load generator:
 | |
| ..    include <stdlib.h>
 | |
| ..    include <stdio.h>
 | |
| ..    int main()
 | |
| ..    {
 | |
| ..        char *buf = (char *) calloc(4096, 4096);
 | |
| ..        while (1) {
 | |
| ..            int i;
 | |
| ..            for (i = 0; i < 4096 * 4; i++) {
 | |
| ..                buf[i * 4096 / 4]++;
 | |
| ..            }
 | |
| ..            printf(".");
 | |
| ..        }
 | |
| ..    }
 |