 2d40178a33
			
		
	
	
		2d40178a33
		
	
	
	
	
		
			
			Reported-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
		
			
				
	
	
		
			176 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			176 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| The memory API
 | |
| ==============
 | |
| 
 | |
| The memory API models the memory and I/O buses and controllers of a QEMU
 | |
| machine.  It attempts to allow modelling of:
 | |
| 
 | |
|  - ordinary RAM
 | |
|  - memory-mapped I/O (MMIO)
 | |
|  - memory controllers that can dynamically reroute physical memory regions
 | |
|    to different destinations
 | |
| 
 | |
| The memory model provides support for
 | |
| 
 | |
|  - tracking RAM changes by the guest
 | |
|  - setting up coalesced memory for kvm
 | |
|  - setting up ioeventfd regions for kvm
 | |
| 
 | |
| Memory is modelled as an acyclic graph of MemoryRegion objects.  Sinks
 | |
| (leaves) are RAM and MMIO regions, while other nodes represent
 | |
| buses, memory controllers, and memory regions that have been rerouted.
 | |
| 
 | |
| In addition to MemoryRegion objects, the memory API provides AddressSpace
 | |
| objects for every root and possibly for intermediate MemoryRegions too.
 | |
| These represent memory as seen from the CPU or a device's viewpoint.
 | |
| 
 | |
| Types of regions
 | |
| ----------------
 | |
| 
 | |
| There are four types of memory regions (all represented by a single C type
 | |
| MemoryRegion):
 | |
| 
 | |
| - RAM: a RAM region is simply a range of host memory that can be made available
 | |
|   to the guest.
 | |
| 
 | |
| - MMIO: a range of guest memory that is implemented by host callbacks;
 | |
|   each read or write causes a callback to be called on the host.
 | |
| 
 | |
| - container: a container simply includes other memory regions, each at
 | |
|   a different offset.  Containers are useful for grouping several regions
 | |
|   into one unit.  For example, a PCI BAR may be composed of a RAM region
 | |
|   and an MMIO region.
 | |
| 
 | |
|   A container's subregions are usually non-overlapping.  In some cases it is
 | |
|   useful to have overlapping regions; for example a memory controller that
 | |
|   can overlay a subregion of RAM with MMIO or ROM, or a PCI controller
 | |
|   that does not prevent card from claiming overlapping BARs.
 | |
| 
 | |
| - alias: a subsection of another region.  Aliases allow a region to be
 | |
|   split apart into discontiguous regions.  Examples of uses are memory banks
 | |
|   used when the guest address space is smaller than the amount of RAM
 | |
|   addressed, or a memory controller that splits main memory to expose a "PCI
 | |
|   hole".  Aliases may point to any type of region, including other aliases,
 | |
|   but an alias may not point back to itself, directly or indirectly.
 | |
| 
 | |
| 
 | |
| Region names
 | |
| ------------
 | |
| 
 | |
| Regions are assigned names by the constructor.  For most regions these are
 | |
| only used for debugging purposes, but RAM regions also use the name to identify
 | |
| live migration sections.  This means that RAM region names need to have ABI
 | |
| stability.
 | |
| 
 | |
| Region lifecycle
 | |
| ----------------
 | |
| 
 | |
| A region is created by one of the constructor functions (memory_region_init*())
 | |
| and destroyed by the destructor (memory_region_destroy()).  In between,
 | |
| a region can be added to an address space by using memory_region_add_subregion()
 | |
| and removed using memory_region_del_subregion().  Region attributes may be
 | |
| changed at any point; they take effect once the region becomes exposed to the
 | |
| guest.
 | |
| 
 | |
| Overlapping regions and priority
 | |
| --------------------------------
 | |
| Usually, regions may not overlap each other; a memory address decodes into
 | |
| exactly one target.  In some cases it is useful to allow regions to overlap,
 | |
| and sometimes to control which of an overlapping regions is visible to the
 | |
| guest.  This is done with memory_region_add_subregion_overlap(), which
 | |
| allows the region to overlap any other region in the same container, and
 | |
| specifies a priority that allows the core to decide which of two regions at
 | |
| the same address are visible (highest wins).
 | |
| 
 | |
| Visibility
 | |
| ----------
 | |
| The memory core uses the following rules to select a memory region when the
 | |
| guest accesses an address:
 | |
| 
 | |
| - all direct subregions of the root region are matched against the address, in
 | |
|   descending priority order
 | |
|   - if the address lies outside the region offset/size, the subregion is
 | |
|     discarded
 | |
|   - if the subregion is a leaf (RAM or MMIO), the search terminates
 | |
|   - if the subregion is a container, the same algorithm is used within the
 | |
|     subregion (after the address is adjusted by the subregion offset)
 | |
|   - if the subregion is an alias, the search is continues at the alias target
 | |
|     (after the address is adjusted by the subregion offset and alias offset)
 | |
| 
 | |
| Example memory map
 | |
| ------------------
 | |
| 
 | |
| system_memory: container@0-2^48-1
 | |
|  |
 | |
|  +---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff)
 | |
|  |
 | |
|  +---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff)
 | |
|  |
 | |
|  +---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff)
 | |
|  |      (prio 1)
 | |
|  |
 | |
|  +---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff)
 | |
| 
 | |
| pci (0-2^32-1)
 | |
|  |
 | |
|  +--- vga-area: container@0xa0000-0xbffff
 | |
|  |      |
 | |
|  |      +--- alias@0x00000-0x7fff  ---> #vram (0x010000-0x017fff)
 | |
|  |      |
 | |
|  |      +--- alias@0x08000-0xffff  ---> #vram (0x020000-0x027fff)
 | |
|  |
 | |
|  +---- vram: ram@0xe1000000-0xe1ffffff
 | |
|  |
 | |
|  +---- vga-mmio: mmio@0xe2000000-0xe200ffff
 | |
| 
 | |
| ram: ram@0x00000000-0xffffffff
 | |
| 
 | |
| This is a (simplified) PC memory map. The 4GB RAM block is mapped into the
 | |
| system address space via two aliases: "lomem" is a 1:1 mapping of the first
 | |
| 3.5GB; "himem" maps the last 0.5GB at address 4GB.  This leaves 0.5GB for the
 | |
| so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with
 | |
| 4GB of memory.
 | |
| 
 | |
| The memory controller diverts addresses in the range 640K-768K to the PCI
 | |
| address space.  This is modelled using the "vga-window" alias, mapped at a
 | |
| higher priority so it obscures the RAM at the same addresses.  The vga window
 | |
| can be removed by programming the memory controller; this is modelled by
 | |
| removing the alias and exposing the RAM underneath.
 | |
| 
 | |
| The pci address space is not a direct child of the system address space, since
 | |
| we only want parts of it to be visible (we accomplish this using aliases).
 | |
| It has two subregions: vga-area models the legacy vga window and is occupied
 | |
| by two 32K memory banks pointing at two sections of the framebuffer.
 | |
| In addition the vram is mapped as a BAR at address e1000000, and an additional
 | |
| BAR containing MMIO registers is mapped after it.
 | |
| 
 | |
| Note that if the guest maps a BAR outside the PCI hole, it would not be
 | |
| visible as the pci-hole alias clips it to a 0.5GB range.
 | |
| 
 | |
| Attributes
 | |
| ----------
 | |
| 
 | |
| Various region attributes (read-only, dirty logging, coalesced mmio, ioeventfd)
 | |
| can be changed during the region lifecycle.  They take effect once the region
 | |
| is made visible (which can be immediately, later, or never).
 | |
| 
 | |
| MMIO Operations
 | |
| ---------------
 | |
| 
 | |
| MMIO regions are provided with ->read() and ->write() callbacks; in addition
 | |
| various constraints can be supplied to control how these callbacks are called:
 | |
| 
 | |
|  - .valid.min_access_size, .valid.max_access_size define the access sizes
 | |
|    (in bytes) which the device accepts; accesses outside this range will
 | |
|    have device and bus specific behaviour (ignored, or machine check)
 | |
|  - .valid.aligned specifies that the device only accepts naturally aligned
 | |
|    accesses.  Unaligned accesses invoke device and bus specific behaviour.
 | |
|  - .impl.min_access_size, .impl.max_access_size define the access sizes
 | |
|    (in bytes) supported by the *implementation*; other access sizes will be
 | |
|    emulated using the ones available.  For example a 4-byte write will be
 | |
|    emulated using four 1-byte writes, if .impl.max_access_size = 1.
 | |
|  - .impl.valid specifies that the *implementation* only supports unaligned
 | |
|    accesses; unaligned accesses will be emulated by two aligned accesses.
 | |
|  - .old_portio and .old_mmio can be used to ease porting from code using
 | |
|    cpu_register_io_memory() and register_ioport().  They should not be used
 | |
|    in new code.
 |