48960 Commits

Author SHA1 Message Date
Eduardo Habkost
41f3d4d69a target-i386: x86_cpu_load_features() function
When probing for CPU model information, we need to reuse the code
that initializes CPUID fields, but not the remaining side-effects
of x86_cpu_realizefn(). Move that code to a separate function
that can be reused later.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
e9e60febc4 target-i386: Unset cannot_destroy_with_object_finalize_yet
TYPE_X86_CPU now call cpu_exec_init() on realize, so we don't
need to set cannot_destroy_with_object_finalize_yet anymore.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
2a138ec3af target-i386/kvm: cache the return value of kvm_enable_x2apic()
Assume that KVM would have returned the same on subsequent runs.
Abstract the memoizaiton pattern into macros and call it memorize as
adding the r makes it less obscure.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
fb506e701e intel_iommu: reject broken EIM
Cluster x2APIC cannot work without KVM's x2apic API when the maximal
APIC ID is greater than 8 and only KVM's LAPIC can support x2APIC, so we
forbid other APICs and also the old KVM case with less than 9, to
simplify the code.

There is no point in enabling EIM in forbidden APICs, so we keep it
enabled only for the KVM APIC;  unconditionally, because making the
option depend on KVM version would be a maintanance burden.

Old QEMUs would enable eim whenever intremap was on, which would trick
guests into thinking that they can enable cluster x2APIC even if any
interrupt destination would get clamped to 8 bits.
Depending on your configuration, QEMU could notice that the destination
LAPIC is not present and report it with a very non-obvious:

  KVM: injection failed, MSI lost (Operation not permitted)

Or the guest could say something about unexpected interrupts, because
clamping leads to aliasing so interrupts were being delivered to
incorrect VCPUs.

KVM_X2APIC_API is the feature that allows us to enable EIM for KVM.

QEMU 2.7 allowed EIM whenever interrupt remapping was enabled.  In order
to keep backward compatibility, we again allow guests to misbehave in
non-obvious ways, and make it the default for old machine types.

A user can enable the buggy mode it with "x-buggy-eim=on".

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
e6b6af0560 intel_iommu: add OnOffAuto intr_eim as "eim" property
The default (auto) emulates the current behavior.
A user can now control EIM like
  -device intel-iommu,intremap=on,eim=off

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
6333e93c77 intel_iommu: redo configuraton check in realize
* there no point in configuring the device if realization is going to
  fail, so move the check to the beginning,
* create a separate function for the check,
* use error_setg() instead error_report().

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
329460191d intel_iommu: pass whole remapped addresses to apic
The MMIO interface to APIC only allowed 8 bit addresses, which is not
enough for 32 bit addresses from EIM remapping.
Intel stored upper 24 bits in the high MSI address, so use the same
technique. The technique is also used in KVM MSI interface.
Other APICs are unlikely to handle those upper bits.

Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
267ee35715 apic: add send_msi() to APICCommonClass
The MMIO based interface to APIC doesn't work well with MSIs that have
upper address bits set (remapped x2APIC MSIs).  A specialized interface
is a quick and dirty way to avoid the shortcoming.

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Radim Krčmář
2f114315dc apic: add global apic_get_class()
Every configuration has only up to one APIC class and we'll be extending
the class with a function that can be called without an instanced
object, so a direct access to the class is convenient.

This patch will break compilation if some code uses apic_get_class()
with CONFIG_USER_ONLY.

Suggested-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
8ca30e8673 target-i386: Move warning code outside x86_cpu_filter_features()
x86_cpu_filter_features() will be reused by code that shouldn't
print any warning. Move the warning code to a new
x86_cpu_report_filtered_features() function, and call it from
x86_cpu_realizefn().

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
9504e7100b qmp: Add runnability information to query-cpu-definitions
Add a new optional field to query-cpu-definitions schema:
"unavailable-features". It will contain a list of QOM properties
that prevent the CPU model from running in the current host.

Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: Michael Mueller <mimu@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Cornelia Huck <cornelia.huck@de.ibm.com>
Cc: Jiri Denemark <jdenemar@redhat.com>
Cc: libvir-list@redhat.com
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
e3c9022b4e target-i386: xsave: Add FP and SSE bits to x86_ext_save_areas
Instead of treating the FP and SSE bits as special cases, add
them to the x86_ext_save_areas array. This will simplify the code
that calculates the supported xsave components and the size of
the xsave area.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
16d2fcaa50 target-i386: Register properties for feature aliases manually
Instead of keeping the aliases inside the feature name arrays and
require parsing the strings, just register alias properties
manually. This simplifies the code for property registration and
lookup.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
fc7dfd205f target-i386: Remove underscores from feat_names arrays
Instead of translating the feature name entries when adding
property names, store the actual property names in the feature
name array.

For reference, here is the full list of functions that use
FeatureWordInfo::feat_names:

* x86_cpu_get_migratable_flags(): not affected, as it just
  check for non-NULL values.
* report_unavailable_features(): informative only. It will
  start printing feature names with hyphens.
* x86_cpu_list(): informative only. It will start printing
  feature names with hyphens
* x86_cpu_register_feature_bit_props(): not affected, as it
  was already calling feat2prop(). Now we can remove the
  feat2prop() calls safely.

So, the only user-visible effect of this patch are the new names
being used in help and error messages for users.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
2fae0d96e6 target-i386: Make plus_features/minus_features QOM-based
Instead of using custom feature name lookup code for
plus_features/minus_features, save the property names used in
"[+-]feature" and use object_property_set_bool() to set them.

We don't need a feat2prop() call because we now have alias
properties for the old names containing underscores.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
54b8dc7c19 target-i386: Register aliases for feature names with underscores
Registering the actual names containing underscores as aliases
will allow management software to be aware that the old
compatibility names are suported, and will make feat2prop() calls
unnecessary when using feature names.

Also, this will help us avoid making the code support underscores
on feature names that never had them in the first place. e.g.
"+tsc_deadline" was never supported and doesn't need to be
translated to "+tsc-deadline".

In other word: this will require less magic translation of
strings, and simple 1:1 match between the config options and
actual QOM properties.

Note that the underscores are still present in the
FeatureWordInfo::feat_names arrays, because
add_flagname_to_bitmaps() needs them to be kept. The next patches
will remove add_flagname_to_bitmaps() and will allow us to
finally remove the aliases from feat_names.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
04d99c3c61 target-i386: Disable VME by default with TCG
VME is already disabled automatically when using TCG. So, instead
of pretending it is there when reporting CPU model data on
query-cpu-* QMP commands (making every CPU model to be reported
as not runnable), we can disable it by default on all CPU models
when using TCG.

Do that by adding a tcg_default_props array that will work like
kvm_default_props.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
ee465a3ef7 target-i386: List CPU models using subclass list
Instead of using the builtin_x86_defs array, use the QOM subclass
list to list CPU models on "-cpu ?" and "query-cpu-definitions".

Signed-off-by: Andreas Färber <afaerber@suse.de>
[ehabkost: copied code from a patch by Andreas:
 "target-i386: QOM'ify CPU", from March 2012]
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Eduardo Habkost
a2f9976ea8 tests: Add test case for x86 feature parsing compatibility
Add a new test case to ensure the existing behavior of the
feature parsing code will be kept.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2016-10-17 15:44:49 -02:00
Peter Maydell
0975b8b823 This pull request contains:
- a patch to add a vdc->reset() handler to virtio-9p
 - a bunch of patches to fix various memory leaks (thanks to Li Qiang)
 - some code cleanups for 9pfs
 -----BEGIN PGP SIGNATURE-----
 
 iEYEABECAAYFAlgE59oACgkQAvw66wEB28IrzgCePLq8RVQHvxJKcx9CO1XQaXCE
 Dp4AoJm+CDVuaBd+cojoUmwGaLmZG8lb
 =8nIw
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/gkurz/tags/for-upstream' into staging

This pull request contains:
- a patch to add a vdc->reset() handler to virtio-9p
- a bunch of patches to fix various memory leaks (thanks to Li Qiang)
- some code cleanups for 9pfs

# gpg: Signature made Mon 17 Oct 2016 16:01:46 BST
# gpg:                using DSA key 0x02FC3AEB0101DBC2
# gpg: Good signature from "Greg Kurz <groug@kaod.org>"
# gpg:                 aka "Greg Kurz <groug@free.fr>"
# gpg:                 aka "Greg Kurz <gkurz@fr.ibm.com>"
# gpg:                 aka "Greg Kurz <gkurz@linux.vnet.ibm.com>"
# gpg:                 aka "Gregory Kurz (Groug) <groug@free.fr>"
# gpg:                 aka "Gregory Kurz (Cimai Technology) <gkurz@cimai.com>"
# gpg:                 aka "Gregory Kurz (Meiosys Technology) <gkurz@meiosys.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 2BD4 3B44 535E C0A7 9894  DBA2 02FC 3AEB 0101 DBC2

* remotes/gkurz/tags/for-upstream:
  9pfs: fix memory leak in v9fs_write
  9pfs: fix memory leak in v9fs_link
  9pfs: fix memory leak in v9fs_xattrcreate
  9pfs: fix information leak in xattr read
  virtio-9p: add reset handler
  9pfs: only free completed request if not flushed
  9pfs: drop useless check in pdu_free()
  9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]
  9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]
  9pfs: fsdev: drop useless extern annotation for functions
  9pfs: fix potential host memory leak in v9fs_read
  9pfs: allocate space for guest originated empty strings

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17 16:17:51 +01:00
Li Qiang
fdfcc9aeea 9pfs: fix memory leak in v9fs_write
If an error occurs when marshalling the transfer length to the guest, the
v9fs_write() function doesn't free an IO vector, thus leading to a memory
leak. This patch fixes the issue.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Li Qiang
4c1586787f 9pfs: fix memory leak in v9fs_link
The v9fs_link() function keeps a reference on the source fid object. This
causes a memory leak since the reference never goes down to 0. This patch
fixes the issue.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, rephrased the changelog]
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Li Qiang
ff55e94d23 9pfs: fix memory leak in v9fs_xattrcreate
The 'fs.xattr.value' field in V9fsFidState object doesn't consider the
situation that this field has been allocated previously. Every time, it
will be allocated directly. This leads to a host memory leak issue if
the client sends another Txattrcreate message with the same fid number
before the fid from the previous time got clunked.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
[groug, updated the changelog to indicate how the leak can occur]
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Li Qiang
eb68760285 9pfs: fix information leak in xattr read
9pfs uses g_malloc() to allocate the xattr memory space, if the guest
reads this memory before writing to it, this will leak host heap memory
to the guest. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Greg Kurz
0e44a0fd3f virtio-9p: add reset handler
Virtio devices should implement the VirtIODevice->reset() function to
perform necessary cleanup actions and to bring the device to a quiescent
state.

In the case of the virtio-9p device, this means:
- emptying the list of active PDUs (i.e. draining all in-flight I/O)
- freeing all fids (i.e. close open file descriptors and free memory)

That's what this patch does.

The reset handler first waits for all active PDUs to complete. Since
completion happens in the QEMU global aio context, we just have to
loop around aio_poll() until the active list is empty.

The freeing part involves some actions to be performed on the backend,
like closing file descriptors or flushing extended attributes to the
underlying filesystem. The virtfs_reset() function already does the
job: it calls free_fid() for all open fids not involved in an ongoing
I/O operation. We are sure this is the case since we have drained
the PDU active list.

The current code implements all backend accesses with coroutines, but we
want to stay synchronous on the reset path. We can either change the
current code to be able to run when not in coroutine context, or create
a coroutine context and wait for virtfs_reset() to complete. This patch
goes for the latter because it results in simpler code.

Note that we also need to create a dummy PDU because it is also an API
to pass the FsContext pointer to all backend callbacks.

Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-10-17 14:13:58 +02:00
Greg Kurz
f74e27bf0f 9pfs: only free completed request if not flushed
If a PDU has a flush request pending, the current code calls pdu_free()
twice:

1) pdu_complete()->pdu_free() with pdu->cancelled set, which does nothing

2) v9fs_flush()->pdu_free() with pdu->cancelled cleared, which moves the
   PDU back to the free list.

This works but it complexifies the logic of pdu_free().

With this patch, pdu_complete() only calls pdu_free() if no flush request
is pending, i.e. qemu_co_queue_next() returns false.

Since pdu_free() is now supposed to be called with pdu->cancelled cleared,
the check in pdu_free() is dropped and replaced by an assertion.

Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Greg Kurz
6868a420c5 9pfs: drop useless check in pdu_free()
Out of the three users of pdu_free(), none ever passes a NULL pointer to
this function.

Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Greg Kurz
8440e22ec1 9pfs: use coroutine_fn annotation in hw/9pfs/9p.[ch]
All these functions either call the v9fs_co_* functions which have the
coroutine_fn annotation, or pdu_complete() which calls qemu_co_queue_next().

Let's mark them to make it obvious they execute in coroutine context.

Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Greg Kurz
5bdade6621 9pfs: use coroutine_fn annotation in hw/9pfs/co*.[ch]
All these functions use the v9fs_co_run_in_worker() macro, and thus always
call qemu_coroutine_self() and qemu_coroutine_yield().

Let's mark them to make it obvious they execute in coroutine context.

Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Greg Kurz
bc70a5925f 9pfs: fsdev: drop useless extern annotation for functions
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Li Qiang
e95c9a493a 9pfs: fix potential host memory leak in v9fs_read
In 9pfs read dispatch function, it doesn't free two QEMUIOVector
object thus causing potential memory leak. This patch avoid this.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Li Qiang
ba42ebb863 9pfs: allocate space for guest originated empty strings
If a guest sends an empty string paramater to any 9P operation, the current
code unmarshals it into a V9fsString equal to { .size = 0, .data = NULL }.

This is unfortunate because it can cause NULL pointer dereference to happen
at various locations in the 9pfs code. And we don't want to check str->data
everywhere we pass it to strcmp() or any other function which expects a
dereferenceable pointer.

This patch enforces the allocation of genuine C empty strings instead, so
callers don't have to bother.

Out of all v9fs_iov_vunmarshal() users, only v9fs_xattrwalk() checks if
the returned string is empty. It now uses v9fs_string_size() since
name.data cannot be NULL anymore.

Signed-off-by: Li Qiang <liqiang6-s@360.cn>
[groug, rewritten title and changelog,
 fix empty string check in v9fs_xattrwalk()]
Signed-off-by: Greg Kurz <groug@kaod.org>
2016-10-17 14:13:58 +02:00
Peter Maydell
7bf59dfec4 ppc patch queue 2016-10-17
Highlights:
     * Significant rework of how PCI IO windows are placed for the
       pseries machine type
     * A number of extra tests added for ppc
     * Other tests clean up / fixed
     * Some cleanups to the XICS interrupt controller in preparation
       for the 'powernv' machine type
 
 A number of the test changes aren't strictly in ppc related code, but
 are included via my tree because they're primarily focused on
 improving test coverage for ppc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJYBDqhAAoJEGw4ysog2bOSVPAP/RlWYnOCTDiKuSCXz7+joHl9
 lY3K+x2r8DbFFmqxk82h+uObBG/dVQ7kcF+o3SD49dMUqoi/iD8rS5UFrDtArFKP
 vPh4h7KpVbQHyMWoHTjo1Zw94Sr2xEqRelQOrZjRA+a794kg6MKs3EIekHo9QdZl
 33qU0aE02LNOmguDwsTqTavrs/19qjcUM+fAyfOMEMiaQHNxdwrQjvldlIeELPka
 Dz/iOS5Lxibq2txmZWMxiMAuBe6lmJQWlcUUy+xXylQ5OE7CQctCfm2hsWWoMpGo
 PnhY+UC+9Ctqvb4TyTzqjllEmEfwK219091+hW7epWIUXGuaWde1RCvwRUr7pFGD
 DN4/75M82aNrpG0ydjzLpqbOwj/h+YvQARdKTTS2/4iJCrDPd5O2rR4Yjkrt1lcq
 jrSSnqsCeYET+4JY4G4h3ZZsPRcLnnpcNrUcF9AwkRCe1ybr4npK8FGSd4AGWAOR
 J7/mZqs7Gne4DjbjIzwfntP8ak5AASPqJKEmwjAO7M8zD0/xm6Ovqbmo8Xte9Vxx
 ge4nBcAhoJJ8y1hiZNLOyy1d87WUiD84MiN/BuSK7UbeBVsfHvYWAcEsVGXFzaP0
 hXwLHxddmflU7gy6hTbQ/f9SQiaobphCaP3uM4fLIOzAn64EIELCDvRwlr6NakRW
 CbJWqMsNz/WblA/ZSlA3
 =tnCe
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.8-20161017' into staging

ppc patch queue 2016-10-17

Highlights:
    * Significant rework of how PCI IO windows are placed for the
      pseries machine type
    * A number of extra tests added for ppc
    * Other tests clean up / fixed
    * Some cleanups to the XICS interrupt controller in preparation
      for the 'powernv' machine type

A number of the test changes aren't strictly in ppc related code, but
are included via my tree because they're primarily focused on
improving test coverage for ppc.

# gpg: Signature made Mon 17 Oct 2016 03:42:41 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg:                 aka "David Gibson (kernel.org) <dwg@kernel.org>"
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.8-20161017:
  spapr: Improved placement of PCI host bridges in guest memory map
  spapr_pci: Add a 64-bit MMIO window
  spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
  spapr_pci: Delegate placement of PCI host bridges to machine type
  libqos: Limit spapr-pci to 32-bit MMIO for now
  libqos: Correct error in PCI hole sizing for spapr
  libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
  ppc/xics: Split ICS into ics-base and ics class
  ppc/xics: Make the ICSState a list
  spapr: fix inheritance chain for default machine options
  target-ppc: implement vexts[bh]2w and vexts[bhw]2d
  tests/boot-sector: Increase time-out to 90 seconds
  tests/boot-sector: Use mkstemp() to create a unique file name
  tests/boot-sector: Use minimum length for the Forth boot script
  qtest: ask endianness of the target in qtest_init()
  tests: minor cleanups in usb-hcd-uhci-test

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17 12:59:54 +01:00
Peter Maydell
ad728364e3 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
 
 iQEcBAABCAAGBQJYBDKcAAoJEMo1YkxqkXHGUUUH/A3xe/y54QGsQQxLpH0Ek1Vk
 NX1Wzu9Q+J8xqyT8qf1Yp/BtL9fnDQKT2deZTx09zaIRjUKqO9JGdXmOVg/zFFAs
 0JynVVFNwO9kDAEHtsbBvc3pvfpdONKj2muCHdG3KiVxP5YH6MiuW5waQ9ujyUKb
 nrW8+oWM/seoCgouBAC6YtIr7lq25NkXxuIL95SBi/yskAjKiPgpQjn+II8tUDTU
 ZF35KNoeFLAw/CgdMZpu2MVZZwod5HDxpBbY1EHIWsboGX/iO3R+CuTyVdSJ6kna
 B9Oi1U4A58OMVCZATpqRIdW/xxjaWs3b+kM1uMIJIzhR9Sk2kWFPweaznwGHc2Y=
 =FdKf
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/famz/tags/for-upstream' into staging

# gpg: Signature made Mon 17 Oct 2016 03:08:28 BST
# gpg:                using RSA key 0xCA35624C6A9171C6
# gpg: Good signature from "Fam Zheng <famz@redhat.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 5003 7CB7 9706 0F76 F021  AD56 CA35 624C 6A91 71C6

* remotes/famz/tags/for-upstream:
  tests/docker/Makefile.include: add a generic docker-run target
  tests/docker: make test-mingw honour TARGET_LIST
  tests/docker: test-build script
  tests/docker: add travis dockerfile

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17 11:56:18 +01:00
Peter Maydell
4378caf59e migration/next for 20161014
-----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJYAPidAAoJEPSH7xhYctcjgakP/0M4gXnf5HpEPrF2y5fWOrds
 9/65UEdkg466tyshu8gY2YWjNNLh65psNiD1eDwwUbOBydapAai8dJiaqlO4/vvz
 le0ctBMv34FUQ4bTYMpf9ZaKK3gplNQWZp70DXZ7d4LKNNPPbqiCK1vaMwZkT2bM
 3cal8IsWaDVcUnN3C85FlgGuS43IDNmbzMd7/WZKNZX/u9dcn/Fg1LMMmpIMl1IM
 o+i9jJv4MAyBqTQ4FMcvGJZ6uFVHz/GftYVCshCG6acU1Qoeq3KsInk41Ev0N8wO
 RRWOelbcSlug1HshQbKiJcEFajZT/2zIdMMA+xAmwTnCRufH8OMeQAAByRjArQLZ
 bfuZZpdRSQdhQwEt5Tad7t+oPDFXYWkiNiZo8CQFnXRSwMiKP0XCW7clphCUNeCV
 tyG5njvHuM1WqsXG0qhSVXUWvshSk4XBlNgiCn7Et1sM0mZNEnNaKGl6TCJwWaO3
 pbm021zxyY7jJl3FDptnRdqhUJHqLqnmj5A0kIVMzsTTz1LCliyhjSmtVBfpi3bu
 dTPYheUvsvOrWgcFwEBlcFMjXWHRB6UqEHeRm4eKnoB3ewkMPfW2sl5W+gCwAoGj
 aINiBIHPm0plfnWnv+D7AKzWolHsyM5aHdLUjrnFfDqahqZc+zGib63c7A1Kn4Fv
 2+OTc+a9ckGC7VahqS0E
 =Gd97
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20161014' into staging

migration/next for 20161014

# gpg: Signature made Fri 14 Oct 2016 16:24:13 BST
# gpg:                using RSA key 0xF487EF185872D723
# gpg: Good signature from "Juan Quintela <quintela@redhat.com>"
# gpg:                 aka "Juan Quintela <quintela@trasno.org>"
# Primary key fingerprint: 1899 FF8E DEBF 58CC EE03  4B82 F487 EF18 5872 D723

* remotes/juanquintela/tags/migration/20161014:
  docs/xbzrle: correction
  migrate: move max-bandwidth and downtime-limit to migrate_set_parameter
  migration: Fix seg with missing port
  migration/postcopy: Explicitly disallow huge pages
  RAMBlocks: Store page size
  Postcopy vs xbzrle: Don't send xbzrle pages once in postcopy [for 2.8]
  migrate: Fix bounds check for migration parameters in migration.c
  migrate: Use boxed qapi for migrate-set-parameters
  migrate: Share common MigrationParameters struct
  migrate: Fix cpu-throttle-increment regression in HMP
  migration/rdma: Don't flag an error when we've been told about one
  migration: Make failed migration load set file error
  migration/rdma: Pass qemu_file errors across link
  migration: Report values for comparisons
  migration: report an error giving the failed field

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-10-17 10:31:10 +01:00
Alex Bennée
e86c9a64f4 tests/docker/Makefile.include: add a generic docker-run target
This re-factors the docker makefile to include a docker-run target which
can be controlled entirely from environment variables specified on the
make command line. This allows us to run against any given docker image
we may have in our repository, for example:

    make docker-run TEST="test-quick" IMAGE="debian:arm64" \
         EXECUTABLE=./aarch64-linux-user/qemu-aarch64

The existing docker-foo@bar targets still work but the inline
verification has been dropped because we already don't hit that due to
other pattern rules in rules.mak.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Message-Id: <20161011161625.9070-5-alex.bennee@linaro.org>
Message-Id: <20161011161625.9070-6-alex.bennee@linaro.org>
[Squash in the verification removal patch. - Fam]
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17 10:05:48 +08:00
Alex Bennée
86a17cb3f4 tests/docker: make test-mingw honour TARGET_LIST
The other builders honour this variable, so should the mingw build.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161011161625.9070-4-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17 10:05:48 +08:00
Alex Bennée
bdecba6e97 tests/docker: test-build script
Much like test-quick but only builds. This is useful for some of the
build targets like ThreadSanitizer that don't yet pass "make check".

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>

Message-Id: <20161011161625.9070-3-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17 10:05:48 +08:00
Alex Bennée
8b9b3177a2 tests/docker: add travis dockerfile
This target grabs the latest Travis containers from their repository at
quay.io and then installs QEMU's build dependencies. With this it is
possible to run on broadly the same setup as they have on travis-ci.org.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20161011161625.9070-2-alex.bennee@linaro.org>
Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-17 10:05:48 +08:00
David Gibson
357d1e3bc7 spapr: Improved placement of PCI host bridges in guest memory map
Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:04:15 +11:00
David Gibson
daa2369903 spapr_pci: Add a 64-bit MMIO window
On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson
2efff1c0dd spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM
Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space.  This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson
6737d9ad79 spapr_pci: Delegate placement of PCI host bridges to machine type
The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows.  This gives the most control, but is very awkward with 6
mandatory parameters.  Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO).  Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done.  It doesn't
change the actual location of the host bridges, or any other behaviour.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson
8360544a6d libqos: Limit spapr-pci to 32-bit MMIO for now
Currently the functions in pci-spapr.c (like pci-pc.c on which it's based)
don't distinguish between 32-bit and 64-bit PCI MMIO.  At the moment, the
qemu side implementation is a bit weird and has a single MMIO window
straddling 32-bit and 64-bit regions, but we're likely to change that in
future.

In any case, pci-pc.c - and therefore the testcases using PCI - only handle
32-bit MMIOs for now.  For spapr despite whatever changes might happen with
the MMIO windows, the 32-bit window is likely to remain at 2..4 GiB in PCI
space.

So, explicitly limit pci-spapr.c to 32-bit MMIOs for now, we can add 64-bit
MMIO support back in when and if we need it.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson
c711369087 libqos: Correct error in PCI hole sizing for spapr
In pci-spapr.c (as in pci-pc.c from which it was derived), the
pci_hole_start/pci_hole_size and pci_iohole_start/pci_iohole_size pairs[1]
essentially define the region of PCI (not CPU) addresses in which MMIO
or PIO BARs respectively will be allocated.

The size value is relative to the start value.  But in pci-spapr.c it is
set to the entire size of the window supported by the (emulated) hardware,
but the start values are *not* at the beginning of the emulated windows.

That means if you tried to map enough PCI BARs, we'd messily overrun the
IO windows, instead of failing in iomap as we should.

This patch corrects this by calculating the hole sizes from the location
of the window in PCI space and the hole start.

[1] Those are bad names, but that's a problem for another time.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
David Gibson
cd1b354ec0 libqos: Isolate knowledge of spapr memory map to qpci_init_spapr()
The libqos code for accessing PCI on the spapr machine type uses IOBASE()
and MMIOBASE() macros to determine the address in the CPU memory map of
the windows to PCI address space.

This is a detail of the implementation of PCI in the machine type, it's not
specified by the PAPR standard.  Real guests would get the addresses of the
PCI windows from the device tree.

Finding the device tree in libqos would be awkward, but we can at least
localize this knowledge of the implementation to the init function, saving
it in the QPCIBusSPAPR structure for use by the accessors.

That leaves only one place to fix if we alter the location of the PCI
windows, as we're planning to do.

Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
2016-10-16 12:03:09 +11:00
Benjamin Herrenschmidt
d4d7a59a7a ppc/xics: Split ICS into ics-base and ics class
The existing implementation remains same and ics-base is introduced. The
type name "ics" is retained, and all the related functions renamed as
ics_simple_*

This will allow different implementations for the source controllers
such as the MSI support of PHB3 on Power8 which uses in-memory state
tables for example.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
[ clg: added ICS_BASE_GET_CLASS and related fixes, based on :
       http://patchwork.ozlabs.org/patch/646010/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 16:31:02 +11:00
Benjamin Herrenschmidt
cc706a5305 ppc/xics: Make the ICSState a list
Instead of an array of fixed sized blocks, use a list, as we will need
to have sources with variable number of interrupts. SPAPR only uses
a single entry. Native will create more. If performance becomes an
issue we can add some hashed lookup but for now this will do fine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[ move the initialization of list to xics_common_initfn,
  restore xirr_owner after migration and move restoring to
  icp_post_load]
Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
[ clg: removed the icp_post_load() changes from nikunj patchset v3:
       http://patchwork.ozlabs.org/patch/646008/ ]
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 16:31:02 +11:00
Michael Roth
672de881e9 spapr: fix inheritance chain for default machine options
Rather than machine instances having backward-compatible option
defaults that need to be repeatedly re-enabled for every new machine
type we introduce, we set the defaults appropriate for newer machine
types, then add code to explicitly disable instance options as needed
to maintain compatibility with older machine types.

Currently pseries-2.5 does not inherit from pseries-2.6 in this
fashion, which is okay at the moment since we do not have any
instance compatibility options for pseries-2.6+ currently.

We will make use of this in future patches though, so fix it here.

Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Extended to make 2.7 inherit from 2.8 as well]
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 15:33:32 +11:00
Nikunj A Dadhania
125a9b2327 target-ppc: implement vexts[bh]2w and vexts[bhw]2d
Vector Extend Sign Instructions:

vextsb2w: Vector Extend Sign Byte To Word
vextsh2w: Vector Extend Sign Halfword To Word
vextsb2d: Vector Extend Sign Byte To Doubleword
vextsh2d: Vector Extend Sign Halfword To Doubleword
vextsw2d: Vector Extend Sign Word To Doubleword

Signed-off-by: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-10-14 10:06:47 +11:00