315 lines
13 KiB
ReStructuredText
315 lines
13 KiB
ReStructuredText
|
=====================================
|
||
|
The PDB TPI and IPI Streams
|
||
|
=====================================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
|
||
|
.. _tpi_intro:
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
The PDB TPI Stream (Index 2) and IPI Stream (Index 4) contain information about
|
||
|
all types used in the program. It is organized as a :ref:`header <tpi_header>`
|
||
|
followed by a list of :doc:`CodeView Type Records <CodeViewTypes>`. Types are
|
||
|
referenced from various streams and records throughout the PDB by their
|
||
|
:ref:`type index <type_indices>`. In general, the sequence of type records
|
||
|
following the :ref:`header <tpi_header>` forms a topologically sorted DAG
|
||
|
(directed acyclic graph), which means that a type record B can only refer to
|
||
|
the type A if ``A.TypeIndex < B.TypeIndex``. While there are rare cases where
|
||
|
this property will not hold (particularly when dealing with object files
|
||
|
compiled with MASM), an implementation should try very hard to make this
|
||
|
property hold, as it means the entire type graph can be constructed in a single
|
||
|
pass.
|
||
|
|
||
|
.. important::
|
||
|
Type records form a topologically sorted DAG (directed acyclic graph).
|
||
|
|
||
|
.. _tpi_ipi:
|
||
|
|
||
|
TPI vs IPI Stream
|
||
|
=================
|
||
|
|
||
|
Recent versions of the PDB format (aka all versions covered by this document)
|
||
|
have 2 streams with identical layout, henceforth referred to as the TPI stream
|
||
|
and IPI stream. Subsequent contents of this document describing the on-disk
|
||
|
format apply equally whether it is for the TPI Stream or the IPI Stream. The
|
||
|
only difference between the two is in *which* CodeView records are allowed to
|
||
|
appear in each one, summarized by the following table:
|
||
|
|
||
|
+----------------------+---------------------+
|
||
|
| TPI Stream | IPI Stream |
|
||
|
+======================+=====================+
|
||
|
| LF_POINTER | LF_FUNC_ID |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_MODIFIER | LF_MFUNC_ID |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_PROCEDURE | LF_BUILDINFO |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_MFUNCTION | LF_SUBSTR_LIST |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_LABEL | LF_STRING_ID |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_ARGLIST | LF_UDT_SRC_LINE |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_FIELDLIST | LF_UDT_MOD_SRC_LINE |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_ARRAY | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_CLASS | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_STRUCTURE | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_INTERFACE | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_UNION | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_ENUM | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_TYPESERVER2 | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_VFTABLE | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_VTSHAPE | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_BITFIELD | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_METHODLIST | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_PRECOMP | |
|
||
|
+----------------------+---------------------+
|
||
|
| LF_ENDPRECOMP | |
|
||
|
+----------------------+---------------------+
|
||
|
|
||
|
The usage of these records is described in more detail in
|
||
|
:doc:`CodeView Type Records <CodeViewTypes>`.
|
||
|
|
||
|
.. _type_indices:
|
||
|
|
||
|
Type Indices
|
||
|
============
|
||
|
|
||
|
A type index is a 32-bit integer that uniquely identifies a type inside of an
|
||
|
object file's ``.debug$T`` section or a PDB file's TPI or IPI stream. The
|
||
|
value of the type index for the first type record from the TPI stream is given
|
||
|
by the ``TypeIndexBegin`` member of the :ref:`TPI Stream Header <tpi_header>`
|
||
|
although in practice this value is always equal to 0x1000 (4096).
|
||
|
|
||
|
Any type index with a high bit set is considered to come from the IPI stream,
|
||
|
although this appears to be more of a hack, and LLVM does not generate type
|
||
|
indices of this nature. They can, however, be observed in Microsoft PDBs
|
||
|
occasionally, so one should be prepared to handle them. Note that having the
|
||
|
high bit set is not a necessary condition to determine whether a type index
|
||
|
comes from the IPI stream, it is only sufficient.
|
||
|
|
||
|
Once the high bit is cleared, any type index >= ``TypeIndexBegin`` is presumed
|
||
|
to come from the appropriate stream, and any type index less than this is a
|
||
|
bitmask which can be decomposed as follows:
|
||
|
|
||
|
.. code-block:: none
|
||
|
|
||
|
.---------------------------.------.----------.
|
||
|
| Unused | Mode | Kind |
|
||
|
'---------------------------'------'----------'
|
||
|
|+32 |+12 |+8 |+0
|
||
|
|
||
|
|
||
|
- **Kind** - A value from the following enum:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
enum class SimpleTypeKind : uint32_t {
|
||
|
None = 0x0000, // uncharacterized type (no type)
|
||
|
Void = 0x0003, // void
|
||
|
NotTranslated = 0x0007, // type not translated by cvpack
|
||
|
HResult = 0x0008, // OLE/COM HRESULT
|
||
|
|
||
|
SignedCharacter = 0x0010, // 8 bit signed
|
||
|
UnsignedCharacter = 0x0020, // 8 bit unsigned
|
||
|
NarrowCharacter = 0x0070, // really a char
|
||
|
WideCharacter = 0x0071, // wide char
|
||
|
Character16 = 0x007a, // char16_t
|
||
|
Character32 = 0x007b, // char32_t
|
||
|
|
||
|
SByte = 0x0068, // 8 bit signed int
|
||
|
Byte = 0x0069, // 8 bit unsigned int
|
||
|
Int16Short = 0x0011, // 16 bit signed
|
||
|
UInt16Short = 0x0021, // 16 bit unsigned
|
||
|
Int16 = 0x0072, // 16 bit signed int
|
||
|
UInt16 = 0x0073, // 16 bit unsigned int
|
||
|
Int32Long = 0x0012, // 32 bit signed
|
||
|
UInt32Long = 0x0022, // 32 bit unsigned
|
||
|
Int32 = 0x0074, // 32 bit signed int
|
||
|
UInt32 = 0x0075, // 32 bit unsigned int
|
||
|
Int64Quad = 0x0013, // 64 bit signed
|
||
|
UInt64Quad = 0x0023, // 64 bit unsigned
|
||
|
Int64 = 0x0076, // 64 bit signed int
|
||
|
UInt64 = 0x0077, // 64 bit unsigned int
|
||
|
Int128Oct = 0x0014, // 128 bit signed int
|
||
|
UInt128Oct = 0x0024, // 128 bit unsigned int
|
||
|
Int128 = 0x0078, // 128 bit signed int
|
||
|
UInt128 = 0x0079, // 128 bit unsigned int
|
||
|
|
||
|
Float16 = 0x0046, // 16 bit real
|
||
|
Float32 = 0x0040, // 32 bit real
|
||
|
Float32PartialPrecision = 0x0045, // 32 bit PP real
|
||
|
Float48 = 0x0044, // 48 bit real
|
||
|
Float64 = 0x0041, // 64 bit real
|
||
|
Float80 = 0x0042, // 80 bit real
|
||
|
Float128 = 0x0043, // 128 bit real
|
||
|
|
||
|
Complex16 = 0x0056, // 16 bit complex
|
||
|
Complex32 = 0x0050, // 32 bit complex
|
||
|
Complex32PartialPrecision = 0x0055, // 32 bit PP complex
|
||
|
Complex48 = 0x0054, // 48 bit complex
|
||
|
Complex64 = 0x0051, // 64 bit complex
|
||
|
Complex80 = 0x0052, // 80 bit complex
|
||
|
Complex128 = 0x0053, // 128 bit complex
|
||
|
|
||
|
Boolean8 = 0x0030, // 8 bit boolean
|
||
|
Boolean16 = 0x0031, // 16 bit boolean
|
||
|
Boolean32 = 0x0032, // 32 bit boolean
|
||
|
Boolean64 = 0x0033, // 64 bit boolean
|
||
|
Boolean128 = 0x0034, // 128 bit boolean
|
||
|
};
|
||
|
|
||
|
- **Mode** - A value from the following enum:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
enum class SimpleTypeMode : uint32_t {
|
||
|
Direct = 0, // Not a pointer
|
||
|
NearPointer = 1, // Near pointer
|
||
|
FarPointer = 2, // Far pointer
|
||
|
HugePointer = 3, // Huge pointer
|
||
|
NearPointer32 = 4, // 32 bit near pointer
|
||
|
FarPointer32 = 5, // 32 bit far pointer
|
||
|
NearPointer64 = 6, // 64 bit near pointer
|
||
|
NearPointer128 = 7 // 128 bit near pointer
|
||
|
};
|
||
|
|
||
|
Note that for pointers, the bitness is represented in the mode. So a ``void*``
|
||
|
would have a type index with ``Mode=NearPointer32, Kind=Void`` if built for
|
||
|
32-bits but a type index with ``Mode=NearPointer64, Kind=Void`` if built for
|
||
|
64-bits.
|
||
|
|
||
|
By convention, the type index for ``std::nullptr_t`` is constructed the same
|
||
|
way as the type index for ``void*``, but using the bitless enumeration value
|
||
|
``NearPointer``.
|
||
|
|
||
|
.. _tpi_header:
|
||
|
|
||
|
Stream Header
|
||
|
=============
|
||
|
At offset 0 of the TPI Stream is a header with the following layout:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct TpiStreamHeader {
|
||
|
uint32_t Version;
|
||
|
uint32_t HeaderSize;
|
||
|
uint32_t TypeIndexBegin;
|
||
|
uint32_t TypeIndexEnd;
|
||
|
uint32_t TypeRecordBytes;
|
||
|
|
||
|
uint16_t HashStreamIndex;
|
||
|
uint16_t HashAuxStreamIndex;
|
||
|
uint32_t HashKeySize;
|
||
|
uint32_t NumHashBuckets;
|
||
|
|
||
|
int32_t HashValueBufferOffset;
|
||
|
uint32_t HashValueBufferLength;
|
||
|
|
||
|
int32_t IndexOffsetBufferOffset;
|
||
|
uint32_t IndexOffsetBufferLength;
|
||
|
|
||
|
int32_t HashAdjBufferOffset;
|
||
|
uint32_t HashAdjBufferLength;
|
||
|
};
|
||
|
|
||
|
- **Version** - A value from the following enum.
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
enum class TpiStreamVersion : uint32_t {
|
||
|
V40 = 19950410,
|
||
|
V41 = 19951122,
|
||
|
V50 = 19961031,
|
||
|
V70 = 19990903,
|
||
|
V80 = 20040203,
|
||
|
};
|
||
|
|
||
|
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
|
||
|
``V80``, and no other values have been observed. It is assumed that should
|
||
|
another value be observed, the layout described by this document may not be
|
||
|
accurate.
|
||
|
|
||
|
- **HeaderSize** - ``sizeof(TpiStreamHeader)``
|
||
|
|
||
|
- **TypeIndexBegin** - The numeric value of the type index representing the
|
||
|
first type record in the TPI stream. This is usually the value 0x1000 as
|
||
|
type indices lower than this are reserved (see :ref:`Type Indices
|
||
|
<type_indices>` for
|
||
|
a discussion of reserved type indices).
|
||
|
|
||
|
- **TypeIndexEnd** - One greater than the numeric value of the type index
|
||
|
representing the last type record in the TPI stream. The total number of
|
||
|
type records in the TPI stream can be computed as ``TypeIndexEnd -
|
||
|
TypeIndexBegin``.
|
||
|
|
||
|
- **TypeRecordBytes** - The number of bytes of type record data following the
|
||
|
header.
|
||
|
|
||
|
- **HashStreamIndex** - The index of a stream which contains a list of hashes
|
||
|
for every type record. This value may be -1, indicating that hash
|
||
|
information is not present. In practice a valid stream index is always
|
||
|
observed, so any producer implementation should be prepared to emit this
|
||
|
stream to ensure compatibility with tools which may expect it to be present.
|
||
|
|
||
|
- **HashAuxStreamIndex** - Presumably the index of a stream which contains a
|
||
|
separate hash table, although this has not been observed in practice and it's
|
||
|
unclear what it might be used for.
|
||
|
|
||
|
- **HashKeySize** - The size of a hash value (usually 4 bytes).
|
||
|
|
||
|
- **NumHashBuckets** - The number of buckets used to generate the hash values
|
||
|
in the aforementioned hash streams.
|
||
|
|
||
|
- **HashValueBufferOffset / HashValueBufferLength** - The offset and size within
|
||
|
the TPI Hash Stream of the list of hash values. It should be assumed that
|
||
|
there are either 0 hash values, or a number equal to the number of type
|
||
|
records in the TPI stream (``TypeIndexEnd - TypeEndBegin``). Thus, if
|
||
|
``HashBufferLength`` is not equal to ``(TypeIndexEnd - TypeEndBegin) *
|
||
|
HashKeySize`` we can consider the PDB malformed.
|
||
|
|
||
|
- **IndexOffsetBufferOffset / IndexOffsetBufferLength** - The offset and size
|
||
|
within the TPI Hash Stream of the Type Index Offsets Buffer. This is a list
|
||
|
of pairs of uint32_t's where the first value is a :ref:`Type Index
|
||
|
<type_indices>` and the second value is the offset in the type record data of
|
||
|
the type with this index. This can be used to do a binary search followed by
|
||
|
a linear search to get O(log n) lookup by type index.
|
||
|
|
||
|
- **HashAdjBufferOffset / HashAdjBufferLength** - The offset and size within
|
||
|
the TPI hash stream of a serialized hash table whose keys are the hash values
|
||
|
in the hash value buffer and whose values are type indices. This appears to
|
||
|
be useful in incremental linking scenarios, so that if a type is modified an
|
||
|
entry can be created mapping the old hash value to the new type index so that
|
||
|
a PDB file consumer can always have the most up to date version of the type
|
||
|
without forcing the incremental linker to garbage collect and update
|
||
|
references that point to the old version to now point to the new version.
|
||
|
The layout of this hash table is described in :doc:`HashTable`.
|
||
|
|
||
|
.. _tpi_records:
|
||
|
|
||
|
CodeView Type Record List
|
||
|
=========================
|
||
|
Following the header, there are ``TypeRecordBytes`` bytes of data that
|
||
|
represent a variable length array of :doc:`CodeView type records
|
||
|
<CodeViewTypes>`. The number of such records (e.g. the length of the array)
|
||
|
can be determined by computing the value ``Header.TypeIndexEnd -
|
||
|
Header.TypeIndexBegin``.
|
||
|
|
||
|
O(log(n)) access is provided by way of the Type Index Offsets array (if
|
||
|
present) described previously.
|