466 lines
19 KiB
ReStructuredText
466 lines
19 KiB
ReStructuredText
|
=====================================
|
||
|
The PDB DBI (Debug Info) Stream
|
||
|
=====================================
|
||
|
|
||
|
.. contents::
|
||
|
:local:
|
||
|
|
||
|
.. _dbi_intro:
|
||
|
|
||
|
Introduction
|
||
|
============
|
||
|
|
||
|
The PDB DBI Stream (Index 3) is one of the largest and most important streams
|
||
|
in a PDB file. It contains information about how the program was compiled,
|
||
|
(e.g. compilation flags, etc), the compilands (e.g. object files) that
|
||
|
were used to link together the program, the source files which were used
|
||
|
to build the program, as well as references to other streams that contain more
|
||
|
detailed information about each compiland, such as the CodeView symbol records
|
||
|
contained within each compiland and the source and line information for
|
||
|
functions and other symbols within each compiland.
|
||
|
|
||
|
|
||
|
.. _dbi_header:
|
||
|
|
||
|
Stream Header
|
||
|
=============
|
||
|
At offset 0 of the DBI Stream is a header with the following layout:
|
||
|
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct DbiStreamHeader {
|
||
|
int32_t VersionSignature;
|
||
|
uint32_t VersionHeader;
|
||
|
uint32_t Age;
|
||
|
uint16_t GlobalStreamIndex;
|
||
|
uint16_t BuildNumber;
|
||
|
uint16_t PublicStreamIndex;
|
||
|
uint16_t PdbDllVersion;
|
||
|
uint16_t SymRecordStream;
|
||
|
uint16_t PdbDllRbld;
|
||
|
int32_t ModInfoSize;
|
||
|
int32_t SectionContributionSize;
|
||
|
int32_t SectionMapSize;
|
||
|
int32_t SourceInfoSize;
|
||
|
int32_t TypeServerMapSize;
|
||
|
uint32_t MFCTypeServerIndex;
|
||
|
int32_t OptionalDbgHeaderSize;
|
||
|
int32_t ECSubstreamSize;
|
||
|
uint16_t Flags;
|
||
|
uint16_t Machine;
|
||
|
uint32_t Padding;
|
||
|
};
|
||
|
|
||
|
- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
|
||
|
|
||
|
- **VersionHeader** - A value from the following enum.
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
enum class DbiStreamVersion : uint32_t {
|
||
|
VC41 = 930803,
|
||
|
V50 = 19960307,
|
||
|
V60 = 19970606,
|
||
|
V70 = 19990903,
|
||
|
V110 = 20091201
|
||
|
};
|
||
|
|
||
|
Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
|
||
|
``V70``, and it is not clear what the other values are for.
|
||
|
|
||
|
- **Age** - The number of times the PDB has been written. Equal to the same
|
||
|
field from the :ref:`PDB Stream header <pdb_stream_header>`.
|
||
|
|
||
|
- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
|
||
|
which contains CodeView symbol records for all global symbols. Actual records
|
||
|
are stored in the symbol record stream, and are referenced from this stream.
|
||
|
|
||
|
- **BuildNumber** - A bitfield containing values representing the major and minor
|
||
|
version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
|
||
|
program, with the following layout:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
uint16_t MinorVersion : 8;
|
||
|
uint16_t MajorVersion : 7;
|
||
|
uint16_t NewVersionFormat : 1;
|
||
|
|
||
|
For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
|
||
|
If it is ``false``, the layout above does not apply and the reader should consult
|
||
|
the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
|
||
|
further guidance.
|
||
|
|
||
|
- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
|
||
|
which contains CodeView symbol records for all public symbols. Actual records
|
||
|
are stored in the symbol record stream, and are referenced from this stream.
|
||
|
|
||
|
- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
|
||
|
PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
|
||
|
|
||
|
- **SymRecordStream** - The stream containing all CodeView symbol records used
|
||
|
by the program. This is used for deduplication, so that many different
|
||
|
compilands can refer to the same symbols without having to include the full record
|
||
|
content inside of each module stream.
|
||
|
|
||
|
- **PdbDllRbld** - Unknown
|
||
|
|
||
|
- **MFCTypeServerIndex** - The index of the MFC type server in the
|
||
|
:ref:`dbi_type_server_map_substream`.
|
||
|
|
||
|
- **Flags** - A bitfield with the following layout, containing various
|
||
|
information about how the program was built:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
uint16_t WasIncrementallyLinked : 1;
|
||
|
uint16_t ArePrivateSymbolsStripped : 1;
|
||
|
uint16_t HasConflictingTypes : 1;
|
||
|
uint16_t Reserved : 13;
|
||
|
|
||
|
The only one of these that is not self-explanatory is ``HasConflictingTypes``.
|
||
|
Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
|
||
|
If it is passed to ``link.exe``, this field will be set. Otherwise it will
|
||
|
not be set. It is unclear what this flag does, although it seems to have
|
||
|
subtle implications on the algorithm used to look up type records.
|
||
|
|
||
|
- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
|
||
|
enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
|
||
|
|
||
|
Immediately after the fixed-size DBI Stream header are ``7`` variable-length
|
||
|
`substreams`. The following ``7`` fields of the DBI Stream header specify the
|
||
|
number of bytes of the corresponding substream. Each substream's contents will
|
||
|
be described in detail :ref:`below <dbi_substreams>`. The length of the entire
|
||
|
DBI Stream should equal ``64`` (the length of the header above) plus the value
|
||
|
of each of the following ``7`` fields.
|
||
|
|
||
|
- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
|
||
|
|
||
|
- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
|
||
|
|
||
|
- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
|
||
|
|
||
|
- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
|
||
|
|
||
|
- **TypeServerMapSize** - The length of the :ref:`dbi_type_server_map_substream`.
|
||
|
|
||
|
- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
|
||
|
|
||
|
- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
|
||
|
|
||
|
.. _dbi_substreams:
|
||
|
|
||
|
Substreams
|
||
|
==========
|
||
|
|
||
|
.. _dbi_mod_info_substream:
|
||
|
|
||
|
Module Info Substream
|
||
|
^^^^^^^^^^^^^^^^^^^^^
|
||
|
|
||
|
Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The
|
||
|
module info substream is an array of variable-length records, each one
|
||
|
describing a single module (e.g. object file) linked into the program. Each
|
||
|
record in the array has the format:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct ModInfo {
|
||
|
uint32_t Unused1;
|
||
|
struct SectionContribEntry {
|
||
|
uint16_t Section;
|
||
|
char Padding1[2];
|
||
|
int32_t Offset;
|
||
|
int32_t Size;
|
||
|
uint32_t Characteristics;
|
||
|
uint16_t ModuleIndex;
|
||
|
char Padding2[2];
|
||
|
uint32_t DataCrc;
|
||
|
uint32_t RelocCrc;
|
||
|
} SectionContr;
|
||
|
uint16_t Flags;
|
||
|
uint16_t ModuleSymStream;
|
||
|
uint32_t SymByteSize;
|
||
|
uint32_t C11ByteSize;
|
||
|
uint32_t C13ByteSize;
|
||
|
uint16_t SourceFileCount;
|
||
|
char Padding[2];
|
||
|
uint32_t Unused2;
|
||
|
uint32_t SourceFileNameIndex;
|
||
|
uint32_t PdbFilePathNameIndex;
|
||
|
char ModuleName[];
|
||
|
char ObjFileName[];
|
||
|
};
|
||
|
|
||
|
- **SectionContr** - Describes the properties of the section in the final binary
|
||
|
which contain the code and data from this module.
|
||
|
|
||
|
``SectionContr.Characteristics`` corresponds to the ``Characteristics`` field
|
||
|
of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
|
||
|
structure.
|
||
|
|
||
|
|
||
|
- **Flags** - A bitfield with the following format:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
// ``true`` if this ModInfo has been written since reading the PDB. This is
|
||
|
// likely used to support incremental linking, so that the linker can decide
|
||
|
// if it needs to commit changes to disk.
|
||
|
uint16_t Dirty : 1;
|
||
|
// ``true`` if EC information is present for this module. EC is presumed to
|
||
|
// stand for "Edit & Continue", which LLVM does not support. So this flag
|
||
|
// will always be be false.
|
||
|
uint16_t EC : 1;
|
||
|
uint16_t Unused : 6;
|
||
|
// Type Server Index for this module. This is assumed to be related to /Zi,
|
||
|
// but as LLVM treats /Zi as /Z7, this field will always be invalid for LLVM
|
||
|
// generated PDBs.
|
||
|
uint16_t TSM : 8;
|
||
|
|
||
|
|
||
|
- **ModuleSymStream** - The index of the stream that contains symbol information
|
||
|
for this module. This includes CodeView symbol information as well as source
|
||
|
and line information. If this field is -1, then no additional debug info will
|
||
|
be present for this module (for example, this is what happens when you strip
|
||
|
private symbols from a PDB).
|
||
|
|
||
|
- **SymByteSize** - The number of bytes of data from the stream identified by
|
||
|
``ModuleSymStream`` that represent CodeView symbol records.
|
||
|
|
||
|
- **C11ByteSize** - The number of bytes of data from the stream identified by
|
||
|
``ModuleSymStream`` that represent C11-style CodeView line information.
|
||
|
|
||
|
- **C13ByteSize** - The number of bytes of data from the stream identified by
|
||
|
``ModuleSymStream`` that represent C13-style CodeView line information. At
|
||
|
most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero. Modern PDBs
|
||
|
always use C13 instead of C11.
|
||
|
|
||
|
- **SourceFileCount** - The number of source files that contributed to this
|
||
|
module during compilation.
|
||
|
|
||
|
- **SourceFileNameIndex** - The offset in the names buffer of the primary
|
||
|
translation unit used to build this module. All PDB files observed to date
|
||
|
always have this value equal to 0.
|
||
|
|
||
|
- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
|
||
|
containing this module's symbol information. This has only been observed
|
||
|
to be non-zero for the special ``* Linker *`` module.
|
||
|
|
||
|
- **ModuleName** - The module name. This is usually either a full path to an
|
||
|
object file (either directly passed to ``link.exe`` or from an archive) or
|
||
|
a string of the form ``Import:<dll name>``.
|
||
|
|
||
|
- **ObjFileName** - The object file name. In the case of an module that is
|
||
|
linked directly passed to ``link.exe``, this is the same as **ModuleName**.
|
||
|
In the case of a module that comes from an archive, this is usually the full
|
||
|
path to the archive.
|
||
|
|
||
|
.. _dbi_sec_contr_substream:
|
||
|
|
||
|
Section Contribution Substream
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
|
||
|
and consumes ``Header->SectionContributionSize`` bytes. This substream begins
|
||
|
with a single ``uint32_t`` which will be one of the following values:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
enum class SectionContrSubstreamVersion : uint32_t {
|
||
|
Ver60 = 0xeffe0000 + 19970605,
|
||
|
V2 = 0xeffe0000 + 20140516
|
||
|
};
|
||
|
|
||
|
``Ver60`` is the only value which has been observed in a PDB so far. Following
|
||
|
this is an array of fixed-length structures. If the version is ``Ver60``,
|
||
|
it is an array of ``SectionContribEntry`` structures (this is the nested structure
|
||
|
from the ``ModInfo`` type. If the version is ``V2``, it is an array of
|
||
|
``SectionContribEntry2`` structures, defined as follows:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct SectionContribEntry2 {
|
||
|
SectionContribEntry SC;
|
||
|
uint32_t ISectCoff;
|
||
|
};
|
||
|
|
||
|
The purpose of the second field is not well understood. The name implies that
|
||
|
is the index of the COFF section, but this also describes the existing field
|
||
|
``SectionContribEntry::Section``.
|
||
|
|
||
|
|
||
|
.. _dbi_section_map_substream:
|
||
|
|
||
|
Section Map Substream
|
||
|
^^^^^^^^^^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
|
||
|
and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``4``
|
||
|
byte header followed by an array of fixed-length records. The header and records
|
||
|
have the following layout:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct SectionMapHeader {
|
||
|
uint16_t Count; // Number of segment descriptors
|
||
|
uint16_t LogCount; // Number of logical segment descriptors
|
||
|
};
|
||
|
|
||
|
struct SectionMapEntry {
|
||
|
uint16_t Flags; // See the SectionMapEntryFlags enum below.
|
||
|
uint16_t Ovl; // Logical overlay number
|
||
|
uint16_t Group; // Group index into descriptor array.
|
||
|
uint16_t Frame;
|
||
|
uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
|
||
|
uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
|
||
|
uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
|
||
|
uint32_t SectionLength; // Byte count of the segment or group.
|
||
|
};
|
||
|
|
||
|
enum class SectionMapEntryFlags : uint16_t {
|
||
|
Read = 1 << 0, // Segment is readable.
|
||
|
Write = 1 << 1, // Segment is writable.
|
||
|
Execute = 1 << 2, // Segment is executable.
|
||
|
AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
|
||
|
IsSelector = 1 << 8, // Frame represents a selector.
|
||
|
IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
|
||
|
IsGroup = 1 << 10 // If set, descriptor represents a group.
|
||
|
};
|
||
|
|
||
|
Many of these fields are not well understood, so will not be discussed further.
|
||
|
|
||
|
.. _dbi_file_info_substream:
|
||
|
|
||
|
File Info Substream
|
||
|
^^^^^^^^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
|
||
|
and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
|
||
|
from module to the source files that contribute to that module. Since multiple
|
||
|
modules can use the same source file (for example, a header file), this substream
|
||
|
uses a string table to store each unique file name only once, and then have each
|
||
|
module use offsets into the string table rather than embedding the string's value
|
||
|
directly. The format of this substream is as follows:
|
||
|
|
||
|
.. code-block:: c++
|
||
|
|
||
|
struct FileInfoSubstream {
|
||
|
uint16_t NumModules;
|
||
|
uint16_t NumSourceFiles;
|
||
|
|
||
|
uint16_t ModIndices[NumModules];
|
||
|
uint16_t ModFileCounts[NumModules];
|
||
|
uint32_t FileNameOffsets[NumSourceFiles];
|
||
|
char NamesBuffer[][NumSourceFiles];
|
||
|
};
|
||
|
|
||
|
**NumModules** - The number of modules for which source file information is
|
||
|
contained within this substream. Should match the corresponding value from the
|
||
|
ref:`dbi_header`.
|
||
|
|
||
|
**NumSourceFiles**: In theory this is supposed to contain the number of source
|
||
|
files for which this substream contains information. But that would present a
|
||
|
problem in that the width of this field being ``16``-bits would prevent one from
|
||
|
having more than 64K source files in a program. In early versions of the file
|
||
|
format, this seems to have been the case. In order to support more than this, this
|
||
|
field of the is simply ignored, and computed dynamically by summing up the values of
|
||
|
the ``ModFileCounts`` array (discussed below). In short, this value should be
|
||
|
ignored.
|
||
|
|
||
|
**ModIndices** - This array is present, but does not appear to be useful.
|
||
|
|
||
|
**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
|
||
|
the number of source files which contribute to the module at the specified index.
|
||
|
While each individual module is limited to 64K contributing source files, the
|
||
|
union of all modules' source files may be greater than 64K. The real number of
|
||
|
source files is thus computed by summing this array. Note that summing this array
|
||
|
does not give the number of `unique` source files, only the total number of source
|
||
|
file contributions to modules.
|
||
|
|
||
|
**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
|
||
|
here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
|
||
|
each integer is an offset into **NamesBuffer** pointing to a null terminated string.
|
||
|
|
||
|
**NamesBuffer** - An array of null terminated strings containing the actual source
|
||
|
file names.
|
||
|
|
||
|
.. _dbi_type_server_map_substream:
|
||
|
|
||
|
Type Server Map Substream
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream`
|
||
|
ends, and consumes ``Header->TypeServerMapSize`` bytes. Neither the purpose
|
||
|
nor the layout of this substream is understood, although it is assumed to
|
||
|
related somehow to the usage of ``/Zi`` and ``mspdbsrv.exe``. This substream
|
||
|
will not be discussed further.
|
||
|
|
||
|
.. _dbi_ec_substream:
|
||
|
|
||
|
EC Substream
|
||
|
^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the
|
||
|
:ref:`dbi_type_server_map_substream` ends, and consumes
|
||
|
``Header->ECSubstreamSize`` bytes. This is presumed to be related to Edit &
|
||
|
Continue support in MSVC. LLVM does not support Edit & Continue, so this
|
||
|
stream will not be discussed further.
|
||
|
|
||
|
.. _dbi_optional_dbg_stream:
|
||
|
|
||
|
Optional Debug Header Stream
|
||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||
|
Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
|
||
|
consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
|
||
|
stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
|
||
|
index in the larger MSF file which contains some additional debug information.
|
||
|
Each position of this array has a special meaning, allowing one to determine
|
||
|
what kind of debug information is at the referenced stream. ``11`` indices
|
||
|
are currently understood, although it's possible there may be more. The
|
||
|
layout of each stream generally corresponds exactly to a particular type
|
||
|
of debug data directory from the PE/COFF file. The format of these fields
|
||
|
can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
|
||
|
If any of these fields is -1, it means the corresponding type of debug info is
|
||
|
not present in the PDB.
|
||
|
|
||
|
**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is an
|
||
|
array of ``FPO_DATA`` structures. This contains the relocated contents of
|
||
|
any ``.debug$F`` section from any of the linker inputs.
|
||
|
|
||
|
**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
|
||
|
is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
|
||
|
|
||
|
**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
|
||
|
debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
|
||
|
|
||
|
**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
|
||
|
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
|
||
|
is used for mapping addresses between instrumented and uninstrumented code.
|
||
|
|
||
|
**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
|
||
|
is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
|
||
|
is used for mapping addresses between instrumented and uninstrumented code.
|
||
|
|
||
|
**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
|
||
|
the original executable.
|
||
|
|
||
|
**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
|
||
|
understood, but it is assumed to be a mapping from ``CLR Token`` to
|
||
|
``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
|
||
|
for more information.
|
||
|
|
||
|
**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
|
||
|
executable.
|
||
|
|
||
|
**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
|
||
|
section from the executable, but that would make it identical to
|
||
|
``DbgStreamArray[1]``. The difference between these two indices is not well
|
||
|
understood.
|
||
|
|
||
|
**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
|
||
|
debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. Note that this is different
|
||
|
from ``DbgStreamArray[0]`` in that ``.debug$F`` sections are only emitted by MASM.
|
||
|
Thus, it is possible for both to appear in the same PDB if both MASM object files
|
||
|
and cl object files are linked into the same program.
|
||
|
|
||
|
**Original Section Header Data** - ``DbgStreamArray[10]``. Similar to
|
||
|
``DbgStreamArray[5]``, but contains the section headers before any binary translation
|
||
|
has been performed. This can be used in conjunction with ``DebugStreamArray[3]``
|
||
|
and ``DbgStreamArray[4]`` to map instrumented and uninstrumented addresses.
|