dmp文件读取(三)- Moduel


| 阅读 |,阅读约 8 分钟
| 复制链接:

Overview

dmp文件读取(三)

第二章主要介绍了dmp中线程stream的读取,线程信息包括进程运行时的所有线程,每个线程包括线程id、优先级、调用栈、寄存器等上下文信息。这些信息是后面分析调用栈的核心参考。本节主要介绍模块stream信息。

module stream概述

  • module信息记录了应用程序(exe)在运行时,加载了哪些dll
  • 每个dll相关的信息,在dmp中称为module
  • dmp中module列表的stream类型为MD_MODULE_LIST_STREAM,枚举值是4
  • 每个module使用MinidumpModule对象表示
  • 每个module在内存中占用一段位置,dmp分析时,根据线程的崩溃地址,匹配响应的模块内存地址

module流读取入口

  • dump的GetModuleList方法,获取dmp文件中的模块列表
  • 读取的模块列表信息保存在MinidumpModuleList
1ProcessResult MinidumpProcessor::Process(
2    Minidump *dump, ProcessState *process_state) {
3 
4    ...
5    // 读取模块信息
6    MinidumpModuleList *module_list = dump->GetModuleList();
7    ...
8}

MinidumpModuleList

 1class MinidumpModuleList : public MinidumpStream,
 2                           public CodeModules {
 3 public:
 4    ...
 5 private:
 6  friend class Minidump;
 7
 8  // 存放所有的module列表
 9  // 每个module使用MinidumpModule对象表示
10  typedef vector<MinidumpModule> MinidumpModules;
11
12  // module的stream类型为MD_MODULE_LIST_STREAM
13  static const uint32_t kStreamType = MD_MODULE_LIST_STREAM;
14
15  // 重新read方法,读取module信息
16  bool Read(uint32_t expected_size);
17
18  // 最大模块数1024
19  // The largest number of modules that will be read from a minidump.  The
20  // default is 1024.
21  static uint32_t max_modules_;
22
23  // 每个module在内存中有一段偏移地址
24  // 这个字段用来根据内存的段,索引这段内存被哪个module占用着
25  // Access to modules using addresses as the key.
26  RangeMap<uint64_t, unsigned int> *range_map_;
27
28  MinidumpModules *modules_;
29  uint32_t module_count_;
30
31  DISALLOW_COPY_AND_ASSIGN(MinidumpModuleList);
32};

MinidumpModule

  • 每个module的核心对象
  • 数据信息保存在MDRawModule中
  • 通过MDRawModule解析出模块名、xx等信息
 1class MinidumpModule : public MinidumpObject,
 2                       public CodeModule {
 3 public:
 4  ...
 5  // 重新read方法,读取模块相关数据信息
 6  bool Read();
 7
 8  // 该module是否是合法的module
 9  bool              module_valid_;
10
11  // 该module是否有debug信息,pdb文件是否匹配
12  bool              has_debug_info_;
13
14  // dmp中存放的module核心数据结构,保存到这个对象
15  MDRawModule       module_;
16
17  // 模块名称,是通过MDRawModule中存放的文件名索引计算得到的(name在dmp文件中的偏移位置和大小)
18  const string*     name_;
19
20  // Cached CodeView record - this is MDCVInfoPDB20 or (likely)
21  // MDCVInfoPDB70, or possibly something else entirely.  Stored as a uint8_t
22  // because the structure contains a variable-sized string and its exact
23  // size cannot be known until it is processed.
24  vector<uint8_t>* cv_record_;
25
26  // If cv_record_ is present, cv_record_signature_ contains a copy of the
27  // CodeView record's first four bytes, for ease of determinining the
28  // type of structure that cv_record_ contains.
29  uint32_t cv_record_signature_;
30
31  // Cached MDImageDebugMisc (usually not present), stored as uint8_t
32  // because the structure contains a variable-sized string and its exact
33  // size cannot be known until it is processed.
34  vector<uint8_t>* misc_record_;
35};

MDRawModule

  • 存放某个模块的核心数据结构
  • 其实地址和大小,用于和线程当前在内存中位置做匹配,以分析线程崩溃在哪个模块了
 1typedef struct {
 2
 3  // 该模块在内存中的起始地址
 4  uint64_t             base_of_image;
 5  
 6  // 该模块在内存中的占用大小
 7  uint32_t             size_of_image;
 8  
 9  // 校验码
10  uint32_t             checksum;         /* 0 if unknown */
11  
12  // 时间戳
13  uint32_t             time_date_stamp;  /* time_t */
14  
15  // 模块名在dmp中的偏移地址和大小
16  MDRVA                module_name_rva;  /* MDString, pathname or filename */
17  MDVSFixedFileInfo    version_info;
18
19  /* The next field stores a CodeView record and is populated when a module's
20   * debug information resides in a PDB file.  It identifies the PDB file. */
21  MDLocationDescriptor cv_record;
22
23  /* The next field is populated when a module's debug information resides
24   * in a DBG file.  It identifies the DBG file.  This field is effectively
25   * obsolete with modules built by recent toolchains. */
26  MDLocationDescriptor misc_record;
27
28  /* Alignment problem: reserved0 and reserved1 are defined by the platform
29   * SDK as 64-bit quantities.  However, that results in a structure whose
30   * alignment is unpredictable on different CPUs and ABIs.  If the ABI
31   * specifies full alignment of 64-bit quantities in structures (as ppc
32   * does), there will be padding between miscRecord and reserved0.  If
33   * 64-bit quantities can be aligned on 32-bit boundaries (as on x86),
34   * this padding will not exist.  (Note that the structure up to this point
35   * contains 1 64-bit member followed by 21 32-bit members.)
36   * As a workaround, reserved0 and reserved1 are instead defined here as
37   * four 32-bit quantities.  This should be harmless, as there are
38   * currently no known uses for these fields. */
39  uint32_t             reserved0[2];
40  uint32_t             reserved1[2];
41} MDRawModule;  /* MINIDUMP_MODULE */
42

module stream读取核心代码

  • 调用读取stream的通用方法,传入的参数为MinidumpModuleList
1MinidumpModuleList* Minidump::GetModuleList() {
2  MinidumpModuleList* module_list;
3  return GetStream(&module_list);
4}

读取stream的通用方法解析

  • 入参为模板类
 1template<typename T>
 2T* Minidump::GetStream(T** stream) {
 3  // stream is a garbage parameter that's present only to account for C++'s
 4  // inability to overload a method based solely on its return type.
 5
 6  // 先获取入参的stream类型
 7  const uint32_t stream_type = T::kStreamType;
 8
 9  BPLOG_IF(ERROR, !stream) << "Minidump::GetStream type " << stream_type <<
10                              " requires |stream|";
11  assert(stream);
12  *stream = NULL;
13
14  if (!valid_) {
15    BPLOG(ERROR) << "Invalid Minidump for GetStream type " << stream_type;
16    return NULL;
17  }
18
19  // 前面的文章介绍过,读取stream索引信息后,所有的被存放在一个名为stream_map的map中
20  // map的key为stream类型,value为stream对象
21  // 如果传入的参数,不在索引列表中,报错返回
22  // 如果在索引列表中,取出value值,并进行后续操作
23  MinidumpStreamMap::iterator iterator = stream_map_->find(stream_type);
24  if (iterator == stream_map_->end()) {
25    // This stream type didn't exist in the directory.
26    BPLOG(INFO) << "GetStream: type " << stream_type << " not present";
27    return NULL;
28  }
29
30  // Get a pointer so that the stored stream field can be altered.
31  MinidumpStreamInfo* info = &iterator->second;
32
33  // 如果map中根据streamType取出的value值中,stream不为空,说明已经取出过了,直接将上次读取的结果返回
34  if (info->stream) {
35    // This cast is safe because info.stream is only populated by this
36    // method, and there is a direct correlation between T and stream_type.
37    *stream = static_cast<T*>(info->stream);
38    return *stream;
39  }
40  
41  // 如果map中根据streamType取出的value值中,stream为空
42  // 调用SeekToStreamType函数,将dmp二进制文件流的指针跳转到该stream指定的数据在dmp文件中的偏移位置,并返回该stream的字节大小
43  uint32_t stream_length;
44  if (!SeekToStreamType(stream_type, &stream_length)) {
45    BPLOG(ERROR) << "GetStream could not seek to stream type " << stream_type;
46    return NULL;
47  }
48
49  scoped_ptr<T> new_stream(new T(this));
50
51  // 前面的步骤已经将读取文件的指针指到指定的偏移位置
52  // 这里在根据stream的大小,就能读取出该stream在dmp中的全部数据
53  // 具体这样一段大小的字节,如何反序列化为stream对象,后面会详细介绍Read反复
54  // 我们说过,Read是虚方法,每种stream有自己的实现方式
55  if (!new_stream->Read(stream_length)) {
56    BPLOG(ERROR) << "GetStream could not read stream type " << stream_type;
57    return NULL;
58  }
59
60  // 得到的二进制数据就是该stream的所有数据
61  *stream = new_stream.release();
62  info->stream = *stream;
63  return *stream;
64}

module Read方法详解

  • 前面的方法,介绍了如何根据streamType将读取文件的指针指向某个偏移位置,并读取指定stream大小的数据,这里以module这种stream为例,介绍具体的Read方法实现
  • 只介绍核心方法,不重要的代码移除掉了
  • 这里主要是调用单个module的Read方法,然后建立内存地址和模块信息的map索引
  1// 该方法从dmp中stream的索引给定的线索,将读取dmp文件流的指针指到固定位置,
  2// 并读取指定大小的一段二进制数据
  3bool MinidumpModuleList::Read(uint32_t expected_size) {
  4  
  5  // 首先读取前4byte数据,这4byte指定了module stream共有多少个module
  6  uint32_t module_count;
  7  if (!minidump_->ReadBytes(&module_count, sizeof(module_count))) {
  8    BPLOG(ERROR) << "MinidumpModuleList could not read module count";
  9    return false;
 10  }
 11
 12  // 同样的,根据大小端决定是否要字节交换
 13  if (minidump_->swap())
 14    Swap(&module_count);
 15
 16
 17  if (module_count != 0) {
 18    // 根据module 的梳理,先创建一个列表对象来接受所有的module
 19    scoped_ptr<MinidumpModules> modules(
 20        new MinidumpModules(module_count, MinidumpModule(minidump_)));
 21
 22    // 遍历module的梳理,依次读取每个module
 23    // 每个module用 MinidumpModule 对象表示
 24    // MinidumpModule 的 Read 方法具体实现了每种module的读取逻辑
 25    for (unsigned int module_index = 0;
 26         module_index < module_count;
 27         ++module_index) {
 28      MinidumpModule* module = &(*modules)[module_index];
 29
 30      // MinidumpModule 的 Read 方法具体实现了每种module的读取逻辑
 31      if (!module->Read()) {
 32        BPLOG(ERROR) << "MinidumpModuleList could not read module " <<
 33                        module_index << "/" << module_count;
 34        return false;
 35      }
 36    }
 37
 38    // 遍历读取到的Module数据,并做一些数据初始化、建立数据索引等操作
 39    uint64_t last_end_address = 0;
 40    for (unsigned int module_index = 0;
 41         module_index < module_count;
 42         ++module_index) {
 43      MinidumpModule* module = &(*modules)[module_index];
 44
 45      // 取出每个module在内存中的起始地址和大小
 46      uint64_t base_address = module->base_address();
 47      uint64_t module_size = module->size();
 48      if (base_address == static_cast<uint64_t>(-1)) {
 49        BPLOG(ERROR) << "MinidumpModuleList found bad base address "
 50                        "for module " << module_index << "/" << module_count <<
 51                        ", " << module->code_file();
 52        return false;
 53      }
 54      
 55      // 根据内存偏移地址,建立map索引,便于后续和线程崩溃时刻的内存地址快速匹配上
 56      if (!range_map_->StoreRange(base_address, module_size, module_index)) {
 57        // Android's shared memory implementation /dev/ashmem can contain
 58        // duplicate entries for JITted code, so ignore these.
 59        // TODO(wfh): Remove this code when Android is fixed.
 60        // See https://crbug.com/439531
 61        const string kDevAshmem("/dev/ashmem/");
 62        if (module->code_file().compare(
 63            0, kDevAshmem.length(), kDevAshmem) != 0) {
 64          if (base_address < last_end_address) {
 65            // If failed due to apparent range overlap the cause may be
 66            // the client correction applied for Android packed relocations.
 67            // If this is the case, back out the client correction and retry.
 68            module_size -= last_end_address - base_address;
 69            base_address = last_end_address;
 70            if (!range_map_->StoreRange(base_address,
 71                                        module_size, module_index)) {
 72              BPLOG(ERROR) << "MinidumpModuleList could not store module " <<
 73                              module_index << "/" << module_count << ", " <<
 74                              module->code_file() << ", " <<
 75                              HexString(base_address) << "+" <<
 76                              HexString(module_size) << ", after adjusting";
 77              return false;
 78            }
 79          } else {
 80            BPLOG(ERROR) << "MinidumpModuleList could not store module " <<
 81                            module_index << "/" << module_count << ", " <<
 82                            module->code_file() << ", " <<
 83                            HexString(base_address) << "+" <<
 84                            HexString(module_size);
 85            return false;
 86          }
 87        } else {
 88          BPLOG(INFO) << "MinidumpModuleList ignoring overlapping module " <<
 89                          module_index << "/" << module_count << ", " <<
 90                          module->code_file() << ", " <<
 91                          HexString(base_address) << "+" <<
 92                          HexString(module_size);
 93        }
 94      }
 95      last_end_address = base_address + module_size;
 96    }
 97
 98    modules_ = modules.release();
 99  }
100
101  module_count_ = module_count;
102
103  valid_ = true;
104  return true;
105}

MinidumpModule的Read方法

  • MinidumpModule的Read方法,实现了一个module最基本的读取业务逻辑
  • 读取后的每个module保存在MinidumpModule的 MDRawModule 成员中
  • MDRawModule前面介绍过
 1bool MinidumpModule::Read() {
 2  
 3  // MDRawModule在dmp中的大小是固定了,为MD_MODULE_SIZE(108byte)
 4  // 也就是只要每108个byte读取一下,放入MDRawModule中,module就反序列化完成了
 5  if (!minidump_->ReadBytes(&module_, MD_MODULE_SIZE)) {
 6    BPLOG(ERROR) << "MinidumpModule cannot read module";
 7    return false;
 8  }
 9
10  // 大小端的情况,交换MDRawModule中的每一个字段的高低位
11  if (minidump_->swap()) {
12    Swap(&module_.base_of_image);
13    Swap(&module_.size_of_image);
14    Swap(&module_.checksum);
15    Swap(&module_.time_date_stamp);
16    Swap(&module_.module_name_rva);
17    Swap(&module_.version_info.signature);
18    Swap(&module_.version_info.struct_version);
19    Swap(&module_.version_info.file_version_hi);
20    Swap(&module_.version_info.file_version_lo);
21    Swap(&module_.version_info.product_version_hi);
22    Swap(&module_.version_info.product_version_lo);
23    Swap(&module_.version_info.file_flags_mask);
24    Swap(&module_.version_info.file_flags);
25    Swap(&module_.version_info.file_os);
26    Swap(&module_.version_info.file_type);
27    Swap(&module_.version_info.file_subtype);
28    Swap(&module_.version_info.file_date_hi);
29    Swap(&module_.version_info.file_date_lo);
30    Swap(&module_.cv_record);
31    Swap(&module_.misc_record);
32    // Don't swap reserved fields because their contents are unknown (as
33    // are their proper widths).
34  }
35
36  module_valid_ = true;
37  return true;
38}