dmp文件读取(三)- Moduel
| 阅读 | 共 3511 字,阅读约
Overview
dmp文件读取(三)
第二章主要介绍了dmp中线程stream的读取,线程信息包括进程运行时的所有线程,每个线程包括线程id、优先级、调用栈、寄存器等上下文信息。这些信息是后面分析调用栈的核心参考。本节主要介绍模块stream信息。
module stream概述
- module信息记录了应用程序(exe)在运行时,加载了哪些dll
- 每个dll相关的信息,在dmp中称为module
- dmp中module列表的stream类型为MD_MODULE_LIST_STREAM,枚举值是4
- 每个module使用
MinidumpModule
对象表示 - 每个module在内存中占用一段位置,dmp分析时,根据线程的崩溃地址,匹配响应的模块内存地址
module流读取入口
- dump的GetModuleList方法,获取dmp文件中的模块列表
- 读取的模块列表信息保存在
MinidumpModuleList
中
1ProcessResult MinidumpProcessor::Process(
2 Minidump *dump, ProcessState *process_state) {
3
4 ...
5 // 读取模块信息
6 MinidumpModuleList *module_list = dump->GetModuleList();
7 ...
8}
MinidumpModuleList
1class MinidumpModuleList : public MinidumpStream,
2 public CodeModules {
3 public:
4 ...
5 private:
6 friend class Minidump;
7
8 // 存放所有的module列表
9 // 每个module使用MinidumpModule对象表示
10 typedef vector<MinidumpModule> MinidumpModules;
11
12 // module的stream类型为MD_MODULE_LIST_STREAM
13 static const uint32_t kStreamType = MD_MODULE_LIST_STREAM;
14
15 // 重新read方法,读取module信息
16 bool Read(uint32_t expected_size);
17
18 // 最大模块数1024
19 // The largest number of modules that will be read from a minidump. The
20 // default is 1024.
21 static uint32_t max_modules_;
22
23 // 每个module在内存中有一段偏移地址
24 // 这个字段用来根据内存的段,索引这段内存被哪个module占用着
25 // Access to modules using addresses as the key.
26 RangeMap<uint64_t, unsigned int> *range_map_;
27
28 MinidumpModules *modules_;
29 uint32_t module_count_;
30
31 DISALLOW_COPY_AND_ASSIGN(MinidumpModuleList);
32};
MinidumpModule
- 每个module的核心对象
- 数据信息保存在MDRawModule中
- 通过MDRawModule解析出模块名、xx等信息
1class MinidumpModule : public MinidumpObject,
2 public CodeModule {
3 public:
4 ...
5 // 重新read方法,读取模块相关数据信息
6 bool Read();
7
8 // 该module是否是合法的module
9 bool module_valid_;
10
11 // 该module是否有debug信息,pdb文件是否匹配
12 bool has_debug_info_;
13
14 // dmp中存放的module核心数据结构,保存到这个对象
15 MDRawModule module_;
16
17 // 模块名称,是通过MDRawModule中存放的文件名索引计算得到的(name在dmp文件中的偏移位置和大小)
18 const string* name_;
19
20 // Cached CodeView record - this is MDCVInfoPDB20 or (likely)
21 // MDCVInfoPDB70, or possibly something else entirely. Stored as a uint8_t
22 // because the structure contains a variable-sized string and its exact
23 // size cannot be known until it is processed.
24 vector<uint8_t>* cv_record_;
25
26 // If cv_record_ is present, cv_record_signature_ contains a copy of the
27 // CodeView record's first four bytes, for ease of determinining the
28 // type of structure that cv_record_ contains.
29 uint32_t cv_record_signature_;
30
31 // Cached MDImageDebugMisc (usually not present), stored as uint8_t
32 // because the structure contains a variable-sized string and its exact
33 // size cannot be known until it is processed.
34 vector<uint8_t>* misc_record_;
35};
MDRawModule
- 存放某个模块的核心数据结构
- 其实地址和大小,用于和线程当前在内存中位置做匹配,以分析线程崩溃在哪个模块了
1typedef struct {
2
3 // 该模块在内存中的起始地址
4 uint64_t base_of_image;
5
6 // 该模块在内存中的占用大小
7 uint32_t size_of_image;
8
9 // 校验码
10 uint32_t checksum; /* 0 if unknown */
11
12 // 时间戳
13 uint32_t time_date_stamp; /* time_t */
14
15 // 模块名在dmp中的偏移地址和大小
16 MDRVA module_name_rva; /* MDString, pathname or filename */
17 MDVSFixedFileInfo version_info;
18
19 /* The next field stores a CodeView record and is populated when a module's
20 * debug information resides in a PDB file. It identifies the PDB file. */
21 MDLocationDescriptor cv_record;
22
23 /* The next field is populated when a module's debug information resides
24 * in a DBG file. It identifies the DBG file. This field is effectively
25 * obsolete with modules built by recent toolchains. */
26 MDLocationDescriptor misc_record;
27
28 /* Alignment problem: reserved0 and reserved1 are defined by the platform
29 * SDK as 64-bit quantities. However, that results in a structure whose
30 * alignment is unpredictable on different CPUs and ABIs. If the ABI
31 * specifies full alignment of 64-bit quantities in structures (as ppc
32 * does), there will be padding between miscRecord and reserved0. If
33 * 64-bit quantities can be aligned on 32-bit boundaries (as on x86),
34 * this padding will not exist. (Note that the structure up to this point
35 * contains 1 64-bit member followed by 21 32-bit members.)
36 * As a workaround, reserved0 and reserved1 are instead defined here as
37 * four 32-bit quantities. This should be harmless, as there are
38 * currently no known uses for these fields. */
39 uint32_t reserved0[2];
40 uint32_t reserved1[2];
41} MDRawModule; /* MINIDUMP_MODULE */
42
module stream读取核心代码
- 调用读取stream的通用方法,传入的参数为MinidumpModuleList
1MinidumpModuleList* Minidump::GetModuleList() {
2 MinidumpModuleList* module_list;
3 return GetStream(&module_list);
4}
读取stream的通用方法解析
- 入参为模板类
1template<typename T>
2T* Minidump::GetStream(T** stream) {
3 // stream is a garbage parameter that's present only to account for C++'s
4 // inability to overload a method based solely on its return type.
5
6 // 先获取入参的stream类型
7 const uint32_t stream_type = T::kStreamType;
8
9 BPLOG_IF(ERROR, !stream) << "Minidump::GetStream type " << stream_type <<
10 " requires |stream|";
11 assert(stream);
12 *stream = NULL;
13
14 if (!valid_) {
15 BPLOG(ERROR) << "Invalid Minidump for GetStream type " << stream_type;
16 return NULL;
17 }
18
19 // 前面的文章介绍过,读取stream索引信息后,所有的被存放在一个名为stream_map的map中
20 // map的key为stream类型,value为stream对象
21 // 如果传入的参数,不在索引列表中,报错返回
22 // 如果在索引列表中,取出value值,并进行后续操作
23 MinidumpStreamMap::iterator iterator = stream_map_->find(stream_type);
24 if (iterator == stream_map_->end()) {
25 // This stream type didn't exist in the directory.
26 BPLOG(INFO) << "GetStream: type " << stream_type << " not present";
27 return NULL;
28 }
29
30 // Get a pointer so that the stored stream field can be altered.
31 MinidumpStreamInfo* info = &iterator->second;
32
33 // 如果map中根据streamType取出的value值中,stream不为空,说明已经取出过了,直接将上次读取的结果返回
34 if (info->stream) {
35 // This cast is safe because info.stream is only populated by this
36 // method, and there is a direct correlation between T and stream_type.
37 *stream = static_cast<T*>(info->stream);
38 return *stream;
39 }
40
41 // 如果map中根据streamType取出的value值中,stream为空
42 // 调用SeekToStreamType函数,将dmp二进制文件流的指针跳转到该stream指定的数据在dmp文件中的偏移位置,并返回该stream的字节大小
43 uint32_t stream_length;
44 if (!SeekToStreamType(stream_type, &stream_length)) {
45 BPLOG(ERROR) << "GetStream could not seek to stream type " << stream_type;
46 return NULL;
47 }
48
49 scoped_ptr<T> new_stream(new T(this));
50
51 // 前面的步骤已经将读取文件的指针指到指定的偏移位置
52 // 这里在根据stream的大小,就能读取出该stream在dmp中的全部数据
53 // 具体这样一段大小的字节,如何反序列化为stream对象,后面会详细介绍Read反复
54 // 我们说过,Read是虚方法,每种stream有自己的实现方式
55 if (!new_stream->Read(stream_length)) {
56 BPLOG(ERROR) << "GetStream could not read stream type " << stream_type;
57 return NULL;
58 }
59
60 // 得到的二进制数据就是该stream的所有数据
61 *stream = new_stream.release();
62 info->stream = *stream;
63 return *stream;
64}
module Read方法详解
- 前面的方法,介绍了如何根据streamType将读取文件的指针指向某个偏移位置,并读取指定stream大小的数据,这里以module这种stream为例,介绍具体的Read方法实现
- 只介绍核心方法,不重要的代码移除掉了
- 这里主要是调用单个module的Read方法,然后建立内存地址和模块信息的map索引
1// 该方法从dmp中stream的索引给定的线索,将读取dmp文件流的指针指到固定位置,
2// 并读取指定大小的一段二进制数据
3bool MinidumpModuleList::Read(uint32_t expected_size) {
4
5 // 首先读取前4byte数据,这4byte指定了module stream共有多少个module
6 uint32_t module_count;
7 if (!minidump_->ReadBytes(&module_count, sizeof(module_count))) {
8 BPLOG(ERROR) << "MinidumpModuleList could not read module count";
9 return false;
10 }
11
12 // 同样的,根据大小端决定是否要字节交换
13 if (minidump_->swap())
14 Swap(&module_count);
15
16
17 if (module_count != 0) {
18 // 根据module 的梳理,先创建一个列表对象来接受所有的module
19 scoped_ptr<MinidumpModules> modules(
20 new MinidumpModules(module_count, MinidumpModule(minidump_)));
21
22 // 遍历module的梳理,依次读取每个module
23 // 每个module用 MinidumpModule 对象表示
24 // MinidumpModule 的 Read 方法具体实现了每种module的读取逻辑
25 for (unsigned int module_index = 0;
26 module_index < module_count;
27 ++module_index) {
28 MinidumpModule* module = &(*modules)[module_index];
29
30 // MinidumpModule 的 Read 方法具体实现了每种module的读取逻辑
31 if (!module->Read()) {
32 BPLOG(ERROR) << "MinidumpModuleList could not read module " <<
33 module_index << "/" << module_count;
34 return false;
35 }
36 }
37
38 // 遍历读取到的Module数据,并做一些数据初始化、建立数据索引等操作
39 uint64_t last_end_address = 0;
40 for (unsigned int module_index = 0;
41 module_index < module_count;
42 ++module_index) {
43 MinidumpModule* module = &(*modules)[module_index];
44
45 // 取出每个module在内存中的起始地址和大小
46 uint64_t base_address = module->base_address();
47 uint64_t module_size = module->size();
48 if (base_address == static_cast<uint64_t>(-1)) {
49 BPLOG(ERROR) << "MinidumpModuleList found bad base address "
50 "for module " << module_index << "/" << module_count <<
51 ", " << module->code_file();
52 return false;
53 }
54
55 // 根据内存偏移地址,建立map索引,便于后续和线程崩溃时刻的内存地址快速匹配上
56 if (!range_map_->StoreRange(base_address, module_size, module_index)) {
57 // Android's shared memory implementation /dev/ashmem can contain
58 // duplicate entries for JITted code, so ignore these.
59 // TODO(wfh): Remove this code when Android is fixed.
60 // See https://crbug.com/439531
61 const string kDevAshmem("/dev/ashmem/");
62 if (module->code_file().compare(
63 0, kDevAshmem.length(), kDevAshmem) != 0) {
64 if (base_address < last_end_address) {
65 // If failed due to apparent range overlap the cause may be
66 // the client correction applied for Android packed relocations.
67 // If this is the case, back out the client correction and retry.
68 module_size -= last_end_address - base_address;
69 base_address = last_end_address;
70 if (!range_map_->StoreRange(base_address,
71 module_size, module_index)) {
72 BPLOG(ERROR) << "MinidumpModuleList could not store module " <<
73 module_index << "/" << module_count << ", " <<
74 module->code_file() << ", " <<
75 HexString(base_address) << "+" <<
76 HexString(module_size) << ", after adjusting";
77 return false;
78 }
79 } else {
80 BPLOG(ERROR) << "MinidumpModuleList could not store module " <<
81 module_index << "/" << module_count << ", " <<
82 module->code_file() << ", " <<
83 HexString(base_address) << "+" <<
84 HexString(module_size);
85 return false;
86 }
87 } else {
88 BPLOG(INFO) << "MinidumpModuleList ignoring overlapping module " <<
89 module_index << "/" << module_count << ", " <<
90 module->code_file() << ", " <<
91 HexString(base_address) << "+" <<
92 HexString(module_size);
93 }
94 }
95 last_end_address = base_address + module_size;
96 }
97
98 modules_ = modules.release();
99 }
100
101 module_count_ = module_count;
102
103 valid_ = true;
104 return true;
105}
MinidumpModule的Read方法
- MinidumpModule的Read方法,实现了一个module最基本的读取业务逻辑
- 读取后的每个module保存在MinidumpModule的 MDRawModule 成员中
- MDRawModule前面介绍过
1bool MinidumpModule::Read() {
2
3 // MDRawModule在dmp中的大小是固定了,为MD_MODULE_SIZE(108byte)
4 // 也就是只要每108个byte读取一下,放入MDRawModule中,module就反序列化完成了
5 if (!minidump_->ReadBytes(&module_, MD_MODULE_SIZE)) {
6 BPLOG(ERROR) << "MinidumpModule cannot read module";
7 return false;
8 }
9
10 // 大小端的情况,交换MDRawModule中的每一个字段的高低位
11 if (minidump_->swap()) {
12 Swap(&module_.base_of_image);
13 Swap(&module_.size_of_image);
14 Swap(&module_.checksum);
15 Swap(&module_.time_date_stamp);
16 Swap(&module_.module_name_rva);
17 Swap(&module_.version_info.signature);
18 Swap(&module_.version_info.struct_version);
19 Swap(&module_.version_info.file_version_hi);
20 Swap(&module_.version_info.file_version_lo);
21 Swap(&module_.version_info.product_version_hi);
22 Swap(&module_.version_info.product_version_lo);
23 Swap(&module_.version_info.file_flags_mask);
24 Swap(&module_.version_info.file_flags);
25 Swap(&module_.version_info.file_os);
26 Swap(&module_.version_info.file_type);
27 Swap(&module_.version_info.file_subtype);
28 Swap(&module_.version_info.file_date_hi);
29 Swap(&module_.version_info.file_date_lo);
30 Swap(&module_.cv_record);
31 Swap(&module_.misc_record);
32 // Don't swap reserved fields because their contents are unknown (as
33 // are their proper widths).
34 }
35
36 module_valid_ = true;
37 return true;
38}