Linux CoreDump机制详解_IT分享知识网

coredump主要应用于解决NE问题（native exception）。用户进程发生native crash时，tombstone会抓取一些简单的backtrace信息，但是对于定位一些内存访问异常、内存被踩的疑难问题来说，tombstone信息量不充足导致无法精确定位分析问题，这个时候就需要使用到coredump分析这类问题。

2.3 什么情况下触发coredump

从进程发生异常类型维度来看，当native进程发生内存越界访问、堆栈溢出、非法指针等操作时，会触发coredump

从进程接收的信号类型来看，当native进程接收SIGQUIT、SIGABRT、SIGSEGV、SIGTRAP等信号时，会触发coredump

三、如何使用coredump

在Android平台中默认关闭coredump功能，需要手动或代码中去打开。当检测到进程异常退出时，会在指定的路径下生成core文件（格式为elf），可以结合gdb工具调试分析，详见第五章Demo案例。

使能coredump有两种方案，第一种是设置core size和coredump文件路径，另外一种是采用命名管道方式使能coredump.

3.1 方案1：设置core size和coredump文件路径方式使能coredump

3.1.1 使能步骤

1）设置core size

可以用命令方式全局设置core size，如下：

1） 检查系统 coredump 是否开启 ulimit -c // 返回 0，则未启用 2） 打开coredump ulimit -c 1024 // 设置成 1024 byte 或者 ulimit -c unlimited // 设置成无限大

也为单个进程设置core size，在代码端实现，如下：

void coreSetLimit(pid_t pid, uint64_t size) struct rlimit64 rlim64; rlim64.rlim_cur = size; rlim64.rlim_max = size; int ret = prlimit64(pid, RLIMIT_CORE, &rlim64, NULL); }

2.设置coredump生成文件的路径

// 如果不设置文件路径，core文件生成的位置默认是可执行文件所在的位置 echo "/data/corefile/core-%e-%p-%t" > /proc/sys/kernel/core_pattern

3.1.2 方案缺陷

1）如果为每个进程设置core size，需要配置setrlimit selinux权限，由于系统中的进程数量很多，为每个进程配置selinux权限不太现实，且有些进程对setrlimit selinux权限是neverallow.

2）即使进程设置core size成功，该进程需要对coredump文件路径（/data/xxx）配置相关的selinux权限和读写权限，每个进程都去配置这些权限不太现实，也容易遗漏，且有些进程对这部分的权限是neverallow.

方案2可以绕过selinux权限，解决以上问题。

3.2 方案2：命名管道方式使能coredump

3.2.1 使能步骤

1）在内核配置CONFIG_STATIC_USERMODEHELPER_PATH属性

2）用户空间实现core辅助程序core_bin

3）用户空间配置

mkdir /data/xxx/coredump 0777 root root chmod 0777 data/xxx/coredump 0777 restorecon data/xxx/coredump 0777 write /proc/sys/kernel/core_pattern "|/system/bin/core_bin %e %p"

往/proc/sys/kernel/core_pattern节点挂载一个用户空间的辅助程序core_bin，linux coredump模块会启动该用户空间辅助程序，通过命名管道的方式将数据写入管道，core辅助进程从管道中读取coredump数据，存入data/xxx/coredump目录的core文件中。

3.2.2 基本工作流程

1）进程发生crash时，内核发送异常信号，在linux coredump中处理异常信号，创建管道，通过exec方式启动用户空间的辅助程序core_bin

2）收集coredump信息写入管道，用户空间的辅助程序core_bin从管道中读取数据，写入到指定的文件

3.2.3 内核设置用户空间辅助程序并执行

do_coredump

do_coredump函数主要作用：如果用户空间采用的是管道方式，则设置管道并启动用户模式辅助进程，进行coredump数据转储。

// kernel/fs/coredump.c void do_coredump(const kernel_siginfo_t *siginfo) { struct core_state core_state; struct core_name cn; struct mm_struct *mm = current->mm; struct linux_binfmt * binfmt; const struct cred *old_cred; struct cred *cred; int retval = 0; int ispipe; size_t *argv = NULL; int argc = 0; /* require nonrelative corefile path and be extra careful */ bool need_suid_safe = false; bool core_dumped = false; static atomic_t core_dump_count = ATOMIC_INIT(0); struct coredump_params cprm = { .siginfo = siginfo, .regs = signal_pt_regs(), .limit = rlimit(RLIMIT_CORE), /* * We must use the same mm->flags while dumping core to avoid * inconsistency of bit flags, since this flag is not protected * by any locks. */ .mm_flags = mm->flags, .vma_meta = NULL, }; audit_core_dumps(siginfo->si_signo); binfmt = mm->binfmt; if (!binfmt || !binfmt->core_dump) goto fail; if (!__get_dumpable(cprm.mm_flags)) goto fail; cred = prepare_creds(); if (!cred) goto fail; /* * We cannot trust fsuid as being the "true" uid of the process * nor do we know its entire history. We only know it was tainted * so we dump it as root in mode 2, and only into a controlled * environment (pipe handler or fully qualified path). */ if (__get_dumpable(cprm.mm_flags) == SUID_DUMP_ROOT) { /* Setuid core dump mode */ cred->fsuid = GLOBAL_ROOT_UID; /* Dump root private */ need_suid_safe = true; } retval = coredump_wait(siginfo->si_signo, &core_state); if (retval < 0) goto fail_creds; old_cred = override_creds(cred); // 1. 判断是否采用管道转储 ispipe = format_corename(&cn, &cprm, &argv, &argc); /* 2. 如果是管道转储，则设置管道并调用用户模式辅助进程； 如果是文件转储，则打开文件并进行写入 */ if (ispipe) { int argi; int dump_count; char helper_argv; struct subprocess_info *sub_info; if (ispipe < 0) { printk(KERN_WARNING "format_corename failed\n"); printk(KERN_WARNING "Aborting core\n"); goto fail_unlock; } if (cprm.limit == 1) { printk(KERN_WARNING "Process %d(%s) has RLIMIT_CORE set to 1\n", task_tgid_vnr(current), current->comm); printk(KERN_WARNING "Aborting core\n"); goto fail_unlock; } cprm.limit = RLIM_INFINITY; dump_count = atomic_inc_return(&core_dump_count); if (core_pipe_limit && (core_pipe_limit < dump_count)) { printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n", task_tgid_vnr(current), current->comm); printk(KERN_WARNING "Skipping core dump\n"); goto fail_dropcount; } helper_argv = kmalloc_array(argc + 1, sizeof(*helper_argv), GFP_KERNEL); if (!helper_argv) { printk(KERN_WARNING "%s failed to allocate memory\n", __func__); goto fail_dropcount; } for (argi = 0; argi < argc; argi++) helper_argv[argi] = cn.corename + argv[argi]; helper_argv[argi] = NULL; retval = -ENOMEM; // 2.1 设置用户模式辅助程序 sub_info = call_usermodehelper_setup(helper_argv[0], helper_argv, NULL, GFP_KERNEL, umh_pipe_setup, NULL, &cprm); // 2.2 内核执行用户辅助程序 if (sub_info) retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); kfree(helper_argv); if (retval) { printk(KERN_INFO "Core dump to |%s pipe failed\n", cn.corename); goto close_fail; } } else { // 文件转储 .... } ... // 3. 查是否中断，如果没有中断，则写入核心转储数据 if (!dump_interrupted()) { /* * umh disabled with CONFIG_STATIC_USERMODEHELPER_PATH="" would * have this set to NULL. */ if (!cprm.file) { pr_info("Core dump to |%s disabled\n", cn.corename); goto close_fail; } if (!dump_vma_snapshot(&cprm)) goto close_fail; file_start_write(cprm.file); core_dumped = binfmt->core_dump(&cprm); /* * Ensures that file size is big enough to contain the current * file postion. This prevents gdb from complaining about * a truncated file if the last "write" to the file was * dump_skip. */ if (cprm.to_skip) { cprm.to_skip--; dump_emit(&cprm, "", 1); } file_end_write(cprm.file); free_vma_snapshot(&cprm); } // 4. 进行清理工作，包括关闭文件、减少核心转储计数、释放内存、结束核心转储等 if (ispipe && core_pipe_limit) wait_for_dump_helpers(cprm.file); close_fail: if (cprm.file) filp_close(cprm.file, NULL); fail_dropcount: if (ispipe) atomic_dec(&core_dump_count); fail_unlock: kfree(argv); kfree(cn.corename); coredump_finish(core_dumped); revert_creds(old_cred); fail_creds: put_cred(cred); fail: return; } static void wait_for_dump_helpers(struct file *file) { // 1. 获取管道的信息 struct pipe_inode_info *pipe = file->private_data; // 2.锁定管道，以防止其他进程同时修改管道的状态 pipe_lock(pipe); // 3. 增加管道的读者计数，并减少写者计数。这表明有一个新的读者（核心转储辅助进程）正在等待数据 pipe->readers++; pipe->writers--; // 4. 唤醒所有在管道读等待队列上等待的进程 wake_up_interruptible_sync(&pipe->rd_wait); // 5. 向所有注册了异步通知的读者发送 SIGIO 信号，通知它们有数据可读 kill_fasync(&pipe->fasync_readers, SIGIO, POLL_IN); // 6. 解锁管道，允许其他进程访问管道 pipe_unlock(pipe); /* * We actually want wait_event_freezable() but then we need * to clear TIF_SIGPENDING and improve dump_interrupted(). */ // 7. 当前进程进入可中断的等待状态，直到管道的读者计数等于1。这表明核心转储数据已经被读取完毕。 wait_event_interruptible(pipe->rd_wait, pipe->readers == 1); // 8. 再次锁定管道，以进行后续的状态更新 pipe_lock(pipe); // 9. 减少管道的读者计数，并增加写者计数。这表明读者已经完成了数据读取。 pipe->readers--; pipe->writers++; // 10. 解锁管道，允许其他进程访问管道 pipe_unlock(pipe); } // 设置管道 static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) { struct file *files[2]; struct coredump_params *cp = (struct coredump_params *)info->data; // 1. 创建一个管道，并将管道的两个文件描述符存储在 files数组中 int err = create_pipe_files(files, 0); if (err) return err; // 2. 管道的写端（files[1]）设置为 cp->file，以便后续的核心转储数据可以通过这个文件描述符写入 cp->file = files[1]; // 3. 将当前进程的标准输入（fd 0）替换为管道的读端（files[0]）。 // replace_fd 函数用于替换文件描述符，fput 函数用于减少文件引用计数 err = replace_fd(0, files[0], 0); fput(files[0]); /* and disallow core files too */ // 4. 设置当前进程的核心文件大小限制为1，用于防止递归核心转储 current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; return err; }

format_corename

format_corename函数作用：根据给定的模式字符串生成核心转储文件的名称，并处理管道模式，代码如下：

 static int format_corename(struct core_name *cn, struct coredump_params *cprm, size_t argv, int *argc) { const struct cred *cred = current_cred(); const char *pat_ptr = core_pattern; int ispipe = (*pat_ptr == '|'); bool was_space = false; int pid_in_pattern = 0; int err = 0; cn->used = 0; cn->corename = NULL; if (expand_corename(cn, core_name_size)) return -ENOMEM; cn->corename[0] = '\0'; // 1. 如果模式以管道符号开头，则分配内存用于存储命令行参数，并初始化参数数组 if (ispipe) { int argvs = sizeof(core_pattern) / 2; (*argv) = kmalloc_array(argvs, sizeof(argv), GFP_KERNEL); if (!(*argv)) return -ENOMEM; (*argv)[(*argc)++] = 0; ++pat_ptr; if (!(*pat_ptr)) return -ENOMEM; } /* Repeat as long as we have more pattern to process and more output space */ while (*pat_ptr) { /* * Split on spaces before doing template expansion so that * %e and %E don't get split if they have spaces in them */ if (ispipe) { if (isspace(*pat_ptr)) { if (cn->used != 0) was_space = true; pat_ptr++; continue; } else if (was_space) { was_space = false; err = cn_printf(cn, "%c", '\0'); if (err) return err; (*argv)[(*argc)++] = cn->used; } } // 遍历模式字符串，根据不同的模式字符（如 %p、%u、%s 等）生成相应的文件名 if (*pat_ptr != '%') { err = cn_printf(cn, "%c", *pat_ptr++); } else { switch (*++pat_ptr) { /* single % at the end, drop that */ case 0: goto out; /* Double percent, output one percent */ case '%': err = cn_printf(cn, "%c", '%'); break; /* pid */ case 'p': pid_in_pattern = 1; err = cn_printf(cn, "%d", task_tgid_vnr(current)); break; /* global pid */ case 'P': err = cn_printf(cn, "%d", task_tgid_nr(current)); break; ... default: break; } ++pat_ptr; } if (err) return err; } ... }

call_usermodehelper_setup

call_usermodehelper_setup函数作用：内核设置一个用户空间辅助进程的执行环境

// kernel/kernel/umh.c struct subprocess_info *call_usermodehelper_setup(const char *path, char argv, char envp, gfp_t gfp_mask, int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *info), void *data) { // 1. 分配内存用于存储 subprocess_info 结构体 struct subprocess_info *sub_info; sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask); if (!sub_info) goto out; // 2. 初始化工作队列，用于执行用户空间的辅助进程 INIT_WORK(&sub_info->work, call_usermodehelper_exec_work); // 3. 设置路径、参数、环境变量以及初始化和清理函数 #ifdef CONFIG_STATIC_USERMODEHELPER sub_info->path = CONFIG_STATIC_USERMODEHELPER_PATH; #else sub_info->path = path; #endif sub_info->argv = argv; sub_info->envp = envp; sub_info->cleanup = cleanup; sub_info->init = init; sub_info->data = data; out: return sub_info; }

call_usermodehelper_exec

call_usermodehelper_exec函数作用：在内核空间中启动一个用户空间的进程，通常用于执行一些特定的任务，如core文件转储

int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait) { // 1. 初始化了一些变量，并检查 sub_info->path 是否为空 unsigned int state = TASK_UNINTERRUPTIBLE; DECLARE_COMPLETION_ONSTACK(done); int retval = 0; if (!sub_info->path) { call_usermodehelper_freeinfo(sub_info); return -EINVAL; } // 2. 对用户模式辅助进程进行加锁，并检查是否禁用了用户模式辅助进程 helper_lock(); if (usermodehelper_disabled) { retval = -EBUSY; goto out; } /* * If there is no binary for us to call, then just return and get out of * here. This allows us to set STATIC_USERMODEHELPER_PATH to "" and * disable all call_usermodehelper() calls. */ if (strlen(sub_info->path) == 0) goto out; /* * Set the completion pointer only if there is a waiter. * This makes it possible to use umh_complete to free * the data structure in case of UMH_NO_WAIT. */ sub_info->complete = (wait == UMH_NO_WAIT) ? NULL : &done; sub_info->wait = wait; // 3. 将work排队到系统未绑定工作队列中 queue_work(system_unbound_wq, &sub_info->work); if (wait == UMH_NO_WAIT) /* task has freed sub_info */ goto unlock; if (wait & UMH_FREEZABLE) state |= TASK_FREEZABLE; if (wait & UMH_KILLABLE) { retval = wait_for_completion_state(&done, state | TASK_KILLABLE); if (!retval) goto wait_done; /* umh_complete() will see NULL and free sub_info */ if (xchg(&sub_info->complete, NULL)) goto unlock; /* * fallthrough; in case of -ERESTARTSYS now do uninterruptible * wait_for_completion_state(). Since umh_complete() shall call * complete() in a moment if xchg() above returned NULL, this * uninterruptible wait_for_completion_state() will not block * SIGKILL'ed processes for long. */ } wait_for_completion_state(&done, state); wait_done: retval = sub_info->retval; out: call_usermodehelper_freeinfo(sub_info); unlock: helper_unlock(); return retval; }

3.2.4 用户空间coredump辅助程序Demo

int main(int argc, char *argv[]) { int result = snprintf(name, sizeof(name), "/data/xxx/coredump/core-%s-%s", argv[1], argv[2]); ... fd = open(name, O_RDWR, 0777); while (numread = read(STDIN_FILENO, buf, BUF_SIZE)) { if ((numread == -1) && (errno != EINTR)) { break; } else if (numread > 0) { ptr = buf; while (numwrite = write(fd, ptr, numread)) { if ((numwrite == -1) && (errno != EINTR)) break; else if (numwrite == numread) break; else if (numwrite > 0) { ptr += numwrite; numread -= numwrite; } } if (numwrite == -1) { break; } } } close(fd); return 0; }

四、coredump实现原理

4.1 基本原理

用户程序发生某些错误或异常时，在Linux内核会捕获到异常，并给用户进程发送signal异常信号，进程在返回用户空间之前处理信号，调用Linux内核coredump，生成elf格式的core文件，保存到指定的路径。

4.2 核心代码段

调用 do_coredump 函数来生成 core文件。如下：

void do_coredump(const kernel_siginfo_t *siginfo) { ...... binfmt = mm->binfmt; if (!binfmt || !binfmt->core_dump) goto fail; if (!__get_dumpable(cprm.mm_flags)) goto fail; ...... // 1.生成core文件名称 ispipe = format_corename(&cn, &cprm, &argv, &argc); ...... // 2.创建core文件 cprm.file = file_open_root(&root, cn.corename, open_flags, 0600); ...... // 3.将进程的内存信息写入core文件 core_dumped = binfmt->core_dump(&cprm); ...... }

elf_core_dump 函数负责将进程的内存状态信息写入elf格式的core文件，以便后续的gdb调试和分析。如下：

// kernel_platform/msm-kernel/fs/binfmt_elf.c static int elf_core_dump(struct coredump_params *cprm) { ...... /* * Collect all the non-memory information about the process for the * notes. This also sets up the file header. */ // 1.函数填充 ELF 头部和 notes 信息 if (!fill_note_info(&elf, e_phnum, &info, cprm)) goto end_coredump; has_dumped = 1; // 2.计算 ELF 头部、程序头部和 notes 节的大小，并分配相应的内存 offset += sizeof(elf); /* Elf header */ offset += segs * sizeof(struct elf_phdr); /* Program headers */ ...... /* Write program headers for segments dump */ for (i = 0; i < cprm->vma_count; i++) { struct core_vma_metadata *meta = cprm->vma_meta + i; struct elf_phdr phdr; phdr.p_type = PT_LOAD; phdr.p_offset = offset; phdr.p_vaddr = meta->start; phdr.p_paddr = 0; phdr.p_filesz = meta->dump_size; phdr.p_memsz = meta->end - meta->start; offset += phdr.p_filesz; phdr.p_flags = 0; if (meta->flags & VM_READ) phdr.p_flags |= PF_R; if (meta->flags & VM_WRITE) phdr.p_flags |= PF_W; if (meta->flags & VM_EXEC) phdr.p_flags |= PF_X; phdr.p_align = ELF_EXEC_PAGESIZE; if (!dump_emit(cprm, &phdr, sizeof(phdr))) goto end_coredump; } // 3.写入 ELF 头部和程序头部 if (!elf_core_write_extra_phdrs(cprm, offset)) goto end_coredump; /* write out the notes section */ // 4.写入 notes信息 if (!write_note_info(&info, cprm)) goto end_coredump; /* For cell spufs */ // 5.写入数据段 if (elf_coredump_extra_notes_write(cprm)) goto end_coredump; /* Align to page */ dump_skip_to(cprm, dataoff); for (i = 0; i < cprm->vma_count; i++) { struct core_vma_metadata *meta = cprm->vma_meta + i; if (!dump_user_range(cprm, meta->start, meta->dump_size)) goto end_coredump; } // 6.写入扩展编号 if (!elf_core_write_extra_data(cprm)) goto end_coredump; if (e_phnum == PN_XNUM) { if (!dump_emit(cprm, shdr4extnum, sizeof(*shdr4extnum))) goto end_coredump; } end_coredump: free_note_info(&info); kfree(shdr4extnum); kfree(phdr4note); return has_dumped; }

4.3 代码时序

异常捕获、信号处理&生成core文件的功能逻辑的代码时序，如下：

4.4 core文件格式及内容

coredump抓取的core文件为elf格式，可以使用gdb调试，定位分析问题。

core文件内容，如下：

ELF Header: Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 Class: ELF64 Data: 2's complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: CORE (Core file) Machine: AArch64 Version: 0x1 Entry point address: 0x0 Start of program headers: 64 (bytes into file) Start of section headers: 0 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 138 Size of section headers: 0 (bytes) Number of section headers: 0 Section header string table index: 0 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000001e70 0x0000000000000000 0x0000000000000000 0x00000000000018a8 0x0000000000000000 0x0 LOAD 0x0000000000004000 0x000000560ca89000 0x0000000000000000 0x0000000000000000 0x0000000000002000 R 0x1000 LOAD 0x0000000000004000 0x000000560ca8b000 0x0000000000000000 0x0000000000000000 0x0000000000003000 R E 0x1000 LOAD 0x0000000000004000 0x000000560ca8e000 0x0000000000000000 0x0000000000001000 0x0000000000001000 R 0x1000 ... Displaying notes found at file offset 0x00001e70 with length 0x000018a8: Owner Data size Description CORE 0x00000188 NT_PRSTATUS (prstatus structure) CORE 0x00000088 NT_PRPSINFO (prpsinfo structure) CORE 0x00000080 NT_SIGINFO (siginfo_t data) CORE 0x00000150 NT_AUXV (auxiliary vector) CORE 0x00000f6e NT_FILE (mapped files) Page size: 4096 Start End Page Offset 0x000000560ca89000 0x000000560ca8b000 0x0000000000000000 /system/bin/coredump-test-bin 0x000000560ca8b000 0x000000560ca8e000 0x0000000000000002 /system/bin/coredump-test-bin ... CORE 0x00000210 NT_FPREGSET (floating point registers) LINUX 0x00000010 NT_ARM_TLS (AArch TLS registers) description data: 00 10 e4 45 7e 00 00 00 00 00 00 00 00 00 00 00 LINUX 0x00000108 NT_ARM_HW_BREAK (AArch hardware breakpoint registers) description data: 06 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 LINUX 0x00000108 NT_ARM_HW_WATCH (AArch hardware watchpoint registers) description data: 04 09 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 LINUX 0x00000004 Unknown note type: (0x00000404) description data: ff ff ff ff LINUX 0x00000010 Unknown note type: (0x00000406) description data: 00 00 00 00 80 ff 7f 00 00 00 00 00 80 ff 7f 00 LINUX 0x00000008 Unknown note type: (0x0000040a) description data: 0f 00 00 00 00 00 00 00 LINUX 0x00000008 Unknown note type: (0x00000409) description data: 01 00 00 00 00 00 00 00

core文件内容主要包括ELF Header、Program Headers、NOTE segment.

ELF Header：用于记录core文件的基本信息和结构。

Program Headers: 记录内存中映射文件的信息，以及segment的权限和属性。

NOTE segment：记录进程崩溃时刻的进程状态、寄存器、信号信息、辅助向量和映射文件的详细信息。通过这些信息，gdb调试工具可以重建崩溃时的内存布局，分析崩溃原因，并帮助开发者精确定位分析问题。

五、Demo案例

1）Demo程序

进程发生异常crash后，抓取tombstone和core文件。

2）生成的tombstone文件

从抓取的tombstone文件分析，只能看出大致的原因，无法精确定位到根本原因或哪句代码出错导致进程crash.因此，需要借助coredump，抓取core文件来精确定位分析这类问题。

Cmdline: ../../system/bin/coredump-test-bin use-after-free pid: 11966, tid: 11966, name: coredump-test-b >>> ../../system/bin/coredump-test-bin <<< uid: 0 ... backtrace: #01 pc 0000000000090088 /system/lib64/libc.so (__vfprintf+10416) (BuildId: 567e41669f1cb528e72fe319cd09033b) #02 pc 00000000000ac06c /system/lib64/libc.so (vsnprintf+192) (BuildId: 567e41669f1cb528e72fe319cd09033b) #03 pc 0000000000006afc /system/lib64/liblog.so (__android_log_print+184) (BuildId: 87ba6a9314f00fab650fb8fad7913d58) #04 pc 00000000000010a4 /system/bin/coredump-test-bin (main+80) (BuildId: c97bade065c198c12dcca74f107c513c) #05 pc 0000000000048768 /system/lib64/libc.so (__libc_init+96) (BuildId: 567e41669f1cb ...

3）生成的core文件

打开coredump功能，抓取core文件。core文件为elf格式，可以用gdb调试。

用gdb调试Demo程序和生成的core文件，执行gdb ./coredump-test-bin ./core-coredump-test-bin-11966-命令，可以精确定位到是源文件哪一行代码出错，如下：

---> ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000000000040053c in square (a=1, b=2) at test.c:7 7 *p = 666; # 可见在test.c中的第7行，出现了问题。 # (gdb) backtrace // 输入backtrace ---> #0 0x000000000040053c in square (a=1, b=2) at test.c:7 // 可见在test.c中的第7行，出现了问题。 #1 0x0000000000 in doCalc (num1=1, num2=2) at test.c:14 #2 0x0000000000 in main () at test.c:22

六、风险及解决方案

打开coredump功能，存在以下风险：

1）若系统中存在native进程反复crash自启，尤其在研发阶段这种现象很普遍，会导致持续不断产生core文件，磁盘空间很快被占满。

解决方案：结合quota机制，core文件路径存储空间分配project_id，设置quota阈值（存储空间上限），超过阈值就自动覆盖老的文件

免责声明：本站所有文章内容,图片，视频等均是来源于用户投稿和互联网及文摘转载整编而成，不代表本站观点，不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益，请在线联系站长,一经查实,本站将立刻删除。本文来自网络,若有侵权，请联系删除，如若转载，请注明出处：https://haidsoft.com/118292.html

Linux CoreDump机制详解

一、背景

二、coredump介绍

2.1 什么是coredump

2.2 coredump作用