ByteHook is an Android PLT hook library which supports armeabi-v7a, arm64-v8a, x86 and x86_64.
MIT License
Bot releases are hidden (Show)
.so
suffix.In previous versions, it was limited to hooking only ELF files with the .so
suffix. This restriction is now removed.
The crash protection mechanism of bytehook registers the signal handler of sigsegv and sigbus, but SA_EXPOSE_TAGBITS
was not added in previous versions. Because the signal handler of bytehook will be executed before the art sigchain, the tag bits in the address will be lost, which will lead to the failure of the MTE mechanism.
In previous versions, bytehook's signal handler would occupy some signal stack memory (64 bytes for arm32 and 128 bytes for arm64). After optimization, the signal stack memory will not be occupied.
Since the Android signal stack memory space is very limited (depending on the Android version and CPU architecture, each thread is approximately between 8KB and 32KB), the additional occupation of the signal stack memory space can easily aggravate the risk of signal stack overflow.
.so
后缀的 ELF 文件的 bug。在之前的版本中,限制了只能 hook .so
后缀的 ELF 文件。现在去掉了这个限制。
bytehook 的崩溃保护机制注册了 sigsegv 和 sigbus 的信号处理函数,但是在之前的版本中没有添加 SA_EXPOSE_TAGBITS
。因为 bytehook 的信号处理函数会比 art sigchain 的先执行,于是导致了地址中 tag bits 丢失,进而导致 MTE 机制失效。
在之前的版本中,bytehook 的信号处理函数会占用一些信号栈内存(arm32 为 64 字节,arm64 为 128 字节)。优化后不会占用信号栈内存。
由于 Android 信号栈内存空间十分有限(根据 Android 版本和 CPU 架构不同,每个线程大约在 8KB 到 32 KB 之间),所以对信号栈内存空间的额外占用,很容易加剧信号栈溢出的风险。
Published by caikelun over 1 year ago
Added API and initialization parameters to enable and disable operation record.
Writing operation records every time hook and unhook will have some impact on performance. It is recommended to sample or enable operation records according to actual needs.
PR: https://github.com/bytedance/bhook/pull/70. Thanks: @cmzy
The source of the problem is that the dlopen
and dlclose
proxy functions inside ByteHook need to know whether they are currently inside the linker's global mutex lock in order to properly flush the ELF cache. This mechanism was previously implemented by adding a counter in the dlopen
and dlclose
proxy functions. However, dlopen
and dlclose
need to be hooked separately, they are not atomic, which may cause the initial value of the counter to be incorrect.
Now we use the bionic pthread_mutex_internal_t
internal data structure to determine whether it is currently in the global mutex lock of the linker. We have found this method to be safe and reliable after extensive testing and verification of the production environment.
mmap
being inlinehooked.Replace the mmap
function call used in the internal logic of ByteHook with a system call, which avoids the interaction with the external mmap
inlinehook proxy function, which may cause ANR problems.
localtime_r()
in operation record.localtime_r()
will call getenv()
to access the global environ
, if there are concurrent setenv()
calls at this time, a crash may occur, because in bionic, the access to environ
is not protected by a lock.
What we can do currently is try to avoid calling getenv()
and setenv()
.
abort()
will be triggered if a proxy function written for automatic mode is used.Doing so allows for clearer and earlier detection of such do-not-use issues.
In most system versions and situations, ByteHook only use 1 TLS key now.
The available TLS keys per process are limited.
At the same time, in the case of only ELF files, the version number of ByteHook can also be determined in the following ways:
llvm-strings libbytehook.so | grep "bytehook version"
增加了 API 和初始化参数用于开启和关闭操作记录。
每次 hook 和 unhook 时都写操作记录对性能会有一些影响,建议根据实际需要采样的或有针对性的开启操作记录。
PR: https://github.com/bytedance/bhook/pull/70. Thanks: @cmzy
问题的根源是 ByteHook 内部的 dlopen
和 dlclose
代理函数需要知道当前是否在 linker 的全局 mutex 锁内,以便正确刷新 ELF 缓存。这种机制之前是通过在 dlopen
和 dlclose
代理函数中增加一个计数器完成的。但是对 dlopen
和 dlclose
需要分别 hook,它们不是原子的,这导致了计数器初始值可能不正确。
现在我们改为通过 bionic pthread_mutex_internal_t
内部数据结构来判断当前是否在 linker 的全局 mutex 锁内。我们经过大量的测试和生成环境的验证,发现这种方法是安全和可靠的。
mmap
被 inlinehook 导致的 ANR 问题。把 ByteHook 内部逻辑中使用的 mmap
函数调用替换成了系统调用,这样做避免了和外部 mmap
inlinehook 代理函数之间的相互作用,进而可能导致 ANR 问题。
localtime_r()
会调用 getenv()
来访问全局的 environ
,如果此时存在并发的 setenv()
调用,则可能会发生崩溃,因为对 environ
的访问在 bionic 中没有锁保护。
我们目前能做的是尽量避免调用 getenv()
和 setenv()
。
abort()
。这样做可以更明确和更早的发现这类勿用问题。
在绝大多数系统版本和情况下,现在 ByteHook 只会使用一个 TLS key。
每个进程的可用 TLS key 是有限的。
同时,在仅有 ELF 文件的情况下,也可以通过以下方式确定 ByteHook 的版本号:
llvm-strings libbytehook.so | grep "bytehook version"
Published by caikelun almost 2 years ago
The first LOAD segment of ELF may be read-only (use the linker option --rosegment
), and the /proc/self/maps
at this time may look like this:
75b8d000-75b9f000 r--p 00000000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75b9f000-75bde000 r-xp 00012000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75bde000-75be1000 r--p 00051000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75be1000-75be2000 rw-p 00054000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
In previous ByteHook versions, this type of ELF could not be hooked in Android 4.x.
ByteHook#init()
is called concurrently.It may actually be still being initialized, but it returns a state that has been initialized.
ByteHook needs to obtain several symbol addresses in libc.so
through dlopen
and dlsym
during initialization. These operations need to hold the linker's global mutex lock. We moved the above operations to .init_array
of libbytehook.so
.
ELF 的第一个 LOAD segment 可能是只读的(用链接器选项 --rosegment
),此时的 /proc/self/maps
大概是这样的:
75b8d000-75b9f000 r--p 00000000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75b9f000-75bde000 r-xp 00012000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75bde000-75be1000 r--p 00051000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
75be1000-75be2000 rw-p 00054000 b3:1c 89884 /data/app-lib/io.hexhacking.xdl.sample-2/libquick.so
在之前的 ByteHook 版本中,在 Android 4.x 中这种类型 ELF 无法被 hook。
ByteHook#init()
时可能返回错误的初始化状态的 bug。可能实际还处在初始化中,但是却返回了已经初始化完成的状态。
ByteHook 需要在初始化时通过 dlopen
和 dlsym
获取 libc.so
中的几个符号地址,这些操作需要持有 linker 的全局 mutex 锁,我们将上述操作移动到了 libbytehook.so
的 .init_array
中。
Published by caikelun about 2 years ago
bh_dl_iterate
module on Androd 4.x.In Android 4.x, the system does not provide the dl_iterate_phdr
function, so we traverse the ELF list by scanning maps. In the process of traversal, since the previous implementation does not hold the linker's mutex global lock, when we read the ELF header information, the ELF may have been dlclose
ed, which will cause a crash. The solution is to hold the linker's mutex global lock while scanning the maps.
callee_path_name
when hooking caused unhook to fail.This is caused by a program logic error.
bh_dl_iterate
模块的一个 native 崩溃。在 Android 4.x 中,系统没有提供 dl_iterate_phdr 函数,所以我们通过扫描 maps 的方式来遍历 ELF list。在遍历的过程中,由于之前的实现没有持有 linker 的 mutex 全局锁,所以在我们读取 ELF 头部信息的时候,可能这个 ELF 已经被 dlclose 了,于是会导致崩溃。解决的方式是在扫描 maps 时持有 linker 的 mutex 全局锁。
callee_path_name
导致 unhook 失败的 bug”。这是一个程序逻辑错误导致的。
Published by caikelun over 2 years ago
dlclose
in some proxy functions could cause a deadlock.This is a very rare case: in the hook operation flow of ByteHook itself, some functions used are hooked, and dlclose
is called in the proxy function. For example: the call of mmap
to mmap64
in libc.so
is hooked, and dlclose
is called in mmap64_proxy
. ByteHook can prevent itself from being hooked by PLT, but cannot prevent other dynamic libraries on the call chain from being hooked.
bytehook_add_ignore
and java layer addIgnore
) for setting the dynamic libraries that need to be ignored globally.We may need to ignore some dynamic libraries globally. For example, some hardened dynamic libraries from third parties may contain some unknown protection errors. Executing hooks on these dynamic libraries may cause unknown problems. Hooks to dlopen
and dlclose
inside ByteHook are also not available.
dlclose
可能引起死锁的问题。这是一种非常罕见的情况:在 ByteHook 自身的 hook 操作流程中,某些用到的函数被 hook 了,而在 proxy 函数中调用了 dlclose
。比如:libc.so
中 mmap
对 mmap64
的调用被 hook 了,在 mmap64_proxy
中调用了 dlclose
。ByteHook 能避免自身被 PLT hook,但是无法阻止调用链上其他动态库被 hook。
bytehook_add_ignore
和 java 层 addIgnore
),用于设置全局需要忽略的动态库。我们可能需要全局的忽略某些动态库,例如某些来自第三方的加固过的动态库,可能包含某些未知的防护错误,对这些动态库执行 hook 可能引起未知的问题。包括 ByteHook 内部对 dlopen
和 dlclose
的 hook 也不能进行。
Published by caikelun almost 3 years ago
Published by caikelun almost 3 years ago
Fix an occasional crash bug caused by GOT table data reading.
In some special cases, the dynamic library will call its own function through PLT, but the called function is not an exported function, so it is not in .hash and .gnu.hash. In the previous implementation, this kind of PLT call could not be hooked.
In the previous implementation, when a thread executes a proxy function for the first time, it will call mmap
and prctl
once.
We have added a module for recording hook / unhook operation records and the corresponding data export interface. You can use these data to count the success rate of hook / unhook, the reason for operation failure, etc. You can also make a comprehensive analysis of these data and app crash information.
修复了一个偶现的读取 GOT 表数据引起崩溃的 bug。
某些特殊情况下,动态库会通过 PLT 调用自身的函数,但是被调用函数不是导出函数,因此不在 .hash 和 .gnu.hash 中。在之前的实现中,这种 PLT 调用是 hook 不到的。
在之前的实现中,线程第一次执行到一个 proxy 函数时,会调用一次 mmap
和一次 prctl
。
我们增加了一个用于记录 hook / unhook 操作记录的模块,以及对应的数据导出接口。你可以用这些数据统计 hook / unhook 的成功率,操作失败的原因等,也可以把这些数据和 app 的崩溃信息一起做综合分析。
Published by caikelun about 3 years ago
In manual mode, the caller needs to save the original function address in the hooked
callback first, and then we can replace the address in the GOT. Otherwise, it may crash due to timing issues. So we added an additional callback (status code is BYTEHOOK_STATUS_CODE_ORIG_ADDR
) to the manual mode to allow the caller to save the original function address.
Thanks to the contributors from iQiyi Video.
If dlopen
or dlclose
in .init_array
or .fini_array
, a deadlock may occur between linker-mutex
and dlclose-proxy-rwlock
.
Thanks to the contributors from iQiyi Video and Toutiao.
在手动模式中,调用者首先需要在 hooked
回调中保存原函数地址,然后我们才能替换 GOT 中的地址。否则,可能由于时序问题引起崩溃。所以我们在手动模式中,额外增加了一次回调(状态码是 BYTEHOOK_STATUS_CODE_ORIG_ADDR
),用于让调用者保存原函数地址。
感谢来自爱奇艺视频的贡献者。
如果在 .init_array
或 .fini_array
中存在 dlopen
或 dlclose
,可能在 linker-mutex
与 dlclose-proxy-rwlock
之间发生死锁。
感谢来自爱奇艺视频和今日头条的贡献者。
Published by caikelun about 3 years ago
Published by caikelun about 3 years ago
First version.
第一个版本。