This is the development tree. Production downloads are at:
OTHER License
Bot releases are hidden (Show)
jpeg_carved
feature extractor to jpeg
so that the flag -S jpeg_carve_mode=2
enables carving of all contiguous JPEGs.bulk_extractor -h
to explain all available carve modes.The digital forensics tool bulk_extractor version 2.1.0 is now available for general use.
Release download point:
https://github.com/simsong/bulk_extractor/releases
GIT repository:
https://github.com/simsong/bulk_extractor
I am pleased to announce the general availability of bulk_extractor version 2.1. This is the first release of bulk_extractor version 2 that is recommended for general use.
Bulk_extractor 2 is a significant rewrite of bulk_extractor. Verison 2 significantly improves the performance and portability of version 1. The rewrite started in 2016 and was largely completed by January 2021.
Details of the rewrite, including a detailed report of the performance improvements and lessons learned, can be found in Sharpening Your Tools: Updating bulk_extractor for the 2020s, Simson Garfinkel and Jon Stewart. Communications of the ACM, August 2023.
Bulk_extractor version 2.1 is the first stable version of bulk_extractor version 2 that is recommended for general use. It corrects a problem with the string search scanner that caused bulk_extractor to hang on open-ended regular expressions such as [a-z]*@company.com
specified with the -F
flag. With version 2.1, we have replaced the C++17 regex compiler with Google's RE2 regex compiler that avoids backtracking. As a result, these open-ended regular expressions no longer hang.
BEViewer is not included in this release. Although it works with Version 2, it is not yet officially supported.
scan_outlook and scan_hiberfile are now disabled by default because they did not have unit tests. These scanners can be re-enabled by specifying -eoutlook and -ehiberfile on the command line.
scan_aes no longer scans for 192-bit AES keys by default, although this behavior can be re-enabled.
The RAR decompressor does not reliably decompress all RAR files and only supports RAR v1, v2, and v3.
The RAR scanner will not reliably name carved RAR file components that contain UTF-8 characters in their name.
We are looking for help to implement the following algorithms:
WkdmDecompress - http://www.opensource.apple.com/source/xnu/xnu-1456.1.26/iokit/Kernel/WKdmDecompress.c
xz, 7zip, and LZMA/LZMA2 decompression
lzo decompression
BZIP2 decompression
CAB decompression
Scanning for the start of BitLocker protected volumes.
NTFS decompression
Better handling of MIME encoding
Process more data with -e xor and look for CCN hits. Most will be false positives
Demonstration of bulk_extractor running on a grid (how fast can it run?)
Python Bridge - run multiple copies of python to let scanners be written in python
scan_pipe - runs every sbuf through an external program.
Published by simsong 10 months ago
Minor packaging updates.
Published by simsong over 1 year ago
Version 2.0.3 is released. However, please note:
Published by simsong over 2 years ago
Release 2.0.0 of bulk_extractor
, a high-performance digital forensics tool that works like a "find evidence" button, pulling actionable intelligence out of disk images, files, memory dumps, network traffic, and just about anything else.
Note: we recommend using the bulk_extractor-2.0.0.tar.gz
file attached, which is a proper release, rather than cloning the repo and all of the sub-repos and then using automake to create the configure script.
Published by simsong almost 3 years ago
bulk_extractor
--- a high-performance digital forensics tool that scans a disk image, a file, or a directory of files and extracts information such as email addresses, JPEGs and JSON snippets without parsing the file system or file system structures. Written in C++ and highly parallelized.
This beta:
Please report bugs to https://github.com/simsong/bulk_extractor/issues
Published by simsong almost 3 years ago
bulk_extractor
is a high-performance C++ program that scans a disk image, a file, or a directory of files and extracts information such as email addresses, JPEGs and JSON snippets without parsing the file system or file system structures.
This beta:
Download from: bulk_extractor-2.0.0-beta2.tar.gz
Published by simsong about 3 years ago
bulk_extractor
is a high-performance digital forensics tool that finds data including JPEG images, email addresses, social security numbers, and other kinds of "known formats" in files and on raw disk partitions, even if the data are compressed, BASE64 encoded, or transformed using other well-known algorithms.
After six years, we have a new release of bulk_extractor! This version now requires C++17, includes a significant test suite with significant code coverage, and is designed for systems with high numbers of CPU cores. Tested on Ubuntu, MacOS, and Fedora.
Published by simsong about 10 years ago
Release 1.5.3 corrects minor bugs that were found in version 1.5.0, and represents a significant improvement over release 1.4.0.
Published by simsong almost 11 years ago
The official 1.4.0 release. Reasonably well tested.
Published by simsong over 11 years ago
Please let us know how it works. We are especially interested in feedback on the XOR scanner.