← back to tools

forensic_excavator

13 files · ready for inspection

readme

# Forensic Excavator **Interactive forensic excavation framework for metadata, document artifacts, redaction failures, and provenance analysis.** Forensic Excavator is a Linux-native, analyst-driven forensic triage framework designed to excavate **metadata, hidden text, content streams, embedded artifacts, entities, and timelines** from documents and datasets at scale. --- ## What Forensic Excavator Does Forensic Excavator performs **read-only forensic analysis** against individual files or entire datasets. It automatically applies the appropriate forensic tooling based on file type and surfaces findings **both in the terminal and as preserved artifacts on disk**. It is built to expose: * Hidden and residual metadata * Improper and incomplete PDF redactions * Content-stream text surviving visual redaction * Embedded objects and binary artifacts * Named entities (people, organizations, locations, dates) * Timeline and provenance indicators * Raw strings leaked in binary structures * Sanitization and structural failures in PDFs * Cryptographic hashes for chain of custody This is forensic triage, not guesswork. If the data exists, this framework is designed to surface it. --- ## Capabilities ### Universal (All File Types) * File type identification * SHA-256 hashing for chain of custody * Deep metadata extraction via ExifTool * Raw string extraction from binary data * Filesystem timestamp capture for timeline reconstruction * Terminal output of key forensic findings --- ### PDF-Specific Forensics * Structural analysis (`pdfinfo`) * Layout-preserving text extraction (`pdftotext`) * Content-stream text recovery using PyMuPDF * Detection of improperly redacted text objects * Automatic bad-redaction flagging * PDF sanitization and structural validation via `qpdf` * Separation of visible text vs underlying content * Entity extraction from recovered and visible text --- ### Image & Binary Analysis * Embedded payload and artifact discovery using `binwalk` * Identification of hidden or appended data in image files --- ### Entity Extraction * Automated extraction of: * Persons * Organizations * Locations * Dates * Uses spaCy with a locally installed language model --- ### Timeline Reconstruction * Filesystem creation and modification timestamps * Metadata-derived document timestamps * Per-file timeline artifacts suitable for correlation and reporting --- ### Dataset Support * Recursive directory analysis * Mixed file-type handling * Clean separation of forensic artifacts by category * Scales from single documents to mounted evidence volumes --- ## Installation ### 1. System Dependencies ```bash sudo apt update sudo apt install -y exiftool poppler-utils binutils coreutils binwalk qpdf python3-pip ``` --- ### 2. Python Dependencies (Required) Forensic Excavator **requires spaCy and its language model**. ```bash sudo pip3 install spacy pymupdf --break-system-packages sudo python3 -m spacy download en_core_web_sm --break-system-packages ``` --- ### 3. Clone the Repository ```bash git clone https://github.com/ekomsSavior/forensic_excavator.git cd forensic_excavator ``` --- ### 4. Permissions ```bash chmod +x excavator.py ``` --- ## Usage Forensic Excavator is fully interactive. No arguments. No flags. No shortcuts. Run: ```bash python3 excavator.py ``` You will be prompted: ``` [?] Enter file or directory path to excavate: ``` Examples: ```bash /home/kali/Documents/redacted.pdf /home/kali/Datasets/case_files/ /mnt/evidence_drive/ ``` All files under the specified path will be analyzed recursively. --- ## How to Use This to Maximum Effect ### PDF Redaction Failure Detection Compare: * `output/pdf/<file>.text.txt` * `output/pdf/unredacted/<file>.unredacted.txt` If recovered text contains information not visible in the rendered document, the redaction failed. The framework will automatically flag suspected redaction failures during analysis. --- ### Entity Intelligence Review: ```bash output/entities/<file>.entities.txt ``` Use this to: * Identify individuals, organizations, and locations * Correlate entities across multiple documents * Build investigative leads rapidly --- ### Metadata & Provenance Analysis Inspect: ```bash output/exif/*.exif.txt ``` Focus on: * Creator and Producer inconsistencies * Editing tools and workflows * Timestamp conflicts * XMP and document history remnants --- ### Embedded & Binary Artifact Discovery Review: ```bash output/strings/ output/binwalk/ ``` Look for: * Embedded file signatures * Residual filenames and paths * URLs, identifiers, and object references --- ### Chain of Custody Use: ```bash output/hashes/*.sha256 ``` Hashes allow you to: * Verify integrity * Reproduce findings * Defend analysis in adversarial settings --- ## Disclaimer This tool is provided **as-is** for lawful forensic analysis, investigative research, journalism, and security testing. You are solely responsible for: * Authorization to analyze the data * Legal and ethical use * Interpretation of findings The author assumes no liability for misuse.

source code

license

MIT License Copyright (c) 2026 ek0mssavi0r Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. USE AT YOUR OWN RISK. NO WARRANTY PROVIDED.
download zip // inspect all source before execution