1. Match Legal Database(s) Hash Codes (optional)
SHA-1, MD5 and CRC32 hash code values are calculated and matched against the entries in any EnCase, Hashkeeper, NIST NSRL or compatible legal hash code databases that the user has provided. Read the included fihash.txt document for details on using third party hash databases. This option does not move the File Investigator Hash database to the 1st stage, only the third party databases. By default this stage is included with the later Hash Code Matching stage, but can be moved to the front when Forensic Investigators need to eliminate ‘Known Good’ files before spending the time to identify them.
2. Match File Header/Magic #
The first 32 bytes, of each file, are read and matched against the entries in the File Investigator Pattern database. This is typically the first stage used.
Several methods are used to match patterns from the File Investigator Pattern Database to patterns located deeper inside each file. This stage is used when the Header Matching stage fails.
Methods:
- Seek offset from end of file
- Seek offset, read new offset, Seek new offset (LH)
- Seek offset, read new offset, Seek new offset (HL)
- Find text string before offset (case sensitive)
- Find text string before offset (non-case sensitive)
4. Match Byte Value Distribution Pattern

5. Interpret & Validate Identification
Several methods are used to interpret and validate the results of the previous identification stages. Each time a potential identification is made, this stage is used to decide whether the identification is accurate. If it is found to be inaccurate, then the stage responsible for the pattern match is instructed to continue looking for a better match.
SHA-1, MD5 and CRC32 hash code values are calculated and matched against the entries in the File Investigator Hash Database (first) and any legal hash code databases that the user has provided. Read the included fihash.txt document for details on using third party hash databases. This option does not use the third party hash databases if they were already used as the first identification stage.
7. Floating Header Match (Secondary)
If a file fails to get identified by the previous pattern matching and interpretation methods, then this option allows the file to be matched against the list of known file extensions. This is the method that MS Windows uses and produces a LOW accuracy rating in File Investigator.
8. Match Hash Codes (Secondary, Legal DB(s) only)
Same as the previous “Match Inter-File Pattern/Signature/Magic #” stage, but records the resulting FI Description Database index value(s) in separately in the “Numbers Metadata” field. This optional stage is intended to catch files with floating headers that are stored within a different type of file that looks innocent. This stage will be performed even when the file is already identified by a previous stage. This “Floating Header” support has been added in response to a whitepaper located at http://www.securityelf.org/magicbyte.html.
9. Match File Extension
10. Read Metadata
Once a file is identified, an attempt will be made to read its Metadata values.
Any of the above stages can be disabled when the user wants to speed up the process.