||ITL Bulletin -- November 2001|
Technical comments: email@example.com
Website comments: firstname.lastname@example.org
|Updated on: 2001-11-21
COMPUTER FORENSICS GUIDANCE
The NSRL provides a set of reference data that can be used to reduce the number of files that have to be reviewed or examined during an investigation. This can significantly cut down on the amount of time required to gather and verify evidence in crimes involving computers.
The objective of the CFTT is to provide a measure of assurance that the tools used in computer forensics investigations produce accurate results. Since the computer forensics arena is relatively new in terms of computer history, there are few standards available to define how these tools should operate and perform. The task of the CFTT is to develop specifications and test methods for these tools. The results provide the information necessary for users to make informed choices about acquiring and using computer forensics tools.
This ITL Bulletin presents some of the results of the work on these two projects and provides recommendations on how the results can be applied.
The National Software
Reference Library Project
The objective of the NSRL is to provide a foundation for automating the examination process by separating these files into those that are relevant to the investigation and those that are not. This is accomplished by computing a unique identifier for each file based on the file’s contents. These identifiers can be used as file signatures or “fingerprints” for the associated files. The identifiers can then be compared to entries in a database of known fingerprints. If a file’s fingerprint matches one in the database, it is a known file and can be eliminated automatically from examination. If the fingerprint does not match anything in the database, then the file is unknown and should be examined further. How this process is accomplished is described below.
NIST has been collecting software from various sources over the past 18 months. This software is recorded as the original source for known files and stored as a permanent part of the NSRL. The software in each package is stored on CD, diskette, magnetic tape, or other electronic storage medium. For example, one CD may contain a full implementation of an operating system. Another may contain a newer version of the same operating system. Other packages contain database management software, photo editors, word processors, image libraries, network browsers, compilers, accounting packages, and many other types of software. The concept is to collect as many different examples, versions, and updates of software as possible in order to generate file signatures for as many known files as possible.
Each file within a package, which may contain many thousands of individual files, is “fingerprinted” by passing the file through a program that computes a hash code. A hash code is a large number computed from the entire string of bits that form the file. The hash code is computed in such a way that if one bit in the file is changed, a completely different hash code is produced. To minimize the possibility that two different files may generate the same hash code, a sufficiently large hash value is computed.
The primary hash value used in the NSRL Reference Data Set (RDS) is the Secure Hash Algorithm (SHA-1) specified in Federal Information Processing Standard (FIPS) 180-1. SHA-1 is a 160-bit hashing algorithm, meaning that the hash value derived from the algorithm is 160 bits in length. The probability that two different files will produce the same SHA-1 value is 1 in 1080, a very small probability. Several other standard hash values also are computed for each file. These include Message Digest 4 (MD4), Message Digest 5 (MD5), and a 32-bit Cyclical Redundancy Checksum (CRC32). These allow the SHA-1 values to be cross-referenced by other products that depend on different hash values. Additionally, this further ensures that no two files will have the same set of hash values. The hash values, file name, file size, and information identifying the source of the file are stored in the RDS. A separate, parallel, and independent process is used to validate the results of the primary RDS implementation. This ensures that the hashes computed can be verified to identify specific files in the RDS. Once verified and validated, the RDS is written to a master CD, duplicated, and distributed through NIST’s Standard Reference Data Office as Special Database #28 (http://www.nist.gov/srd/nistsd28.htm).
When a computer hard disk, CD, or other storage medium becomes part of an investigation, the files stored on it can be “fingerprinted” using SHA-1, MD4, or MD5 through separate software that may be acquired from other sources. These fingerprints can be compared to the known file fingerprints in the RDS. Those files that have matching hash values can be discarded from the investigation without further examination; those that do not match the database should be examined further.
Uses of the Reference
Organizations are looking at the RDS for other purposes. For example, the RDS can be used in intellectual property investigations to find pirated software on a specific computer. If the pirated software is on the computer, i.e., a specific file or set of files exists on the computer, hash values of those files will match known files in the RDS.
Alternately, a perpetrator may try to hide a pornographic image by renaming it as a nondescript operating system file, e.g., renaming a .JPG image as an .EXE file. The hash value derived from the image will not match that from the known operating system file and will thus be uncovered.
Additionally, the RDS can be used to find camouflaged software, such as executable files with changed names that may blend in with other data files, e.g., a .EXE file renamed with a .JPG extension. A hash value of the camouflaged file will still match the RDS since the file’s contents were not changed, only its name.
Expected files may be missing if they do not show up in the known files list. This may indicate that files were deleted to cover up illegal activity and may prompt the investigator to pursue other means of investigating the file system.
The project website is http://www.nsrl.nist.gov.
The Computer Forensics
Tool Testing Project
As a side effect of this work, the computer forensics arena will also have a means for defining different sets of tool requirements in a standard framework. This framework will allow different subject area focus groups to work in specific domains without duplicating effort.
Initial work on the framework focused on the classification of tools by functional capabilities. Similar or related types of functionality were grouped to form broad classifications. The capabilities required in a classification were defined by focus groups of technical experts and practitioners that use specific types of tools.
There are two focus groups at present: disk imaging and write blocker. Other focus groups are planned in the future. The tasks of these focus groups are to produce a detailed specification of each functional classification and to review test assertions based on the specification. NIST then defines and prototypes test methods and test cases that can be used to exercise relevant functions of various tools purporting to provide the functionality required. At each step in the process, the set of requirements, test assertions, test cases, and procedures is distributed to subject area experts and posted to the CFTT website (http://www.cftt.nist.gov) for comment and peer review. Test results will be published by the National Institute of Justice.