Prev Original Paper Next


NSRL collects software and incorporates computed file profiles into a Reference Data Set (RDS) of information.

There is a physical library with shelves of software. Hashes must be traceable back to the original media to be admitted into court.

Original system was monolithic. We distributed the work on old PCs pulled from excess.

Media are read at batching stations using a web interface and and copied to the file server using several Perl scripts.

The contents of the media are hashed automatically by computers in a hashing constellation. This process is almost completely done with Perl code.

After verification, hash results are placed in a reference database which is used to generate the NSRL RDS that is mailed to subscribers quarterly.

We can batch up to 70 CDs each day with current staff, and the system needs to be able to hash faster than we can batch.

The throughput of the process is important in order to provide useful incremental additions quarterly to the data set.

Spinoffs: National Archives, Tripwire