Last update: Aug. 17, 2006 - corrected comments in code in tarfile.
The NSRL is on the verge of collecting block hashes. We have the capability to capture the SHA-1 and MD5 of each 512-byte block of every file we process, we can capture every 4096-byte block in a rolling window from every file.
Storage of this block hash information is an issue, and so will be the manner of distributing this information. We posit that Bloom filters can allow the knowledge of known hashes to be distributed in a compact, easily updated format, that will bring greater filtering speed to practitioners.
NIST has no plans to discontinue the current RDS product. This line of research is in anticipation of orders-of-magnitude increase in the data used to identify digital objects.
bloom_code.tar.gz - 5 MB includes README.txt
rds213-md5.bin - 170 MB
rds213-sha1.bin - 220 MB