Future Work
Have a publicly available filename/hash value lookup service (Dec. 03?)
Put more hashsets out for free download (2,400 now)
Have the hashing code generate hashes for 512B blocks
Need to write some papers on the following topics:
Automated media hashes compared to installed hashes
- how much could they identify?
- how much do they identify?
- which finds how much?
Automated recursive hashing of archive files
- explain how we do it
- what does it miss?
Uniquely identifying every file in the NSRL shelves
- what data is collected?
- how does it identify every file?
Collecting file information (batching) via the web for the NSRL
- show how the web forms collect everything for the paper above
- how do we know we batch all the files from the media?
Applying hashing technology and the RDS to NARA data
- identification rate
- duplication rate
- identifying original software package
Applying installation hashes to NARA data
- is it applicable?
- if so how? SHAs or filenames?
Better explanation of the math behind hashes
- can it be shown graphically?
- can it be easily and correctly explained to a jury?
Using virtual machines for installation
- show repeatability
- how does this compare with physical machines?
"Golden nuggets"
- what files change on (windows) reboot?
- how are files modified on (win) installation?
- what changes how when
Tracking the static areas of dynamic files
- relates to golden nuggets
- relates to install hashes
- relates to NARA file identification
How to build the NSRL environment in open source
- write cookbook to go with our LAMPS work