Back Top Next

Benchmarking

I read about "sexeger" - reversed regexes - after YAPC::Europe::2001

Log files from previous work contained timing information

Had 92 pieces of media with timings, so rehashed them using code with sexeger improvements

Old Consistency Checks:

Hashes performed: 812049
Average number of hashes per media: 8826
Effective hashes/s: 74
Code sectionSeconds% of run time
initialization:180%
walker hash:110107%
consistency:8004456%
find/extract:4912434%
total time:140984 (39h)100%

The times above clearly identify the consistency checks as the initial location for attempting performance tuning

New Consistency Checks (with sexeger):

Hashes performed: 812049
Average number of hashes per media: 8826
Effective hashes/s: 75
Code sectionSeconds% of run time
initialization:180%
walker hash:1079511%
consistency:3583037%
find/extract:5012751%
total time:96770 (27h)100%

After sexeger, we observed 55% reduction in consistency check time and 32% overall execution time reduction

During April and May 2002, we rehashed the entire NSRL software collection in 6 weeks

New hashes calculated were verified against the previous hashes

Total Collection:

Hashes peformed: 6423307
Average number of hashes per media:4057
Effective hashes/s: 34
Code sectionSeconds% of run time
initialization:3060%
walker hash:18472411%
consistency:62028837%
find/extract:87281152%
total time:1656605 (460h)100%

Measurement of time spent in major sections of the hashing execution allowed us to identify the worst performance


Pinyan, J.(2000), "sexeger". http://www.perlmonks.org/index.pl?node=sexeger

Sergeant, P. (2001) "Reversing Regular Expressions". http://www.perl.com/pub/a/2001/05/01/expressions.html