| Back | Top | Next |
I read about "sexeger" - reversed regexes - after YAPC::Europe::2001
Log files from previous work contained timing information
Had 92 pieces of media with timings, so rehashed them using code with sexeger improvements
Old Consistency Checks:
Hashes performed: 812049
Average number of hashes per media: 8826
Effective hashes/s: 74
| Code section | Seconds | % of run time |
|---|---|---|
| initialization: | 18 | 0% |
| walker hash: | 11010 | 7% |
| consistency: | 80044 | 56% |
| find/extract: | 49124 | 34% |
| total time: | 140984 (39h) | 100% |
The times above clearly identify the consistency checks as the initial location for attempting performance tuning
New Consistency Checks (with sexeger):
Hashes performed: 812049
Average number of hashes per media: 8826
Effective hashes/s: 75
| Code section | Seconds | % of run time |
|---|---|---|
| initialization: | 18 | 0% |
| walker hash: | 10795 | 11% |
| consistency: | 35830 | 37% |
| find/extract: | 50127 | 51% |
| total time: | 96770 (27h) | 100% |
After sexeger, we observed 55% reduction in consistency check time and 32% overall execution time reduction
During April and May 2002, we rehashed the entire NSRL software collection in 6 weeks
New hashes calculated were verified against the previous hashes
Total Collection:
Hashes peformed: 6423307
Average number of hashes per media:4057
Effective hashes/s: 34
| Code section | Seconds | % of run time |
|---|---|---|
| initialization: | 306 | 0% |
| walker hash: | 184724 | 11% |
| consistency: | 620288 | 37% |
| find/extract: | 872811 | 52% |
| total time: | 1656605 (460h) | 100% |
Measurement of time spent in major sections of the hashing execution allowed us to identify the worst performance
Pinyan, J.(2000), "sexeger". http://www.perlmonks.org/index.pl?node=sexeger
Sergeant, P. (2001) "Reversing Regular Expressions". http://www.perl.com/pub/a/2001/05/01/expressions.html