Thursday, June 7, 2012

Adventures in perlsembly

Some Debian packages are missing important optimizations. This one was noticed when comparing openssl benchmarks by monkeyiq to the results got on a pandaboard (OMAP 4430). Being a Cortex-A9, it should have clearly been same or faster than the N9 in the benchmark (OMAP 3630). And it was, except for AES benchmarks. Since AES is quite important, that seemed a bit odd. Turns out, that Debian/Ubuntu package had some hand-crafted arm assembler optimizations missing. Enable them, and the results with openssl speed command benchmark were quite nice:
The 'numbers' are in 1000s of bytes per second processed.

benchmark    debian       with patch +%
sha1         55836.67k -> 73599.08k +31.811%
aes-128 cbc  18451.11k -> 36305.34k +96.765%
aes-256 cbc  13552.30k -> 27108.31k +100.027%
sha256       20092.25k -> 43469.45k +116.349%
sha512       8052.74k  -> 37194.28k +361.884 %
rsa 1024     1904.2v/S -> 3650.5v/s +91.708 %
Curiously, the assembler code is actually in perl files that output assembler code. This kind of code is affectionately called "perlsembly". A bug with patch has been filed, hopefully applied at the soonest.