|
O-Matrix 6 Performance Enhancements
O-Matrix version 6.1 provides significant performance
improvements over previous releases. O-Matrix
intrinsic functions, script files, and language operators were all reviewed
to provide increased performance. Enhancements have been made to utilize
the multiprocessing capabilities of newer CPUs such as the Intel HT and
Intel DuoCore processors
The following table presents the performance improvements for functions and
operators that had the greatest performance gain and functions we
found to be most commonly used in actual end-user applications.
| Benchmark |
O-Matrix 6.1 |
O-Matrix 5.82 |
| fft(); 2^19 element vector |
0.0688 |
0.2874 |
| dft(); 2^13+1 element vector |
0.0016 |
1.610 |
| rand(); 200000x20 matrix |
0.0156 |
0.1250 |
| snormal(); 200000x20 matrix |
0.0953 |
0.3420 |
| vector transpose; 1000000 elements |
0.0022 |
0.0032 |
| sort(); 1000000 element vector |
0.2470 |
0.3062 |
| colmedian(); 50000x40 matrix |
0.4109 |
0.5250 |
| colstd(); 100000x40 matrix |
0.0141 |
0.0812 |
| colmean(); 100000x40 matrix |
0.0034 |
0.0194 |
| colnorm(); 100000x40 matrix |
0.0037 |
0.0328 |
| sum(); 100000x40 matrix |
0.0034 |
0.0088 |
| 100000x40 matrix ^2 |
0.0128 |
0.0531 |
| gamma(); 50000x40 matrix |
0.7438 |
0.9190 |
| floor(); 100000x40 matrix |
0.0907 |
0.3453 |
| ceil(); 100000x40 matrix |
0.0890 |
0.3454 |
| abs(); 100000x40 matrix |
0.0131 |
0.0144 |
| for loops/second |
16,229,000 |
12,820,000 |
| fill(); 200000x10 matrix |
0.0022 |
0.0043 |
All timings are in seconds. - Run on a 3 GHz Pentium 4
During development of the version 6 release we did extensive
profiling of actual user applications, both at the O-Matrix
executable level and at the application level. These results showed
that an extensive percentage of execution time of real-world user applications
was spent in matrix operators. The following table illustrates
some of the matrix operator performance improvements for
O-Matrix 6.
| Benchmark |
O-Matrix 6.1 |
O-Matrix 5.82 |
| matrix*scalar; 100000x20 matrix |
0.0045 |
0.0079 |
| matrix%matrix; 100000x20 matrix |
0.0087 |
0.0092 |
| scalar*scalar |
0.0051 |
0.0076 |
| vector dot product; 100000 element vector |
0.0011 |
0.0050 |
| matrix/scalar; 100000x20 matrix |
0.0044 |
0.0267 |
| matrix/matrix; 100000x20 matrix |
0.0070 |
0.0269 |
| scalar\matrix; 100000x20 matrix |
0.0045 |
0.0269 |
| matrix+matrix; 100000x20 matrix |
0.0063 |
0.0105 |
| scalar+matrix; 100000x20 matrix |
0.0064 |
0.0080 |
| matrix-matrix; 100000x20 matrix |
0.0064 |
0.0092 |
| matrix-scalar; 100000x20 matrix |
0.0063 |
0.0066 |
All timings are in seconds. - Run on a 3 GHz Pentium 4
The STSA Exerciser is a script that we use internally
that solves moderate size problems with the
O-Matrix Time Series Analysis Toolbox.
This is
a typical "real world" application that uses a broad range
of O-Matrix intrinsic functions, script files, and
language operators.
For a 3GHz Pentium 4 machine the execution time of
this script was reduced from
19 seconds for version 6.0 to 11 seconds for version 6.1.
|