O-Matrix version 6.1 provides significant performance
improvements over previous releases. O-Matrix
intrinsic functions, script files, and language operators were all reviewed
to provide increased performance. Enhancements have been made to utilize
the multiprocessing capabilities of newer CPUs such as the Intel HT and
Intel DuoCore processors

The following table presents the performance improvements for functions and
operators that had the greatest performance gain and functions we
found to be most commonly used in actual end-user applications.

Benchmark

O-Matrix 6.1

O-Matrix 5.82

fft(); 2^19 element vector

0.0688

0.2874

dft(); 2^13+1 element vector

0.0016

1.610

rand(); 200000x20 matrix

0.0156

0.1250

snormal(); 200000x20 matrix

0.0953

0.3420

vector transpose; 1000000 elements

0.0022

0.0032

sort(); 1000000 element vector

0.2470

0.3062

colmedian(); 50000x40 matrix

0.4109

0.5250

colstd(); 100000x40 matrix

0.0141

0.0812

colmean(); 100000x40 matrix

0.0034

0.0194

colnorm(); 100000x40 matrix

0.0037

0.0328

sum(); 100000x40 matrix

0.0034

0.0088

100000x40 matrix ^2

0.0128

0.0531

gamma(); 50000x40 matrix

0.7438

0.9190

floor(); 100000x40 matrix

0.0907

0.3453

ceil(); 100000x40 matrix

0.0890

0.3454

abs(); 100000x40 matrix

0.0131

0.0144

for loops/second

16,229,000

12,820,000

fill(); 200000x10 matrix

0.0022

0.0043

All timings are in seconds. - Run on a 3 GHz Pentium 4

During development of the version 6 release we did extensive
profiling of actual user applications, both at the O-Matrix
executable level and at the application level. These results showed
that an extensive percentage of execution time of real-world user applications
was spent in matrix operators. The following table illustrates
some of the matrix operator performance improvements for
O-Matrix 6.

Benchmark

O-Matrix 6.1

O-Matrix 5.82

matrix*scalar; 100000x20 matrix

0.0045

0.0079

matrix%matrix; 100000x20 matrix

0.0087

0.0092

scalar*scalar

0.0051

0.0076

vector dot product; 100000 element vector

0.0011

0.0050

matrix/scalar; 100000x20 matrix

0.0044

0.0267

matrix/matrix; 100000x20 matrix

0.0070

0.0269

scalar\matrix; 100000x20 matrix

0.0045

0.0269

matrix+matrix; 100000x20 matrix

0.0063

0.0105

scalar+matrix; 100000x20 matrix

0.0064

0.0080

matrix-matrix; 100000x20 matrix

0.0064

0.0092

matrix-scalar; 100000x20 matrix

0.0063

0.0066

All timings are in seconds. - Run on a 3 GHz Pentium 4

The STSA Exerciser is a script that we use internally
that solves moderate size problems with the
O-Matrix Time Series Analysis Toolbox.
This is
a typical "real world" application that uses a broad range
of O-Matrix intrinsic functions, script files, and
language operators.
For a 3GHz Pentium 4 machine the execution time of
this script was reduced from
19 seconds for version 6.0 to 11 seconds for version 6.1.