NEW ALGORITHMS OF FAST DIAGONALIZATION OF REAL SYMMETRIC MATRICES.
Author: Yurii F. Sigolaev (St. Petersburg). E-mail: y_sigolaev@hotmail.com
Last revision: May 25, 2007
I have developed a new complex of algorithms of diagonalization of dense real symmetric matrices. This complex (SDIAG) is realized in C and has a number of advantages compared to other known packages realizing diagonalization algorithms:
- noticeable increase in the computation speed;
- considerable saving of RAM.
These results were attained thanks to the solution of a basic problem of numerical analysis of tridiagonal matrices, development of new algorithm for vector multiplication and improvement of the block algorithms for packed matrices. Some of the algorithms use the IEEE-arithmetic support and use SSE2 and SSE3 for efficiency. For a comparative analysis, I used Intel MKL package because of the common source of the code, LAPACK package.
EM64T technology has no restrictions on the size of RAM. For example, on the computer with 4 GB RAM it is possible to diagonalize matrix up to 31000×31000 in size. In my algorithm, the RAM is used only for storing the information only slightly exceeding in size a half of the square matrix. Diagonalization programs which use one of the most known and probably the best for now algorithm The divide and conquer permit to diagonalize the matrix not more then 13000×13000 on the above mentioned computer. The use of additional registers permits to do optimization of the programs with the aim of increase in their speed. At so doing the increase in speed of the programs of diagonalization of Intel MKL is more remarkable then that of SDIAG because problems connected with absence of additional registers in IA32 were dissolved in SDIAG in part (additional registers permit to get good results using trivial programming). Nevertheless the difference in the speed is still large enough. For example, in the Tables shown below the speed of SDIAG on the single core is not worse then that of DSYEVD Intel MKL on two cores.
Description of the new algorithms and the references are given on the page devoted to P4 processor: http://www.thesa-store.com/products/.
Please address all remarks and suggestions to: y_sigolaev@hotmail.com
Tel/Fax: 7(812)2691898
Comparison results.
I compared the results I obtained with those from the known Intel MKL package (Package ID: w_mkl_p_9.0.017). Hardware configuration: Core 2 Duo 2.66 GHz processor (Processor E6700, Speed: 3.00 GHz (BIOS setup: Overclock Options [FSB1200/DDR2-800])), Data cache size : L1 32 KB, L2 4096 KB, L3 0 KB, 1066 MHz FSB, Motherboard: P5WDG2 WS Professional, 1066 MHz FSB, DDR2 800 MHz (4 GB), OS Windows XP Professional x64 Edition SP1.
Consider a packed symmetric Toplitz matrix:
Aij = arcsin(1 / (i - j + 1))/(i - j + 1)3 (i , j = 1, 2, ..., n; i >= j)
Application of the algorithms described above in Sections 1.1, 2 and 3.1 on the page devoted to P4 CPU to the solution of full algebraic problem for this matrix gave results showing that the DSYEVD Intel MKL procedure (currently the only procedure combining calculation speed and accuracy but requiring a huge memory resource) is not effective. The comparative results obtained using one coure (OMP_NUM_THREADS=1) and two cores (OMP_NUM_THREADS=2) are given in Tables 1 and 2, respectively.
TABLE 1
| n | t (sec) | Δt (%) | |
| SDIAG |
DSYEVD Intel MKL 9.0 |
||
|
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 |
0.72 4.7 14.5 32.7 61.5 104 162 239 337 457 606 782 992 |
0.92 6.4 20.6 47.0 89.5 153 241 356 503 685 906 1170 1522 |
28 36 42 44 46 47 49 49 49 50 50 50 53 |
TABLE 2
| n | t (sec) | Δt (%) | |
| SDIAG |
DSYEVD Intel MKL 9.0 |
||
|
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 |
0.45 3.0 9.6 22.3 42.6 72.9 115 169 240 326 434 560 712 |
0.66 4.8 15.1 33.8 64.2 109 171 251 353 483 636 825 1046 |
47 60 57 52 51 50 49 49 47 48 47 47 47 |