NEW ALGORITHMS OF FAST DIAGONALIZATION OF REAL SYMMETRIC MATRICES.

 

Author: Yurii F. Sigolaev (St. Petersburg). E-mail: y_sigolaev@hotmail.com

Last revision: May 25, 2007

 

     I have developed a new complex of algorithms of diagonalization of dense real symmetric matrices. This complex (SDIAG) is realized in C and has a number of advantages compared to other known packages realizing diagonalization algorithms:

 

-        noticeable increase in the computation speed;

-        considerable saving of RAM.

 

     These results were attained thanks to the solution of a basic problem of numerical analysis of tridiagonal matrices, development of new algorithm for vector multiplication and improvement of the block algorithms for packed matrices. Some of the algorithms use the IEEE-arithmetic support and use SSE2 and SSE3 for efficiency. For a comparative analysis, I used Intel MKL package because of the common source of the code, LAPACK package.

     EM64T technology has no restrictions on the size of RAM. For example,  on the computer with 4 GB RAM it is possible to diagonalize matrix up to 31000×31000 in size. In my algorithm, the RAM is used only for storing the information only slightly exceeding in size a half of the square matrix. Diagonalization programs which use one of the most known and probably the best for now algorithm The divide and conquer permit to diagonalize the matrix not more then 13000×13000 on the above mentioned computer. The use of additional registers permits to do optimization of the programs with the aim of increase in their speed. At so doing the increase in speed of the programs of diagonalization of Intel MKL is more remarkable then that of SDIAG because problems connected with absence of additional registers in IA32 were dissolved in SDIAG in part (additional registers permit to get good results using trivial programming). Nevertheless the difference in the speed is still large enough. For example, in the Tables shown below the speed of SDIAG on the single core is not worse then that of DSYEVD Intel MKL on two cores.

  Description of the new algorithms and the references are given on the page devoted to P4 processor:  http://www.thesa-store.com/products/.

   

Please address all remarks and suggestions to: y_sigolaev@hotmail.com

Tel/Fax: 7(812)2691898

   Comparison results.

    

I compared the results I obtained with those from the known Intel MKL package (Package ID: w_mkl_p_9.0.017). Hardware configuration: Core 2 Duo 2.66 GHz processor (Processor E6700, Speed: 3.00 GHz (BIOS setup: Overclock Options [FSB1200/DDR2-800])), Data cache size : L1 32 KB, L2 4096 KB, L3 0 KB, 1066 MHz FSB, Motherboard:  P5WDG2 WS Professional, 1066 MHz FSB, DDR2 800 MHz (4 GB), OS Windows XP Professional x64 Edition SP1.

 Consider a packed symmetric Toplitz matrix:

Aij = arcsin(1 / (i - j + 1))/(i - j + 1)3 (i , j = 1, 2, ..., n; i >= j)

Application of the algorithms described above in Sections 1.1, 2 and 3.1 on the page devoted to P4 CPU to the solution of full algebraic problem for this matrix gave results showing that the DSYEVD Intel MKL procedure (currently the only procedure combining calculation speed and accuracy but requiring a huge memory resource) is not effective. The comparative results obtained using one coure (OMP_NUM_THREADS=1) and two cores (OMP_NUM_THREADS=2) are given in Tables 1 and 2, respectively.

TABLE 1

n t (sec)  Δt (%)
SDIAG DSYEVD
Intel  MKL 9.0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

0.72

4.7

14.5

32.7

61.5

104

162

239

337

457

606

782

992

0.92

6.4

20.6

47.0

89.5

153

241

356

503

685

906

1170

1522

28

36

 42

 44

46

47

49

49

49

50

50

50

53

TABLE 2

n t (sec)  Δt (%)
SDIAG DSYEVD
Intel  MKL 9.0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

11000

12000

13000

0.45

3.0

9.6

22.3

42.6

72.9

115

169

240

326

434

560

712

0.66

4.8

 15.1

33.8

64.2

109

171

251

353

483

636

825

1046

47

60

 57

 52

51

50

49

49

47

48

47

47

47