Two CPU Comparisons are given:
1)  Electronic-Structure SCF run
2) VECTORIZED Electronic-Structure Convolution Integral.
  Details:
I have ran my OLD KKR-CPA code on several platforms to compare performance.
I have done one interation of FCC NM Fe90S (requiring lots of CPA)
Code is a mixed cluster-real-space + BZ integration method for speed.
Cluster method used 6 real-space shells in FCC
BZ method is a ray-prism method (similar to special directions).
(Note: cluster is more scalar, but a fast method BZ method was written
highly vectorized for CRAY. However, much longer is spent in BZ method,
but more E's are done cluster method, so there is a large averaging out.)
Machine 
Time (secs) 
Compiler Options 
SUN Hyper Sparc 2580 -dalign -native -O
J-CRAY 1045 -a static -dp
SUN Ultra 170 Mhz  936 -dalign -native -O3
HP              120 Mhz  870 +E7 -K  -O
CRAY YMP  553  -a static -dp
IBM      590  565 -O3 -qdpc -qstrict -qtune=pwr2 
                                                 -qarch=pwr2    -lesslp2
DEC ALPHA 267 MHZ WS  354 -O -tune host -fast
DEC ALPHA 400 MHZ WS  243 -O -tune host -fast
DEC ALPHA 500MHZ  P-AU  184 -O -tune host -fast
DEC ALPHA 600 MHZ

  A First-Principles, Concentration-Wave Calculation is a KKR-CPA-based approach for a complex
convolution integral  [i.e., integrate over k: T(k+q; E)T(k; E)].   The code is highly vectorized, so the CRAY
C90 should be by far the fastest machine. (FOR F77 Compiler and 1 q and 1 Energy point.)

Machine 
Time (mins) 
Compiler Options 
SUN  Sparc  10/50 8:03 -dalign -native -O3
SUN Hyper Sparc 4:06 -dalign -native -O3
SUN Ultra 170 Mhz 2:00 -dalign -native -O3
HP    J2-10 120 Mhz 2:27 -O
SGI RS-8000 pwr challenge 1:37 -O3 -static 
one part w/ -O1
IBM      590  1:24 -O3 -qdpc -qstrict
DEC ALPHA 267 MHZ WS  1:24 -O -tune host -fast
DEC ALPHA 400 MHZ WS  1:02 -O -tune host -fast
DEC ALPHA 500MHZ  P-AU  0:46 -O -tune host -fast
DEC ALPHA 600 MHZ
CRAY YMP-C90 0:23 -dp -static