Table 1
Accuracy and performance of Genefer transform implementations. CPU hardware details: 1Intel Core i7-4750HQ, 2Intel Xeon X5650, 3Intel Xeon E5-2670, 4Intel Xeon E5620, 5Intel Xeon E5-1650v2.
| b limit | ms per mul | b limit | ms per mul | |
|---|---|---|---|---|
| x871 | 30,770,000 | 30.8 | 16,490,000 | 288 |
| Default1 | 945,000 | 14.5 | 505,000 | 133 |
| SSE21 | 945,000 | 6.17 | 505,000 | 58.3 |
| SSE41 | 945,000 | 5.49 | 505,000 | 51.9 |
| AVX1 | 945,000 | 3.66 | 505,000 | 35.8 |
| FMA31 | 945,000 | 3.35 | 505,000 | 32.6 |
| CUDA (NVIDIA Tesla C2050) 2 | 855,000 | 1.34 | 485,000 | 8.50 |
| OpenCL (NVIDIA Tesla C2050) 2 | 915,000 | 0.89 | 505,000 | 7.79 |
| CUDA (NVIDIA Tesla K20m) 3 | 825,000 | 1.05 | 480,000 | 6.09 |
| OpenCL (NVIDIA Tesla K20m) 3 | 915,000 | 0.54 | 505,000 | 4.19 |
| OpenCL (AMD FirePro V7800) 4 | 895,000 | 1.25 | 505,000 | 12.9 |
| OpenCL (AMD FirePro D700) 5 | 870,000 | 0.67 | 500,000 | 5.81 |
