[3] R. Mueller, J. Teubner, and G. Alonso, “Sorting networks on FPGAs,” The International Journal on Very Large Data Bases, vol. 21, no. 1, pp. 1-23, 2012. http://dx.doi.org/10.1007/s00778-011-0232-z10.1007/s00778-011-0232-z
[4] S. Chey, J. Liz, J. W. Sheaffery, K. Skadrony, and J. Lach, “Accelerating compute-intensive applications with GPUs and FPGAs,” in 2008 Symposium on Application Specific Processors, 2008, pp. 101-107. http://dx.doi.org/10.1109/SASP.2008.457079310.1109/SASP.2008.4570793
[5] D. J. Greaves and S. Singh, “Kiwi: Synthesis of FPGA circuits from parallel programs,” in 16th Int. Symp. on Field-Programmable Custom Computing Machines, 2008, pp. 3-12.
[6] R. D. Chamberlain and N. Ganesan, “Sorting on architecturally diverse computer systems,” in 3rd Int. Workshop on High-Performance Reconfigurable Computing Technology and Applications, 2009, pp. 39-46.10.1145/1646461.1646466
[8] X. Ye, D. Fan, W. Lin, N. Yuan and P. Ienne, “High performance comparison-based sorting algorithm on many-core GPUs,” in 2010 IEEE Int. Symp. on Parallel & Distributed Processing, 2010.
[9] N. Satish, M. Harris, and M. Garland, “Designing efficient sorting algorithms for manycore GPUs,” in 2009 IEEE Int. Symp. on Parallel & Distributed Processing, 2009.10.1109/IPDPS.2009.5161005
[10] D. Cederman and P. Tsigas, “A practical quicksort algorithm for graphics processors,” in 16th Annual European Symp. on Algorithms, 2008, pp. 246-258.10.1007/978-3-540-87744-8_21
[11] G. Gapannini, F. Silvestri, and R. Baraglia, “Sorting on GPU for large scale datasets: A throrough comparison,” Information Processing and Management, vol. 48, no. 5, pp. 903-917, 2012. http://dx.doi.org/10.1016/j.ipm.2010.11.01010.1016/j.ipm.2010.11.010
[12] P. Kipfer and R. Westermann, “Improved GPU sorting,” in GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation, M. Pharr and R. Fernando, Eds. Addison-Wesley, 2005. Available: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter46.html.
[13] A. R. Brodtkorb, T. R. Hagen, and M. L. Sætra, “GPU programming strategies and trends in GPU computing,” Journal of Parallel and Distributed Computing, vol. 73, no. 1, pp. 4-13, 2013. http://dx.doi.org/10.1016/j.jpdc.2012.04.00310.1016/j.jpdc.2012.04.003
[14] C. Grozea, Z. Bankovic, and P. Laskov, “FPGA vs. multi-core CPUs vs. GPUs,” in Facing the Multicore-Challenge, R. Keller, D. Kramer, J. P. Weiss, Eds. Springer-Verlag, 2010, pp. 105-117. http://dx.doi.org/10.1007/978-3-642-16233-6_1210.1007/978-3-642-16233-6_12
[15] M. Edahiro, “Parallelizing fundamental algorithms such as sorting on multi-core processors for EDA acceleration,” in 14th Asia and South Pacific Design Automation Conference, 2009, pp. 230-233.10.1109/ASPDAC.2009.4796485
[16] B. Cope, P. Y. K. Cheung, W. Luk, and L. Howes, “Performance comparison of graphics processors to reconfigurable logic: A case study,” IEEE Transactions on Computers, vol. 59, no. 4, pp. 433-448, 2010. http://dx.doi.org/10.1109/TC.2009.17910.1109/TC.2009.179
[17] J. Gonzalez and R. C. Núñez, “LAPACKrc: Fast linear algebra kernels/solvers for FPGA accelerators,” Journal of Physics: Conference Series, vol. 180, 2009.10.1088/1742-6596/180/1/012042
[18] S. Koehler, J. Curreri, and A. D. George, “Performance analysis challenges and framework for high-performance reconfigurable computing,” Parallel Computing, vol. 34, no. 4-5, pp. 217-230, 2008. http://dx.doi.org/10.1016/j.parco.2008.01.00810.1016/j.parco.2008.01.008
[19] N. Moore, M. Leeser, and L. S. King, “VForce: An environment for portable applications on high performance systems with accelerators,” Journal of Parallel and Distributed Computing, vol. 72, no. 9, pp. 1144-1156, 2012. http://dx.doi.org/10.1016/j.jpdc.2011.07.01410.1016/j.jpdc.2011.07.014
[20] M. Santarini, “Zynq-7000 EPP sets stage for new era of innovations,” Xcell Journal, no. 75, 2011. [Online]. Available: http://www.eetimes.com/design/programmable-logic/4217069/Zynq-7000-EPP-sets-stage-fornew-era-of-innovations.
[22] I. Skliarova, V. Sklyarov, and A. Sudnitson, Design of FPGA-based Circuits using Hierarchical Finite State Machines. TUT Press, 2012.10.1109/IranianCEE.2013.6599683
[23] V. Sklyarov, I. Skliarova, D. Mihhailov, and A. Sudnitson, “Implementation in FPGA of address-based data sorting,” in 21st Int. Conf. on Field- Programmable Logic and Applications, 2011, pp. 405-410.10.1109/FPL.2011.81
[24] V. Sklyarov and I. Skliarova, “Modeling, design, and implementation of a priority buffer for embedded systems,” in 7th Asian Control Conf., 2009, pp. 9-14.