Device 6: AMD FX(tm)-9590 Eight-Core Processor Specified 1 device IDs: 6 Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 22.5772 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 22.5452 GB/sec Running benchmark MaxFlops result for maxspflops: 77.5734 GFLOPS result for maxdpflops: 51.9592 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 0.6866 GB/s result for gmem_readbw_strided: 46.7156 GB/s result for gmem_writebw: 0.3582 GB/s result for gmem_writebw_strided: 43.4828 GB/s result for lmem_readbw: 61.6795 GB/s result for lmem_writebw: 56.0024 GB/s result for tex_readbw: 13.0108 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0229 sec Running benchmark QueueDelay result for ocl_queue: 0.0076 ms Running benchmark BFS result for bfs: 0.1950 GB/s result for bfs_pcie: 0.1758 GB/s result for bfs_teps: 8027410.0000 Edges/s Running benchmark FFT result for fft_sp: 3.4051 GFLOPS result for fft_sp_pcie: 2.8570 GFLOPS result for ifft_sp: 3.4385 GFLOPS result for ifft_sp_pcie: 2.8805 GFLOPS result for fft_dp: 2.5918 GFLOPS result for fft_dp_pcie: 2.0090 GFLOPS result for ifft_dp: 2.6343 GFLOPS result for ifft_dp_pcie: 2.0344 GFLOPS Running benchmark GEMM result for sgemm_n: 14.7719 GFLOPS result for sgemm_t: 9.7866 GFLOPS result for sgemm_n_pcie: 14.6351 GFLOPS result for sgemm_t_pcie: 9.7541 GFLOPS result for dgemm_n: 14.9615 GFLOPS result for dgemm_t: 9.1538 GFLOPS result for dgemm_n_pcie: 14.6656 GFLOPS result for dgemm_t_pcie: 9.0292 GFLOPS Running benchmark MD result for md_sp_flops: 5.0568 GFLOPS result for md_sp_bw: 3.8754 GB/s result for md_sp_flops_pcie: 3.3798 GFLOPS result for md_sp_bw_pcie: 2.5902 GB/s result for md_dp_flops: 4.2400 GFLOPS result for md_dp_bw: 5.6911 GB/s result for md_dp_flops_pcie: 3.0611 GFLOPS result for md_dp_bw_pcie: 4.1088 GB/s Running benchmark MD5Hash result for md5hash: 0.0947 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 0.5583 GB/s result for reduction_pcie: 0.4694 GB/s result for reduction_dp: 1.1003 GB/s result for reduction_dp_pcie: 0.8218 GB/s Running benchmark Scan result for scan: 0.0327 GB/s result for scan_pcie: 0.0321 GB/s result for scan_dp: 0.0654 GB/s result for scan_dp_pcie: 0.0635 GB/s Running benchmark Sort result for sort: 0.0003 GB/s result for sort_pcie: 0.0003 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 1.0817 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.3013 Gflop/s result for spmv_csr_scalar_dp: BenchmarkError result for spmv_csr_scalar_dp_pcie: BenchmarkError result for spmv_csr_scalar_pad_sp: 1.0843 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.2592 Gflop/s result for spmv_csr_scalar_pad_dp: BenchmarkError result for spmv_csr_scalar_pad_dp_pcie: BenchmarkError result for spmv_csr_vector_sp: 0.9701 Gflop/s result for spmv_csr_vector_sp_pcie: 0.2920 Gflop/s result for spmv_csr_vector_dp: BenchmarkError result for spmv_csr_vector_dp_pcie: BenchmarkError result for spmv_csr_vector_pad_sp: 0.9709 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 0.2521 Gflop/s result for spmv_csr_vector_pad_dp: BenchmarkError result for spmv_csr_vector_pad_dp_pcie: BenchmarkError result for spmv_ellpackr_sp: 0.4540 Gflop/s result for spmv_ellpackr_dp: BenchmarkError Running benchmark Stencil2D result for stencil: 0.5359 GFLOPS result for stencil_dp: 0.5063 GFLOPS Running benchmark Triad result for triad_bw: 3.1222 GB/s Running benchmark S3D result for s3d: 0.8412 GFLOPS result for s3d_pcie: 0.8336 GFLOPS result for s3d_dp: 1.3738 GFLOPS result for s3d_dp_pcie: 1.3540 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: hpcmanage.umassd.edu Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'NVIDIA CUDA': 1 Device 0: Tesla M2050 Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 6.0616 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 6.6318 GB/sec Running benchmark MaxFlops result for maxspflops: 1007.4300 GFLOPS result for maxdpflops: 509.3640 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 94.9039 GB/s result for gmem_readbw_strided: 14.1468 GB/s result for gmem_writebw: 101.9090 GB/s result for gmem_writebw_strided: 3.9093 GB/s result for lmem_readbw: 392.2110 GB/s result for lmem_writebw: 469.6430 GB/s result for tex_readbw: 79.9080 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0001 sec Running benchmark QueueDelay result for ocl_queue: 0.0062 ms Running benchmark BFS result for bfs: 1.0238 GB/s result for bfs_pcie: 0.8990 GB/s result for bfs_teps: 40213900.0000 Edges/s Running benchmark FFT result for fft_sp: 72.5060 GFLOPS result for fft_sp_pcie: 14.2845 GFLOPS result for ifft_sp: 70.9108 GFLOPS result for ifft_sp_pcie: 14.2215 GFLOPS result for fft_dp: 18.3457 GFLOPS result for fft_dp_pcie: 5.9901 GFLOPS result for ifft_dp: 17.3496 GFLOPS result for ifft_dp_pcie: 5.8799 GFLOPS Running benchmark GEMM result for sgemm_n: 413.3400 GFLOPS result for sgemm_t: 407.6030 GFLOPS result for sgemm_n_pcie: 365.2660 GFLOPS result for sgemm_t_pcie: 360.7690 GFLOPS result for dgemm_n: 179.7060 GFLOPS result for dgemm_t: 170.3270 GFLOPS result for dgemm_n_pcie: 146.1620 GFLOPS result for dgemm_t_pcie: 139.8970 GFLOPS Running benchmark MD result for md_sp_flops: 31.1427 GFLOPS result for md_sp_bw: 23.8668 GB/s result for md_sp_flops_pcie: 15.2648 GFLOPS result for md_sp_bw_pcie: 11.6984 GB/s result for md_dp_flops: 18.0563 GFLOPS result for md_dp_bw: 24.2363 GB/s result for md_dp_flops_pcie: 11.0300 GFLOPS result for md_dp_bw_pcie: 14.8051 GB/s Running benchmark MD5Hash result for md5hash: 0.8812 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 94.0757 GB/s result for reduction_pcie: 5.6738 GB/s result for reduction_dp: 94.6165 GB/s result for reduction_dp_pcie: 5.6770 GB/s Running benchmark Scan result for scan: 25.6739 GB/s result for scan_pcie: 2.7951 GB/s result for scan_dp: 20.6908 GB/s result for scan_dp_pcie: 2.7235 GB/s Running benchmark Sort result for sort: 0.2753 GB/s result for sort_pcie: 0.2530 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 1.4512 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.6036 Gflop/s result for spmv_csr_scalar_dp: 1.3024 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.4544 Gflop/s result for spmv_csr_scalar_pad_sp: 1.4590 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.5763 Gflop/s result for spmv_csr_scalar_pad_dp: 1.3669 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.4651 Gflop/s result for spmv_csr_vector_sp: 7.4612 Gflop/s result for spmv_csr_vector_sp_pcie: 0.9078 Gflop/s result for spmv_csr_vector_dp: 5.1155 Gflop/s result for spmv_csr_vector_dp_pcie: 0.6140 Gflop/s result for spmv_csr_vector_pad_sp: 8.2046 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 0.8534 Gflop/s result for spmv_csr_vector_pad_dp: 5.5225 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.6251 Gflop/s result for spmv_ellpackr_sp: 7.2196 Gflop/s result for spmv_ellpackr_dp: 5.5801 Gflop/s Running benchmark Stencil2D result for stencil: 77.6329 GFLOPS result for stencil_dp: 38.4687 GFLOPS Running benchmark Triad result for triad_bw: 5.8195 GB/s Running benchmark S3D result for s3d: 46.6696 GFLOPS result for s3d_pcie: 38.9817 GFLOPS result for s3d_dp: 22.1899 GFLOPS result for s3d_dp_pcie: 18.6963 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: rps Platform selection not specified, default to platform #0 Error collecting device info. Make sure that you are running in the SHOC install root directory (or set -bindir) and any hostfile you set is correct. UMDAR+gkhanna@rps:~/shoc-master/bin$ ./shocdriver -s 4 -cuda --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: rps Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'Tesla K40c' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 10.5497 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 10.5584 GB/sec Running benchmark MaxFlops result for maxspflops: 3743.9700 GFLOPS result for maxdpflops: 1422.6300 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 177.4980 GB/s result for gmem_readbw_strided: 18.2029 GB/s result for gmem_writebw: 173.6630 GB/s result for gmem_writebw_strided: 7.2354 GB/s result for lmem_readbw: 908.4730 GB/s result for lmem_writebw: 1136.6500 GB/s result for tex_readbw: 210.2710 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 1.2082 GB/s result for bfs_pcie: 1.0493 GB/s result for bfs_teps: 71060400.0000 Edges/s Running benchmark FFT result for fft_sp: 530.8750 GFLOPS result for fft_sp_pcie: 53.1111 GFLOPS result for ifft_sp: 530.5440 GFLOPS result for ifft_sp_pcie: 53.2969 GFLOPS result for fft_dp: 265.1410 GFLOPS result for fft_dp_pcie: 26.5918 GFLOPS result for ifft_dp: 265.2750 GFLOPS result for ifft_dp_pcie: 26.6117 GFLOPS Running benchmark GEMM result for sgemm_n: 3115.4900 GFlops result for sgemm_t: 3127.9100 GFlops result for sgemm_n_pcie: 2170.4000 GFlops result for sgemm_t_pcie: 2176.4200 GFlops result for dgemm_n: 1167.7400 GFlops result for dgemm_t: 1234.1700 GFlops result for dgemm_n_pcie: 754.3290 GFlops result for dgemm_t_pcie: 781.4980 GFlops Running benchmark MD result for md_sp_flops: 216.7030 GFLOPS result for md_sp_bw: 166.0740 GB/s result for md_sp_flops_pcie: 41.4332 GFLOPS result for md_sp_bw_pcie: 31.7532 GB/s result for md_dp_flops: 73.2770 GFLOPS result for md_dp_bw: 98.3569 GB/s result for md_dp_flops_pcie: 29.3095 GFLOPS result for md_dp_bw_pcie: 39.3410 GB/s Running benchmark MD5Hash result for md5hash: 2.5847 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 159.2420 GB/s result for reduction_pcie: 9.8453 GB/s result for reduction_dp: 171.1130 GB/s result for reduction_dp_pcie: 9.8886 GB/s Running benchmark Scan result for scan: 48.5570 GB/s result for scan_pcie: 4.7416 GB/s result for scan_dp: 43.2257 GB/s result for scan_dp_pcie: 4.6834 GB/s Running benchmark Sort result for sort: 3.0600 GB/s result for sort_pcie: 1.9360 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 2.4467 Gflop/s result for spmv_csr_scalar_sp_pcie: 1.2389 Gflop/s result for spmv_csr_scalar_dp: 2.1263 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.9452 Gflop/s result for spmv_csr_scalar_pad_sp: 2.9054 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 1.3572 Gflop/s result for spmv_csr_scalar_pad_dp: 2.5846 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.0301 Gflop/s result for spmv_csr_vector_sp: 18.4959 Gflop/s result for spmv_csr_vector_sp_pcie: 2.2118 Gflop/s result for spmv_csr_vector_dp: 17.0999 Gflop/s result for spmv_csr_vector_dp_pcie: 1.5477 Gflop/s result for spmv_csr_vector_pad_sp: 20.2005 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.2613 Gflop/s result for spmv_csr_vector_pad_dp: 18.4636 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.5672 Gflop/s result for spmv_ellpackr_sp: 17.5968 Gflop/s result for spmv_ellpackr_dp: 14.5100 Gflop/s Running benchmark Stencil2D result for stencil: 128.8050 GFLOPS result for stencil_dp: 57.6011 GFLOPS Running benchmark Triad result for triad_bw: 13.6654 GB/s Running benchmark S3D result for s3d: 97.9084 GFLOPS result for s3d_pcie: 83.3313 GFLOPS result for s3d_dp: 51.4110 GFLOPS result for s3d_dp_pcie: 43.4497 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: hpe-blade2 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'Quadro M6000' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 11.2694 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 13.1822 GB/sec Running benchmark MaxFlops result for maxspflops: 6793.2800 GFLOPS result for maxdpflops: 217.0310 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 291.2580 GB/s result for gmem_readbw_strided: 58.4460 GB/s result for gmem_writebw: 272.5860 GB/s result for gmem_writebw_strided: 16.3956 GB/s result for lmem_readbw: 2829.3800 GB/s result for lmem_writebw: 3243.7100 GB/s result for tex_readbw: 345.6680 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 2.0963 GB/s result for bfs_pcie: 1.7253 GB/s result for bfs_teps: 123822000.0000 Edges/s Running benchmark FFT result for fft_sp: 768.9280 GFLOPS result for fft_sp_pcie: 56.7081 GFLOPS result for ifft_sp: 768.1620 GFLOPS result for ifft_sp_pcie: 56.7452 GFLOPS result for fft_dp: 119.9170 GFLOPS result for fft_dp_pcie: 25.1116 GFLOPS result for ifft_dp: 114.1290 GFLOPS result for ifft_dp_pcie: 24.8580 GFLOPS Running benchmark GEMM result for sgemm_n: 6022.9100 GFlops result for sgemm_t: 6198.8500 GFlops result for sgemm_n_pcie: 3399.6800 GFlops result for sgemm_t_pcie: 3455.0300 GFlops result for dgemm_n: 198.9010 GFlops result for dgemm_t: 199.3710 GFlops result for dgemm_n_pcie: 183.3050 GFlops result for dgemm_t_pcie: 183.7040 GFlops Running benchmark MD result for md_sp_flops: 296.4370 GFLOPS result for md_sp_bw: 227.1800 GB/s result for md_sp_flops_pcie: 27.7914 GFLOPS result for md_sp_bw_pcie: 21.2985 GB/s result for md_dp_flops: 77.7643 GFLOPS result for md_dp_bw: 104.3800 GB/s result for md_dp_flops_pcie: 21.4853 GFLOPS result for md_dp_bw_pcie: 28.8389 GB/s Running benchmark MD5Hash result for md5hash: 10.2087 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 246.1450 GB/s result for reduction_pcie: 11.2635 GB/s result for reduction_dp: 281.6780 GB/s result for reduction_dp_pcie: 11.3334 GB/s Running benchmark Scan result for scan: 80.9602 GB/s result for scan_pcie: 5.6934 GB/s result for scan_dp: 75.7911 GB/s result for scan_dp_pcie: 5.6646 GB/s Running benchmark Sort result for sort: 6.2741 GB/s result for sort_pcie: 3.0609 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 13.0318 Gflop/s result for spmv_csr_scalar_sp_pcie: 2.4129 Gflop/s result for spmv_csr_scalar_dp: 9.4197 Gflop/s result for spmv_csr_scalar_dp_pcie: 1.6237 Gflop/s result for spmv_csr_scalar_pad_sp: 16.1822 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 2.3207 Gflop/s result for spmv_csr_scalar_pad_dp: 12.8174 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.5838 Gflop/s result for spmv_csr_vector_sp: 34.1806 Gflop/s result for spmv_csr_vector_sp_pcie: 2.6876 Gflop/s result for spmv_csr_vector_dp: 27.6185 Gflop/s result for spmv_csr_vector_dp_pcie: 1.8295 Gflop/s result for spmv_csr_vector_pad_sp: 35.4663 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.5173 Gflop/s result for spmv_csr_vector_pad_dp: 29.1775 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.6988 Gflop/s result for spmv_ellpackr_sp: 38.3223 Gflop/s result for spmv_ellpackr_dp: 25.6143 Gflop/s Running benchmark Stencil2D result for stencil: 277.4940 GFLOPS result for stencil_dp: 82.5234 GFLOPS Running benchmark Triad result for triad_bw: 15.8217 GB/s Running benchmark S3D result for s3d: 95.8164 GFLOPS result for s3d_pcie: 83.9426 GFLOPS result for s3d_dp: 57.5268 GFLOPS result for s3d_dp_pcie: 49.1857 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: centos7-ppc64le Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 2 Device 0: 'Tesla V100-SXM2-16GB' Device 1: 'Tesla V100-SXM2-16GB' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 37.5824 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 38.8395 GB/sec Running benchmark MaxFlops result for maxspflops: 15516.8000 GFLOPS result for maxdpflops: 7837.7700 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 888.3320 GB/s result for gmem_readbw_strided: 479.0020 GB/s result for gmem_writebw: 742.7190 GB/s result for gmem_writebw_strided: 59.8676 GB/s result for lmem_readbw: 9453.9000 GB/s result for lmem_writebw: 10179.5000 GB/s result for tex_readbw: 1512.2300 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 10.5773 GB/s result for bfs_pcie: 7.3547 GB/s result for bfs_teps: 378866000.0000 Edges/s Running benchmark FFT result for fft_sp: 2278.6600 GFLOPS result for fft_sp_pcie: 175.6960 GFLOPS result for ifft_sp: 2260.2600 GFLOPS result for ifft_sp_pcie: 176.1480 GFLOPS result for fft_dp: 1137.5700 GFLOPS result for fft_dp_pcie: 87.9158 GFLOPS result for ifft_dp: 1128.5800 GFLOPS result for ifft_dp_pcie: 88.1465 GFLOPS Running benchmark GEMM result for sgemm_n: 14643.4000 GFlops result for sgemm_t: 14347.2000 GFlops result for sgemm_n_pcie: 8729.9400 GFlops result for sgemm_t_pcie: 8623.8000 GFlops result for dgemm_n: 6207.9300 GFlops result for dgemm_t: 6213.2300 GFlops result for dgemm_n_pcie: 3372.9900 GFlops result for dgemm_t_pcie: 3374.5600 GFlops Running benchmark MD result for md_sp_flops: 912.5020 GFLOPS result for md_sp_bw: 699.3130 GB/s result for md_sp_flops_pcie: 132.5160 GFLOPS result for md_sp_bw_pcie: 101.5560 GB/s result for md_dp_flops: 820.0680 GFLOPS result for md_dp_bw: 1100.7500 GB/s result for md_dp_flops_pcie: 125.5660 GFLOPS result for md_dp_bw_pcie: 168.5420 GB/s Running benchmark MD5Hash result for md5hash: 34.7245 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 325.9520 GB/s result for reduction_pcie: 34.3715 GB/s result for reduction_dp: 577.3380 GB/s result for reduction_dp_pcie: 35.9960 GB/s Running benchmark Scan result for scan: 198.6420 GB/s result for scan_pcie: 17.0488 GB/s result for scan_dp: 201.1580 GB/s result for scan_dp_pcie: 16.9802 GB/s Running benchmark Sort result for sort: 21.3906 GB/s result for sort_pcie: 9.9923 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 68.7036 Gflop/s result for spmv_csr_scalar_sp_pcie: 6.3301 Gflop/s result for spmv_csr_scalar_dp: 51.8053 Gflop/s result for spmv_csr_scalar_dp_pcie: 4.9438 Gflop/s result for spmv_csr_scalar_pad_sp: 77.8411 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 6.6088 Gflop/s result for spmv_csr_scalar_pad_dp: 60.6291 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 5.2722 Gflop/s result for spmv_csr_vector_sp: 164.1990 Gflop/s result for spmv_csr_vector_sp_pcie: 6.6855 Gflop/s result for spmv_csr_vector_dp: 119.5780 Gflop/s result for spmv_csr_vector_dp_pcie: 5.2050 Gflop/s result for spmv_csr_vector_pad_sp: 172.6370 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 6.9219 Gflop/s result for spmv_csr_vector_pad_dp: 125.1400 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 5.5174 Gflop/s result for spmv_ellpackr_sp: 90.2127 Gflop/s result for spmv_ellpackr_dp: 75.5094 Gflop/s Running benchmark Stencil2D result for stencil: 690.8430 GFLOPS result for stencil_dp: 372.8490 GFLOPS Running benchmark Triad result for triad_bw: 36.0896 GB/s Running benchmark S3D result for s3d: 462.9920 GFLOPS result for s3d_pcie: 363.5560 GFLOPS result for s3d_dp: 237.8210 GFLOPS result for s3d_dp_pcie: 185.8300 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: openpower8 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla P100-SXM2-16GB' Device 1: 'Tesla P100-SXM2-16GB' Device 2: 'Tesla P100-SXM2-16GB' Device 3: 'Tesla P100-SXM2-16GB' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 32.4056 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 33.9957 GB/sec Running benchmark MaxFlops result for maxspflops: 10474.1000 GFLOPS result for maxdpflops: 5316.8900 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 575.3930 GB/s result for gmem_readbw_strided: 99.0885 GB/s result for gmem_writebw: 437.2800 GB/s result for gmem_writebw_strided: 26.4217 GB/s result for lmem_readbw: 4262.8600 GB/s result for lmem_writebw: 5458.3900 GB/s result for tex_readbw: 660.8260 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 4.3473 GB/s result for bfs_pcie: 3.1295 GB/s result for bfs_teps: 218685000.0000 Edges/s Running benchmark FFT result for fft_sp: 1498.2000 GFLOPS result for fft_sp_pcie: 162.4770 GFLOPS result for ifft_sp: 1494.1400 GFLOPS result for ifft_sp_pcie: 162.7580 GFLOPS result for fft_dp: 748.5540 GFLOPS result for fft_dp_pcie: 81.4628 GFLOPS result for ifft_dp: 746.1060 GFLOPS result for ifft_dp_pcie: 81.5918 GFLOPS Running benchmark GEMM result for sgemm_n: 9482.2300 GFlops result for sgemm_t: 9600.8100 GFlops result for sgemm_n_pcie: 6668.8200 GFlops result for sgemm_t_pcie: 6727.2600 GFlops result for dgemm_n: 4660.4800 GFlops result for dgemm_t: 4675.1000 GFlops result for dgemm_n_pcie: 2745.6200 GFlops result for dgemm_t_pcie: 2750.6800 GFlops Running benchmark MD result for md_sp_flops: 479.0120 GFLOPS result for md_sp_bw: 367.1000 GB/s result for md_sp_flops_pcie: 122.1050 GFLOPS result for md_sp_bw_pcie: 93.5773 GB/s result for md_dp_flops: 396.1840 GFLOPS result for md_dp_bw: 531.7830 GB/s result for md_dp_flops_pcie: 112.5230 GFLOPS result for md_dp_bw_pcie: 151.0350 GB/s Running benchmark MD5Hash result for md5hash: 16.3441 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 272.0530 GB/s result for reduction_pcie: 27.4714 GB/s result for reduction_dp: 439.0800 GB/s result for reduction_dp_pcie: 30.1435 GB/s Running benchmark Scan result for scan: 101.2760 GB/s result for scan_pcie: 13.8531 GB/s result for scan_dp: 132.6110 GB/s result for scan_dp_pcie: 14.1119 GB/s Running benchmark Sort result for sort: 12.7440 GB/s result for sort_pcie: 7.1887 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 22.4800 Gflop/s result for spmv_csr_scalar_sp_pcie: 5.7300 Gflop/s result for spmv_csr_scalar_dp: 17.1413 Gflop/s result for spmv_csr_scalar_dp_pcie: 4.0754 Gflop/s result for spmv_csr_scalar_pad_sp: 26.2656 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 5.5985 Gflop/s result for spmv_csr_scalar_pad_dp: 20.3773 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 3.9544 Gflop/s result for spmv_csr_vector_sp: 60.8922 Gflop/s result for spmv_csr_vector_sp_pcie: 6.8191 Gflop/s result for spmv_csr_vector_dp: 47.6407 Gflop/s result for spmv_csr_vector_dp_pcie: 4.8071 Gflop/s result for spmv_csr_vector_pad_sp: 65.8285 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 6.4194 Gflop/s result for spmv_csr_vector_pad_dp: 51.2282 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 4.4776 Gflop/s result for spmv_ellpackr_sp: 50.6938 Gflop/s result for spmv_ellpackr_dp: 40.7505 Gflop/s Running benchmark Stencil2D result for stencil: 445.6380 GFLOPS result for stencil_dp: 273.8350 GFLOPS Running benchmark Triad result for triad_bw: 31.4049 GB/s Running benchmark S3D result for s3d: 287.5210 GFLOPS result for s3d_pcie: 244.9500 GFLOPS result for s3d_dp: 158.6450 GFLOPS result for s3d_dp_pcie: 133.2810 GFLOPS node0011.inband --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: node0011 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla V100-SXM2-32GB' Device 1: 'Tesla V100-SXM2-32GB' Device 2: 'Tesla V100-SXM2-32GB' Device 3: 'Tesla V100-SXM2-32GB' Specified 1 device IDs: 0 Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 70.2028 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 70.4283 GB/sec Running benchmark MaxFlops result for maxspflops: 15603.6000 GFLOPS result for maxdpflops: 7837.3000 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 787.9340 GB/s result for gmem_readbw_strided: 467.8230 GB/s result for gmem_writebw: 718.3410 GB/s result for gmem_writebw_strided: 54.2259 GB/s result for lmem_readbw: 9519.6500 GB/s result for lmem_writebw: 10555.3000 GB/s result for tex_readbw: 1535.2600 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 10.9037 GB/s result for bfs_pcie: 8.6029 GB/s result for bfs_teps: 440881000.0000 Edges/s Running benchmark FFT result for fft_sp: 2275.7500 GFLOPS result for fft_sp_pcie: 312.4370 GFLOPS result for ifft_sp: 2255.0500 GFLOPS result for ifft_sp_pcie: 313.7840 GFLOPS result for fft_dp: 1133.7300 GFLOPS result for fft_dp_pcie: 157.8540 GFLOPS result for ifft_dp: 1120.8200 GFLOPS result for ifft_dp_pcie: 158.2440 GFLOPS Running benchmark GEMM result for sgemm_n: 13994.9000 GFlops result for sgemm_t: 14051.3000 GFlops result for sgemm_n_pcie: 10544.4000 GFlops result for sgemm_t_pcie: 10576.4000 GFlops result for dgemm_n: 6390.1000 GFlops result for dgemm_t: 6382.2000 GFlops result for dgemm_n_pcie: 4346.1100 GFlops result for dgemm_t_pcie: 4342.4600 GFlops Running benchmark MD result for md_sp_flops: 917.2320 GFLOPS result for md_sp_bw: 702.9380 GB/s result for md_sp_flops_pcie: 220.4670 GFLOPS result for md_sp_bw_pcie: 168.9590 GB/s result for md_dp_flops: 834.2070 GFLOPS result for md_dp_bw: 1119.7200 GB/s result for md_dp_flops_pcie: 208.0400 GFLOPS result for md_dp_bw_pcie: 279.2450 GB/s Running benchmark MD5Hash result for md5hash: 34.5590 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 302.6460 GB/s result for reduction_pcie: 55.2395 GB/s result for reduction_dp: 511.3640 GB/s result for reduction_dp_pcie: 59.7797 GB/s Running benchmark Scan result for scan: 172.7680 GB/s result for scan_pcie: 28.1934 GB/s result for scan_dp: 187.2430 GB/s result for scan_dp_pcie: 28.6058 GB/s Running benchmark Sort result for sort: 20.1942 GB/s result for sort_pcie: 12.6884 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 60.9526 Gflop/s result for spmv_csr_scalar_sp_pcie: 10.6094 Gflop/s result for spmv_csr_scalar_dp: 44.6487 Gflop/s result for spmv_csr_scalar_dp_pcie: 7.9400 Gflop/s result for spmv_csr_scalar_pad_sp: 71.8809 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 11.7643 Gflop/s result for spmv_csr_scalar_pad_dp: 55.8149 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 8.4869 Gflop/s result for spmv_csr_vector_sp: 148.0370 Gflop/s result for spmv_csr_vector_sp_pcie: 11.8155 Gflop/s result for spmv_csr_vector_dp: 109.1130 Gflop/s result for spmv_csr_vector_dp_pcie: 8.8681 Gflop/s result for spmv_csr_vector_pad_sp: 151.2460 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 12.8455 Gflop/s result for spmv_csr_vector_pad_dp: 113.3470 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 9.2008 Gflop/s result for spmv_ellpackr_sp: 78.8599 Gflop/s result for spmv_ellpackr_dp: 65.4746 Gflop/s Running benchmark Stencil2D result for stencil: 604.0910 GFLOPS result for stencil_dp: 352.6900 GFLOPS Running benchmark Triad result for triad_bw: 76.5243 GB/s Running benchmark S3D result for s3d: 424.6950 GFLOPS result for s3d_pcie: 373.8320 GFLOPS result for s3d_dp: 224.3360 GFLOPS result for s3d_dp_pcie: 196.3560 GFLOPS ------------------------------------------------------------ Sender: LSF System Subject: Job 1483: in cluster Done Job was submitted from host by user in cluster at Sat Dec 7 07:04:37 2019 Job was executed on host(s) , in queue , as user in cluster at Sat Dec 7 07:04:38 2019 was used as the home directory. was used as the working directory. Started at Sat Dec 7 07:04:38 2019 Terminated at Sat Dec 7 07:19:21 2019 Results reported at Sat Dec 7 07:19:21 2019 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input #BSUB -L /bin/bash #BSUB -J "shoc" #BSUB -o "shoc.%J" #BSUB -e "shoc_e.%J" #BSUB -W 24:00 #BSUB -n 1 #BSUB -R "select[type==any]" #BSUB -gpu "num=4" #BSUB -q "normal" #BSUB -x export LD_LIBRARY_PATH=/usr/mpi/gcc/openmpi-4.0.2a1/lib:$LD_LIBRARY_PATH export PATH=/usr/mpi/gcc/openmpi-4.0.2a1/bin:$PATH cat $LSB_DJOB_HOSTFILE cd /home/khannag/shoc/bin ./shocdriver -s 4 -d 0 -cuda #-hostfile $LSB_DJOB_HOSTFILE ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 879.88 sec. Max Memory : 260 MB Average Memory : 30.11 MB Total Requested Memory : - Delta Memory : - Max Swap : 3 MB Max Processes : 8 Max Threads : 12 Run time : 888 sec. Turnaround time : 884 sec. The output (if any) is above this job summary. PS: Read file for stderr output of this job. --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: node6 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla P100-PCIE-16GB' Device 1: 'Tesla P100-PCIE-16GB' Device 2: 'Tesla P100-PCIE-16GB' Device 3: 'Tesla P100-PCIE-16GB' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 11.7534 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 13.1998 GB/sec Running benchmark MaxFlops result for maxspflops: 9324.5400 GFLOPS result for maxdpflops: 4736.8100 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 573.8520 GB/s result for gmem_readbw_strided: 94.8123 GB/s result for gmem_writebw: 432.6690 GB/s result for gmem_writebw_strided: 26.7780 GB/s result for lmem_readbw: 4026.9600 GB/s result for lmem_writebw: 4931.1900 GB/s result for tex_readbw: 587.9940 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 4.0938 GB/s result for bfs_pcie: 2.8384 GB/s result for bfs_teps: 225749000.0000 Edges/s Running benchmark FFT result for fft_sp: 1499.8700 GFLOPS result for fft_sp_pcie: 64.2650 GFLOPS result for ifft_sp: 1497.3100 GFLOPS result for ifft_sp_pcie: 64.3394 GFLOPS result for fft_dp: 750.8760 GFLOPS result for fft_dp_pcie: 32.1353 GFLOPS result for ifft_dp: 748.2570 GFLOPS result for ifft_dp_pcie: 32.1525 GFLOPS Running benchmark GEMM result for sgemm_n: 8542.1600 GFlops result for sgemm_t: 8564.1300 GFlops result for sgemm_n_pcie: 4236.3600 GFlops result for sgemm_t_pcie: 4241.7600 GFlops result for dgemm_n: 4155.9100 GFlops result for dgemm_t: 4167.4400 GFlops result for dgemm_n_pcie: 1551.6500 GFlops result for dgemm_t_pcie: 1553.2500 GFlops Running benchmark MD result for md_sp_flops: 478.3100 GFLOPS result for md_sp_bw: 366.5620 GB/s result for md_sp_flops_pcie: 50.8975 GFLOPS result for md_sp_bw_pcie: 39.0062 GB/s result for md_dp_flops: 396.7170 GFLOPS result for md_dp_bw: 532.4980 GB/s result for md_dp_flops_pcie: 47.9109 GFLOPS result for md_dp_bw_pcie: 64.3090 GB/s Running benchmark MD5Hash result for md5hash: 14.5623 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 256.4620 GB/s result for reduction_pcie: 11.1231 GB/s result for reduction_dp: 417.1400 GB/s result for reduction_dp_pcie: 11.3156 GB/s Running benchmark Scan result for scan: 97.5881 GB/s result for scan_pcie: 5.8826 GB/s result for scan_dp: 122.8610 GB/s result for scan_dp_pcie: 5.9627 GB/s Running benchmark Sort result for sort: 12.2828 GB/s result for sort_pcie: 4.1531 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 22.4475 Gflop/s result for spmv_csr_scalar_sp_pcie: 2.3961 Gflop/s result for spmv_csr_scalar_dp: 17.0155 Gflop/s result for spmv_csr_scalar_dp_pcie: 1.6899 Gflop/s result for spmv_csr_scalar_pad_sp: 25.9598 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 2.5409 Gflop/s result for spmv_csr_scalar_pad_dp: 19.9898 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.7320 Gflop/s result for spmv_csr_vector_sp: 57.6101 Gflop/s result for spmv_csr_vector_sp_pcie: 2.5633 Gflop/s result for spmv_csr_vector_dp: 45.3832 Gflop/s result for spmv_csr_vector_dp_pcie: 1.8013 Gflop/s result for spmv_csr_vector_pad_sp: 61.9079 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.6934 Gflop/s result for spmv_csr_vector_pad_dp: 48.6586 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.8251 Gflop/s result for spmv_ellpackr_sp: 48.5013 Gflop/s result for spmv_ellpackr_dp: 38.5877 Gflop/s Running benchmark Stencil2D result for stencil: 442.4320 GFLOPS result for stencil_dp: 259.1490 GFLOPS Running benchmark Triad result for triad_bw: 15.7742 GB/s Running benchmark S3D result for s3d: 296.4230 GFLOPS result for s3d_pcie: 203.4860 GFLOPS result for s3d_dp: 158.8710 GFLOPS result for s3d_dp_pcie: 106.7830 GFLOPS [gkhanna@hsw229 bin]$ ./shocdriver -cuda -s 4 --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: hsw229 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 4 Device 0: 'Tesla V100-PCIE-16GB' Device 1: 'Tesla V100-PCIE-16GB' Device 2: 'Tesla V100-PCIE-16GB' Device 3: 'Tesla V100-PCIE-16GB' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 12.4530 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 13.1830 GB/sec Running benchmark MaxFlops result for maxspflops: 14015.2000 GFLOPS result for maxdpflops: 7046.7800 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 893.3680 GB/s result for gmem_readbw_strided: 433.4710 GB/s result for gmem_writebw: 748.5130 GB/s result for gmem_writebw_strided: 61.4618 GB/s result for lmem_readbw: 8344.1700 GB/s result for lmem_writebw: 9246.9500 GB/s result for tex_readbw: 1340.6300 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 9.9041 GB/s result for bfs_pcie: 4.8504 GB/s result for bfs_teps: 462258000.0000 Edges/s Running benchmark FFT result for fft_sp: 2303.7600 GFLOPS result for fft_sp_pcie: 67.6669 GFLOPS result for ifft_sp: 2288.6000 GFLOPS result for ifft_sp_pcie: 67.7517 GFLOPS result for fft_dp: 1148.4700 GFLOPS result for fft_dp_pcie: 33.8672 GFLOPS result for ifft_dp: 1141.0500 GFLOPS result for ifft_dp_pcie: 33.8951 GFLOPS Running benchmark GEMM result for sgemm_n: 13155.4000 GFlops result for sgemm_t: 12939.4000 GFlops result for sgemm_n_pcie: 4909.0400 GFlops result for sgemm_t_pcie: 4878.6400 GFlops result for dgemm_n: 5611.5600 GFlops result for dgemm_t: 5617.9200 GFlops result for dgemm_n_pcie: 1637.5100 GFlops result for dgemm_t_pcie: 1638.0600 GFlops Running benchmark MD result for md_sp_flops: 889.3100 GFLOPS result for md_sp_bw: 681.5390 GB/s result for md_sp_flops_pcie: 56.4029 GFLOPS result for md_sp_bw_pcie: 43.2254 GB/s result for md_dp_flops: 792.8760 GFLOPS result for md_dp_bw: 1064.2500 GB/s result for md_dp_flops_pcie: 53.8825 GFLOPS result for md_dp_bw_pcie: 72.3245 GB/s Running benchmark MD5Hash result for md5hash: 31.1718 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 311.3070 GB/s result for reduction_pcie: 11.8760 GB/s result for reduction_dp: 549.5340 GB/s result for reduction_dp_pcie: 11.2510 GB/s Running benchmark Scan result for scan: 190.3640 GB/s result for scan_pcie: 6.1441 GB/s result for scan_dp: 186.8280 GB/s result for scan_dp_pcie: 6.1644 GB/s Running benchmark Sort result for sort: 20.5054 GB/s result for sort_pcie: 4.8689 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 72.4667 Gflop/s result for spmv_csr_scalar_sp_pcie: 2.8033 Gflop/s result for spmv_csr_scalar_dp: 48.7981 Gflop/s result for spmv_csr_scalar_dp_pcie: 1.9085 Gflop/s result for spmv_csr_scalar_pad_sp: 78.7012 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 2.8672 Gflop/s result for spmv_csr_scalar_pad_dp: 58.2138 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.9355 Gflop/s result for spmv_csr_vector_sp: 157.5750 Gflop/s result for spmv_csr_vector_sp_pcie: 2.8627 Gflop/s result for spmv_csr_vector_dp: 112.7430 Gflop/s result for spmv_csr_vector_dp_pcie: 1.9520 Gflop/s result for spmv_csr_vector_pad_sp: 164.5800 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.9227 Gflop/s result for spmv_csr_vector_pad_dp: 118.9950 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.9689 Gflop/s result for spmv_ellpackr_sp: 85.0935 Gflop/s result for spmv_ellpackr_dp: 62.0023 Gflop/s Running benchmark Stencil2D result for stencil: 648.2180 GFLOPS result for stencil_dp: 364.6100 GFLOPS Running benchmark Triad result for triad_bw: 15.6118 GB/s Running benchmark S3D result for s3d: 454.7370 GFLOPS result for s3d_pcie: 270.9480 GFLOPS result for s3d_dp: 233.7540 GFLOPS result for s3d_dp_pcie: 138.3770 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: fusion2 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'AMD Accelerated Parallel Processing': 3 Device 0: Spectre Device 1: Hawaii Device 2: AMD A10-7850K APU with Radeon(TM) R7 Graphics Device selection not specified: defaulting to device #0. Using size class: 3 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 9.4583 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 10.0571 GB/sec Running benchmark MaxFlops result for maxspflops: 733.9700 GFLOPS result for maxdpflops: 45.9488 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 23.3226 GB/s result for gmem_readbw_strided: 15.3657 GB/s result for gmem_writebw: 17.1887 GB/s result for gmem_writebw_strided: 6.8982 GB/s result for lmem_readbw: 266.7080 GB/s result for lmem_writebw: 294.7120 GB/s result for tex_readbw: 63.3353 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0378 sec Running benchmark QueueDelay result for ocl_queue: 0.0415 ms Running benchmark BFS result for bfs: 0.6879 GB/s result for bfs_pcie: 0.6375 GB/s result for bfs_teps: 13731100.0000 Edges/s Running benchmark FFT result for fft_sp: 49.0251 GFLOPS result for fft_sp_pcie: 17.1015 GFLOPS result for ifft_sp: 54.5175 GFLOPS result for ifft_sp_pcie: 17.7244 GFLOPS result for fft_dp: 10.5905 GFLOPS result for fft_dp_pcie: 5.8503 GFLOPS result for ifft_dp: 10.3274 GFLOPS result for ifft_dp_pcie: 5.7691 GFLOPS Running benchmark GEMM result for sgemm_n: 170.2290 GFLOPS result for sgemm_t: 150.3310 GFLOPS result for sgemm_n_pcie: 156.6420 GFLOPS result for sgemm_t_pcie: 140.1260 GFLOPS result for dgemm_n: 31.9007 GFLOPS result for dgemm_t: 33.6603 GFLOPS result for dgemm_n_pcie: 30.0337 GFLOPS result for dgemm_t_pcie: 31.4858 GFLOPS Running benchmark MD result for md_sp_flops: 17.1391 GFLOPS result for md_sp_bw: 13.1349 GB/s result for md_sp_flops_pcie: 12.1380 GFLOPS result for md_sp_bw_pcie: 9.3022 GB/s result for md_dp_flops: 7.8838 GFLOPS result for md_dp_bw: 10.5821 GB/s result for md_dp_flops_pcie: 6.5560 GFLOPS result for md_dp_bw_pcie: 8.7998 GB/s Running benchmark MD5Hash result for md5hash: 0.9040 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 13.2883 GB/s result for reduction_pcie: 5.4244 GB/s result for reduction_dp: 12.9653 GB/s result for reduction_dp_pcie: 5.3985 GB/s Running benchmark Scan result for scan: 6.1895 GB/s result for scan_pcie: 2.6390 GB/s result for scan_dp: 6.3924 GB/s result for scan_dp_pcie: 2.6887 GB/s Running benchmark Sort result for sort: 0.2309 GB/s result for sort_pcie: 0.2199 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 0.3282 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.2838 Gflop/s result for spmv_csr_scalar_dp: 0.2655 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.2253 Gflop/s result for spmv_csr_scalar_pad_sp: 0.3101 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.2725 Gflop/s result for spmv_csr_scalar_pad_dp: 0.2688 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.2277 Gflop/s result for spmv_csr_vector_sp: 2.8263 Gflop/s result for spmv_csr_vector_sp_pcie: 1.2039 Gflop/s result for spmv_csr_vector_dp: 1.7109 Gflop/s result for spmv_csr_vector_dp_pcie: 0.7959 Gflop/s result for spmv_csr_vector_pad_sp: 2.9978 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 1.2850 Gflop/s result for spmv_csr_vector_pad_dp: 1.8055 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.8162 Gflop/s result for spmv_ellpackr_sp: 3.7735 Gflop/s result for spmv_ellpackr_dp: 2.4520 Gflop/s Running benchmark Stencil2D result for stencil: 14.4530 GFLOPS result for stencil_dp: 9.4046 GFLOPS Running benchmark Triad result for triad_bw: 5.8931 GB/s Running benchmark S3D result for s3d: 8.0729 GFLOPS result for s3d_pcie: 7.9218 GFLOPS result for s3d_dp: 3.6208 GFLOPS result for s3d_dp_pcie: 3.5417 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: PrimoChillHasher Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'AMD Accelerated Parallel Processing': 4 Device 0: Tahiti Device 1: Tahiti Device 2: Tahiti Device 3: AMD Sempron(tm) 145 Processor Specified 1 device IDs: 1 Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 2.8530 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 3.3703 GB/sec Running benchmark MaxFlops result for maxspflops: 2924.2200 GFLOPS result for maxdpflops: 1111.3100 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 241.2070 GB/s result for gmem_readbw_strided: 53.5042 GB/s result for gmem_writebw: 142.7710 GB/s result for gmem_writebw_strided: 7.0469 GB/s result for lmem_readbw: 1509.8500 GB/s result for lmem_writebw: 1620.5500 GB/s result for tex_readbw: 219.2110 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0487 sec Running benchmark QueueDelay result for ocl_queue: 0.0251 ms Running benchmark BFS result for bfs: 4.1851 GB/s result for bfs_pcie: 1.9291 GB/s result for bfs_teps: 127772000.0000 Edges/s Running benchmark FFT result for fft_sp: 507.5920 GFLOPS result for fft_sp_pcie: 8.3855 GFLOPS result for ifft_sp: 509.1450 GFLOPS result for ifft_sp_pcie: 8.3859 GFLOPS result for fft_dp: 198.4020 GFLOPS result for fft_dp_pcie: 4.1721 GFLOPS result for ifft_dp: 197.5000 GFLOPS result for ifft_dp_pcie: 4.1717 GFLOPS Running benchmark GEMM result for sgemm_n: 1636.1700 GFLOPS result for sgemm_t: 373.6770 GFLOPS result for sgemm_n_pcie: 791.4800 GFLOPS result for sgemm_t_pcie: 300.4890 GFLOPS result for dgemm_n: 269.9670 GFLOPS result for dgemm_t: 451.8860 GFLOPS result for dgemm_n_pcie: 157.7230 GFLOPS result for dgemm_t_pcie: 205.9680 GFLOPS Running benchmark MD result for md_sp_flops: 118.7090 GFLOPS result for md_sp_bw: 90.9753 GB/s result for md_sp_flops_pcie: 10.8540 GFLOPS result for md_sp_bw_pcie: 8.3182 GB/s result for md_dp_flops: 66.5612 GFLOPS result for md_dp_bw: 89.3426 GB/s result for md_dp_flops_pcie: 9.0214 GFLOPS result for md_dp_bw_pcie: 12.1091 GB/s Running benchmark MD5Hash result for md5hash: 5.5687 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 210.5410 GB/s result for reduction_pcie: 2.8790 GB/s result for reduction_dp: 219.1540 GB/s result for reduction_dp_pcie: 2.8880 GB/s Running benchmark Scan result for scan: 46.7588 GB/s result for scan_pcie: 1.5212 GB/s result for scan_dp: 57.3697 GB/s result for scan_dp_pcie: 1.5326 GB/s Running benchmark Sort result for sort: 0.7080 GB/s result for sort_pcie: 0.4880 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 2.1216 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.4278 Gflop/s result for spmv_csr_scalar_dp: 2.0856 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.3223 Gflop/s result for spmv_csr_scalar_pad_sp: 2.1259 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.5402 Gflop/s result for spmv_csr_scalar_pad_dp: 2.0701 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.3884 Gflop/s result for spmv_csr_vector_sp: 17.8587 Gflop/s result for spmv_csr_vector_sp_pcie: 0.5202 Gflop/s result for spmv_csr_vector_dp: 11.4729 Gflop/s result for spmv_csr_vector_dp_pcie: 0.3689 Gflop/s result for spmv_csr_vector_pad_sp: 18.7726 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 0.6973 Gflop/s result for spmv_csr_vector_pad_dp: 11.9912 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.4597 Gflop/s result for spmv_ellpackr_sp: 14.7010 Gflop/s result for spmv_ellpackr_dp: 10.0615 Gflop/s Running benchmark Stencil2D result for stencil: 194.2900 GFLOPS result for stencil_dp: 100.8090 GFLOPS Running benchmark Triad result for triad_bw: 2.9971 GB/s Running benchmark S3D result for s3d: 31.7470 GFLOPS result for s3d_pcie: 25.0674 GFLOPS result for s3d_dp: 16.4525 GFLOPS result for s3d_dp_pcie: 13.7010 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: fusion2 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'AMD Accelerated Parallel Processing': 3 Device 0: Spectre Device 1: Hawaii Device 2: AMD A10-7850K APU with Radeon(TM) R7 Graphics Specified 1 device IDs: 1 Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 12.9956 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 14.0332 GB/sec Running benchmark MaxFlops result for maxspflops: 5770.2800 GFLOPS result for maxdpflops: 723.1400 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 304.3200 GB/s result for gmem_readbw_strided: 91.0439 GB/s result for gmem_writebw: 227.5090 GB/s result for gmem_writebw_strided: 7.4868 GB/s result for lmem_readbw: 2280.4600 GB/s result for lmem_writebw: 2124.5500 GB/s result for tex_readbw: 270.2890 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0370 sec Running benchmark QueueDelay result for ocl_queue: 0.0321 ms Running benchmark BFS result for bfs: 6.3234 GB/s result for bfs_pcie: 3.4146 GB/s result for bfs_teps: 169200000.0000 Edges/s Running benchmark FFT result for fft_sp: 733.7130 GFLOPS result for fft_sp_pcie: 35.4249 GFLOPS result for ifft_sp: 726.8800 GFLOPS result for ifft_sp_pcie: 35.4088 GFLOPS result for fft_dp: 133.3150 GFLOPS result for fft_dp_pcie: 16.2466 GFLOPS result for ifft_dp: 130.9270 GFLOPS result for ifft_dp_pcie: 16.2106 GFLOPS Running benchmark GEMM result for sgemm_n: 2178.6500 GFLOPS result for sgemm_t: 609.6360 GFLOPS result for sgemm_n_pcie: 1617.8900 GFLOPS result for sgemm_t_pcie: 554.1110 GFLOPS result for dgemm_n: 289.3780 GFLOPS result for dgemm_t: 425.1440 GFLOPS result for dgemm_n_pcie: 241.4730 GFLOPS result for dgemm_t_pcie: 329.3820 GFLOPS Running benchmark MD result for md_sp_flops: 197.0630 GFLOPS result for md_sp_bw: 151.0230 GB/s result for md_sp_flops_pcie: 31.4837 GFLOPS result for md_sp_bw_pcie: 24.1281 GB/s result for md_dp_flops: 76.2366 GFLOPS result for md_dp_bw: 102.3300 GB/s result for md_dp_flops_pcie: 25.4630 GFLOPS result for md_dp_bw_pcie: 34.1780 GB/s Running benchmark MD5Hash result for md5hash: 7.2531 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 223.4750 GB/s result for reduction_pcie: 11.0730 GB/s result for reduction_dp: 265.3600 GB/s result for reduction_dp_pcie: 11.1914 GB/s Running benchmark Scan result for scan: 45.1651 GB/s result for scan_pcie: 5.5692 GB/s result for scan_dp: 64.1836 GB/s result for scan_dp_pcie: 5.7749 GB/s Running benchmark Sort result for sort: 0.6890 GB/s result for sort_pcie: 0.6195 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 2.2487 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.9859 Gflop/s result for spmv_csr_scalar_dp: 2.2081 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.7480 Gflop/s result for spmv_csr_scalar_pad_sp: 2.3250 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.9900 Gflop/s result for spmv_csr_scalar_pad_dp: 2.2542 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.8888 Gflop/s result for spmv_csr_vector_sp: 25.7542 Gflop/s result for spmv_csr_vector_sp_pcie: 1.6436 Gflop/s result for spmv_csr_vector_dp: 14.4429 Gflop/s result for spmv_csr_vector_dp_pcie: 1.0489 Gflop/s result for spmv_csr_vector_pad_sp: 27.1675 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 1.6212 Gflop/s result for spmv_csr_vector_pad_dp: 15.0622 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.3373 Gflop/s result for spmv_ellpackr_sp: 16.3054 Gflop/s result for spmv_ellpackr_dp: 11.3079 Gflop/s Running benchmark Stencil2D result for stencil: 229.4680 GFLOPS result for stencil_dp: 96.6270 GFLOPS Running benchmark Triad result for triad_bw: 8.2560 GB/s Running benchmark S3D result for s3d: 102.7550 GFLOPS result for s3d_pcie: 78.2243 GFLOPS result for s3d_dp: 49.1849 GFLOPS result for s3d_dp_pcie: 38.0105 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: fusion2 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'AMD Accelerated Parallel Processing': 2 Device 0: Fiji Device 1: AMD A10-7850K APU with Radeon(TM) R7 Graphics Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 12.9545 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 14.1924 GB/sec Running benchmark MaxFlops result for maxspflops: 8563.4200 GFLOPS result for maxdpflops: 537.4960 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 487.6300 GB/s result for gmem_readbw_strided: 114.6660 GB/s result for gmem_writebw: 447.9930 GB/s result for gmem_writebw_strided: 12.5633 GB/s result for lmem_readbw: 3436.8000 GB/s result for lmem_writebw: 3412.9000 GB/s result for tex_readbw: 286.1490 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0364 sec Running benchmark QueueDelay result for ocl_queue: 0.0268 ms Running benchmark BFS result for bfs: 5.8897 GB/s result for bfs_pcie: 3.9657 GB/s result for bfs_teps: 152044000.0000 Edges/s Running benchmark FFT result for fft_sp: 892.9480 GFLOPS result for fft_sp_pcie: 35.9890 GFLOPS result for ifft_sp: 812.3850 GFLOPS result for ifft_sp_pcie: 35.8458 GFLOPS result for fft_dp: 121.8150 GFLOPS result for fft_dp_pcie: 16.1560 GFLOPS result for ifft_dp: 118.4490 GFLOPS result for ifft_dp_pcie: 16.0954 GFLOPS Running benchmark GEMM result for sgemm_n: 3256.5400 GFLOPS result for sgemm_t: 774.3610 GFLOPS result for sgemm_n_pcie: 2156.6100 GFLOPS result for sgemm_t_pcie: 689.8400 GFLOPS result for dgemm_n: 489.1720 GFLOPS result for dgemm_t: 494.4170 GFLOPS result for dgemm_n_pcie: 368.4550 GFLOPS result for dgemm_t_pcie: 371.4430 GFLOPS Running benchmark MD result for md_sp_flops: 288.9330 GFLOPS result for md_sp_bw: 221.4300 GB/s result for md_sp_flops_pcie: 40.4597 GFLOPS result for md_sp_bw_pcie: 31.0071 GB/s result for md_dp_flops: 133.7290 GFLOPS result for md_dp_bw: 179.5000 GB/s result for md_dp_flops_pcie: 30.7099 GFLOPS result for md_dp_bw_pcie: 41.2208 GB/s Running benchmark MD5Hash result for md5hash: 10.3697 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 161.5130 GB/s result for reduction_pcie: 11.7531 GB/s result for reduction_dp: 278.9370 GB/s result for reduction_dp_pcie: 12.0743 GB/s Running benchmark Scan result for scan: 42.1121 GB/s result for scan_pcie: 5.6862 GB/s result for scan_dp: 66.9224 GB/s result for scan_dp_pcie: 5.9863 GB/s Running benchmark Sort result for sort: 0.7462 GB/s result for sort_pcie: 0.6660 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 4.0236 Gflop/s result for spmv_csr_scalar_sp_pcie: 1.2893 Gflop/s result for spmv_csr_scalar_dp: 3.7863 Gflop/s result for spmv_csr_scalar_dp_pcie: 1.0391 Gflop/s result for spmv_csr_scalar_pad_sp: 4.4426 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 1.8123 Gflop/s result for spmv_csr_scalar_pad_dp: 4.6987 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.4350 Gflop/s result for spmv_csr_vector_sp: 25.8192 Gflop/s result for spmv_csr_vector_sp_pcie: 1.7669 Gflop/s result for spmv_csr_vector_dp: 14.5406 Gflop/s result for spmv_csr_vector_dp_pcie: 1.3038 Gflop/s result for spmv_csr_vector_pad_sp: 27.2547 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.7515 Gflop/s result for spmv_csr_vector_pad_dp: 15.1746 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.8184 Gflop/s result for spmv_ellpackr_sp: 11.1322 Gflop/s result for spmv_ellpackr_dp: 10.2750 Gflop/s Running benchmark Stencil2D result for stencil: 354.8750 GFLOPS result for stencil_dp: 148.9640 GFLOPS Running benchmark Triad result for triad_bw: 10.9797 GB/s Running benchmark S3D result for s3d: 121.4070 GFLOPS result for s3d_pcie: 97.3392 GFLOPS result for s3d_dp: 48.8817 GFLOPS result for s3d_dp_pcie: 41.2434 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: Ryzen-2400G Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'AMD Accelerated Parallel Processing': 1 Device 0: gfx902 Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 4.5159 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 5.5676 GB/sec Running benchmark MaxFlops result for maxspflops: 1738.0500 GFLOPS result for maxdpflops: 108.8340 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 34.0575 GB/s result for gmem_readbw_strided: 15.7989 GB/s result for gmem_writebw: 35.5090 GB/s result for gmem_writebw_strided: 11.3556 GB/s result for lmem_readbw: 250.8320 GB/s result for lmem_writebw: 189.5720 GB/s result for tex_readbw: 73.5553 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0172 sec Running benchmark QueueDelay result for ocl_queue: 0.0850 ms Running benchmark BFS result for bfs: 2.9356 GB/s result for bfs_pcie: 2.6292 GB/s result for bfs_teps: 27254800.0000 Edges/s Running benchmark FFT result for fft_sp: 94.4112 GFLOPS result for fft_sp_pcie: 11.1313 GFLOPS result for ifft_sp: 94.4142 GFLOPS result for ifft_sp_pcie: 11.1313 GFLOPS result for fft_dp: 26.2853 GFLOPS result for fft_dp_pcie: 5.0869 GFLOPS result for ifft_dp: 25.7335 GFLOPS result for ifft_dp_pcie: 5.0659 GFLOPS Running benchmark GEMM result for sgemm_n: 257.1490 GFLOPS result for sgemm_t: 232.2160 GFLOPS result for sgemm_n_pcie: 232.9940 GFLOPS result for sgemm_t_pcie: 212.8110 GFLOPS result for dgemm_n: 98.0633 GFLOPS result for dgemm_t: 100.4260 GFLOPS result for dgemm_n_pcie: 86.8044 GFLOPS result for dgemm_t_pcie: 88.0993 GFLOPS Running benchmark MD result for md_sp_flops: 35.0808 GFLOPS result for md_sp_bw: 26.8848 GB/s result for md_sp_flops_pcie: 25.3283 GFLOPS result for md_sp_bw_pcie: 19.4109 GB/s result for md_dp_flops: 13.5267 GFLOPS result for md_dp_bw: 18.1564 GB/s result for md_dp_flops_pcie: 11.7190 GFLOPS result for md_dp_bw_pcie: 15.7299 GB/s Running benchmark MD5Hash result for md5hash: 2.2729 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 33.3499 GB/s result for reduction_pcie: 11.7260 GB/s result for reduction_dp: 32.8573 GB/s result for reduction_dp_pcie: 11.6645 GB/s Running benchmark Scan result for scan: 11.3841 GB/s result for scan_pcie: 5.7459 GB/s result for scan_dp: 11.2068 GB/s result for scan_dp_pcie: 5.7569 GB/s Running benchmark Sort result for sort: 0.3766 GB/s result for sort_pcie: 0.3641 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 11.4282 Gflop/s result for spmv_csr_scalar_sp_pcie: 4.6117 Gflop/s result for spmv_csr_scalar_dp: 9.4288 Gflop/s result for spmv_csr_scalar_dp_pcie: 3.3349 Gflop/s result for spmv_csr_scalar_pad_sp: 11.7225 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 4.3330 Gflop/s result for spmv_csr_scalar_pad_dp: 10.3244 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 3.4181 Gflop/s result for spmv_csr_vector_sp: 31.1743 Gflop/s result for spmv_csr_vector_sp_pcie: 6.1967 Gflop/s result for spmv_csr_vector_dp: 18.9608 Gflop/s result for spmv_csr_vector_dp_pcie: 4.0562 Gflop/s result for spmv_csr_vector_pad_sp: 32.2182 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 5.6649 Gflop/s result for spmv_csr_vector_pad_dp: 19.6488 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 4.0552 Gflop/s result for spmv_ellpackr_sp: 26.0593 Gflop/s result for spmv_ellpackr_dp: 17.7164 Gflop/s Running benchmark Stencil2D result for stencil: 39.0051 GFLOPS result for stencil_dp: 20.6794 GFLOPS Running benchmark Triad result for triad_bw: 6.9503 GB/s Running benchmark S3D result for s3d: 0.3579 GFLOPS result for s3d_pcie: 0.3578 GFLOPS result for s3d_dp: 0.1649 GFLOPS result for s3d_dp_pcie: 0.1649 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: rps2.cscvr Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 2 Device 0: 'TITAN X (Pascal)' Device 1: 'Quadro K420' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 12.0136 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 13.2086 GB/sec Running benchmark MaxFlops result for maxspflops: 13277.9000 GFLOPS result for maxdpflops: 416.1190 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 269.5980 GB/s result for gmem_readbw_strided: 46.4687 GB/s result for gmem_writebw: 265.9080 GB/s result for gmem_writebw_strided: 11.8630 GB/s result for lmem_readbw: 5158.7400 GB/s result for lmem_writebw: 5911.4700 GB/s result for tex_readbw: 623.6690 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 4.0639 GB/s result for bfs_pcie: 2.8594 GB/s result for bfs_teps: 232000000.0000 Edges/s Running benchmark FFT result for fft_sp: 983.8410 GFLOPS result for fft_sp_pcie: 63.2543 GFLOPS result for ifft_sp: 984.2930 GFLOPS result for ifft_sp_pcie: 63.2756 GFLOPS result for fft_dp: 232.3450 GFLOPS result for fft_dp_pcie: 29.4890 GFLOPS result for ifft_dp: 221.0550 GFLOPS result for ifft_dp_pcie: 29.3255 GFLOPS Running benchmark GEMM result for sgemm_n: 11173.8000 GFlops result for sgemm_t: 11290.9000 GFlops result for sgemm_n_pcie: 4804.7300 GFlops result for sgemm_t_pcie: 4826.2600 GFlops result for dgemm_n: 369.0740 GFlops result for dgemm_t: 351.4620 GFlops result for dgemm_n_pcie: 321.3210 GFlops result for dgemm_t_pcie: 307.8890 GFlops Running benchmark MD result for md_sp_flops: 425.4410 GFLOPS result for md_sp_bw: 326.0450 GB/s result for md_sp_flops_pcie: 24.0118 GFLOPS result for md_sp_bw_pcie: 18.4019 GB/s result for md_dp_flops: 154.2430 GFLOPS result for md_dp_bw: 207.0340 GB/s result for md_dp_flops_pcie: 19.6782 GFLOPS result for md_dp_bw_pcie: 26.4133 GB/s Running benchmark MD5Hash result for md5hash: 19.6332 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 335.8420 GB/s result for reduction_pcie: 11.5124 GB/s result for reduction_dp: 381.3280 GB/s result for reduction_dp_pcie: 11.5718 GB/s Running benchmark Scan result for scan: 116.0220 GB/s result for scan_pcie: 5.9663 GB/s result for scan_dp: 109.5280 GB/s result for scan_dp_pcie: 5.9506 GB/s Running benchmark Sort result for sort: 10.1044 GB/s result for sort_pcie: 3.8781 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 12.1854 Gflop/s result for spmv_csr_scalar_sp_pcie: 2.3314 Gflop/s result for spmv_csr_scalar_dp: 8.9481 Gflop/s result for spmv_csr_scalar_dp_pcie: 1.5953 Gflop/s result for spmv_csr_scalar_pad_sp: 14.9000 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 2.4336 Gflop/s result for spmv_csr_scalar_pad_dp: 11.6320 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.6721 Gflop/s result for spmv_csr_vector_sp: 60.1334 Gflop/s result for spmv_csr_vector_sp_pcie: 2.7507 Gflop/s result for spmv_csr_vector_dp: 47.0609 Gflop/s result for spmv_csr_vector_dp_pcie: 1.8643 Gflop/s result for spmv_csr_vector_pad_sp: 64.7440 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 2.7827 Gflop/s result for spmv_csr_vector_pad_dp: 51.2351 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.8808 Gflop/s result for spmv_ellpackr_sp: 53.5020 Gflop/s result for spmv_ellpackr_dp: 39.3846 Gflop/s Running benchmark Stencil2D result for stencil: 385.8600 GFLOPS result for stencil_dp: 125.3190 GFLOPS Running benchmark Triad result for triad_bw: 15.7121 GB/s Running benchmark S3D result for s3d: 147.2780 GFLOPS result for s3d_pcie: 121.0070 GFLOPS result for s3d_dp: 85.5443 GFLOPS result for s3d_dp_pcie: 68.3721 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: Tesla-K40 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'NVIDIA CUDA': 1 Device 0: Tesla K40c Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 5.5866 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 6.7162 GB/sec Running benchmark MaxFlops result for maxspflops: 3747.9300 GFLOPS result for maxdpflops: 1406.8400 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 175.4230 GB/s result for gmem_readbw_strided: 20.6230 GB/s result for gmem_writebw: 189.0680 GB/s result for gmem_writebw_strided: 7.0236 GB/s result for lmem_readbw: 1095.6400 GB/s result for lmem_writebw: 1069.9200 GB/s result for tex_readbw: 208.7580 GB/sec Running benchmark KernelCompile result for ocl_kernel: 0.0001 sec Running benchmark QueueDelay result for ocl_queue: 0.0048 ms Running benchmark BFS result for bfs: 1.8931 GB/s result for bfs_pcie: 1.4860 GB/s result for bfs_teps: 72544700.0000 Edges/s Running benchmark FFT result for fft_sp: 86.1434 GFLOPS result for fft_sp_pcie: 14.2924 GFLOPS result for ifft_sp: 85.2218 GFLOPS result for ifft_sp_pcie: 14.2668 GFLOPS result for fft_dp: 40.5466 GFLOPS result for fft_dp_pcie: 7.0747 GFLOPS result for ifft_dp: 38.8484 GFLOPS result for ifft_dp_pcie: 7.0211 GFLOPS Running benchmark GEMM result for sgemm_n: 709.8790 GFLOPS result for sgemm_t: 804.8770 GFLOPS result for sgemm_n_pcie: 573.3100 GFLOPS result for sgemm_t_pcie: 633.7660 GFLOPS result for dgemm_n: 410.0920 GFLOPS result for dgemm_t: 447.2420 GFLOPS result for dgemm_n_pcie: 264.3660 GFLOPS result for dgemm_t_pcie: 279.3100 GFLOPS Running benchmark MD result for md_sp_flops: 90.9587 GFLOPS result for md_sp_bw: 69.7080 GB/s result for md_sp_flops_pcie: 21.2305 GFLOPS result for md_sp_bw_pcie: 16.2704 GB/s result for md_dp_flops: 42.9679 GFLOPS result for md_dp_bw: 57.6742 GB/s result for md_dp_flops_pcie: 16.3030 GFLOPS result for md_dp_bw_pcie: 21.8829 GB/s Running benchmark MD5Hash result for md5hash: 2.7985 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 159.1570 GB/s result for reduction_pcie: 5.3959 GB/s result for reduction_dp: 170.5970 GB/s result for reduction_dp_pcie: 5.4069 GB/s Running benchmark Scan result for scan: 40.5280 GB/s result for scan_pcie: 2.8359 GB/s result for scan_dp: 40.0624 GB/s result for scan_dp_pcie: 2.8334 GB/s Running benchmark Sort result for sort: 0.5453 GB/s result for sort_pcie: 0.4625 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 3.0055 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.8270 Gflop/s result for spmv_csr_scalar_dp: 2.2106 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.5771 Gflop/s result for spmv_csr_scalar_pad_sp: 4.0317 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.9009 Gflop/s result for spmv_csr_scalar_pad_dp: 2.9786 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.6117 Gflop/s result for spmv_csr_vector_sp: 17.2290 Gflop/s result for spmv_csr_vector_sp_pcie: 1.0700 Gflop/s result for spmv_csr_vector_dp: 15.3249 Gflop/s result for spmv_csr_vector_dp_pcie: 0.7431 Gflop/s result for spmv_csr_vector_pad_sp: 18.6977 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 1.0924 Gflop/s result for spmv_csr_vector_pad_dp: 16.7128 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.7359 Gflop/s result for spmv_ellpackr_sp: 14.8203 Gflop/s result for spmv_ellpackr_dp: 13.7010 Gflop/s Running benchmark Stencil2D result for stencil: 163.3910 GFLOPS result for stencil_dp: 78.4505 GFLOPS Running benchmark Triad result for triad_bw: 5.7070 GB/s Running benchmark S3D result for s3d: 91.4162 GFLOPS result for s3d_pcie: 64.5822 GFLOPS result for s3d_dp: 42.2205 GFLOPS result for s3d_dp_pcie: 31.6346 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: tegra-ubuntu Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'NVIDIA Tegra X1' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 10.4479 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 10.6209 GB/sec Running benchmark MaxFlops result for maxspflops: 497.4530 GFLOPS result for maxdpflops: 15.8070 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 10.4948 GB/s result for gmem_readbw_strided: 5.7953 GB/s result for gmem_writebw: 6.7195 GB/s result for gmem_writebw_strided: 1.7666 GB/s result for lmem_readbw: 220.0790 GB/s result for lmem_writebw: 250.2780 GB/s result for tex_readbw: 58.5162 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 0.2199 GB/s result for bfs_pcie: 0.2127 GB/s result for bfs_teps: 13256500.0000 Edges/s Running benchmark FFT result for fft_sp: 57.8030 GFLOPS result for fft_sp_pcie: 22.1869 GFLOPS result for ifft_sp: 57.7692 GFLOPS result for ifft_sp_pcie: 22.2449 GFLOPS result for fft_dp: 9.2130 GFLOPS result for fft_dp_pcie: 4.2913 GFLOPS result for ifft_dp: 8.7599 GFLOPS result for ifft_dp_pcie: 4.1936 GFLOPS Running benchmark GEMM result for sgemm_n: 406.6730 GFlops result for sgemm_t: 366.5210 GFlops result for sgemm_n_pcie: 384.1860 GFlops result for sgemm_t_pcie: 348.1550 GFlops result for dgemm_n: 14.9555 GFlops result for dgemm_t: 14.9957 GFlops result for dgemm_n_pcie: 14.8290 GFlops result for dgemm_t_pcie: 14.8685 GFlops Running benchmark MD result for md_sp_flops: 5.8807 GFLOPS result for md_sp_bw: 4.5068 GB/s result for md_sp_flops_pcie: 5.2318 GFLOPS result for md_sp_bw_pcie: 4.0095 GB/s result for md_dp_flops: 4.6373 GFLOPS result for md_dp_bw: 6.2245 GB/s result for md_dp_flops_pcie: 3.9653 GFLOPS result for md_dp_bw_pcie: 5.3224 GB/s Running benchmark MD5Hash result for md5hash: 0.7579 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 21.8378 GB/s result for reduction_pcie: 6.9811 GB/s result for reduction_dp: 21.8140 GB/s result for reduction_dp_pcie: 6.9246 GB/s Running benchmark Scan result for scan: 6.6754 GB/s result for scan_pcie: 2.8640 GB/s result for scan_dp: 6.5805 GB/s result for scan_dp_pcie: 2.3433 GB/s Running benchmark Sort result for sort: 0.4502 GB/s result for sort_pcie: 0.4143 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 0.6642 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.5079 Gflop/s result for spmv_csr_scalar_dp: 0.5428 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.3253 Gflop/s result for spmv_csr_scalar_pad_sp: 0.7396 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.4615 Gflop/s result for spmv_csr_scalar_pad_dp: 0.6514 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.3661 Gflop/s result for spmv_csr_vector_sp: 2.8529 Gflop/s result for spmv_csr_vector_sp_pcie: 1.2287 Gflop/s result for spmv_csr_vector_dp: 2.0670 Gflop/s result for spmv_csr_vector_dp_pcie: 0.5817 Gflop/s result for spmv_csr_vector_pad_sp: 2.9434 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 0.8659 Gflop/s result for spmv_csr_vector_pad_dp: 2.2921 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.6125 Gflop/s result for spmv_ellpackr_sp: 2.9034 Gflop/s result for spmv_ellpackr_dp: 2.0906 Gflop/s Running benchmark Stencil2D result for stencil: 19.0748 GFLOPS result for stencil_dp: 4.9999 GFLOPS Running benchmark Triad result for triad_bw: 6.6636 GB/s Running benchmark S3D result for s3d: 7.1068 GFLOPS result for s3d_pcie: 7.0163 GFLOPS result for s3d_dp: 4.3247 GFLOPS result for s3d_dp_pcie: 4.2269 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: tegra-ubuntu Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'GM20B' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 10.4686 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 10.6373 GB/sec Running benchmark MaxFlops result for maxspflops: 484.5220 GFLOPS result for maxdpflops: 15.7751 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 15.1397 GB/s result for gmem_readbw_strided: 6.0576 GB/s result for gmem_writebw: 12.7893 GB/s result for gmem_writebw_strided: 1.9888 GB/s result for lmem_readbw: 215.4760 GB/s result for lmem_writebw: 247.0810 GB/s result for tex_readbw: 45.5643 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 0.2131 GB/s result for bfs_pcie: 0.1975 GB/s result for bfs_teps: 12752000.0000 Edges/s Running benchmark FFT result for fft_sp: 56.3586 GFLOPS result for fft_sp_pcie: 5.0751 GFLOPS result for ifft_sp: 56.4569 GFLOPS result for ifft_sp_pcie: 5.3521 GFLOPS result for fft_dp: 9.2100 GFLOPS result for fft_dp_pcie: 2.1241 GFLOPS result for ifft_dp: 8.6632 GFLOPS result for ifft_dp_pcie: 2.2504 GFLOPS Running benchmark GEMM result for sgemm_n: 100.9860 GFlops result for sgemm_t: 93.6681 GFlops result for sgemm_n_pcie: 95.2578 GFlops result for sgemm_t_pcie: 88.7196 GFlops result for dgemm_n: 5.7714 GFlops result for dgemm_t: 5.8490 GFlops result for dgemm_n_pcie: 5.6417 GFlops result for dgemm_t_pcie: 5.7158 GFlops Running benchmark MD result for md_sp_flops: 6.0203 GFLOPS result for md_sp_bw: 4.6137 GB/s result for md_sp_flops_pcie: 2.7676 GFLOPS result for md_sp_bw_pcie: 2.1210 GB/s result for md_dp_flops: 4.6492 GFLOPS result for md_dp_bw: 6.2405 GB/s result for md_dp_flops_pcie: 2.4314 GFLOPS result for md_dp_bw_pcie: 3.2636 GB/s Running benchmark MD5Hash result for md5hash: 0.5145 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 16.3416 GB/s result for reduction_pcie: 6.2787 GB/s result for reduction_dp: 21.7324 GB/s result for reduction_dp_pcie: 6.9257 GB/s Running benchmark Scan result for scan: 6.5806 GB/s result for scan_pcie: 1.7787 GB/s result for scan_dp: 6.1591 GB/s result for scan_dp_pcie: 2.2345 GB/s Running benchmark Sort result for sort: 0.4447 GB/s result for sort_pcie: 0.4084 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 0.6716 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.1879 Gflop/s result for spmv_csr_scalar_dp: 0.5669 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.1322 Gflop/s result for spmv_csr_scalar_pad_sp: 0.7561 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.5145 Gflop/s result for spmv_csr_scalar_pad_dp: 0.6641 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.4186 Gflop/s result for spmv_csr_vector_sp: 2.9135 Gflop/s result for spmv_csr_vector_sp_pcie: 0.2397 Gflop/s result for spmv_csr_vector_dp: 2.2214 Gflop/s result for spmv_csr_vector_dp_pcie: 0.1599 Gflop/s result for spmv_csr_vector_pad_sp: 2.9742 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 1.0407 Gflop/s result for spmv_csr_vector_pad_dp: 2.3583 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.7621 Gflop/s result for spmv_ellpackr_sp: 2.6979 Gflop/s result for spmv_ellpackr_dp: 2.0867 Gflop/s Running benchmark Stencil2D result for stencil: 0.3184 GFLOPS result for stencil_dp: 0.1090 GFLOPS Running benchmark Triad result for triad_bw: 6.6658 GB/s Running benchmark S3D result for s3d: 7.2195 GFLOPS result for s3d_pcie: 6.7368 GFLOPS result for s3d_dp: 4.2222 GFLOPS result for s3d_dp_pcie: 4.1375 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: tegra-ubuntu Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'GP10B' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 22.1032 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 22.2968 GB/sec Running benchmark MaxFlops result for maxspflops: 654.8800 GFLOPS result for maxdpflops: 20.8066 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 28.1349 GB/s result for gmem_readbw_strided: 6.6103 GB/s result for gmem_writebw: 24.1361 GB/s result for gmem_writebw_strided: 2.9126 GB/s result for lmem_readbw: 292.7010 GB/s result for lmem_writebw: 330.6710 GB/s result for tex_readbw: 77.9776 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: BenchmarkError result for bfs_pcie: BenchmarkError result for bfs_teps: BenchmarkError Running benchmark FFT result for fft_sp: 90.8676 GFLOPS result for fft_sp_pcie: 34.8526 GFLOPS result for ifft_sp: 90.7311 GFLOPS result for ifft_sp_pcie: 35.0640 GFLOPS result for fft_dp: 11.9958 GFLOPS result for fft_dp_pcie: 8.6390 GFLOPS result for ifft_dp: 11.4102 GFLOPS result for ifft_dp_pcie: 8.3428 GFLOPS Running benchmark GEMM result for sgemm_n: 587.0820 GFlops result for sgemm_t: 587.0280 GFlops result for sgemm_n_pcie: 562.7050 GFlops result for sgemm_t_pcie: 562.6560 GFlops result for dgemm_n: 19.6912 GFlops result for dgemm_t: 19.7272 GFlops result for dgemm_n_pcie: 19.5951 GFlops result for dgemm_t_pcie: 19.6308 GFlops Running benchmark MD result for md_sp_flops: 11.0284 GFLOPS result for md_sp_bw: 8.4518 GB/s result for md_sp_flops_pcie: 9.8361 GFLOPS result for md_sp_bw_pcie: 7.5381 GB/s result for md_dp_flops: 5.7878 GFLOPS result for md_dp_bw: 7.7688 GB/s result for md_dp_flops_pcie: 5.4326 GFLOPS result for md_dp_bw_pcie: 7.2920 GB/s Running benchmark MD5Hash result for md5hash: 1.0156 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 41.8546 GB/s result for reduction_pcie: 13.4739 GB/s result for reduction_dp: 48.6725 GB/s result for reduction_dp_pcie: 14.2264 GB/s Running benchmark Scan result for scan: 13.7223 GB/s result for scan_pcie: 5.7664 GB/s result for scan_dp: 9.2918 GB/s result for scan_dp_pcie: 4.8140 GB/s Running benchmark Sort result for sort: 0.7021 GB/s result for sort_pcie: 0.6562 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 0.7609 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.6443 Gflop/s result for spmv_csr_scalar_dp: 0.5973 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.4992 Gflop/s result for spmv_csr_scalar_pad_sp: 0.8855 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.7205 Gflop/s result for spmv_csr_scalar_pad_dp: 0.7182 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.5689 Gflop/s result for spmv_csr_vector_sp: 2.9958 Gflop/s result for spmv_csr_vector_sp_pcie: 1.7493 Gflop/s result for spmv_csr_vector_dp: 2.4578 Gflop/s result for spmv_csr_vector_dp_pcie: 1.3589 Gflop/s result for spmv_csr_vector_pad_sp: 3.1523 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 1.7347 Gflop/s result for spmv_csr_vector_pad_dp: 2.5794 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 1.3297 Gflop/s result for spmv_ellpackr_sp: 3.7574 Gflop/s result for spmv_ellpackr_dp: 2.5914 Gflop/s Running benchmark Stencil2D result for stencil: BenchmarkError result for stencil_dp: BenchmarkError Running benchmark Triad result for triad_bw: 12.9691 GB/s Running benchmark S3D result for s3d: 11.9130 GFLOPS result for s3d_pcie: 11.7802 GFLOPS result for s3d_dp: 7.2267 GFLOPS result for s3d_dp_pcie: 7.1265 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: tegra-ubuntu Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 'Portable Computing Language': 2 Device 0: pthread-cortex-a57 Device 1: NVIDIA Tegra X2 Specified 1 device IDs: 1 Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 28.8800 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 58.9883 GB/sec Running benchmark MaxFlops result for maxspflops: 658.6600 GFLOPS result for maxdpflops: 20.7915 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 40.4261 GB/s result for gmem_readbw_strided: 6.7821 GB/s result for gmem_writebw: 28.9945 GB/s result for gmem_writebw_strided: 2.8652 GB/s result for lmem_readbw: 290.8710 GB/s result for lmem_writebw: 330.3380 GB/s result for tex_readbw: NoResult Running benchmark KernelCompile result for ocl_kernel: 0.0867 sec Running benchmark QueueDelay result for ocl_queue: 0.0000 ms Running benchmark BFS result for bfs: 0.4735 GB/s result for bfs_pcie: 0.4647 GB/s result for bfs_teps: 17643400.0000 Edges/s Running benchmark FFT result for fft_sp: 35.2717 GFLOPS result for fft_sp_pcie: 21.8458 GFLOPS result for ifft_sp: 34.3922 GFLOPS result for ifft_sp_pcie: 21.5052 GFLOPS result for fft_dp: 4.7247 GFLOPS result for fft_dp_pcie: 4.0596 GFLOPS result for ifft_dp: 4.6318 GFLOPS result for ifft_dp_pcie: 3.9907 GFLOPS Running benchmark GEMM result for sgemm_n: 46.1983 GFLOPS result for sgemm_t: 107.5090 GFLOPS result for sgemm_n_pcie: 45.9928 GFLOPS result for sgemm_t_pcie: 106.4030 GFLOPS result for dgemm_n: 17.4032 GFLOPS result for dgemm_t: 17.3438 GFLOPS result for dgemm_n_pcie: 17.2843 GFLOPS result for dgemm_t_pcie: 17.2269 GFLOPS Running benchmark MD result for md_sp_flops: 6.2788 GFLOPS result for md_sp_bw: 4.8119 GB/s result for md_sp_flops_pcie: 5.9031 GFLOPS result for md_sp_bw_pcie: 4.5240 GB/s result for md_dp_flops: 5.2738 GFLOPS result for md_dp_bw: 7.0788 GB/s result for md_dp_flops_pcie: 4.9928 GFLOPS result for md_dp_bw_pcie: 6.7016 GB/s Running benchmark MD5Hash result for md5hash: 1.0240 GHash/s Skipping non-opencl benchmark NeuralNet Running benchmark Reduction result for reduction: 38.3667 GB/s result for reduction_pcie: 13.3537 GB/s result for reduction_dp: 48.3866 GB/s result for reduction_dp_pcie: 14.5069 GB/s Running benchmark Scan result for scan: 11.0291 GB/s result for scan_pcie: 5.4109 GB/s result for scan_dp: 8.8093 GB/s result for scan_dp_pcie: 4.8355 GB/s Running benchmark Sort result for sort: 0.1421 GB/s result for sort_pcie: 0.1402 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 0.9040 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.3921 Gflop/s result for spmv_csr_scalar_dp: 0.6744 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.2734 Gflop/s result for spmv_csr_scalar_pad_sp: 1.0416 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 0.4221 Gflop/s result for spmv_csr_scalar_pad_dp: 0.7421 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 0.2846 Gflop/s result for spmv_csr_vector_sp: 3.0800 Gflop/s result for spmv_csr_vector_sp_pcie: 0.5654 Gflop/s result for spmv_csr_vector_dp: 2.3304 Gflop/s result for spmv_csr_vector_dp_pcie: 0.3840 Gflop/s result for spmv_csr_vector_pad_sp: 3.2543 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 0.5827 Gflop/s result for spmv_csr_vector_pad_dp: 2.4832 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 0.3893 Gflop/s result for spmv_ellpackr_sp: 4.2795 Gflop/s result for spmv_ellpackr_dp: 2.6662 Gflop/s Running benchmark Stencil2D result for stencil: 25.7711 GFLOPS result for stencil_dp: 7.0058 GFLOPS Running benchmark Triad result for triad_bw: 13.9280 GB/s Running benchmark S3D result for s3d: 10.8527 GFLOPS result for s3d_pcie: 9.9890 GFLOPS result for s3d_dp: 5.1976 GFLOPS result for s3d_dp_pcie: 4.7372 GFLOPS --- Welcome To The SHOC Benchmark Suite version 1.1.5 --- Hostname: jetson-0423018055044 Platform selection not specified, default to platform #0 Number of available platforms: 1 Number of available devices on platform 0 : 1 Device 0: 'Xavier' Device selection not specified: defaulting to device #0. Using size class: 4 --- Starting Benchmarks --- Running benchmark BusSpeedDownload result for bspeed_download: 25.3878 GB/sec Running benchmark BusSpeedReadback result for bspeed_readback: 25.4305 GB/sec Running benchmark MaxFlops result for maxspflops: 921.0190 GFLOPS result for maxdpflops: 28.8697 GFLOPS Running benchmark DeviceMemory result for gmem_readbw: 84.8254 GB/s result for gmem_readbw_strided: 27.9703 GB/s result for gmem_writebw: 80.8378 GB/s result for gmem_writebw_strided: 6.3197 GB/s result for lmem_readbw: 705.8740 GB/s result for lmem_writebw: 792.5250 GB/s result for tex_readbw: 299.6360 GB/sec Skipping non-cuda benchmark KernelCompile Skipping non-cuda benchmark QueueDelay Running benchmark BFS result for bfs: 0.7755 GB/s result for bfs_pcie: 0.7267 GB/s result for bfs_teps: 43635600.0000 Edges/s Running benchmark FFT result for fft_sp: 155.2300 GFLOPS result for fft_sp_pcie: 17.8633 GFLOPS result for ifft_sp: 155.2330 GFLOPS result for ifft_sp_pcie: 17.8708 GFLOPS result for fft_dp: 14.9378 GFLOPS result for fft_dp_pcie: 5.6325 GFLOPS result for ifft_dp: 14.1692 GFLOPS result for ifft_dp_pcie: 5.7866 GFLOPS Running benchmark GEMM result for sgemm_n: 917.6240 GFlops result for sgemm_t: 919.9560 GFlops result for sgemm_n_pcie: 734.2130 GFlops result for sgemm_t_pcie: 735.7060 GFlops result for dgemm_n: 28.9232 GFlops result for dgemm_t: 28.9233 GFlops result for dgemm_n_pcie: 27.8288 GFlops result for dgemm_t_pcie: 27.8288 GFlops Running benchmark MD result for md_sp_flops: 29.0007 GFLOPS result for md_sp_bw: 22.2252 GB/s result for md_sp_flops_pcie: 9.7580 GFLOPS result for md_sp_bw_pcie: 7.4782 GB/s result for md_dp_flops: 12.8433 GFLOPS result for md_dp_bw: 17.2391 GB/s result for md_dp_flops_pcie: 6.7017 GFLOPS result for md_dp_bw_pcie: 8.9954 GB/s Running benchmark MD5Hash result for md5hash: 2.0281 GHash/s Running benchmark NeuralNet result for nn_learning: BenchmarkError result for nn_learning_pcie: BenchmarkError Running benchmark Reduction result for reduction: 81.2065 GB/s result for reduction_pcie: 18.0615 GB/s result for reduction_dp: 79.1035 GB/s result for reduction_dp_pcie: 16.7297 GB/s Running benchmark Scan result for scan: 27.6401 GB/s result for scan_pcie: 6.4075 GB/s result for scan_dp: 18.0155 GB/s result for scan_dp_pcie: 5.6062 GB/s Running benchmark Sort result for sort: 1.5446 GB/s result for sort_pcie: 1.3563 GB/s Running benchmark Spmv result for spmv_csr_scalar_sp: 5.6995 Gflop/s result for spmv_csr_scalar_sp_pcie: 0.6537 Gflop/s result for spmv_csr_scalar_dp: 2.5858 Gflop/s result for spmv_csr_scalar_dp_pcie: 0.3971 Gflop/s result for spmv_csr_scalar_pad_sp: 6.7761 Gflop/s result for spmv_csr_scalar_pad_sp_pcie: 2.6221 Gflop/s result for spmv_csr_scalar_pad_dp: 2.8235 Gflop/s result for spmv_csr_scalar_pad_dp_pcie: 1.5030 Gflop/s result for spmv_csr_vector_sp: 16.2701 Gflop/s result for spmv_csr_vector_sp_pcie: 0.7063 Gflop/s result for spmv_csr_vector_dp: 5.9269 Gflop/s result for spmv_csr_vector_dp_pcie: 0.4348 Gflop/s result for spmv_csr_vector_pad_sp: 17.0852 Gflop/s result for spmv_csr_vector_pad_sp_pcie: 3.4203 Gflop/s result for spmv_csr_vector_pad_dp: 6.6241 Gflop/s result for spmv_csr_vector_pad_dp_pcie: 2.1649 Gflop/s result for spmv_ellpackr_sp: 11.2317 Gflop/s result for spmv_ellpackr_dp: 5.9354 Gflop/s Running benchmark Stencil2D result for stencil: 67.8865 GFLOPS result for stencil_dp: 14.7572 GFLOPS Running benchmark Triad result for triad_bw: 5.0897 GB/s Running benchmark S3D result for s3d: 18.6842 GFLOPS result for s3d_pcie: 17.3286 GFLOPS result for s3d_dp: 10.2904 GFLOPS result for s3d_dp_pcie: 10.1299 GFLOPS