一、参数解释

合适的HPL.dat参数设置才能够正常运行以及达到较好的性能。

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
80000       Ns
1            # of NBs
1024         NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1           # of process grids (P x Q)
1           Ps
1           Qs
16.0         threshold
1            # of panel fact
1        PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4          NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1        RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
0            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
2            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)

1、第1、2行为注释说明行,不需要作修改

2、第3行说明如果输出文件的话,文件的名字

3、第4行说明输出结果文件的形式,为“6”时,测试结果输出至标准输出(stdout),为“7”时,测试结果输出至标准错误输出(stderr),为其它值时,测试结果输出至第3行所指定的文件中

4、第5行说明求解问题(矩阵)的个数,也就是第6行要设置的参数的个数

5、第6行要设置矩阵的阶,参数值要与第5行的数值相等。网上大多数都说N的值为N×N×8=系统总内存×80%最优

6、第7行说明求解问题(矩阵)时采用的分块方式的种数,也就是第8行要设置的参数的个数

7、第8行说明每一种分块的大小。为提高数据的局部性,从而提高整体性能,HPL采用分块矩阵的算法。NB值的选择主要是通过实际测试得到最优值。

8、第9行是选择处理器阵列是按列的排列方式还是按行的排列方式。

9、第10-12行说明二维处理器网格(P×Q)。二维处理器网格(P×Q)的要遵循以下几个要求:P×Q=进程数。这是HPL的硬性规定。

10、其他值采取默认即可。

二、单个节点上执行

命令:

./xhpl

结果:

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   80000 
NB     :    1024 
PMAP   : Row-major process mapping
P      :       1 
Q      :       1 
PFACT  :   Crout 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :   1ring 
DEPTH  :       2 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR20C2C4       80000  1024     1     1             729.77             4.6774e+02
HPL_pdgesv() start time Fri Jul 17 09:24:43 2020

HPL_pdgesv() end time   Fri Jul 17 09:36:53 2020

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.16389188e-03 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

三、多个节点执行

命令:

第一种方式:
mpirun -np N xhpl  N为进程数


第二种方式:
mpirun -p4pg <p4file> xhpl  需要自己编写配置文件,p4file指定每个进程在哪个节点运行

 

GitHub 加速计划 / li / linux-dash
6
1
下载
A beautiful web dashboard for Linux
最近提交(Master分支:3 个月前 )
186a802e added ecosystem file for PM2 4 年前
5def40a3 Add host customization support for the NodeJS version 4 年前
Logo

旨在为数千万中国开发者提供一个无缝且高效的云端环境,以支持学习、使用和贡献开源项目。

更多推荐