0 HPCG简介
HPCG(High Performance Conjugate Gradients)基准测试是一个高性能计算性能评估工具,它主要用于衡量超级计算机在稀疏矩阵、内存访问密集型任务下的真实性能,比传统的 HPL(LINPACK)更贴近很多科学与工程计算场景
HPL(LINPACK) 侧重浮点运算能力(FLOPS),适合反映处理器的峰值计算能力,但偏向计算密集型任务
HPCG 重点考察:
- 稀疏矩阵存取
- 内存带宽
- 缓存效率
- 通信延迟
因此,HPCG 的成绩通常是 HPL 的 0.3%~4% 左右,更接近真实 HPC 应用性能。
HPCG 实现了预条件共轭梯度法 (Preconditioned Conjugate Gradient) 求解三维泊松方程的迭代过程,包含:
- 稀疏矩阵-向量乘法 (SpMV)
- 向量更新 (AXPY)
- 点积 (Dot Product)
- 全局通信(MPI Allreduce)
- 多重网格预条件器
1 标准源代码执行
- 安装依赖库、clone代码、拷贝编译配置文件
# dnf install -y gcc gcc-c++ make cmake openmpi openmpi-devel
# git clone https://github.com/hpcg-benchmark/hpcg.git
# cd setup
# cp Make.Linux_MPI Make.kunpeng
- 修改编译配置文件
# vim setup_make.kunpeng
#HEADER
# -- High Performance Conjugate Gradient Benchmark (HPCG)
# HPCG - 3.1 - March 28, 2019
# Michael A. Heroux
# Scalable Algorithms Group, Computing Research Division
# Sandia National Laboratories, Albuquerque, NM
#
# Piotr Luszczek
# Jack Dongarra
# University of Tennessee, Knoxville
# Innovative Computing Laboratory
#
# (C) Copyright 2013-2019 All Rights Reserved
#
#
# -- Copyright notice and Licensing terms:
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions
# are met:
#
# 1. Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions, and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# 3. All advertising materials mentioning features or use of this
# software must display the following acknowledgement:
# This product includes software developed at Sandia National
# Laboratories, Albuquerque, NM and the University of
# Tennessee, Knoxville, Innovative Computing Laboratory.
#
# 4. The name of the University, the name of the Laboratory, or the
# names of its contributors may not be used to endorse or promote
# products derived from this software without specific written
# permission.
#
# -- Disclaimer:
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
# ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
# OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
# DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#@HEADER
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s -f
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - HPCG Directory Structure / HPCG library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = .
SRCdir = $(TOPdir)/src
INCdir = $(TOPdir)/src
BINdir = $(TOPdir)/bin
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /usr/lib64/openmpi
MPinc = -I$(MPdir)/include
MPlib = -L$(MPdir)/lib -lmpi
#
#
# ----------------------------------------------------------------------
# - HPCG includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPCG_INCLUDES = -I$(INCdir) -I$(INCdir)/$(arch) $(MPinc)
HPCG_LIBS =
#
# - Compile time options -----------------------------------------------
#
# -DHPCG_NO_MPI Define to disable MPI
# -DHPCG_NO_OPENMP Define to disable OPENMP
# -DHPCG_CONTIGUOUS_ARRAYS Define to have sparse matrix arrays long and contiguous
# -DHPCG_DEBUG Define to enable debugging output
# -DHPCG_DETAILED_DEBUG Define to enable very detailed debugging output
#
# By default HPCG will:
# *) Build with MPI enabled.
# *) Build with OpenMP enabled.
# *) Not generate debugging output.
#
HPCG_OPTS = -DHPCG_NO_OPENMP
#
# ----------------------------------------------------------------------
#
HPCG_DEFS = $(HPCG_OPTS) $(HPCG_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CXX = mpicxx
#CXXFLAGS = $(HPCG_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
CXXFLAGS = -O3 -march=armv8-a
#
LINKER = $(CXX)
LINKFLAGS = $(CXXFLAGS)
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
USE_CUDA = 0
#
# ----------------------------------------------------------------------
#
注意这里禁用了CUDA
- 编译
cd ..
make arch=kunpeng
cd bin
- 准备执行配置文件
默认的配置文件: hpcg.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
104 104 104
60
我们的配置文件: hpcg.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300
注意上面最后一行表示执行时间,建议是1800s起,300s以下可能会不准。
- 执行:
# mpirun --allow-run-as-root --mca pml ob1 -np 64 ./xhpcg
]# tail HPCG-Benchmark_3.1_2025-07-08_10-21-09.txt
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=2.58093
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=16.1137
Final Summary::HPCG 2.4 rating for historical reasons is=16.1678
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=310.259
Final Summary::Official results execution time (sec) must be at least=1800
# cat hpcg20250708T100940.txt
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [0] Number of Iterations [11] Scaled Residual [1.12102e-13]
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [1] Number of Iterations [11] Scaled Residual [1.12102e-13]
Call [0] Number of Iterations [2] Scaled Residual [2.79999e-17]
Call [1] Number of Iterations [2] Scaled Residual [2.79999e-17]
Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 7.31869e-10
Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 5.92074e-11
SpMV call [0] Residual [0]
SpMV call [1] Residual [0]
Call [0] Scaled Residual [0.00454823]
Call [1] Scaled Residual [0.00454823]
重要参数如下:
此机的HPCG结果为:16.1137 GFLOP/s
allow-run-as-root:这个参数明确告诉 mpirun 允许以 root 用户身份运行程序。
默认情况下,许多 MPI 实现(尤其是 Open MPI)会出于安全考虑,阻止 root 用户直接运行并行应用程序,因为这可能带来安全风险,尤其是在共享集群环境中。如果您的程序需要 root 权限才能运行(或者您正在以 root 身份直接执行 mpirun 命令),但 MPI 库又默认禁止 root 运行,就会出现权限相关的错误。添加此参数可以绕过此安全检查。
在非生产环境或测试中,如果确实需要 root 权限,可以使用此参数。但在生产环境或多用户共享集群中,不推荐以 root 身份运行计算任务,通常应该使用普通用户账户。
mca pml ob1: 这个参数是 MPI Component Architecture (MCA) 的一个选项,用于选择 点对点通信层 (PML – Point-to-Point Messaging Layer) 的具体实现。ob1 是 Open MPI 中一个常用的 PML 组件。https://wxa.wxs.qq.com/tmpl/mp/base_tmpl.html
Open MPI 具有高度模块化的架构,允许用户为不同的功能(如点对点通信、集体通信、进程管理等)选择不同的组件。ob1 是 Open MPI 中默认的、也是最常用的 PML 组件之一。它通常使用各种网络接口(如 InfiniBand、Ethernet 等)进行通信。
显式指定 ob1 通常是为了确保使用特定的通信机制,或者解决与默认 PML 相关的兼容性/性能问题。在大多数情况下,如果您不指定,mpirun 也会默认使用 ob1,但显式指定可以确保行为一致性。
np 64: 这个参数指定了要启动的 MPI 进程(或称为“秩”或“rank”)的数量。
MPI 程序通过在多个进程之间分配任务来实现并行。每个进程都有一个唯一的 ID (rank),从 0 到 np-1。
64 表示您希望 xhpcg 程序以 64 个并行进程运行。这些进程可以分布在多个计算节点上,也可以全部运行在单个节点上,这取决于您的 hosts 文件配置和 mpirun 的其他资源调度参数。
参考资料
- 软件测试精品书籍文档下载持续更新 https://github.com/china-testing/python-testing-examples 请点赞,谢谢!
- 本文涉及的python测试开发库 谢谢点赞! https://github.com/china-testing/python_cn_resouce
- python精品书籍下载 https://github.com/china-testing/python_cn_resouce/blob/main/python_good_books.md
- Linux精品书籍下载 https://www.cnblogs.com/testing-/p/17438558.html
- python八字排盘 https://github.com/china-testing/bazi
- 联系方式:钉ding或V信: pythontesting
- https://mirrors.huaweicloud.com/kunpeng/archive/HPC/benchmark/
- https://developer.nvidia.com/nvidia-hpc-benchmarks-downloads?target_os=Linux&target_arch=x86_64
- https://github.com/davidrohr/hpl-gpu
- https://catalog.ngc.nvidia.com/orgs/nvidia/containers/hpc-benchmarks
- https://github.com/NVIDIA/nvidia-hpcg
- https://www.amd.com/en/developer/zen-software-studio/applications/pre-built-applications/zen-hpl.html
- https://www.netlib.org/benchmark/hpl/
2 Phoronix Test Suite执行
安装Phoronix Test Suite参见:https://www.cnblogs.com/testing-/p/18303322
- 安装 hpcg
# phoronix-test-suite install hpcg
# cd /var/lib/phoronix-test-suite/test-profiles/pts/hpcg-1.3.0
- 修改配置文件
test-definition.xml的内容:
<?xml version="1.0"?>
<!--Phoronix Test Suite v10.8.4-->
<PhoronixTestSuite>
<TestInformation>
<Title>High Performance Conjugate Gradient</Title>
<AppVersion>3.1</AppVersion>
<Description>HPCG is the High Performance Conjugate Gradient and is a new scientific benchmark from Sandia National Lans focused for super-computer testing with modern real-world workloads compared to HPCC.</Description>
<ResultScale>GFLOP/s</ResultScale>
<Proportion>HIB</Proportion>
<TimesToRun>1</TimesToRun>
</TestInformation>
<TestProfile>
<Version>1.3.0</Version>
<SupportedPlatforms>Linux</SupportedPlatforms>
<SoftwareType>Benchmark</SoftwareType>
<TestType>Processor</TestType>
<License>Free</License>
<Status>Verified</Status>
<ExternalDependencies>build-utilities, fortran-compiler, openmpi-development</ExternalDependencies>
<EnvironmentSize>2.4</EnvironmentSize>
<ProjectURL>http://www.hpcg-benchmark.org/</ProjectURL>
<RepositoryURL>https://github.com/hpcg-benchmark/hpcg</RepositoryURL>
<InternalTags>SMP, MPI</InternalTags>
<Maintainer>Michael Larabel</Maintainer>
</TestProfile>
<TestSettings>
<Option>
<DisplayName>X Y Z</DisplayName>
<Identifier>xyz</Identifier>
<Menu>
<Entry>
<Name>104 104 104</Name>
<Value>--nx=104 --ny=104 --nz=104</Value>
</Entry>
<Entry>
<Name>144 144 144</Name>
<Value>--nx=144 --ny=144 --nz=144</Value>
</Entry>
<Entry>
<Name>160 160 160</Name>
<Value>--nx=160 --ny=160 --nz=160</Value>
</Entry>
<Entry>
<Name>192 192 192</Name>
<Value>--nx=192 --ny=192 --nz=192</Value>
</Entry>
</Menu>
</Option>
<Option>
<DisplayName>RT</DisplayName>
<Identifier>time</Identifier>
<ArgumentPrefix>--rt=</ArgumentPrefix>
<Menu>
<Entry>
<Name>300</Name>
<Value>300</Value>
<Message>Shorter run-time</Message>
</Entry>
<Entry>
<Name>1800</Name>
<Value>1800</Value>
<Message>Official run-time</Message>
</Entry>
</Menu>
</Option>
</TestSettings>
</PhoronixTestSuite>
- 执行测试
# phoronix-test-suite benchmark hpcg
Evaluating External Test Dependencies .......................................................................................................................
Phoronix Test Suite v10.8.4
Installed: pts/hpcg-1.3.0
High Performance Conjugate Gradient 3.1:
pts/hpcg-1.3.0
Processor Test Configuration
1: 104 104 104
2: 144 144 144
3: 160 160 160
4: 192 192 192
5: Test All Options
** Multiple items can be selected, delimit by a comma. **
X Y Z: 1
1: 300 [Shorter run-time]
2: 1800 [Official run-time]
3: Test All Options
** Multiple items can be selected, delimit by a comma. **
RT: 1
System Information
PROCESSOR: ARMv8 @ 2.90GHz
Core Count: 128
Cache Size: 224 MB
Scaling Driver: cppc_cpufreq performance
GRAPHICS: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support]
Screen: 1024x768
MOTHERBOARD: WUZHOU BC83AMDAA01-7270Z
BIOS Version: 11.62
Chipset: Huawei HiSilicon
Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892
MEMORY: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET
DISK: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N
File-System: xfs
Mount Options: attr2 inode64 noquota relatime rw
Disk Scheduler: MQ-DEADLINE
Disk Details: Block Size: 4096
OPERATING SYSTEM: Kylin Linux Advanced Server V10
Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314
Display Server: X Server 1.20.8
Compiler: GCC 7.3.0 + CUDA 12.8
Security: itlb_multihit: Not affected
+ l1tf: Not affected
+ mds: Not affected
+ meltdown: Not affected
+ mmio_stale_data: Not affected
+ spec_store_bypass: Mitigation of SSB disabled via prctl
+ spectre_v1: Mitigation of __user pointer sanitization
+ spectre_v2: Not affected
+ srbds: Not affected
+ tsx_async_abort: Not affected
Would you like to save these test results (Y/n): y
Enter a name for the result file: hpcg_45_31
Enter a unique name to describe this test run / configuration:
If desired, enter a new description below to better describe this result set / system configuration under test.
Press ENTER to proceed without changes.
Current Description: ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.
New Description:
High Performance Conjugate Gradient 3.1:
pts/hpcg-1.3.0 [X Y Z: 104 104 104 - RT: 300]
Test 1 of 1
Estimated Trial Run Count: 1
Estimated Time To Completion: 38 Minutes [03:16 CDT]
Started Run 1 @ 02:39:00
X Y Z: 104 104 104 - RT: 300:
69.8633
Average: 69.8633 GFLOP/s
Do you want to view the text results of the testing (Y/n): Y
hpcg_45_31
ARMv8 testing with a WUZHOU BC83AMDAA01-7270Z (11.62 BIOS) and Huawei Hi171x [iBMC Intelligent Management chip w/VGA support] on Kylin Linux Advanced Server V10 via the Phoronix Test Suite.
ARMv8:
Processor: ARMv8 @ 2.90GHz (128 Cores), Motherboard: WUZHOU BC83AMDAA01-7270Z (11.62 BIOS), Chipset: Huawei HiSilicon, Memory: 16 x 32 GB 4800MT/s Samsung M321R4GA3BB6-CQKET, Disk: 2 x 480GB HWE62ST3480L003N + 3 x 1920GB HWE62ST31T9L005N, Graphics: Huawei Hi171x [iBMC Intelligent Management chip w/VGA support], Network: 6 x Huawei HNS GE/10GE/25GE/50GE + 2 x Mellanox MT2892
OS: Kylin Linux Advanced Server V10, Kernel: 4.19.90-52.22.v2207.ky10.aarch64 (aarch64) 20230314, Display Server: X Server 1.20.8, Compiler: GCC 7.3.0 + CUDA 12.8, File-System: xfs, Screen Resolution: 1024x768
High Performance Conjugate Gradient 3.1
X Y Z: 104 104 104 - RT: 300
GFLOP/s > Higher Is Better
ARMv8 . 69.86 |==================================================================================================================================================
Would you like to upload the results to OpenBenchmarking.org (y/n): y
Would you like to attach the system logs (lspci, dmesg, lsusb, etc) to the test result (y/n): y
Results Uploaded To: https://openbenchmarking.org/result/2507083-NE-HPCG4531360
可用浏览器查看测试结果:
3 NVIDIA HPC Benchmarks
# docker pull nvcr.io/nvidia/hpc-benchmarks:25.04
# vi HPCG.dat
HPCG benchmark input file
Sandia National Laboratories; University of Tennessee, Knoxville
128 128 128
300
# docker run --rm --gpus all --ipc=host --ulimit memlock=-1:-1 \
-v $(pwd):/host_data \
nvcr.io/nvidia/hpc-benchmarks:25.04 \
mpirun -np 1 \
/workspace/hpcg.sh \
--dat /host_data/HPCG.dat \
--cpu-affinity 0 \
--gpu-affinity 0
=========================================================
================= NVIDIA HPC Benchmarks =================
=========================================================
NVIDIA Release 25.04
Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: No InfiniBand devices detected.
Multi-node communication performance may be reduced.
Ensure /dev/infiniband is mounted to this container.
HPCG-NVIDIA 25.4.0 -- NVIDIA accelerated HPCG benchmark -- NVIDIA
Build v0.5.6
Start of application (GPU-Only) ...
Initial Residual = 2838.81
Iteration = 1 Scaled Residual = 0.185703
Iteration = 2 Scaled Residual = 0.101681
...
Iteration = 50 Scaled Residual = 3.94531e-07
GPU Rank Info:
| cuSPARSE version 12.5
| Reference CPU memory = 935.79 MB
| GPU Name: 'NVIDIA GeForce RTX 4090'
| GPU Memory Use: 2223 MB / 24082 MB
| Process Grid: 1x1x1
| Local Domain: 128x128x128
| Number of CPU Threads: 1
| Slice Size: 2048
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [0] Number of Iterations [11] Scaled Residual [1.19242e-14]
WARNING: PERFORMING UNPRECONDITIONED ITERATIONS
Call [1] Number of Iterations [11] Scaled Residual [1.19242e-14]
Call [0] Number of Iterations [1] Scaled Residual [2.94233e-16]
Call [1] Number of Iterations [1] Scaled Residual [2.94233e-16]
Departure from symmetry (scaled) for SpMV abs(x'*A*y - y'*A*x) = 8.42084e-10
Departure from symmetry (scaled) for MG abs(x'*Minv*y - y'*Minv*x) = 4.21042e-10
SpMV call [0] Residual [0]
SpMV call [1] Residual [0]
Initial Residual = 2838.81
Iteration = 1 Scaled Residual = 0.220178
Iteration = 2 Scaled Residual = 0.118926
...
Iteration = 49 Scaled Residual = 4.98548e-07
Iteration = 50 Scaled Residual = 3.08635e-07
Call [0] Scaled Residual [3.08635e-07]
Call [1] Scaled Residual [3.08635e-07]
Call [2] Scaled Residual [3.08635e-07]
...
Call [1501] Scaled Residual [3.08635e-07]
Call [1502] Scaled Residual [3.08635e-07]
HPCG-Benchmark
version=3.1
Release date=March 28, 2019
Machine Summary=
Machine Summary::Distributed Processes=1
Machine Summary::Threads per processes=1
Global Problem Dimensions=
Global Problem Dimensions::Global nx=128
Global Problem Dimensions::Global ny=128
Global Problem Dimensions::Global nz=128
Processor Dimensions=
Processor Dimensions::npx=1
Processor Dimensions::npy=1
Processor Dimensions::npz=1
Local Domain Dimensions=
Local Domain Dimensions::nx=128
Local Domain Dimensions::ny=128
########## Problem Summary ##########=
Setup Information=
Setup Information::Setup Time=0.00910214
Linear System Information=
Linear System Information::Number of Equations=2097152
Linear System Information::Number of Nonzero Terms=55742968
Multigrid Information=
Multigrid Information::Number of coarse grid levels=3
Multigrid Information::Coarse Grids=
Multigrid Information::Coarse Grids::Grid Level=1
Multigrid Information::Coarse Grids::Number of Equations=262144
Multigrid Information::Coarse Grids::Number of Nonzero Terms=6859000
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=2
Multigrid Information::Coarse Grids::Number of Equations=32768
Multigrid Information::Coarse Grids::Number of Nonzero Terms=830584
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
Multigrid Information::Coarse Grids::Grid Level=3
Multigrid Information::Coarse Grids::Number of Equations=4096
Multigrid Information::Coarse Grids::Number of Nonzero Terms=97336
Multigrid Information::Coarse Grids::Number of Presmoother Steps=1
Multigrid Information::Coarse Grids::Number of Postsmoother Steps=1
########## Memory Use Summary ##########=
Memory Use Information=
Memory Use Information::Total memory used for data (Gbytes)=1.49883
Memory Use Information::Memory used for OptimizeProblem data (Gbytes)=0
Memory Use Information::Bytes per equation (Total memory / Number of Equations)=714.697
Memory Use Information::Memory used for linear system and CG (Gbytes)=1.31912
Memory Use Information::Coarse Grids=
Memory Use Information::Coarse Grids::Grid Level=1
Memory Use Information::Coarse Grids::Memory used=0.15755
Memory Use Information::Coarse Grids::Grid Level=2
Memory Use Information::Coarse Grids::Memory used=0.0196946
Memory Use Information::Coarse Grids::Grid Level=3
Memory Use Information::Coarse Grids::Memory used=0.00246271
########## V&V Testing Summary ##########=
Spectral Convergence Tests=
Spectral Convergence Tests::Result=PASSED
Spectral Convergence Tests::Unpreconditioned=
Spectral Convergence Tests::Unpreconditioned::Maximum iteration count=11
Spectral Convergence Tests::Unpreconditioned::Expected iteration count=12
Spectral Convergence Tests::Preconditioned=
Spectral Convergence Tests::Preconditioned::Maximum iteration count=1
Spectral Convergence Tests::Preconditioned::Expected iteration count=2
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon=
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Result=PASSED
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for SpMV=8.42084e-10
Departure from Symmetry |x'Ay-y'Ax|/(2*||x||*||A||*||y||)/epsilon::Departure for MG=4.21042e-10
########## Iterations Summary ##########=
Iteration Count Information=
Iteration Count Information::Result=PASSED
Iteration Count Information::Reference CG iterations per set=50
Iteration Count Information::Optimized CG iterations per set=50
Iteration Count Information::Total number of reference iterations=75150
Iteration Count Information::Total number of optimized iterations=75150
########## Reproducibility Summary ##########=
Reproducibility Information=
Reproducibility Information::Result=PASSED
Reproducibility Information::Scaled residual mean=3.08635e-07
Reproducibility Information::Scaled residual variance=0
########## Performance Summary (times in sec) ##########=
Benchmark Time Summary=
Benchmark Time Summary::Optimization phase=0.017375
Benchmark Time Summary::DDOT=6.03317
Benchmark Time Summary::WAXPBY=6.80771
Benchmark Time Summary::SpMV=58.5598
Benchmark Time Summary::MG=227.166
Benchmark Time Summary::Total=298.585
Floating Point Operations Summary=
Floating Point Operations Summary::Raw DDOT=9.5191e+11
Floating Point Operations Summary::Raw WAXPBY=9.5191e+11
Floating Point Operations Summary::Raw SpMV=8.54573e+12
Floating Point Operations Summary::Raw MG=4.76988e+13
Floating Point Operations Summary::Total=5.81484e+13
Floating Point Operations Summary::Total with convergence overhead=5.81484e+13
GB/s Summary=
GB/s Summary::Raw Read B/W=1200
GB/s Summary::Raw Write B/W=277.327
GB/s Summary::Raw Total B/W=1477.32
GB/s Summary::Total with convergence and optimization phase overhead=1457.89
GFLOP/s Summary=
GFLOP/s Summary::Raw DDOT=157.779
GFLOP/s Summary::Raw WAXPBY=139.828
GFLOP/s Summary::Raw SpMV=145.932
GFLOP/s Summary::Raw MG=209.974
GFLOP/s Summary::Raw Total=194.747
GFLOP/s Summary::Total with convergence overhead=194.747
GFLOP/s Summary::Total with convergence and optimization phase overhead=192.185
User Optimization Overheads=
User Optimization Overheads::Optimization phase time (sec)=0.017375
User Optimization Overheads::Optimization phase time vs reference SpMV+MG time=0.0396317
DDOT Timing Variations=
DDOT Timing Variations::Min DDOT MPI_Allreduce time=0.220609
DDOT Timing Variations::Max DDOT MPI_Allreduce time=0.220609
DDOT Timing Variations::Avg DDOT MPI_Allreduce time=0.220609
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=192.185
Final Summary::HPCG 2.4 rating for historical reasons is=193.058
Final Summary::Results are valid but execution time (sec) is=298.585
Final Summary::Official results execution time (sec) must be at least=1800
声明:来自内容团队,仅代表创作者观点。链接:https://eyangzhen.com/2094.html