stream

STREAM

STREAM软件是内存带宽性能测试的基准工具,也是衡量服务器内存性能指标的通用工具。STREAM软件具有良好的空间局部性,是对转换检测缓冲区TLB(Translation Lookaside Buffer)友好、缓存友好的一款软件。STREAM软件支持复制(Copy)、尺度变换(Scale)、矢量求和(Add)、复合矢量求和(Triad)四种运算方式测试内存带宽的性能。

1. STREAM作业运行参考

1.执行命令&结果:

[root@login1 STREAM-master]# mpirun -np 4 stream
stream.c        stream.f        stream_mpi.c    stream_mpi.exe
[root@login1 STREAM-master]# mpirun -np 4 stream_mpi.exe
-------------------------------------------------------------
STREAM version $Revision: 1.8 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Total Aggregate Array size = 80000000 (elements)
Total Aggregate Memory per array = 610.4 MiB (= 0.6 GiB).
Total Aggregate memory required = 1831.1 MiB (= 1.8 GiB).
Data is distributed across 4 MPI ranks
   Array size per MPI rank = 20000000 (elements)
   Memory per array per MPI rank = 152.6 MiB (= 0.1 GiB).
   Total memory per MPI rank = 457.8 MiB (= 0.4 GiB).
-------------------------------------------------------------
Each kernel will be executed 20 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Number of Threads requested for each MPI rank = 1
Number of Threads counted for rank 0 = 1
-------------------------------------------------------------
Your timer granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 34650 microseconds.
   (= 34650 timer ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 timer ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 0.726401 seconds
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:          32276.7     0.041149     0.039657     0.063049
Scale:         32415.2     0.040358     0.039488     0.046934
Add:           34641.0     0.055867     0.055426     0.057078
Triad:         34526.2     0.056714     0.055610     0.069850
-------------------------------------------------------------
VERBOSE: rank 0, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 1, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 2, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 3, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
Solution Validates: avg error less than 1.000000e-13 on all three arrays
Results Validation Verbose Results:
    Expected a(1), b(1), c(1): 2.769001 1.144215 3.868538
    Observed a(1), b(1), c(1): 2.769001 1.144215 3.868538
    Rel Errors on a, b, c:     0.000000e+00 0.000000e+00 0.000000e+00
-------------------------------------------------------------
VERBOSE: total shutdown time for rank 0 = 0.074511 seconds

2.input文件

job_name=stream run_time=24:00:00 partition=dell_intel node_num=3 task_per_node=32

3.执行脚本

#!/bin/sh
source /home/wushiming/stream/stream_input

##check input var
time=`date +%m%d_%H%M%S`

if [ "x$job_name" == "x" ];then
    sbatch_job_name="YHPC_$time "
else
    sbatch_job_name=$job_name
fi

if [ "x$partition" == "x" ];then
    sbatch_partition=""
else
    sbatch_partition=$partition
fi

if [ "x$work_dir" == "x" ];then
    mkdir -p /home/yhpc/YHPC_$time
    sbatch_work_dir=/home/yhpc/YHPC_$time
else
    sbatch_work_dir=$work_dir
fi

if [ "x$run_time" == "x" ];then
    sbatch_run_time=03:00:00
else
    sbatch_run_time=$run_time
fi

sbatch_node_num=$node_num
sbatch_task_per_node=$task_per_node

sbatch_err_log=$sbatch_work_dir/%j.err
sbatch_out_log=$sbatch_work_dir/%j.out

### Write basic job infomations

#echo -e "The start time is: `date +"%Y-%m-%d %H:%M:%S"` \n"
#echo -e "My job ID is: $SLURM_JOB_ID \n"
#echo -e "The total cores is: $total_cores \n"
#echo -e "The hosts is: \n"
#srun -np $node_num -nnp 1 hostname
cat > $sbatch_work_dir/stream.slurm <<EOF
#!/bin/bash
#SBATCH --ntasks-per-node=$sbatch_task_per_node
#SBATCH --job-name $sbatch_job_name
#SBATCH --nodes=$sbatch_node_num
#SBATCH --mail-type=ALL
#SBATCH --partition $sbatch_partition
#SBATCH --chdir=$sbatch_work_dir
#SBATCH -e $sbatch_err_log
#SBATCH -o $sbatch_out_log

ulimit -s unlimited
ulimit -l unlimited

module purge
source /opt/ohpc/pub/apps/intel/setvars.sh
module load intel/mpi-2021.1.1
module load stream/2016-07-28

export I_MPI_OFI_PROVIDER=Verbs
export FI_VERBS_IFACE=team1.282

echo -e "The start time is: \`date +"%Y-%m-%d %H:%M:%S"\`"
echo -e "My job ID is: \$SLURM_JOB_ID"
echo -e "The total cores is: \$SLURM_NPROCS"
echo -e "The \$SLURM_JOB_ID Job info:"
scontrol show job \$SLURM_JOB_ID

mpirun   -genv I_MPI_FABRICS ofi  stream_mpi.exe

echo -e "The end time is: \`date +"%Y-%m-%d %H:%M:%S"\`"
EOF

/usr/bin/sbatch $sbatch_work_dir/stream.slurm

个结果匹配 ""

    无结果匹配 ""