stream
STREAM
STREAM软件是内存带宽性能测试的基准工具,也是衡量服务器内存性能指标的通用工具。STREAM软件具有良好的空间局部性,是对转换检测缓冲区TLB(Translation Lookaside Buffer)友好、缓存友好的一款软件。STREAM软件支持复制(Copy)、尺度变换(Scale)、矢量求和(Add)、复合矢量求和(Triad)四种运算方式测试内存带宽的性能。
1. STREAM作业运行参考
1.执行命令&结果:
[root@login1 STREAM-master]# mpirun -np 4 stream
stream.c stream.f stream_mpi.c stream_mpi.exe
[root@login1 STREAM-master]# mpirun -np 4 stream_mpi.exe
-------------------------------------------------------------
STREAM version $Revision: 1.8 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Total Aggregate Array size = 80000000 (elements)
Total Aggregate Memory per array = 610.4 MiB (= 0.6 GiB).
Total Aggregate memory required = 1831.1 MiB (= 1.8 GiB).
Data is distributed across 4 MPI ranks
Array size per MPI rank = 20000000 (elements)
Memory per array per MPI rank = 152.6 MiB (= 0.1 GiB).
Total memory per MPI rank = 457.8 MiB (= 0.4 GiB).
-------------------------------------------------------------
Each kernel will be executed 20 times.
The *best* time for each kernel (excluding the first iteration)
will be used to compute the reported bandwidth.
The SCALAR value used for this run is 0.420000
-------------------------------------------------------------
Number of Threads requested for each MPI rank = 1
Number of Threads counted for rank 0 = 1
-------------------------------------------------------------
Your timer granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 34650 microseconds.
(= 34650 timer ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 timer ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
VERBOSE: total setup time for rank 0 = 0.726401 seconds
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 32276.7 0.041149 0.039657 0.063049
Scale: 32415.2 0.040358 0.039488 0.046934
Add: 34641.0 0.055867 0.055426 0.057078
Triad: 34526.2 0.056714 0.055610 0.069850
-------------------------------------------------------------
VERBOSE: rank 0, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 1, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 2, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
VERBOSE: rank 3, AvgErrors 0.000000e+00 0.000000e+00 0.000000e+00
Solution Validates: avg error less than 1.000000e-13 on all three arrays
Results Validation Verbose Results:
Expected a(1), b(1), c(1): 2.769001 1.144215 3.868538
Observed a(1), b(1), c(1): 2.769001 1.144215 3.868538
Rel Errors on a, b, c: 0.000000e+00 0.000000e+00 0.000000e+00
-------------------------------------------------------------
VERBOSE: total shutdown time for rank 0 = 0.074511 seconds
2.input文件
job_name=stream run_time=24:00:00 partition=dell_intel node_num=3 task_per_node=32
3.执行脚本
#!/bin/sh
source /home/wushiming/stream/stream_input
##check input var
time=`date +%m%d_%H%M%S`
if [ "x$job_name" == "x" ];then
sbatch_job_name="YHPC_$time "
else
sbatch_job_name=$job_name
fi
if [ "x$partition" == "x" ];then
sbatch_partition=""
else
sbatch_partition=$partition
fi
if [ "x$work_dir" == "x" ];then
mkdir -p /home/yhpc/YHPC_$time
sbatch_work_dir=/home/yhpc/YHPC_$time
else
sbatch_work_dir=$work_dir
fi
if [ "x$run_time" == "x" ];then
sbatch_run_time=03:00:00
else
sbatch_run_time=$run_time
fi
sbatch_node_num=$node_num
sbatch_task_per_node=$task_per_node
sbatch_err_log=$sbatch_work_dir/%j.err
sbatch_out_log=$sbatch_work_dir/%j.out
### Write basic job infomations
#echo -e "The start time is: `date +"%Y-%m-%d %H:%M:%S"` \n"
#echo -e "My job ID is: $SLURM_JOB_ID \n"
#echo -e "The total cores is: $total_cores \n"
#echo -e "The hosts is: \n"
#srun -np $node_num -nnp 1 hostname
cat > $sbatch_work_dir/stream.slurm <<EOF
#!/bin/bash
#SBATCH --ntasks-per-node=$sbatch_task_per_node
#SBATCH --job-name $sbatch_job_name
#SBATCH --nodes=$sbatch_node_num
#SBATCH --mail-type=ALL
#SBATCH --partition $sbatch_partition
#SBATCH --chdir=$sbatch_work_dir
#SBATCH -e $sbatch_err_log
#SBATCH -o $sbatch_out_log
ulimit -s unlimited
ulimit -l unlimited
module purge
source /opt/ohpc/pub/apps/intel/setvars.sh
module load intel/mpi-2021.1.1
module load stream/2016-07-28
export I_MPI_OFI_PROVIDER=Verbs
export FI_VERBS_IFACE=team1.282
echo -e "The start time is: \`date +"%Y-%m-%d %H:%M:%S"\`"
echo -e "My job ID is: \$SLURM_JOB_ID"
echo -e "The total cores is: \$SLURM_NPROCS"
echo -e "The \$SLURM_JOB_ID Job info:"
scontrol show job \$SLURM_JOB_ID
mpirun -genv I_MPI_FABRICS ofi stream_mpi.exe
echo -e "The end time is: \`date +"%Y-%m-%d %H:%M:%S"\`"
EOF
/usr/bin/sbatch $sbatch_work_dir/stream.slurm