virtualflow
virtulFlow是哈佛医学院开发的开源超高通量虚拟筛选平台,旨在利用高性能计算能力并行筛选潜在的有机化合物结构,以寻找有希望的新药物分子。
利用virtualFlow的平台,调用160000个CPU对接10亿个分子仅耗时约15小时,10000个CPU则需要两周。
该平台支持市面上大部分免费的分子对接程序等,包括AutoDock Vina、QuickVina 2、Smina、AutoDockFR、QuickVina-W、VinaXB和Vina-Carb等,并可进行同时计算交叉验证。
关于virtualflow的更多信息请访问virtualflow官网。
一、作业提交参数说明
用户可通过公共模板提交virtualflow作业,与virtualflow相关的作业参数如下:
参数 | 描述 |
ligand dir | 筛选源文件所在目录 |
todo all | 筛选列表 |
receptor | 筛选的受体 |
method | 筛选的工具 |
method config | 筛选工具的配置文件 |
二、virtualflow作业运行参考
VFVS
1.下载VFVS算例
wget https://virtual-flow.org/sites/virtual-flow.org/files/tutorials/VFVS_GK.tar
tar -xvf VFVS_GK.tar -C /VFVS_GK
2.准备受体与配体文件
*受体文件位置
/VFVS_GK/input-files/receptor/xx.pdbqt
*配体文件位置
/VFVS_GK/input-files/ligand-library/XX
受体文件示例:
ATOM 1 N ASN A 5 -57.546 14.070 -17.657 1.00 38.97 0.627 N
ATOM 2 CA ASN A 5 -58.392 14.717 -16.614 1.00 38.40 0.390 C
ATOM 3 C ASN A 5 -57.576 15.000 -15.365 1.00 38.07 0.289 C
ATOM 4 O ASN A 5 -57.211 14.083 -14.616 1.00 37.38 -0.268 OA
ATOM 5 CB ASN A 5 -59.601 13.849 -16.268 1.00 38.92 0.141 C
ATOM 6 CG ASN A 5 -60.712 14.634 -15.585 1.00 39.01 0.277 C
ATOM 7 OD1 ASN A 5 -61.835 14.709 -16.096 1.00 40.81 -0.269 OA
ATOM 8 ND2 ASN A 5 -60.415 15.210 -14.426 1.00 37.42 -0.107 N
ATOM 9 N LEU A 6 -57.321 16.286 -15.147 1.00 37.25 -0.229 NA
ATOM 10 CA LEU A 6 -56.485 16.767 -14.058 1.00 36.96 0.186 C
ATOM 11 C LEU A 6 -57.170 16.681 -12.715 1.00 36.42 0.274 C
ATOM 12 O LEU A 6 -56.533 16.330 -11.714 1.00 36.52 -0.268 OA
ATOM 13 CB LEU A 6 -56.067 18.214 -14.317 1.00 36.63 0.034 C
ATOM 14 CG LEU A 6 -54.638 18.458 -14.794 1.00 37.45 0.002 C
ATOM 15 CD1 LEU A 6 -54.208 17.490 -15.893 1.00 37.69 0.000 C
ATOM 16 CD2 LEU A 6 -54.532 19.906 -15.259 1.00 35.41 0.000 C
ATOM 17 N TYR A 7 -58.465 16.990 -12.700 1.00 35.25 -0.228 NA
ATOM 18 CA TYR A 7 -59.248 16.903 -11.489 1.00 34.76 0.191 C
ATOM 19 C TYR A 7 -59.169 15.483 -10.914 1.00 34.10 0.275 C
ATOM 20 O TYR A 7 -58.901 15.313 -9.724 1.00 34.06 -0.268 OA
ATOM 21 CB TYR A 7 -60.698 17.291 -11.742 1.00 34.94 0.060 C
ATOM 22 CG TYR A 7 -61.565 17.163 -10.513 1.00 35.01 -0.020 A
ATOM 23 CD1 TYR A 7 -61.418 18.040 -9.444 1.00 35.21 -0.002 A
ATOM 24 CD2 TYR A 7 -62.526 16.158 -10.413 1.00 35.82 -0.002 A
ATOM 25 CE1 TYR A 7 -62.208 17.930 -8.312 1.00 35.63 0.027 A
ATOM 26 CE2 TYR A 7 -63.321 16.037 -9.277 1.00 36.38 0.027 A
ATOM 27 CZ TYR A 7 -63.153 16.932 -8.233 1.00 35.69 0.131 A
ATOM 28 OH TYR A 7 -63.931 16.834 -7.105 1.00 36.50 -0.190 OA
ATOM 29 N PHE A 8 -59.380 14.478 -11.765 1.00 33.61 -0.229 NA
ATOM 30 CA PHE A 8 -59.351 13.082 -11.312 1.00 33.25 0.183 C
ATOM 31 C PHE A 8 -57.953 12.592 -10.979 1.00 32.89 0.237 C
ATOM 32 O PHE A 8 -57.772 11.838 -10.010 1.00 32.51 -0.276 OA
ATOM 33 CB PHE A 8 -60.058 12.150 -12.294 1.00 33.60 0.060 C
ATOM 34 CG PHE A 8 -61.537 12.376 -12.355 1.00 34.35 -0.020 A
ATOM 35 CD1 PHE A 8 -62.160 12.648 -13.564 1.00 35.78 -0.004 A
ATOM 36 CD2 PHE A 8 -62.304 12.342 -11.191 1.00 35.53 -0.004 A
ATOM 37 CE1 PHE A 8 -63.525 12.877 -13.628 1.00 35.72 -0.000 A
ATOM 38 CE2 PHE A 8 -63.671 12.562 -11.239 1.00 36.20 -0.000 A
ATOM 39 CZ PHE A 8 -64.283 12.830 -12.461 1.00 35.77 -0.000 A
ATOM 40 N GLN A 9 -56.967 13.003 -11.773 1.00 32.40 -0.208 NA
ATOM 41 CA GLN A 9 -55.566 12.769 -11.429 0.50 32.44 0.139 C
ATOM 42 C GLN A 9 -55.312 13.316 -10.031 1.00 31.96 0.215 C
ATOM 43 O GLN A 9 -54.639 12.687 -9.214 1.00 31.55 -0.277 OA
· · ·
3.配置文件
*修改配体信息文件:/VFVS_GK/tools/templates/todo.all
*修改配置文件:/VFVS_GK/tools/templates/all.ctrl
batchsystem=SLURM
partition=X86
docking_scenario_names=vina_rigid_receptor1:smina_rigid_receptor1
ocking_scenario_programs=vina:smina_rigid
docking_scenario_inputfolders=../input-files/vina_rigid_receptor1:../input-files/smina_rigid_receptor1
*筛选工具配置示例:
config.txt:
receptor = ../input-files/receptor/4no7_prot.pdbqt
center_x = -8.654
center_y = 2.229
center_z = 19.715
size_x = 24.0
size_y = 26.25
size_z = 22.5
exhaustiveness = 8
cpu = 1
结果分析
1.下载VFTools
wget https://github.com/VirtualFlow/VFTools/archive/master.tar.gz
tar -zcvf master.tar.gz -C /VFTools
修改环境变量
export PATH="/VFTools/bin:$PATH"
2.安装openbabel(化学领域常用的一个文件格式转换工具)
*安装conda
mkdir /conda
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -P /conda
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh #执行时所有选择yes
*创建新的python3.7.5的虚拟环境
conda create -n py37 python=3.7.5
安装openbabel
*进入虚拟环境
conda activate py37
*安装
conda install openbabel -c conda-forge
*退出虚拟环境
conda deactivate
3.对化合物进行排名
*创建排名文件夹
mkdir /VFVS_GK/ranking
*进入/VFVS_GK/ranking中执行命令进行排名
vfvs_pp_ranking_all.sh ../output-files/complete/ 2 meta_tranche
*使用tree查看生成文件,排名信息在.clean后缀文件中
.
├── qvina02_rigid_receptor1
│ ├── compounds
│ ├── compounds.energies
│ ├── compounds.energies.uniq.csv
│ ├── firstposes.all
│ ├── firstposes.all.minindex
│ ├── firstposes.all.minindex.sorted
│ └── firstposes.all.minindex.sorted.clean
└── smina_rigid_receptor1
├── firstposes.all
├── firstposes.all.minindex
├── firstposes.all.minindex.sorted
└── firstposes.all.minindex.sorted.clean
*查看排名前十化合物
head -10 qvina02_rigid_receptor1/firstposes.all.minindex.sorted.clean
GACEBG_00000 Z2624037004_3 -10.3 1
GACEBG_00000 Z2624037004_4 -10.3 1
GACEBG_00000 Z2087256678_1 -9.8 1
GACEBG_00000 Z2087260951_2 -9.8 1
GACEBG_00000 Z2087260951_4 -9.8 1
GACECF_00000 PV-001701895824_1 -9.8 1
GACEBG_00000 PV-001958058751_3 -9.6 1
GACEBG_00000 Z2087256678_2 -9.6 1
GACEBG_00000 Z2087260951_3 -9.6 1
GACEBG_00000 Z2624037004_1 -9.6 1
*将排名前100化合物重定向为compounds文件
head -100 qvina02_rigid_receptor1/firstposes.all.minindex.sorted.clean > compounds
4.提取前100个化合物的结构
*创建存放化合物结构文件夹
mkdir /VFVS_GK/pose
*在/VFVS_GK/pose/下执行进行提取
vfvs_pp_prepare_dockingposes.sh ../output-files/complete/qvina02_rigid_receptor1/results/ meta_tranch ../ranking/qvina02_rigid_receptor1/compounds dockingsposes overwrite
*提取完成后得到:
/VFVS_GK/pose/qvina02_rigid_receptor1/dockingsposes.plain/xxx.pdb
5.input文件
partition=dell_intel
node_num=2
task_per_node=30
run_time=0-08:00:00
work_dir=/home/liupeng/virtual-flow
ligand_dir=/home/wushiming/virtual-flow/input-files/ligand-library
todo_all=/home/wushiming/virtual-flow/tools/templates/todo.all
prceptor=/home/wushiming/virtual-flow/input-files/receptor/4no7_prot.pdbqt
method1=qvina02
method1_config=/home/wushiming/virtual-flow/input-files/qvina02_rigid_receptor1/config.txt
method2=smina
method2_config=/home/wushiming/virtual-flow/input-files/smina_rigid_receptor1/config.txt
method3=
method3_config=
method4=
method4_config=
method5=
method5_config=
method6=
method6_config=
4no7_prot.pdbqt:pdbqt格式蛋白质文件
ligand-library目录下文件:
GA/GACAAD.tar
GA/GACBBF.tar
HA/HACABE.tar ...
6.todo.all文件
GACAAD_00000 9
GACAAF_00000 22
GACABD_00000 27
GACABF_00000 48
GACACC_00000 1
GACACD_00000 30
GACACF_00000 20
GACADC_00000 2
GACADD_00000 2
GACADE_00000 74
GACADF_00000 6
GACAEE_00000 5
GACAEF_00000 1
GACBAD_00000 69
GACBEF_00000 26
GACBEG_00000 2
GACBFE_00000 1
GACCAD_00000 1
GACDDE_00000 75
GACDEE_00000 14
GACDFE_00000 3
GACDFF_00000 10
GACDFG_00000 55
GACEAG_00000 3
GACEBF_00000 6
GACEBG_00000 71
GACECF_00000 34
GACEEE_00000 1
GACEFF_00000 23
GAFECG_00000 38
GAFEEF_00000 1
GAFEEG_00000 1
GAFEFG_00000 1
GAFFBG_00000 1
GAFFCG_00000 2
GAFFDG_00000 11
GAFFFG_00000 2
HACABE_00000 38
HACABF_00000 20
HACACE_00000 17
HACACF_00000 5
HACADE_00000 4
HACADF_00000 1
HACBAE_00000 10
HACBBD_00000 1
HACBBE_00000 14
HACBBF_00000 40
HACBBG_00000 2
HACBCD_00000 2
HACBCE_00000 75
HACBCG_00000 12
HACBDD_00000 1
HACBDE_00000 45
HACBDF_00000 35
HACBDG_00000 5
HACBED_00000 1
HACBEF_00000 2
HACBFF_00000 1
HACCBE_00000 4
HAFCFE_00000 12
HAFCFG_00000 19
HAFDCG_00000 21
HAFDDG_00000 24
HAFDEF_00000 8
HAFDEG_00000 1
HAFDFF_00000 1
HAFDFG_00000 2
HAFEBG_00000 2
7.qvina02的配置文件
/home/wushiming/virtual-flow/input-files/qvina02_rigid_receptor1/config.txt
receptor = ../input-files/receptor/4no7_prot.pdbqt
center_x = -8.654
center_y = 2.229
center_z = 19.715
size_x = 24.0
size_y = 26.25
size_z = 22.5
exhaustiveness = 8
cpu = 1
8.smina的配置文件
/home/wushiming/virtual-flow/input-files/smina_rigid_receptor1/config.txt
receptor = ../input-files/receptor/4no7_prot.pdbqt
center_x = -8.654
center_y = 2.229
center_z = 19.715
size_x = 24.0
size_y = 26.25
size_z = 22.5
exhaustiveness = 4
scoring = vinardo
cpu = 1
9.virtual-flow.sh 脚本
#!/bin/sh
#set -x
source /home/wushiming/virtual-flow/input
if [ "x$work_dir" == "x" ];then
mkdir -p ~/yhpc/YHPC_$time
sbatch_work_dir=~/yhpc/YHPC_$time
else
sbatch_work_dir=$work_dir
fi
if [ "x$prceptor" == "x" ];then
echo "The analysis_file cannot be empty."
exit 1
else
prceptor_file=$prceptor
fi
if [ "x$ligand_dir" == "x" ];then
echo "The analysis_dir cannot be empty."
exit 1
else
all_ligand_dir=$ligand_dir
fi
if [ "x$todo_all" == "x" ];then
echo "The analysis_file cannot be empty."
exit 1
else
todo_all_file=$todo_all
fi
$(> temp)
$(> temp2)
$(> temp4)
for i in {1..6}
do
method=`echo method$i`
#method_$i=`eval echo \\$$method`
#method_array[i]=`eval echo \\$$method`
if [ "x`eval echo \\$$method`" != "x" ];then
num=$i
mkdir $sbatch_work_dir/input-files/`eval echo \\$$method`_rigid_receptor1 -p
config=`echo method$i\_config`
if [ "x`eval echo \\$$config`" == "x" ];then
echo "file of `eval echo \\$$method` configure not fount."
continue
fi
cp `eval echo \\$$config` $sbatch_work_dir/input-files/`eval echo \\$$method`_rigid_receptor1/config.txt
sed -i '/receptor/c \receptor = '$prceptor_file'' $sbatch_work_dir/input-files/`eval echo
\\$$method`_rigid_receptor1/config.txt
#if [ `eval echo \\$$method` == smina ];then
#fi
echo -n `eval echo \\$$method`_rigid_receptor1: >> temp
temp1=`cat temp`
temp1=${temp1%*:}
#echo $temp1
echo -n ../input-files/`eval echo \\$$method`_rigid_receptor1: >> temp2
temp3=`cat temp2`
temp3=${temp3%*:}
#echo $temp3
echo -n `eval echo \\$$method`: >> temp4
else
config=`echo method$i\_config`
if [ "x`eval echo \\$$config`" != "x" ];then
echo "essential to add a method to the file:`eval echo \\$$config`."
continue
fi
fi
done
if [ ! -s temp ];then
echo "at least one method needs to select."
exit 1
fi
grep smina temp4 &> /dev/null
if [ $?==0 ];then
sed -i -n 's/smina/smina_rigid/p' temp4
fi
temp5=`cat temp4`
temp5=${temp5%*:}
#echo $i
mkdir $sbatch_work_dir/input-files/ligand-library -p
cp $all_ligand_dir/* $sbatch_work_dir/input-files/ligand-library/ -R
#cp $prceptor_file $sbatch_work_dir/input-files/receptor/
cp /home/wushiming/virtual-flow/tools $sbatch_work_dir/ -R
echo "y" | cp $todo_all_file $sbatch_work_dir/tools/templates/
sed -i '/timelimit=/c \timelimit='$run_time'' $sbatch_work_dir/tools/templates/all.ctrl
sed -i '/partition=/c \partition='$partition'' $sbatch_work_dir/tools/templates/all.ctrl
sed -i '/steps_per_job=/c \steps_per_job='$node_num'' $sbatch_work_dir/tools/templates/all.ctrl
sed -i '/cpus_per_step=/c \cpus_per_step='$task_per_node'' $sbatch_work_dir/tools/templates/all.ctrl
sed -i '/queues_per_step=/c \queues_per_step='$task_per_node'' $sbatch_work_dir/tools/templates/all.ctrl
sed -i '/docking_scenario_programs=/c \docking_scenario_programs='$temp5''
$sbatch_work_dir/tools/templates/all.ctrl
sed -i '/docking_scenario_names=/c \docking_scenario_names='$temp1''
$sbatch_work_dir/tools/templates/all.ctrl
sed -i '/docking_scenario_inputfolders=/c \docking_scenario_inputfolders='$temp3''
$sbatch_work_dir/tools/templates/all.ctrl
sed -i '/tempdir=/c \tempdir='$work_dir'/tempdir' $sbatch_work_dir/tools/templates/all.ctrl
mkdir $sbatch_work_dir/tempdir
rm -rf temp*
#执行命令
cd $sbatch_work_dir/tools
./vf_prepare_folders.sh <<EOF
y
y
EOF
./vf_start_jobline.sh 1 1 templates/template1.slurm.sh submit 1