[机器学习]LightGBM分布式使用完全手册
一 LightGBM分布式支持说明
使用源码编译过的二进制lightgbm来运行分布式
分布式worker之间通信可以使用Socket与MPI方式(MPI通信更快,建议使用)
二 LightGBM分布式环境安装
分布式训练环境是Ubuntu
一 Socket环境支持
On Linux LightGBM can be built using CMake and gcc or Clang.
Install CMake.
Run the following commands:
git clone --recursive https://github.com/microsoft/LightGBM cd LightGBM mkdir build cd build cmake .. make -j4Note: glibc >= 2.14 is required.
Also, you may want to read gcc Tips.
二 MPI环境支持
The default build version of LightGBM is based on socket. LightGBM also supports MPI. MPI is a high performance communication approach with RDMA support.
If you need to run a distributed learning application with high performance communication, you can build the LightGBM with MPI support.
On Linux an MPI version of LightGBM can be built using Open MPI, CMake and gcc or Clang.
1. OpenMPI安装方法
1. OpenSSH安装
apt-get update apt-get install openssh-server如果是多台机器使用MPI通信,请配置节点间ssh免密登录 并设置 StrictHostKeyChecking=no
# ~/.ssh/config 中添加如下信息 Host *StrictHostKeyChecking no2.安装openmpi (建议安装4.1.0版本)
wget https://download.open-mpi.org/release/open-mpi/v4.1/openmpi-4.1.0.tar.gz tar zxvf openmpi-4.1.0.tar.gz cd openmpi-4.1.0/ ./configure --prefix=/usr/local/openmpi # 如果环境直接安装在/usr/local, 这样可以不用设置之后的环境变量 #./configure --prefix=/usr/localmake -j8 make install
3. 配置环境变量(~/.bashrc)
export PATH=$PATH:/usr/local/openmpi/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/openmpi/lib/ source ~/.bashrc sudo ldconfig3. 验证安装成功
cd examples make ./hello_c #mpirun -np 8 hello_c Hello, world, I am 0 of 1, (Open MPI v3.1.0, package: Open MPI root@ssli_centos7 Distribution, ident: 3.1.0, repo rev: v3.1.0, May 07, 2018, 112)示例程序正确运行,说明安装成功
2. Install CMake.
Run the following commands:
git clone --recursive https://github.com/microsoft/LightGBM cd LightGBM mkdir build cd build cmake -DUSE_MPI=ON .. make -j4Note: glibc >= 2.14 is required.
三 分布式环境测试
相关数据与参数配置请参考如下:
https://github.com/microsoft/LightGBM/blob/master/examples/parallel_learning/README.md
Node1:
Node2:
Copy data file, executable file, config file and mlist.txt to all machines.
Note: MPI needs to be run in the same path on all machines.
Run following command on one machine (not need to run on all machines), need to change your_config_file to real config file.
mpiexec --machinefile mlist.txt ./lightgbm config=your_config_file在实验中(openMPI 4.1.0),发现用上面的方法,不会在每台机器上分配1个进程,可以使用如下方法启动分布式
mpiexec -np 2 -H 172.200.24.6,172.200.25.10 lightgbm config=train.conf
总结
以上是生活随笔为你收集整理的[机器学习]LightGBM分布式使用完全手册的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: phones指的是麦克风插口吗
- 下一篇: [机器学习] LightGBM并行计算算