构建detectron2的docker镜像

目录显示

无文档，不做事；做事必写文档。

Dockerfile

首先从owod的仓库开始，发现其依赖的是detectron2 v0.2.1的版本。
然后进入docker目录，检查一下Dockerfile，确实需要修改很多，如下是修改后的，需要注意的是这段dockerfile构建的是owod的完整项目，所以：

pip install -e [dir]的目录指向做了更改
不使用官方dockerfile创建的appuser用户，因为我想使用vscode连接容器方便debug，但又因为非root用户我不知道怎么连接，或者说尝试了几次后并没有成功，才作此决定：直接使用root用户。
添加nvidia的GPG
替换apt源为阿里云
安装pip, python3 -m pip install --no-cache-dir --upgrade pip
指定clone 某个tag的仓库并重新命名仓库名称, git clone --branch v0.2.1 --depth 1 https://github.com/facebookresearch/detectron2 detectron2_repo
由于多次build需要联网，为了减少网络流量和节省时间，先下载，然后COPY是个不错的策略

FROM nvidia/cuda:11.1.1-cudnn8-devel-ubuntu18.04

RUN apt-key del 7fa2af80 && \
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub && \
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

ENV DEBIAN_FRONTEND noninteractive
RUN sed -i 's@/archive.ubuntu.com/@/mirrors.aliyun.com/@g' /etc/apt/sources.list && \
  apt-get update --fix-missing && apt-get install -y \
    python3-opencv ca-certificates python3-dev python3-pip git wget vim openssh-server --fix-missing \
    cmake ninja-build && \
  rm -rf /var/lib/apt/lists/*
RUN ln -sv /usr/bin/python3 /usr/bin/python

RUN python3 -m pip install --no-cache-dir --upgrade pip 

# install dependencies
# See https://pytorch.org/ for other options if you use a different version of CUDA
RUN pip install --user tensorboard cmake   # cmake from apt-get is too old
RUN pip install --user torch==1.10 torchvision==0.11.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html

RUN pip install 'git+https://github.com/facebookresearch/fvcore'
# install detectron2
#RUN git clone --branch v0.4.1 --depth 1 https://github.com/facebookresearch/detectron2 detectron2_repo
COPY code OWOD
# set FORCE_CUDA because during `docker build` cuda is not accessible
ENV FORCE_CUDA="1"
# This will by default build detectron2 for all common cuda architectures and take a lot more time,
# because inside `docker build`, there is no way to tell which architecture will be used.
ARG TORCH_CUDA_ARCH_LIST="Kepler;Kepler+Tesla;Maxwell;Maxwell+Tegra;Pascal;Volta;Turing"
ENV TORCH_CUDA_ARCH_LIST="${TORCH_CUDA_ARCH_LIST}"

#RUN pip install --user -e detectron2_repo
RUN pip install -e OWOD/src/detectron2

# Set a fixed model cache directory.
ENV FVCORE_CACHE="/tmp"
RUN mkdir /var/run/sshd && \
    echo 'root:99521' | chpasswd && \
    sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
    sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \
    echo "export VISIBLE=now" >> /etc/profile
EXPOSE 22
ENTRYPOINT ["/usr/sbin/sshd", "-D"]

然后就是执行build了，注意要切换到Dockerfile所在的目录：

docker build -t owod:v1 .

Container

成功后，开启容器：

docker run --gpus all -d --shm-size=8gb -p 6792:22 --env="DISPLAY" --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" --volume="/mnt/c/users/aikedaer/Desktop/tjl/detectron2:/detectron2" --name=detectron2 owod:v1

我只是在原来的命令基础上加了：

-p 6792:22这段代码是映射ssh端口的
volume="/mnt/c/users/aikedaer/Desktop/tjl/detectron2:/detectron2"指定挂载的目录

Reinstall locally

如果我想在容器中重新构建一个新的detectron2目录，因为之前构建镜像时使用的是pip install -e .所以，这里在新的detectron2目录执行pip install -e .会出现卸载问题，所以直接在容器内将原来的detectron2目录删除。然后再执行新目录的安装，发现仍然报错，是因为显卡架构不在pytorch1.6支持的架构列表中，那么可以忽略gpu加速器，直接安装，当然这是对于只是用cpu的情况，但我们真正想要的是安装支持GPU加速的环境，所以为了测试是不是真的可以使用GPU，只靠torch.cuda.is_available()还不够，需要再多执行一句torch.tensor([1]).to("cuda")来判断是否真的支持gpu，如果报错说明不支持。所以这环境安装真是个谁都得罪不起的活，GPU架构要在安装的Pytorch版本支持的列表中，然后Pytorch的版本又要是Detectron2版本依赖的版本，从下到上还真是不容易。上面的dockerfile是更新了很多次的。

CUDA_VISIBLE_DEVICES= pip install -e .

就可以成功了。

SSH debug

使用vscode连接容器的配置需要写在家目录下.ssh/config中：

Host detectron2
    HostName 127.0.0.1
    Port 6792
    User root

然后在vscode插件里下载特定版本的python extention，注意实际上是下载到连接的容器内，选择编译器后，当前的编辑器就可以打断点debug了。

Add ssh to a base image

Dockerfile

FROM fuzz4all/fuzz4all:v3

RUN sed -i 's@/archive.ubuntu.com/@/mirrors.aliyun.com/@g' /etc/apt/sources.list && \
  apt-get update && apt-get install -y openssh-server

ENV FVCORE_CACHE="/tmp"
RUN mkdir /var/run/sshd && \
echo 'root:99521' | chpasswd && \
sed -i 's/#PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
sed 's@session\s*required\s*pam_loginuid.so@session optional pam_loginuid.so@g' -i /etc/pam.d/sshd && \
echo "export VISIBLE=now" >> /etc/profile
EXPOSE 22
ENTRYPOINT ["/usr/sbin/sshd", "-D"]

打赏作者

Dockerfile

Container

Reinstall locally

SSH debug

Add ssh to a base image

相关文章：

发表回复 取消回复

发表回复取消回复