{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 10.3 物体检测与语义分割联合使用" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "物体检测的目标是:导入一张图片,通过方框正确识别主要物体在图像的哪个地方。它的输入是一整幅图像,输出是方框及方框内每个物体的标签。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](images/RCNN.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "生成这些边框的算法最著名的就是R-CNN。R-CNN 采用 Selective Search 的流程,通过不同尺寸的窗口来查看图像。对于每一个尺寸,它通过纹理、色彩或密度把相邻像素划为一组,来进行物体识别。后来,它的不同组件都进行了改进,以达到端到端的自动学习的流程:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "R-CNN: https://arxiv.org/abs/1311.2524\n", "\n", "Fast R-CNN: https://arxiv.org/abs/1504.08083\n", "\n", "Faster R-CNN: https://arxiv.org/abs/1506.01497\n", "\n", "Mask R-CNN: https://arxiv.org/abs/1703.06870" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "其中,Mask R-CNN 把 Faster R-CNN 拓展到像素级的图像分割。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![MaskRCNN](images/MaskRCNN.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Mask R-CNN 通过向 Faster R-CNN 加入一个分支来实现语义分割。新增的分支输出一个二元的 mask,指示某像素是否是物体的一部分。这个分支是一个 CNN 特征图上的全卷积网络:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](images/Mask.jpeg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "原始版本的 Faster R-CNN的RoIPool会与原图中的区域有轻微出入,而图像分割需要像素级别的精确度。于是,作者们对 RoIPool 进行调整,使之更精确的排列对齐,便是 RoIAlign:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](images/RoiAlign.jpeg)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }