hide
Free keywords:
Computer Science, Computer Vision and Pattern Recognition, cs.CV
Abstract:
Convnets have enabled significant progress in pedestrian detection recently,
but there are still open questions regarding suitable architectures and
training data. We revisit CNN design and point out key adaptations, enabling
plain FasterRCNN to obtain state-of-the-art results on the Caltech dataset.
To achieve further improvement from more and better data, we introduce
CityPersons, a new set of person annotations on top of the Cityscapes dataset.
The diversity of CityPersons allows us for the first time to train one single
CNN model that generalizes well over multiple benchmarks. Moreover, with
additional training with CityPersons, we obtain top results using FasterRCNN on
Caltech, improving especially for more difficult cases (heavy occlusion and
small scale) and providing higher localization quality.