Lucid Data Dreaming for Multiple Object Tracking

Khoreva, Anna; Benenson, Rodrigo; Ilg, Eddy; Brox, Thomas; Schiele, Bernt

DetailsSummary

Lucid Data Dreaming for Multiple Object Tracking

Khoreva, A., Benenson, R., Ilg, E., Brox, T., & Schiele, B. (2017). Lucid Data Dreaming for Multiple Object Tracking. Retrieved from http://arxiv.org/abs/1703.09554.

Item is Released

show all hide all

Basic

show hide

Item Permalink: https://hdl.handle.net/11858/00-001M-0000-002C-E621-0 Version Permalink: https://hdl.handle.net/11858/00-001M-0000-002E-1E70-5

Genre: Paper

Other : Lucid Data Dreaming for Object Tracking

Files

show Files

hide Files

:

1703.09554v2 (Preprint), 12MB

View Save

File Permalink:
https://hdl.handle.net/11858/00-001M-0000-002E-1E71-3

Name:
1703.09554v2

Description:
File downloaded from arXiv at 2017-11-02 14:40

OA-Status:

Visibility:
Public

MIME-Type / Checksum:
application/pdf / [MD5]

Technical Metadata:

View

Copyright Date:
-

Copyright Info:
-

License:
http://arxiv.org/help/license

Locators

show

Creators

show

hide

Creators:
Khoreva, Anna¹, Author
Benenson, Rodrigo², Author
Ilg, Eddy², Author
Brox, Thomas², Author
Schiele, Bernt¹, Author

Affiliations:
1Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society, ou_1116547
2External Organizations, ou_persistent22

Content

show

hide

Free keywords: Computer Science, Computer Vision and Pattern Recognition, cs.CV

Abstract: Convolutional networks reach top quality in pixel-level object tracking but require a large amount of training data (1k ~ 10k) to deliver such results. We propose a new training strategy which achieves state-of-the-art results across three evaluation datasets while using 20x ~ 100x less annotated data than competing methods. Instead of using large training sets hoping to generalize across domains, we generate in-domain training data using the provided annotation on the first frame of each video to synthesize ("lucid dream") plausible future video frames. In-domain per-video training data allows us to train high quality appearance- and motion-based models, as well as tune the post-processing stage. This approach allows to reach competitive results even when training from only a single annotated frame, without ImageNet pre-training. Our results indicate that using a larger training set is not automatically better, and that for the tracking task a smaller training set that is closer to the target domain is more effective. This changes the mindset regarding how many training samples and general "objectness" knowledge are required for the object tracking task.

Details

show

hide

Language(s): eng - English

Dates: Created: 2017-03-28Published Online: 2017

Publication Status: Published online

Pages: 17 p.

Publishing info: -

Table of Contents: -

Rev. Type: -

Identifiers: arXiv: 1703.09554
URI: http://arxiv.org/abs/1703.09554
BibTex Citekey: khoreva_lucid_dreams17

Degree: -

Event

show

Legal Case

show

Project information

show

Source

show