Textual Explanations for Self-Driving Vehicles

Kim, Jinkyu; Rohrbach, Anna; Darrell, Trevor; Canny, John; Akata, Zeynep

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Bitte beachten Sie, dass eine neuere Version dieses Datensatzes verfügbar ist:
https://pure.mpg.de/pubman/item/item_2628606_11

DetailsÜbersicht

Freigegeben

Konferenzbeitrag

Textual Explanations for Self-Driving Vehicles

MPG-Autoren

/persons/resource/persons79477

Rohrbach, Anna
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

/persons/resource/persons127761

Akata, Zeynep
Computer Vision and Multimodal Computing, MPI for Informatics, Max Planck Society;

Externe Ressourcen

Es sind keine externen Ressourcen hinterlegt

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Kim, J., Rohrbach, A., Darrell, T., Canny, J., & Akata, Z. (2018). Textual Explanations for Self-Driving Vehicles. In ECCV 2018. Berlin: Springer.

Zitierlink: https://hdl.handle.net/21.11116/0000-0001-DE86-E

Zusammenfassung

Deep neural perception and control networks have become key com- ponents of self-driving vehicles. User acceptance is likely to benefit from easy- to-interpret textual explanations which allow end-users to understand what trig- gered a particular behavior. Explanations may be triggered by the neural con- troller, namely introspective explanations , or informed by the neural controller’s output, namely rationalizations . We propose a new approach to introspective ex- planations which consists of two parts. First, we use a visual (spatial) attention model to train a convolutional network end-to-end from images to the vehicle control commands, i . e ., acceleration and change of course. The controller’s at- tention identifies image regions that potentially influence the network’s output. Second, we use an attention-based video-to-text model to produce textual ex- planations of model actions. The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. Finally, we explore a version of our model that generates rationalizations, and compare with introspective explanations on the same video segments. We evaluate these models on a novel driving dataset with ground-truth human explanations, the Berkeley DeepDrive eXplanation (BDD- X) dataset. Code is available at https://github.com/JinkyuKimUCB/explainable-deep-driving