Visual Question Answering and Dialog

VQA and Dialog Workshop LogoIntroduction

The primary goal of this workshop is two-fold. First is to benchmark progress in Visual Question Answering and Visual Dialog.

Visual Question Answering

There will be three tracks in the Visual Question Answering Challenge this year.

    • TextVQA+TextCaps: There are 2 subtracks under this track.
Visual Dialog
    • The 3rd edition of the Visual Dialog Challenge will be hosted on the VisDial v1.0 dataset introduced in Das et al., CVPR 2017. The 1st and 2nd editions of the Visual Dialog Challenge was organised on the VisDial v1.0 dataset at ECCV 2018 and CVPR 2019. Visual Dialog requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history (consisting of the image caption and a sequence of previous questions and answers), the agent has to answer a follow-up question in the dialog.
      Challenge link: https://visualdialog.org/challenge
      Evaluation Server: https://evalai.cloudcv.org/web/challenges/challenge-page/518/overview
      Submission Deadline: May 14, 2020 23:59:59 GMT (00 days 00h 00m 00s)

The second goal of this workshop is to continue to bring together researchers interested in visually-grounded question answering, dialog systems, and language in general to share state-of-the-art approaches, best practices, and future directions in multi-modal AI. In addition to invited talks from established researchers, we invite submissions of extended abstracts of at most 2 pages describing work in the relevant areas including: Visual Question Answering, Visual Dialog, (Textual) Question Answering, (Textual) Dialog Systems, Commonsense knowledge, Vision + Language, etc. The submissions are not specific to any challenge track. All accepted abstracts will be presented as posters at the workshop to disseminate ideas. The workshop is on June 14, 2020, at the IEEE Conference on Computer Vision and Pattern Recognition, 2020.

 

Teaser picture for paper
Opening Remarks by Aishwarya Agrawal.
    Authors: Aishwarya Agrawal   
    Keywords:  W64-601, Aishwarya Agrawal, VQA, Visual Dialog, Visual Question Answering, Opening Remarks
Sund Jun14  
8:55 AM - 9:00 AM
Favorite
Teaser picture for paper
This is a live panel consisting of a subset of the invited speakers who would be answering questions (both from the organizers as well as from the aud
    Authors: Danna Gurari, Felix Hill, Mateusz Malinowski, Nassim Parvin, Jiasen Lu, Dimosthenis Karatzas   
    Keywords:  Panel, Live, QA, Discussion, VQA, Visual Dialog, Visual Question Answering, W64-703
Sund Jun14  
9:00 AM - 9:45 AM
Favorite
Teaser picture for paper
Invited Talk by Danna Gurari
    Authors: Danna Gurari   
    Keywords:  Danna Gurari, VQA, Visual Dialog, Visual Question Answering, W64-101
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Felix Hill
    Authors: Felix Hill   
    Keywords:  Felix Hill, VQA, Visual Dialog, Visual Question Answering, W64-102
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Douwe Kiela
    Authors: Douwe Kiela   
    Keywords:  Douwe Kiela, VQA, Visual Dialog, Visual Question Answering, W64-103
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Anna Rohrbach
    Authors: Anna Rohrbach   
    Keywords:  Anna Rohrbach, VQA, Visual Dialog, Visual Question Answering, video question answering, video fill-in-the-blank, visual question answering, relational reasoning, vision and language, W64-104
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Mateusz Malinowski
    Authors: Mateusz Malinowski   
    Keywords:  Mateusz Malinowski, VQA, Visual Dialog, Visual Question Answering, W64-105
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Amanpreet Singh
    Authors: Amanpreet Singh   
    Keywords:  Amanpreet Singh, VQA, Visual Dialog, Visual Question Answering, W64-106
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Ani Kembhavi
    Authors: Ani Kembhavi   
    Keywords:  Ani Kembhavi, VQA, Visual Dialog, Visual Question Answering, W64-108
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Jiasen Lu
    Authors: Jiasen Lu   
    Keywords:  Jiasen Lu, VQA, Visual Dialog, Visual Question Answering, W64-109
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Dimosthenis Karatzas
    Authors: Dimosthenis Karatzas   
    Keywords:  Dimosthenis Karatzas, VQA, Visual Dialog, Visual Question Answering, W64-110
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
Invited Talk by Zhou Yu
    Authors: Zhou Yu   
    Keywords:  Zhou Yu, VQA, Visual Dialog, Visual Question Answering, W64-111
Sund Jun14  
10:00 AM - 10:30 AM
Favorite
Teaser picture for paper
This talk is going to provide an overview of the VQA Challenge, announce the winners and present interesting analyses of the challenge results.
    Authors: Ayush Shrivastava   
    Keywords:  VQA, Visual Question Answering, W64-201
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We revisit grid features for VQA and use modulated convolutions for counting -- achieving state-of-the-art without vision-and-language pre-training.
    Authors: Duy-Kien Nguyen, Huaizu Jiang, Vedanuij Goswami, Licheng Yu, Xinlei Chen   
    Keywords:  Visual Question Answering, VQA Challenge, visual counting, image captioning, vision and language, bottom-up attention, grid features, convolutional feature maps, modulated convolutions, W64-202
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
TextVQA contains questions on images which require reading and reasoning over the text. This talk present results and analysis of TextVQA Challenge 20
    Authors: Amanpreet Singh   
    Keywords:  VQA, Visual Question Answering, OCR, Vision and Language, text reading, TextVQA, reading comprehension, dataset, challenge, W64-301
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Text based Visual Question Answering (TextVQA) is a recently raised challenge that requires a machine to read text in images and answer natural langua
    Authors: Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu   
    Keywords:  TextVQA, SMA, VQA, Visual Question Answering, TextVQA Challenge, W64-302
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
This talk introduces the VizWiz VQA Challenge 2020 and includes an analysis of this year’s submissions along with the announcement of the winners.
    Authors: Danna Gurari, Samreen Anjum   
    Keywords:  VizWiz, VQA, Visual Question Answering, VizWiz Challenge, W64-401
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Our challenge requires a model to recognize text, relate it to its visual context, and decide what part of the text to use to describe a scene correct
    Authors: Oleksii Sidrov   
    Keywords:  Image Captioning, OCR, Vision and Language, Text reading, TextCaps, Reading comprehension, dataset, W64-303
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
With the use of CRAFT, ABCNet and four-stage STR framework, we improve OCR of M4C_captioner model and win first place in TextCaps challenge 2020.
    Authors: Zhaokai Wang, Renda Bao, Si Liu   
    Keywords:  image captioning, TextCaps, OCR, TextCaps Challenge, W64-304
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Visual Dialog Challenge Talk (Overview, Analysis and Winner Announcement)
    Authors: Vishvak Murahari   
    Keywords:  Visual Dialog, Visual Dialog Challenge, W64-501
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We develop an ensemble model for visual dialog which achieves good results on both MRR (70.2%) and NDCG (72.7%)
    Authors: Idan Schwartz, Prof. Alex Schwing, Prof. Tamir Hazan   
    Keywords:  Visual Dialog, Visual Dialog Challenge, W64-502
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Talk by VQA Challenge Runner-ups: Renaissance@DamoNLP
    Authors: Ming Yan, Chenliang Li, Wei Wang, Bin Bi, Zhongzhou Zhao, Songfang Huang   
    Keywords:  VQA, Visual Question Answering, VQA Challenge, Visual+Language Pre-training, StructBERT base, Progressive Pre-training, W64-204
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We propose a novel sparsely-connected self-attention layer that considers local spatial context and avoids learning redundant features.
    Authors: Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal   
    Keywords:  VQA and Dialog Workshop
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Our dataset requires a model to recognize text, relate it to its visual context, and decide what part of the text to use to describe a scene correctly
    Authors: Oleksii Sidorov, Ronghang Hu, Marcus Rohrbach, Amanpreet Singh   
    Keywords:  VQA and Dialog Workshop, Image Captioning, OCR, Vision and Language, Text reading, TextCaps, Reading comprehension, dataset
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We propose Zero-Shot Grounding wheree wee localize referred objects not seen in training using a Single Stage architecture.
    Authors: Arka Sadhu, Kan Chen, Ram Nevatia   
    Keywords:  VQA and Dialog Workshop, zero-shot, grounding, phrase grounding, dataset, visual genome, single-stage, grid features, object detection
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Tricks For Training Visual-Dialog Models, 13% improvement in MRR to base model.
    Authors: Liang Guanlin, Li Wenbin   
    Keywords:  VQA and Dialog Workshop, Learning2Rank, Two Stage Training, Revised Cross Entropy, New Multi-task Finetuning
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
VQA models can be used to assist the blind. We analyze the robustness of such models and reveal their shortcomings observed in real world settings.
    Authors: Shaunak Halbe   
    Keywords:  VQA and Dialog Workshop, visual question answering, interpretability, adversarial, accessibility, multi modal, robustness, attributions, adversarial attacks, nlp
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We propose Differentiable First-Order Logic for visual reasoning (VR) and studying perception and reasoning in disentangled fashion in VR tasks.
    Authors: Saeed Amizadeh, Oleksandr Polozov, Hamid Palangi, Yichen Huang, Kazuhito Koishida   
    Keywords:  Visual Reasoning, Visual Question Answering, GQA, First-Order Logic, Differentiable First-Order Logic, Vision-Reasoning Disentanglement.
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
Elisabot, a personal assistant for reminiscence therapy with your photos. Bring back your memories and fight Alzheimer’s with dialogue.
    Authors: Mariona Carós   
    Keywords:  VQA and Dialog Workshop,chatbot,Reminiscence Therapy, dementia, Alzheimer, dialogue system, visual question generator
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We propose a weakly-supervised VAE setting to generate diverse questions conditioned on the answer category using category consistent cyclic training.
    Authors: Sarthak Bhagat   
    Keywords:  VQA and Dialog Workshop, cycle consistency, multimodal, visual question generation, latent structure
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We develop an ensemble model for visual dialog which achieves good results on both MRR (70.2%) and NDCG (72.7%)
    Authors: Idan Schwartz   
    Keywords:  VQA and Dialog Workshop, visual dialog
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
We develop and evaluate captioning models that can control length, which can be leveraged to generate captions of different style and descriptiveness.
    Authors: Ruotian Luo, Greg Shakhnarovich   
    Keywords:  VQA and Dialog Workshop, image captioning, control, length, LSTM, descriptiveness
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
In this paper, we present our approach to generating visual questions about radiology images called VQGR.
    Authors: Mourad Sarrouti, Asma Ben Abacha, Dina Demner-Fushman   
    Keywords:  VQA and Dialog Workshop, Visual Question Generation, Radiology Images, Visual Question Answering, Computer Vision, Natural Language Processing, Data Augmentation
Sund Jun14  
12:00 PM - 1:00 PM
Favorite
Teaser picture for paper
This is a live panel consisting of a subset of the invited speakers who would be answering questions (both from the organizers as well as from the aud
    Authors: Douwe Kiela, Anna Rohrbach, Amanpreet Singh, Ani Kembhavi, Zhou Yu   
    Keywords:  Panel, Live, QA, Discussion, VQA, Visual Dialog, Visual Question Answering, W64-704
Sund Jun14  
3:00 PM - 3:45 PM
Favorite
Teaser picture for paper
Closing Remarks by Aishwarya Agrawal.
    Authors: Aishwarya Agrawal   
    Keywords:  W64-602, Aishwarya Agrawal, VQA, Visual Dialog, Visual Question Answering, Closing Remarks
Sund Jun14  
5:55 PM - 6:00 PM
Favorite
Teaser picture for paper
We add objects and attributes semantics to the LXMERT baseline model, then fuse 5 models with random seeds and different pretrained model weights.
    Authors: VizWiz-VQA Challenge Winner Team   
    Keywords:  VizWiz, VQA, Visual Question Answering, VizWiz Challenge, W64-402
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Our teaser shows 20 answers produced by our trained model with defined four types to the corresponding questions.
    Authors: VizWiz-VQA Challenge Runner-up Team   
    Keywords:  VizWiz, VQA, Visual Question Answering, VizWiz Challenge, W64-403
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We revisit the bilinear attention networks in VQA from a graph perspective. The classical bilinear attention networks build a bilinear attention map t
    Authors: Dalu Guo, Chang Xu, Dacheng Tao   
    Keywords:  VQA, Visual Question Answering, VQA Challenge, Bilinear Graph, Graph Neural Networks, W64-203
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We propose a novel video understanding task by fusing knowledge-based and video question answering.
    Authors: Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima   
    Keywords:  VQA and Dialog Workshop, visual question answering, video question answering, external knowledge, language and vision, vqa, videoqa, knowledge-based visual question answering, bert, dataset
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
A practical and simple way to asses the difficulty of visual questions by clustering entropy of predicted answer distributions
    Authors: Kento Terao, Toru Tamaki, Bisser Raytchev, Kazufumi Kaneda, Shin'ichi Satoh   
    Keywords:  VQA and Dialog Workshop
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
VQA method that uses answer embeddings to inject prior knowledge and capture meaning/relations between candidate answers + extensive analysis on GQA.
    Authors: Violetta Shevchenko, Damien Teney, Anthony Dick, Anton van den Hengel   
    Keywords:  VQA and Dialog Workshop, visual question answering, answer embeddings, word embeddings
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Method to impose hard constraints based on known relations between training examples (e.g. annotations of equivalent/entailed questions in GQA)
    Authors: Damien Teney, Ehsan Abbasnejad, Anton van den Hengel   
    Keywords:  VQA and Dialog Workshop
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Discussion of bad practices with the VQA-CP dataset. New naive baseline that surpasses the SOTA.
    Authors: Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton van den Hengel   
    Keywords:  VQA and Dialog Workshop
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Is it possible to develop an "AI Pathologist" to pass the board-certified examination of the American Board of Pathology? To achieve this goal, the
    Authors: Xuehai He, Yichen Zhang, Luntian Moux, Eric P. Xing, Pengtao Xie   
    Keywords:  Pathology, VQA
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
This talk is going to provide an overview of the VQA Challenge, announce the winners and present interesting analyses of the challenge results.
    Authors: Ayush Shrivastava   
    Keywords:  VQA, Visual Question Answering, W64-201
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We revisit grid features for VQA and use modulated convolutions for counting -- achieving state-of-the-art without vision-and-language pre-training.
    Authors: Duy-Kien Nguyen, Huaizu Jiang, Vedanuij Goswami, Licheng Yu, Xinlei Chen   
    Keywords:  Visual Question Answering, VQA Challenge, visual counting, image captioning, vision and language, bottom-up attention, grid features, convolutional feature maps, modulated convolutions, W64-202
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
TextVQA contains questions on images which require reading and reasoning over the text. This talk present results and analysis of TextVQA Challenge 20
    Authors: Amanpreet Singh   
    Keywords:  VQA, Visual Question Answering, OCR, Vision and Language, text reading, TextVQA, reading comprehension, dataset, challenge, W64-301
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Text based Visual Question Answering (TextVQA) is a recently raised challenge that requires a machine to read text in images and answer natural langua
    Authors: Chenyu Gao, Qi Zhu, Peng Wang, Hui Li, Yuliang Liu, Anton van den Hengel, Qi Wu   
    Keywords:  TextVQA, SMA, VQA, Visual Question Answering, TextVQA Challenge, W64-302
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
This talk introduces the VizWiz VQA Challenge 2020 and includes an analysis of this year’s submissions along with the announcement of the winners.
    Authors: Danna Gurari, Samreen Anjum   
    Keywords:  VizWiz, VQA, Visual Question Answering, VizWiz Challenge, W64-401
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Our challenge requires a model to recognize text, relate it to its visual context, and decide what part of the text to use to describe a scene correct
    Authors: Oleksii Sidrov   
    Keywords:  Image Captioning, OCR, Vision and Language, Text reading, TextCaps, Reading comprehension, dataset, W64-303
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
With the use of CRAFT, ABCNet and four-stage STR framework, we improve OCR of M4C_captioner model and win first place in TextCaps challenge 2020.
    Authors: Zhaokai Wang, Renda Bao, Si Liu   
    Keywords:  image captioning, TextCaps, OCR, TextCaps Challenge, W64-304
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Visual Dialog Challenge Talk (Overview, Analysis and Winner Announcement)
    Authors: Vishvak Murahari   
    Keywords:  Visual Dialog, Visual Dialog Challenge, W64-501
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We develop an ensemble model for visual dialog which achieves good results on both MRR (70.2%) and NDCG (72.7%)
    Authors: Idan Schwartz, Prof. Alex Schwing, Prof. Tamir Hazan   
    Keywords:  Visual Dialog, Visual Dialog Challenge, W64-502
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Talk by VQA Challenge Runner-ups: Renaissance@DamoNLP
    Authors: Ming Yan, Chenliang Li, Wei Wang, Bin Bi, Zhongzhou Zhao, Songfang Huang   
    Keywords:  VQA, Visual Question Answering, VQA Challenge, Visual+Language Pre-training, StructBERT base, Progressive Pre-training, W64-204
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We propose a novel sparsely-connected self-attention layer that considers local spatial context and avoids learning redundant features.
    Authors: Yash Kant, Dhruv Batra, Peter Anderson, Alex Schwing, Devi Parikh, Jiasen Lu, Harsh Agrawal   
    Keywords:  VQA and Dialog Workshop
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Our dataset requires a model to recognize text, relate it to its visual context, and decide what part of the text to use to describe a scene correctly
    Authors: Oleksii Sidorov, Ronghang Hu, Marcus Rohrbach, Amanpreet Singh   
    Keywords:  VQA and Dialog Workshop, Image Captioning, OCR, Vision and Language, Text reading, TextCaps, Reading comprehension, dataset
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We propose Zero-Shot Grounding wheree wee localize referred objects not seen in training using a Single Stage architecture.
    Authors: Arka Sadhu, Kan Chen, Ram Nevatia   
    Keywords:  VQA and Dialog Workshop, zero-shot, grounding, phrase grounding, dataset, visual genome, single-stage, grid features, object detection
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
VQA models can be used to assist the blind. We analyze the robustness of such models and reveal their shortcomings observed in real world settings.
    Authors: Shaunak Halbe   
    Keywords:  VQA and Dialog Workshop, visual question answering, interpretability, adversarial, accessibility, multi modal, robustness, attributions, adversarial attacks, nlp
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We propose Differentiable First-Order Logic for visual reasoning (VR) and studying perception and reasoning in disentangled fashion in VR tasks.
    Authors: Saeed Amizadeh, Oleksandr Polozov, Hamid Palangi, Yichen Huang, Kazuhito Koishida   
    Keywords:  Visual Reasoning, Visual Question Answering, GQA, First-Order Logic, Differentiable First-Order Logic, Vision-Reasoning Disentanglement.
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
Elisabot, a personal assistant for reminiscence therapy with your photos. Bring back your memories and fight Alzheimer’s with dialogue.
    Authors: Mariona Carós   
    Keywords:  VQA and Dialog Workshop,chatbot,Reminiscence Therapy, dementia, Alzheimer, dialogue system, visual question generator
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We propose a weakly-supervised VAE setting to generate diverse questions conditioned on the answer category using category consistent cyclic training.
    Authors: Sarthak Bhagat   
    Keywords:  VQA and Dialog Workshop, cycle consistency, multimodal, visual question generation, latent structure
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We develop an ensemble model for visual dialog which achieves good results on both MRR (70.2%) and NDCG (72.7%)
    Authors: Idan Schwartz   
    Keywords:  VQA and Dialog Workshop, visual dialog
Mond Jun15  
12:00 AM - 1:00 AM
Favorite
Teaser picture for paper
We develop and evaluate captioning models that can control length, which can be leveraged to generate captions of different style and descriptiveness.
    Authors: Ruotian Luo, Greg Shakhnarovich   
    Keywords:  VQA and Dialog Workshop, image captioning, control, length, LSTM, descriptiveness
Mond Jun15  
12:00 AM - 1:00 AM
Favorite