Vision and language is a recently raised research area and has received a lot of attention. Initial research and applications in this area are mainly image-focused, such as Image Captioning, Visual Question Answering, and Referring Expression. However, moving beyond static images is essential for vision and language understanding as videos contain much richer information like spatial-temporal dynamics and audio signals. So most recently, researchers in both computer vision and natural language processing communities are striving to bridge videos and natural language. Popular topics such as video captioning, video question answering, text guided video generation fall into this area. We are proposing the first Language & Vision with applications to Video Understanding in CVPR with a joint VATEX Video Captioning Challenge and a YouMakeup Video Question Answering Challenge. This workshop offers to gather researchers from multiple domains to form a new video-language community and attract more people on this topic. In the workshop, we will invite several top-tier researchers from this area to present their most recent works. We will cover different video-language related topics such as video captioning and video question answering. The invited speakers will present key architectural building blocks and novel algorithms used to solve these tasks.
Please visit our website page for more details: https://languageandvision.github.io/
Live session link: https://zoom.us/j/94492956880?pwd=VzVxMzYvQml2M0lseWRaQUw4TFlCZz09