Text and Documents

We suggest that you check the schedule on the external Web page of the workshop for any last minute changes here.

All times below are PDT. Please click on the talk title or check at the end of this page (below the schedule) for links to more content and the rooms of the different talks.

Time (PDT)TypeTitle / DescriptionPresenter / Author
8:30 – 8:40(LIVE)Opening RemarksR. Manmatha, Yuting Zhang, Vijay Mahadevan, Dimosthenis Karatzas
8:40 – 9:35Invited TalkImages with Text: Visually Grounded Reading ComprehensionMarcus Rohrbach
9:35 – 10:30Invited TalkSemantic Reading of Population Records: A Digital Twin of the Past SocietiesJosep Lladós
10:30 – 10:50Break
10:50 – 11:50ORAL SESSION

(Q & A with authors during the whole session)

10:50 – 11:00READ: Recursive Autoencoders for Document Layout GenerationAkshay Gadi Patil, Omri Ben-Eliezer, Or Perel, Hadar Averbuch-Elor
11:00 – 11:10On Recognizing Texts of Arbitrary Shapes with 2D Self-AttentionJunyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee
11:10 – 11:20CLEval: Character-Level Evaluation for Text Detection and Recognition TasksYoungmin Baek, Daehyun Nam, Sungrae Park, Junyeop Lee, Seung Shin, Jeonghun Baek, Chae Young Lee, Hwalsuk Lee
11:20 – 11:30Recognizing handwritten mathematical expressions via paired dual loss attention network and printed mathematical expressionsAnh Duc Le
11:30 – 11:40Visual Parsing with Query-Driven Global Graph Attention (QD-GGA): Preliminary Results for Handwritten Math Formula RecognitionMahshad Mahdavi, Richard Zanibbi
11:40 – 11:50CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documentsDevashish Krishna Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita A Sultanpure
11:50 – 13:20POSTER SESSION

(Q & A with authors during the whole session)

Poster paper #1A method for detecting text of arbitrary shapes in natural scenes that improves text spottingQitong Wang, YI ZHENG, Margrit Betke
Poster paper #2Textual Visual Semantic Dataset for Text SpottingAhmed A Sabir, Francesc Moreno, Lluís Padró
Poster paper #3A Large Dataset of Historical Japanese Documents with Complex LayoutsZejiang Shen, Kaixuan Zhang, Melissa Dell
Poster paper #4An Accurate Segmentation-Based Scene Text Detector with Context Attention and Repulsive Text BorderXi Liu, Gaojing Zhou, Rui Zhang, Xiaolin Wei
Poster paper #5Illegible Text to Readable Text: An Image-to-Image Transformation using Conditional Sliced Wasserstein Adversarial NetworksMostafa Karimi, Gopalkrishna Veni, Yen-Yun Yu
Poster paper #6Optical Braille Recognition Based on Semantic Segmentation Network with Auxiliary Learning StrategyRenqiang Li, Hong Liu, Xiangdong Wang, Jianxing Xu, Yueliang Qian
Poster paper #7Font-ProtoNet: Prototypical Network based Font Identification of Document Images in Low Data RegimeNikita Goel, Monika Sharma, Lovekesh Vig
Poster paper #8Information Extraction from Document Images via FCA based Template Detection and Knowledge Graph Rule InductionMouli Rastogi, Afshan Syed, Mrinal Rawat, Lovekesh Vig, Puneet Agarwal, Gautam Shroff, Ashwin Srinivasan
Poster paper #9An OCR for Classical Indic Documents Containing Arbitrarily Long WordsAgam Dwivedi, Rohit Saluja, Ravi Kiran Sarvadevabhatla
Poster paper #10Visual and Textual Deep Feature Fusion for Document Image ClassificationSouhail Bakkali, Zuheng MING, Mickael Coustaty, Marçal Rusiñol
Poster paper #11Symbol Spotting on Digital Architectural Floor Plans Using a Deep Learning-based FrameworkAlireza Rezvanifar, Melissa Cote, Alexandra Branzan Albu
13:20 – 13:50Long Break
13:50 – 14:45Invited TalkGaining a Deeper Visual Understanding of DocumentsBrian Price
14:45 – 15:30(LIVE)Panel sessionMarcus Rohrbach, Josep Lladós, Brian Price
15:30 – 15:50Break
15:50 – 17:30DocVQA CHALLENGE SESSION

(Q & A with authors during the live discussion)

15:50 – 16:05Intro and Overview of Task 1 (Dataset, Results, Analysis of the results)Minesh Mathew
16:05 – 16:15PingAn-OneConnect-Gammalab-DQAHan Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng
16:15 – 16:25Structural LM-v2Structural LM Team
16:25 – 16:35QA_Base_MRC_1Yu Di Chen, You Hui Guo, Gangyan Zeng, Jian Jian Cao, Qi Ming Peng, Sijin Wu
16:35 – 16:45Overview of Task 2 (Dataset, Results, Analysis of the results)Ruben Tito Perez
16:45 – 16:50PingAn-OneConnect-Gammalab-DQAHan Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng
16:50 – 17:00iFLYTEK-DOCRChenyu Liu, Fengren Wang, Jiajia Wu, Jinshui Hu, Bing Yin, Cong Liu
17:00 – 17:30

(LIVE)

Discussion and Awards ceremonyMinesh Mathew, Ruben Tito Perez, R. Manmatha, C.V. Jawahar, D. Karatzas
17:30 – 17:40(LIVE)Closing and best paper awardR. Manmatha, Yuting Zhang, Vijay Mahadevan, Dimosthenis Karatzas

 

 

Teaser picture for paper
The goal of this workshop is to raise awareness about the text and document analysis in the broader computer vision community.
    Authors: R. Manmatha, Yuting Zhang, Vijay Mahadevan, Dimosthenis Karatzas   
    Keywords:  Document analysis, scene text, OCR, table extraction, character recognition, Scene text Visual Question Answering, Document Visual Question Answering, Handwriting recognition, Signature verification, Graphics recognition
Mond Jun15  
8:30 AM - 8:40 AM
Favorite
Teaser picture for paper
Images with text are a challenging multimodal problem. The talk gives some insights from our datasets and models how to better understand them.
    Authors: Marcus Rohrbach   
Mond Jun15  
8:40 AM - 9:35 AM
Favorite
Teaser picture for paper
Mond Jun15  
9:35 AM - 10:30 AM
Favorite
Teaser picture for paper
We present READ, a Recursive Variational Autoencoder Network to generate document layouts. (Top): Skeleton of our training pipeline (Bottom): Novel an
    Authors: Akshay Gadi Patil, Omri Ben-Eliezer, Or Perel, Hadar Averbuch-Elor   
    Keywords:  Document Structure, Layout Generation, Generative Neural Networks, Recursive Neural Networks, Variational Autoencoder
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
This paper introduces an architecture to recognizing texts of arbitrary shapes. Our model utilizes the self-attention to describe 2D spatial dependenc
    Authors: Junyeop Lee, Sungrae Park, Jeonghun Baek, Seong Joon Oh, Seonghyeon Kim, Hwalsuk Lee   
    Keywords:  Scene Text Recognition, Self Attention, 2D Attention, Transformer, OCR
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
We propose a Character-Level Evaluation metric for text detection and recognition tasks. The CLEval provides a comprehensive and fine assessment.
    Authors: Youngmin Baek, Daehyun Nam, Sungrae Park, Junyeop Lee, Seung Shin, Jeonghun Baek, Chae Young Lee, Hwalsuk Lee   
    Keywords:  end-to-end evaluation, text detection, text recognition, fine assessment, OCR, character level, CLEval
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
We have proposed the paired dual loss attention to recognize handwritten mathematical expressions (MEs), which could learn semantic invariant features
    Authors: Anh Duc Le   
    Keywords:  paired dual loss attention network, recognition of handwritten mathematical expressions, context matching, attention-based encoder-decoder, paired samples, printed samples, domain adaption, semantic invariant features, CROHME dataset, latex corpus
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
We present a new visual parsing method based on convolutional neural networks for handwritten mathematical formulas. The Query-Driven Global Graph Att
    Authors: Mahshad Mahdavi, Richard Zanibbi   
    Keywords:  math parsing, multitask learning, graph attention, handwritten recognition, visual parsing, CNN, graph parsing
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
The State of the Art and End-to-End approach for Table detection and Table Structure Recognition in document images using a single CNN Model.
    Authors: Devashish Krishna Prasad, Ayan Gadpal, Kshitij Kapadni, Manish Visave, Kavita A Sultanpure   
    Keywords:  Table Detection, Table Recognition, Table Structure Recogntion, Table Segmentation, Cascade-RCNN, Cascade-Mask-RCNN, HRNet, Document Analysis, Tabular data extraction, Table extraction
Mond Jun15  
10:50 AM - 11:50 AM
Favorite
Teaser picture for paper
UHT effectively extracts text polygons from scene images, using only region information instead of geometric information.
    Authors: Qitong Wang, YI ZHENG, Margrit Betke   
    Keywords:  scene text detection, computer vision, text spotting, deep learning
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Text recognition in unrestricted images can be improved using image visual context.
    Authors: Ahmed A Sabir, Francesc Moreno, Lluís Padró   
    Keywords:  Dataset, Text spotting, OCR, Semantic similarity, Visual Semantic
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
The dataset is created to help with the research on layout analysis methods for historical documents, especially for eastern languages.
    Authors: Zejiang Shen, Kaixuan Zhang, Melissa Dell   
    Keywords:  Layout Analysis, Dataset, Layout Object Detection
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
An accurate encoder-decoder framework with context attention and repulsive link. Experiments on benchmarks validate the superiority of our method.
    Authors: Xi Liu, Gaojing Zhou, Rui Zhang, Xiaolin Wei   
    Keywords:  text detection, segmentation, context attention, repulsive link, text border
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
(a) Overall pipeline for HW2MP-GAN (b) Joint attention handwriting recognition reinforced by HW2MP-GAN (c) Examples of generated machine-print images.
    Authors: Mostafa Karimi, Gopalkrishna Veni, Yen-Yun Yu   
    Keywords:  Handwriting recognition, Sliced Wasserstein GANs, Image-to-Image translation
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
In this paper, we propose an optimal semantic segmentation framework BraUNet to directly detect and recognize Braille characters in the whole origina
    Authors: Renqiang Li, Hong Liu, Xiangdong Wang, Jianxing Xu, Yueliang Qian   
    Keywords:  Optical Braille recognition, Braille character recognition, Semantic segmentation, Double-sided Braille, U-Net
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Proposed approach,experimental results on FewShot-FontID and AdobeVFR dataset,along with comparison between meta-learning technique and baseline.
    Authors: Nikita Goel, Monika Sharma, Lovekesh Vig   
    Keywords:  Font Identification, Prototypical Network, Meta-training, Meta-testing, Few-shot Learning, Euclidean Distance, t-SNE visualization,Character images
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Presenting an end-to-end system for information extraction from streaming document images using one-shot rule induction and template allocation.
    Authors: Mouli Rastogi, Afshan Syed, Mrinal Rawat, Lovekesh Vig, Puneet Agarwal, Gautam Shroff, Ashwin Srinivasan   
    Keywords:  Information Extraction, Template Matching, Structural Similarity, One Shot Learning, Rule Induction, Formal Concept Analysis, Document Processing, Knowledge Graphs
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Datasets (real, synthetic) and a CNN-LSTM Attention OCR for printed classical Indic documents containing very long words.
    Authors: Agam Dwivedi, Rohit Saluja, Ravi Kiran Sarvadevabhatla   
    Keywords:  OCR, Indic, Synthetic data, Line
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
In this paper, we propose a hybrid cross-modal deep network to perform end-to-end document image classification. Our network learns simultaneously f
    Authors: Souhail Bakkali, Zuheng MING, Mickael Coustaty, Marçal Rusiñol   
    Keywords:  Text document image classification, Cross-Modal feature learning, Deep neural networks, Static word embeddings, Contextualized dynamic word embeddings, Feature fusion mechanisms
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Our deep learning method for symbol spotting on architectural floor plans addresses occlusion, clutter and intra-class graphical notation variability.
    Authors: Alireza Rezvanifar, Melissa Cote, Alexandra Branzan Albu   
    Keywords:  Symbol spotting, symbol recognition, architectural floor plans, document image analysis, region proposal networks, YOLO, convolutional neural networks, SESYD dataset
Mond Jun15  
11:50 AM - 1:20 PM
Favorite
Teaser picture for paper
Mond Jun15  
1:50 PM - 2:45 PM
Favorite
Teaser picture for paper
The goal of this workshop is to raise awareness about the text and document analysis in the broader computer vision community.
    Authors: Marcus Rohrbach, Josep Lladós, Brian Price   
    Keywords:  Document analysis, scene text, OCR, table extraction, character recognition, Scene text Visual Question Answering, Document Visual Question Answering, Handwriting recognition, Signature verification, Graphics recognition
Mond Jun15  
2:45 PM - 3:30 PM
Favorite
Teaser picture for paper
Introduces a new dataset for VQA on Document Images. Datset has 50,000 questions defined over 12,000+ images.
    Authors: Minesh Mathew   
    Keywords:  vqa,document vqa,document understanding,multimodal,question answering,visual question answering
Mond Jun15  
3:50 PM - 4:05 PM
Favorite
Teaser picture for paper
We train a DB model to detect words and a TPS-ResNet-BiLSTM-Attention model to recognize words. We pretrain a 2d-position embedding model.
    Authors: Han Qiu, Guoqiang Xu, Chenjie Cao, Chao Gao, Dexun Wang, Fengxin Yang, Xiao Xie, Yu Qiu, Ziqi Zheng   
    Keywords:  OCR,position embedding,pretrain,language model
Mond Jun15  
4:05 PM - 4:15 PM
Favorite
Teaser picture for paper
Structural language model to solve the docvqa task
    Authors: Structural LM Team   
    Keywords:  structural lm
Mond Jun15  
4:15 PM - 4:25 PM
Favorite
Teaser picture for paper
{\rtf1\ansi\ansicpg936\cocoartf1561\cocoasubrtf600 {\fonttbl\f0\fswiss\fcharset0 ArialMT;\f1\fnil\fcharset134 PingFangSC-Regular;} {\colortbl;\red255\
    Authors: Yu Di Chen, You Hui Guo, Gangyan Zeng, Jian Jian Cao, Qi Ming Peng, Sijin Wu   
    Keywords:  {\rtf1\ansi\ansicpg936\cocoartf1561\cocoasubrtf600 {\fonttbl\f0\fswiss\fcharset0 Helvetica;\f1\fnil\fcharset134 PingFangSC-Regular;} {\colortbl;\red255\green255\blue255;} {\*\expandedcolortbl;;} \paperw11900\paperh16840\margl1440\margr1440\vieww10800
Mond Jun15  
4:25 PM - 4:35 PM
Favorite
Teaser picture for paper
DocVQA Task 2: Visual Question Answering task where questions are performed over a collection of documents instead of a single image.
    Authors: Ruben Tito Perez   
    Keywords:  DocVQA,VQA,Documents,Retrieval,RetrievalVQA,DocVQATask2,CV,AI
Mond Jun15  
4:35 PM - 4:45 PM
Favorite
Teaser picture for paper
By treating this problem as a retrieval task, we presented DOCument OCR Retrieval (DOCR). Our method consists of three building blocks: 1) layout &
    Authors: Chenyu Liu, Fengren Wang, Jiajia Wu, Jinshui Hu, Bing Yin, Cong Liu   
    Keywords:  retrieval, document, sequence parsing, POS tagging, fuzzy search, QA
Mond Jun15  
4:50 PM - 5:00 PM
Favorite
Teaser picture for paper
The goal of this workshop is to raise awareness about the text and document analysis in the broader computer vision community.
    Authors: Minesh Mathew, Ruben Tito Perez, R. Manmatha, C.V. Jawahar, D. Karatzas   
    Keywords:  Document analysis, scene text, OCR, table extraction, character recognition, Scene text Visual Question Answering, Document Visual Question Answering, Handwriting recognition, Signature verification, Graphics recognition
Mond Jun15  
5:00 PM - 5:30 PM
Favorite
Teaser picture for paper
The goal of this workshop is to raise awareness about the text and document analysis in the broader computer vision community.
    Authors: R. Manmatha, Yuting Zhang, Vijay Mahadevan, Dimosthenis Karatzas   
    Keywords:  Document analysis, scene text, OCR, table extraction, character recognition, Scene text Visual Question Answering, Document Visual Question Answering, Handwriting recognition, Signature verification, Graphics recognition
Mond Jun15  
5:30 PM - 5:40 PM
Favorite