Skip to main content

Business Document Information Extraction: Towards Practical Benchmarks

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2022)

Abstract

Information extraction from semi-structured documents is crucial for frictionless business-to-business (B2B) communication. While machine learning problems related to Document Information Extraction (IE) have been studied for decades, many common problem definitions and benchmarks do not reflect domain-specific aspects and practical needs for automating B2B document communication. We review the landscape of Document IE problems, datasets and benchmarks. We highlight the practical aspects missing in the common definitions and define the Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR) problems. There is a lack of relevant datasets and benchmarks for Document IE on semi-structured business documents as their content is typically legally protected or sensitive. We discuss potential sources of available documents including synthetic data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 79.99
Price excludes VAT (USA)
Softcover Book
USD 99.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The term semi-structured documents is commonly used in different meanings: Some use it for text files containing semi-structured data [94], such as XML files. We use the term to refer to visually rich documents without a fixed layout [66].

  2. 2.

    For some only the annotations are available, without the original PDFs/images.

  3. 3.

    Some of the datasets are subsets: FUNSD [27] \(\subset \) RVL-CDIP [30] \(\subset \) IIT-CDIP [44].

  4. 4.

    A large proportion of the UCSF Industry Document Library are old documents, often written on a typewriter, which presents a domain shift w.r.t. today’s documents.

  5. 5.

    Automated crawling of the site not allowed: https://www.sec.gov/os/accessing-edgar-data.

  6. 6.

    CC-MAIN-2022-05 contains almost 3 billion documents out of which 0.84% are PDFs [89] – however, most of them are not semi-structured business documents.

References

  1. Antonacopoulos, A., Bridson, D., Papadopoulos, C., Pletschacher, S.: A realistic dataset for performance evaluation of document layout analysis. In: Proceedings of ICDAR, pp. 296–300. IEEE (2009)

    Google Scholar 

  2. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE/CVF CVPR, pp. 9365–9374 (2019)

    Google Scholar 

  3. Baviskar, D., Ahirrao, S., Kotecha, K.: Multi-layout invoice document dataset (MIDD): a dataset for named entity recognition. Data (2021). https://doi.org/10.3390/data6070078

  4. Bensch, O., Popa, M., Spille, C.: Key information extraction from documents: evaluation and generator. In: Abbès, S.B., et al. (eds.) Proceedings of DeepOntoNLP and X-SENTIMENT. CEUR Workshop Proceedings, vol. 2918, pp. 47–53. CEUR-WS.org (2021)

    Google Scholar 

  5. Berge, J.: The EDIFACT Standards. Blackwell Publishers, Inc. (1994)

    Google Scholar 

  6. Borchmann, Ł., et al.: DUE: End-to-end document understanding benchmark. In: Proceedings of NeurIPS (2021)

    Google Scholar 

  7. Bosak, J., McGrath, T., Holman, G.K.: Universal business language v2. 0. Organization for the Advancement of Structured Information Standards (OASIS), Standard (2006)

    Google Scholar 

  8. Cesarini, F., Francesconi, E., Gori, M., Soda, G.: Analysis and understanding of multi-class invoices. Doc. Anal. Recogn. 6(2), 102–114 (2003)

    Article  Google Scholar 

  9. Chaudhry, R., Shekhar, S., Gupta, U., Maneriker, P., Bansal, P., Joshi, A.: LEAF-QA: locate, encode & attend for figure question answering. In: Proceedings of WACV, pp. 3501–3510. IEEE (2020). https://doi.org/10.1109/WACV45572.2020.9093269

  10. Chen, L., et al.: WebSRC: a dataset for web-based structural reading comprehension. CoRR (2021)

    Google Scholar 

  11. Chen, W., Chang, M., Schlinger, E., Wang, W.Y., Cohen, W.W.: Open question answering over tables and text. In: Proceedings of ICLR (2021)

    Google Scholar 

  12. Chen, W., et al.: TabFact: a large-scale dataset for table-based fact verification. In: Proceedings of ICLR (2020)

    Google Scholar 

  13. Chen, W., Zha, H., Chen, Z., Xiong, W., Wang, H., Wang, W.Y.: HybridQA: a dataset of multi-hop question answering over tabular and textual data. In: Cohn, T., He, Y., Liu, Y. (eds.) Findings of the Association for Computational Linguistics: EMNLP. Findings of ACL, vol. EMNLP 2020, pp. 1026–1036. Association for Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.findings-emnlp.91

  14. Cho, M., Amplayo, R.K., Hwang, S., Park, J.: Adversarial TableQA: attention supervision for question answering on tables. In: Zhu, J., Takeuchi, I. (eds.) Proceedings of ACML. Proceedings of Machine Learning Research, vol. 95, pp. 391–406 (2018)

    Google Scholar 

  15. Clausner, C., Antonacopoulos, A., Pletschacher, S.: ICDAR 2019 competition on recognition of documents with complex layouts-RDCL2019. In: Proceedings of ICDAR, pp. 1521–1526. IEEE (2019)

    Google Scholar 

  16. Cristani, M., Bertolaso, A., Scannapieco, S., Tomazzoli, C.: Future paradigms of automated processing of business documents. Int. J. Inf. Manag. 40, 67–75 (2018)

    Article  Google Scholar 

  17. d’Andecy, V.P., Hartmann, E., Rusinol, M.: Field extraction by hybrid incremental and a-priori structural templates. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 251–256. IEEE (2018)

    Google Scholar 

  18. Deng, Y., Rosenberg, D.S., Mann, G.: Challenges in end-to-end neural scientific table recognition. In: Proceedings of ICDAR, pp. 894–901. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00148

  19. Denk, T.I., Reisswig, C.: BERTgrid: contextualized embedding for 2D document representation and understanding. arXiv preprint arXiv:1909.04948 (2019)

  20. Dhakal, P., Munikar, M., Dahal, B.: One-shot template matching for automatic document data capture. In: Proceedings of Artificial Intelligence for Transforming Business and Society (AITB), vol. 1, pp. 1–6. IEEE (2019)

    Google Scholar 

  21. Directive 2014/55/EU of the European parliament and of the council on electronic invoicing in public procurement, April 2014. https://eur-lex.europa.eu/eli/dir/2014/55/oj

  22. Fang, J., Tao, X., Tang, Z., Qiu, R., Liu, Y.: Dataset, ground-truth and performance metrics for table detection evaluation. In: Blumenstein, M., Pal, U., Uchida, S. (eds.) Proceedings of IAPR International Workshop on Document Analysis Systems, DAS, pp. 445–449. IEEE (2012). https://doi.org/10.1109/DAS.2012.29

  23. Ford, G., Thoma, G.R.: Ground truth data for document image analysis. In: Symposium on Document Image Understanding and Technology, pp. 199–205. Citeseer (2003)

    Google Scholar 

  24. Gao, L., Yi, X., Jiang, Z., Hao, L., Tang, Z.: ICDAR2017 competition on page object detection. In: Proceedings of ICDAR, pp. 1417–1422 (2017). https://doi.org/10.1109/ICDAR.2017.231

  25. Garncarek, Ł, et al.: LAMBERT: layout-aware language modeling for information extraction. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 532–547. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_34

    Chapter  Google Scholar 

  26. Göbel, M.C., Hassan, T., Oro, E., Orsi, G.: ICDAR 2013 table competition. In: Proceedings of ICDAR, pp. 1449–1453. IEEE Computer Society (2013). https://doi.org/10.1109/ICDAR.2013.292

  27. Jaume, G., Ekenel, H.K., Thiran, J.P.: FUNSD: a dataset for form understanding in noisy scanned documents. In: ICDAR-OST (2019, accepted)

    Google Scholar 

  28. Hamad, K.A., Mehmet, K.: A detailed analysis of optical character recognition technology. Int. J. Appl. Math. Electron. Comput. 1(Special Issue-1), 244–249 (2016)

    Google Scholar 

  29. Hamza, H., Belaïd, Y., Belaïd, A.: Case-based reasoning for invoice analysis and recognition. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 404–418. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-74141-1_28

    Chapter  Google Scholar 

  30. Harley, A.W., Ufkes, A., Derpanis, K.G.: Evaluation of deep convolutional nets for document image classification and retrieval. In: International Conference on Document Analysis and Recognition (ICDAR) (2015)

    Google Scholar 

  31. He, S., Schomaker, L.: Beyond OCR: multi-faceted understanding of handwritten document characteristics. Pattern Recogn. 63, 321–333 (2017)

    Article  Google Scholar 

  32. Holeček, M.: Learning from similarity and information extraction from structured documents. Int. J. Doc. Anal. Recogn. (IJDAR) 1–17 (2021)

    Google Scholar 

  33. Holeček, M., Hoskovec, A., Baudiš, P., Klinger, P.: Table understanding in structured documents. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 158–164. IEEE (2019)

    Google Scholar 

  34. Holt, X., Chisholm, A.: Extracting structured data from invoices. In: Proceedings of the Australasian Language Technology Association Workshop 2018, pp. 53–59 (2018)

    Google Scholar 

  35. Huang, Z., et al.: ICDAR2019 competition on scanned receipt OCR and information extraction. In: Proceedings of ICDAR, pp. 1516–1520. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00244

  36. Islam, N., Islam, Z., Noor, N.: A survey on optical character recognition system. arXiv preprint arXiv:1710.05703 (2017)

  37. Jiang, J.: Information extraction from text. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 11–41. Springer, Cham (2012). https://doi.org/10.1007/978-1-4614-3223-4_2

    Chapter  Google Scholar 

  38. Jobin, K.V., Mondal, A., Jawahar, C.V.: DocFigure: a dataset for scientific document figure classification. In: 13th IAPR International Workshop on Graphics Recognition, GREC@ICDAR, pp. 74–79. IEEE (2019). https://doi.org/10.1109/ICDARW.2019.00018

  39. Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. arXiv preprint arXiv:2004.14356 (2020)

  40. Katti, A.R., et al.: Chargrid: towards understanding 2D documents. arXiv preprint arXiv:1809.08799 (2018)

  41. Krieger, F., Drews, P., Funk, B., Wobbe, T.: Information extraction from invoices: a graph neural network approach for datasets with high layout variety. In: Ahlemann, F., Schütte, R., Stieglitz, S. (eds.) WI 2021. LNISO, vol. 47, pp. 5–20. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86797-3_1

    Chapter  Google Scholar 

  42. Kumar, A., et al.: Ask me anything: dynamic memory networks for natural language processing. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of ICML, vol. 48, pp. 1378–1387. JMLR.org (2016)

    Google Scholar 

  43. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Knight, K., Nenkova, A., Rambow, O. (eds.) Proceedings of NAACL HLT, pp. 260–270 (2016). https://doi.org/10.18653/v1/n16-1030

  44. Lewis, D., Agam, G., Argamon, S., Frieder, O., Grossman, D., Heard, J.: Building a test collection for complex document information processing. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665–666 (2006)

    Google Scholar 

  45. Li, J., Wang, S., Wang, Y., Tang, Z.: Synthesizing data for text recognition with style transfer. Multimed. Tools Appl. 78(20), 29183–29196 (2019)

    Article  Google Scholar 

  46. Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., Li, Z.: TableBank: table benchmark for image-based table detection and recognition. In: Calzolari, N., et al. (eds.) Proceedings of The 12th Language Resources and Evaluation Conference, LREC. pp. 1918–1925 (2020)

    Google Scholar 

  47. Liu, W., Zhang, Y., Wan, B.: Unstructured document recognition on business invoice. Machine Learning, Stanford iTunes University, Stanford, CA, USA, Technical report (2016)

    Google Scholar 

  48. Majumder, B.P., Potti, N., Tata, S., Wendt, J.B., Zhao, Q., Najork, M.: Representation learning for information extraction from form-like documents. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL, pp. 6495–6504 (2020). https://doi.org/10.18653/v1/2020.acl-main.580

  49. Mathew, M., Bagal, V., Tito, R., Karatzas, D., Valveny, E., Jawahar, C.: InfographicVQA. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1697–1706 (2022)

    Google Scholar 

  50. Mathew, M., Karatzas, D., Jawahar, C.V.: DocVQA: a dataset for VQA on document images. In: Proceedings of WACV, pp. 2199–2208. IEEE (2021). https://doi.org/10.1109/WACV48630.2021.00225

  51. McCann, B., Keskar, N.S., Xiong, C., Socher, R.: The natural language decathlon: multitask learning as question answering. CoRR (2018)

    Google Scholar 

  52. Meadows, B., Seaburg, L.: Universal business language 1.0. Organization for the Advancement of Structured Information Standards (OASIS) (2004)

    Google Scholar 

  53. Medvet, E., Bartoli, A., Davanzo, G.: A probabilistic approach to printed document understanding. Int. J. Doc. Anal. Recogn. 14(4), 335–347 (2011). https://doi.org/10.1007/s10032-010-0137-1

    Article  Google Scholar 

  54. Memon, J., Sami, M., Khan, R.A., Uddin, M.: Handwritten optical character recognition (OCR): a comprehensive systematic literature review (SLR). IEEE Access 8, 142642–142668 (2020)

    Article  Google Scholar 

  55. Methani, N., Ganguly, P., Khapra, M.M., Kumar, P.: PlotQA: reasoning over scientific plots. In: Proceedings of WACV, pp. 1516–1525 (2020). https://doi.org/10.1109/WACV45572.2020.9093523

  56. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticæ Investigationes, pp. 3–26 (2007). https://doi.org/10.1075/li.30.1.03nad

  57. Nassar, A., Livathinos, N., Lysak, M., Staar, P.W.J.: TableFormer: table structure understanding with transformers. CoRR abs/2203.01017 (2022). https://doi.org/10.48550/arXiv.2203.01017

  58. Nayef, N., et al.: ICDAR 2019 robust reading challenge on multi-lingual scene text detection and recognition-RRC-MLT-2019. In: Proceedings of ICDAR, pp. 1582–1587. IEEE (2019)

    Google Scholar 

  59. Palm, R.B., Laws, F., Winther, O.: Attend, copy, parse end-to-end information extraction from documents. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 329–336. IEEE (2019)

    Google Scholar 

  60. Palm, R.B., Winther, O., Laws, F.: CloudScan - a configuration-free invoice analysis system using recurrent neural networks. In: Proceedings of ICDAR, pp. 406–413. IEEE (2017). https://doi.org/10.1109/ICDAR.2017.74

  61. Park, S., el al.: Cord: a consolidated receipt dataset for post-OCR parsing. In: Workshop on Document Intelligence at NeurIPS 2019 (2019)

    Google Scholar 

  62. Prasad, D., Gadpal, A., Kapadni, K., Visave, M., Sultanpure, K.: CascadeTabNet: an approach for end to end table detection and structure recognition from image-based documents. In: Proceedings of CVPRw, pp. 2439–2447 (2020). https://doi.org/10.1109/CVPRW50498.2020.00294

  63. Qasim, S.R., Mahmood, H., Shafait, F.: Rethinking table recognition using graph neural networks. In: Proceedings of ICDAR, pp. 142–147. IEEE (2019). https://doi.org/10.1109/ICDAR.2019.00031

  64. Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683 (2019)

  65. Rastogi, M., et al.: Information extraction from document images via FCA based template detection and knowledge graph rule induction. In: Proceedings of CVPRw, pp. 2377–2385 (2020). https://doi.org/10.1109/CVPRW50498.2020.00287

  66. Riba, P., Dutta, A., Goldmann, L., Fornés, A., Ramos, O., Lladós, J.: Table detection in invoice documents by graph neural networks. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 122–127. IEEE (2019)

    Google Scholar 

  67. Rusinol, M., Benkhelfallah, T., Poulain dAndecy, V.: Field extraction from administrative documents by incremental structural templates. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1100–1104. IEEE (2013)

    Google Scholar 

  68. Schreiber, S., Agne, S., Wolf, I., Dengel, A., Ahmed, S.: DeepDeSRT: deep learning for detection and structure recognition of tables in document images. In: Proceedings of ICDAR, pp. 1162–1167 (2017). https://doi.org/10.1109/ICDAR.2017.192

  69. Schuster, D., et al.: Intellix-end-user trained information extraction for document archiving. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 101–105. IEEE (2013)

    Google Scholar 

  70. Shahab, A., Shafait, F., Kieninger, T., Dengel, A.: An open approach towards the benchmarking of table structure recognition systems. In: Doermann, D.S., Govindaraju, V., Lopresti, D.P., Natarajan, P. (eds.) The Ninth IAPR International Workshop on Document Analysis Systems, DAS, pp. 113–120 (2010). https://doi.org/10.1145/1815330.1815345

  71. Siegel, N., Lourie, N., Power, R., Ammar, W.: Extracting scientific figures with distantly supervised neural networks. In: Chen, J., Gonçalves, M.A., Allen, J.M., Fox, E.A., Kan, M., Petras, V. (eds.) Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, JCDL, pp. 223–232 (2018). https://doi.org/10.1145/3197026.3197040

  72. Smith, R.: An overview of the tesseract OCR engine. In: Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), vol. 2, pp. 629–633. IEEE (2007)

    Google Scholar 

  73. Stanisławek, T., et al.: Kleister: key information extraction datasets involving long documents with complex layouts. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 564–579. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_36

    Chapter  Google Scholar 

  74. Stockerl, M., Ringlstetter, C., Schubert, M., Ntoutsi, E., Kriegel, H.P.: Online template matching over a stream of digitized documents. In: Proceedings of the 27th International Conference on Scientific and Statistical Database Management, pp. 1–12 (2015)

    Google Scholar 

  75. Stray, J., Svetlichnaya, S.: DeepForm: extract information from documents (2020). https://wandb.ai/deepform/political-ad-extraction, benchmark

  76. Sun, H., Kuang, Z., Yue, X., Lin, C., Zhang, W.: Spatial dual-modality graph reasoning for key information extraction. arXiv preprint arXiv:2103.14470 (2021)

  77. Sunder, V., Srinivasan, A., Vig, L., Shroff, G., Rahul, R.: One-shot information extraction from document images using neuro-deductive program synthesis. arXiv preprint arXiv:1906.02427 (2019)

  78. Tensmeyer, C., Morariu, V.I., Price, B., Cohen, S., Martinez, T.: Deep splitting and merging for table structure decomposition. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 114–121. IEEE (2019)

    Google Scholar 

  79. Wang, J., et al.: Towards robust visual information extraction in real world: new dataset and novel solution. In: Proceedings of the AAAI Conference on Artificial Intelligence (2021)

    Google Scholar 

  80. Web: Annual reports. https://www.annualreports.com/. Accessed 28 Apr 2022

  81. Web: Charity Commission for England and Wales. https://apps.charitycommission.gov.uk/showcharity/registerofcharities/RegisterHomePage.aspx. Accessed 22 Apr 2022

  82. Web: EDGAR. https://www.sec.gov/edgar.shtml. Accessed 22 Apr 2022

  83. Web: Industry Documents Library. https://www.industrydocuments.ucsf.edu/. Accessed 22 Apr 2022

  84. Web: NIST Special Database 2. https://www.nist.gov/srd/nist-special-database-2. Accessed 25 Apr 2022

  85. Web: Open Government Data (OGD) Platform India. https://visualize.data.gov.in/. Accessed 22 Apr 2022

  86. Web: Public Inspection Files. https://publicfiles.fcc.gov/. Accessed 22 Apr 2022

  87. Web: Scitsr. https://github.com/Academic-Hammer/SciTSR. Accessed 26 Apr 2022

  88. Web: S &P 500 Companies with Financial Information. https://www.spglobal.com/spdji/en/indices/equity/sp-500/#data. Accessed 25 Apr 2022

  89. Web: Statistics of Common Crawl Monthly Archives – MIME Types. https://commoncrawl.github.io/cc-crawl-statistics/plots/mimetypes. Accessed 22 Apr 2022

  90. Web: Tablebank. https://github.com/doc-analysis/TableBank. Accessed 26 Apr 2022

  91. Web: World Bank Open Data. https://data.worldbank.org/. Accessed 22 Apr 2022

  92. Xu, Y., Li, M., Cui, L., Huang, S., Wei, F., Zhou, M.: LayoutLM: pre-training of text and layout for document image understanding. In: Gupta, R., Liu, Y., Tang, J., Prakash, B.A. (eds.) Proceedings on KDD, pp. 1192–1200 (2020). https://doi.org/10.1145/3394486.3403172

  93. Xu, Y., et al.: LayoutXLM: multimodal pre-training for multilingual visually-rich document understanding. CoRR (2021)

    Google Scholar 

  94. Yi, J., Sundaresan, N.: A classifier for semi-structured documents. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 340–344 (2000)

    Google Scholar 

  95. Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12113–12122 (2020)

    Google Scholar 

  96. Yu, W., Lu, N., Qi, X., Gong, P., Xiao, R.: PICK: processing key information extraction from documents using improved graph learning-convolutional networks. In: Proceedings of ICPR, pp. 4363–4370. IEEE (2020). https://doi.org/10.1109/ICPR48806.2021.9412927

  97. Zhao, X., Wu, Z., Wang, X.: CUTIE: learning to understand documents with convolutional universal text information extractor. CoRR abs/1903.12363 (2019). http://arxiv.org/abs/1903.12363

  98. Zheng, X., Burdick, D., Popa, L., Zhong, X., Wang, N.X.R.: Global table extractor (GTE): a framework for joint table identification and cell structure recognition using visual context. In: Proceedings of WACV, pp. 697–706. IEEE (2021). https://doi.org/10.1109/WACV48630.2021.00074

  99. Zhong, X., ShafieiBavani, E., Jimeno Yepes, A.: Image-based table recognition: data, model, and evaluation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12366, pp. 564–580. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58589-1_34

    Chapter  Google Scholar 

  100. Zhong, X., Tang, J., Jimeno-Yepes, A.: PubLayNet: largest dataset ever for document layout analysis. In: Proceedings of ICDAR, pp. 1015–1022. IEEE, September 2019. https://doi.org/10.1109/ICDAR.2019.00166

  101. Zhu, F., et al.: TAT-QA: a question answering benchmark on a hybrid of tabular and textual content in finance. In: Zong, C., Xia, F., Li, W., Navigli, R. (eds.) Proceedings International Joint Conference on Natural Language Processing, pp. 3277–3287 (2021). https://doi.org/10.18653/v1/2021.acl-long.254

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Milan Šulc .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Skalický, M., Šimsa, Š., Uřičář, M., Šulc, M. (2022). Business Document Information Extraction: Towards Practical Benchmarks. In: Barrón-Cedeño, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2022. Lecture Notes in Computer Science, vol 13390. Springer, Cham. https://doi.org/10.1007/978-3-031-13643-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-13643-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-13642-9

  • Online ISBN: 978-3-031-13643-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics