Webinar Takeaways: Tables are Tough: Perfecting an AI Model to Automate Table-to-XML Extraction

On February 1, Fusemachines and Data Conversation Laboratory hosted a webinar called Perfecting an AI Model to Automate Table-to-XML Extraction. The webinar was hosted by Isu Shrestha Senior Machine Learning Engineer at Fusemachines and Mark Gross President of DCL.

Extracting and structuring content from text- or image-based tables has always been challenging. Transforming tables into structured models such as XML or HTML is nearly always manual or semi-manual. 

As Isu mentioned, looking at tables through the eyes of a machine, we can see there are many challenges preventing us from simply drawing lines and capturing data from tables. The objective is to understand the structure of tables as closely as possible. 

Questions? Feel free to reach out to Isu Shrestha at isu@fusemachines.com 

Why are tables tough? 

  • Inconsistencies with content
  • Diversity of layouts
  • Complicated elements such as straddle headings, various alignments of contents, empty cells and more 
XML extraction challenges

What Fusemachines and DCL are doing to overcome these challenges 

Data Conversion Laboratory and Fusemachines created an AI model that finds and extracts information from all tables in a document using a combination of Computer Vision (CV) and Natural Language Processing (NLP). 

The webinar covered how we developed and managed a hybrid approach of rules-based processes and machine-learning to identify and extract tabular data, and augmented training data to develop an AI model that automates table-to-XML extraction. 

The webinar went into the details of why the automated process of table structure is important, why we took the approaches we did, and how one can measure the efficacy of table identification and extraction.

What kinds of tables are we talking about? 

XML extraction examples

What are the benefits of transforming tables?

Tables to XML extraction
approach from table to xml extraction

The Fusemachines | DCL approach: Multi-layered system 

  • How is a table different from regular text?
  • Do we handle all types of tables in the same way?
  • How do we make the system better over time?
XML extraction approach
Importance of XML extraction

Questions? Feel free to reach out to Isu Shrestha at isu@fusemachines.com 

Watch the webinar here