Reasoning Over Complex Documents with DocLLM with Armineh Nourbakhsh
EPISODE 672
|
FEBRUARY
19,
2024
Watch
Follow
Share
About this Episode
Today we're joined by Armineh Nourbakhsh of JP Morgan AI Research to discuss the development and capabilities of DocLLM, a layout-aware large language model for multimodal document understanding. Armineh provides a historical overview of the challenges of document AI and an introduction to the DocLLM model. Armineh explains how this model, distinct from both traditional LLMs and document AI models, incorporates both textual semantics and spatial layout in processing enterprise documents like reports and complex contracts. We dig into her team’s approach to training DocLLM, their choice of a generative model as opposed to an encoder-based approach, the datasets they used to build the model, their approach to incorporating layout information, and the various ways they evaluated the model’s performance.
About the Guest
Armineh Nourbakhsh
J.P. Morgan AI Research
Resources
- Paper: DocLLM: A layout-aware generative language model for multimodal document understanding
- Paper: DocGraphLM: Documental Graph Language Model for Information Extraction
- BizGraphQA: A Dataset for Image-based Inference over Graph-structured Diagrams from Business Domains
- Paper: Synthetic Document Generator for Annotation-free Layout Recognition
- BloombergGPT – an LLM for Finance with David Rosenberg - #639
- AI Research at JPMorgan Chase with Manuela Veloso - #371

