Talk 1: Unveiling the Power of AI: The Critical Role of Explainability and Frugality in Modern Companies
For years, AI has been at the heart of industrial processes, driving innovation and efficiency. However, the advent of deep learning, and with it foundational models and large language models, have greatly improved the types of applications and their results. On the flip side, these models require significant resources. Moreover, these models are often described as black boxes. Yet, explainability is one of the reasons why a product is used and sold. Knowing why it works, but more importantly, explaining the errors, helps to build trust in a system and often improve it. Frugality, on the other hand, is an ecological need but above all an economic one. Even if results can be optimal, a medium-sized company cannot afford to overuse costly systems (GPUs, data centers, large storage disks...) or compromise on overly long processing times. It is through the lens of these issues, and by using examples related to automatic document processing, that we will frame our discussion. Our goal is to demonstrate that explainable and frugal systems must be at the core of researchers' considerations so that their models are both reliable and usable by everyone.
Speaker's Bio: Aurélie Joseph got her Ph.D. in Linguistics with the support of ITESOFT company and LDI Lab (Paris Sorbonne Cité) in 2013. She has been working as Innovation Lab Manager at Yooz (France) in the large document analysis field (document classification, information extraction, flow structuration mobility, security). Leading a team of 4 engineers, she specifies needs of the company, manages projects but also develops, integrates and tests technologies with the partnership of different labs.
Talk 2: From Research to Production and Back Again
Over three years ago, we introduced TILT, the model that achieved state-of-the-art document VQA performance and won the InfohraphicsVQA ICDAR 2021 competition. It was merely the beginning of the process that led to providing a Document AI solution to thousands of customers and addressing real-world requirements on the way. This will be the story of bringing the document understanding solutions to customers, improving them during the process, and staying competitive in the rapidly evolving environment of multi-modal LLMs over the years.
Speaker's Bio: Machine Learning researcher specializing in Natural Language Processing and Document Understanding. With a strong background in the industry and several international competitions won, he has contributed to the advancement of language modeling, particularly in multi-modal models incorporating visual and layout features alongside textual information. Recently involved in the development of Snowflake Arctic and Arctic-TILT LLMs.
Talk 3: Revolutionizing Identity Verification: Deep Learning-based Facial Recognition for Identity Documents
Facial recognition systems for identity documents using deep learning have revolutionized the way we authenticate and verify personal identities. These systems leverage deep learning algorithms, particularly convolutional neural networks (CNNs), to accurately identify and match facial features from photographs on identity documents, such as passports, driver's licenses, and ID cards, with live or previously stored images. This presentation will explore how these algorithms works and introduce you to a cutting-edge software solution used in production.
Speaker's Bio: Olivier Lessard is the product manager at IMDS Software in Montreal. He completed a Bachelor's degree in Electrical Engineering and a Master's degree in Computer Engineering at Polytechnique Montreal, where he focused on 3D geometric deep learning during his Master's studies. After gaining two years of industry experience, he joined IMDS Software. IMDS Software specializes in innovative solutions for document processing, offering advanced technologies in artificial intelligence, facial recognition, and image processing. Their products include automated document capture, customer communications management and various specialized scanners for books. IMDS serves diverse sectors, including automotive safety, healthcare, administration, and cultural heritage preservation, providing tools for efficient information management, secure archiving, and precise identification.
Talk 4: LLMs, Knowledge, and Document Analysis
Speaker's Bio: Thomas Breuel is a distinguished research scientist at NVIDIA, where he works on petascale deep learning, tools for large scale distributed learning, text recognition with applications to AV and business automation, and fundamental topics at the intersection of deep learning and statistics. He has over 30 years of experience in developing and applying cutting-edge machine learning and computer vision techniques to solve real-world problems and advance scientific knowledge. Before joining NVIDIA, he was a member of the Google Brain team, where he contributed to large scale, distributed machine learning, pattern recognition, neural network, data mining, and computer vision applications. He also served as a full professor of computer science and director of the IUPR Research Lab at the University of Kaiserslautern (Germany), where he led multiple research projects with industrial and public partners, including Google, Microsoft, Smiths Detection, Deutsche Telekom, and the BMBF. He holds a Ph.D. in Computational Neuroscience from MIT and is an alumnus of Harvard University.