Entrée gratuite sur inscription uniquement : Formulaire d'inscription
L’IA embarquée au service des données hétérogènes : le projet TIFAIFAI
Valérie Renault et Florent Carlier, CREN, maîtres de conférences à Le Mans Université
Table ronde animée par Josep Crego (Directeur de la recherche de Systran) et Romain Sambarino d’Allomédia
Avec Djamé Seddah, Laurent Besacier, Hatim Bourfoune (Ingénieur de recherche) IDRIS, Maxence Jeunesse
Table ronde animée par Céline Beguin
Avec Magali Bouteille-Brigant, Jean-Philippe Cointet, Vincent Courboulay, Karën Fort, Félicien Vallet
télécharger le programme complet ici
Watch live streaming here
08:00-09:00 Arrival and coffee (IC2 Building)
09:00-09:05 Welcome Remarks (Anthony Larcher)
09:05-09:20 The road to JSALT 2023 (Sanjeev Khudanpur)
09:20-09:40 Team Presentation (Lucas Ondel)
09:40-09:50 Open Discussion
09:50-10:10 Team Presentation (Kenneth Church)
10:10-10:20 Open Discussion
10:20-10:30 Break
10:30-10:50 Team Presentation (Marie Tahon)
10:50-11:00 Open Discussion
11:00-11:20 Team Presentation (Petr Schwarz)
11:20-11:30 Open Discussion
11:30-12:00 Workshop laboratory orientation/set-up
12:00-13:30 Lunch Break
13:30-15:30 Team meetings (Rooms TBD) —> workshop laboratory
15:30-16:00 JSALT Steering Committee Meeting (Team leaders and JSALT organizers)
16:00-18:00 Focus time (workshop laboratory)
18:00-23:00 Welcome Reception (Outside IC2 Building)
IC2 building, auditorium
The Data Collection and Annotation Day will take place on the 29th of June at the University of Le Mans. This day is dedicated to exchange the needs, challenges, and tools of researcher, archivers, broadcasters and data producers about data collection and annotation in the human language technology community. We aim at fostering a collaboration between academia and industry in terms of leveraging machine learning research and human-in-the-loop to efficiently create, manage, and evaluate data related processes. The event will cover the following main topics:
9h00 - 9h15 Opening
9h15 - 9h45 Language data collection and distribution: achievements, challenges and looking ahead
Speaker: Denise DiPersio (Associate DirectorLDC)
Bio: The Linguistic Data Consortium, hosted by the University Pennsylvania, develops and distributes language resources to organizations around the globe. Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership, and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property, and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.
Summary: This presentation explores the data collection approach employed by LDC for its projects, highlighting LDC's role as a data repository within the community and the advantages of sharing and reusing data. In light of LDC's recent 30th anniversary, the evolving landscape of data production and sharing is also examined. Topics include the transformation of LDC's model, the significance of web data, the utilization of crowdsourcing for collection and annotation, various data distribution options, and the influence of platforms like Kaggle, Hugging Face, and Github on data production and distribution. Additionally, the speech addresses the growing awareness of privacy concerns in human subjects collections and the expanding user community for language resources. Finally, the implications of generative technologies for traditional methods of data collection, annotation, and evaluation will be discussed.
9h45 - 10h15 Common Voice
Speaker: Rebecca Ryakitimbo (Mozilla Foundation)
Bio: Rebecca is a techie, writer and researcher. She is currently a community engagement fellow at Mozilla, working towards building an open voice dataset in Kiswahili to promote voice technology. She is working on establishing and supporting diverse Kiswahili language and tech communities along axes of gender, age, regional origin, accent and vernacular usage towards building an open voice dataset in Kiswahili. Before joining Mozilla, Rebecca has been an Internet Society fellow, an Afrisig fellow, a Google Policy fellow, a national geographic explorer and a digital rights program officer at Paradigm Initiative. Rebecca is an enthusiast of digital inclusion and the founder of the first women SIG "Arusha women school of Internet governance”.
Summary: Presentation of the Common Voice: crowdsourcing for underesouced languages and gendering Voice technology.
10h15 - 10h30 Coffee Break
10h30 - 11h15 Services around Language Resource Production and Sharing
Part 1: LR production and data management
Part 2: Legal challenges of data production and distribution
Speaker: Victoria Arranz (Head of R&D) and Mickaël Rigault (Legal Counsel) (ELDA)
Bio: Victoria Arranz is responsible for national and international projects, collaborating with both industry and academia. She holds a PhD in Computational Linguistics and an MSc in Machine Translation, both from the University of Manchester, Institute of Science and Technology. She worked as a researcher and lecturer in NLP before joining ELDA, where she has continued her work towards the creation (collection, annotation, definition of specifications), description (metadata) and sharing of language resources, respecting the full lifecycle of a LR and defining procedures on a user-oriented basis. She is also the coordinator of ELDA’s participation in the European Language Data Space initiative, which aims at establishing a European platform and marketplace for the collection, creation and sharing of language data.
Mickaël Rigault: Mickaël is a legal specialist working in the field of language data at ELDA. He provides legal expertise on intellectual property, data management and the protection of personal data on a day-to-day basis for business and research activities. He takes care of the legal aspects for European infrastructures such as ELRC, ELG and the current Language Data Space. Mickaël holds a Master’s degree in Comparative Law from Paris Ouest University, a Master’s degree in Multimedia Law from Lyon 3 (France) and a LLM degree in Media Law from the University of East Anglia (Norwich – UK).
Summary: The Evaluations and Language resources Distribution Agency (ELDA) is the operational body of the ELRA Language Resources Association. It addresses a wide variety of services around language resources (LR), such as identification, production (data collection, annotation and processing), distribution and dissemination. A major focus lies on the clearing of legal aspects behind all data-related matters. This presentation is divided into two parts: a) Victoria’s description of ELDA’s activities regarding LR production and data management, including ELDA’s participation in the setting up of data sharing infrastructures. This part will dive very particularly into the procedures behind different types of data collection and annotation; b) The second part will focus on ELDA’s legal challenges resulting from such activities. Mickaël will describe the concepts of general intellectual property and protection of personal data and their application to data collection and distribution. He will provide an overview of some licensing aspects and contractual obligations and finish with a review of current legal cases revolving around the use of generative language models such as ChatGPT.
11h15 - 12h00 Round table
12h00 - 13H30 Lunch Break
13h30 - 14h00 : Annotations for the chapterization of audio contents
Speaker: Ivan Thomas (Radio France)
Bio: Lead for R&D and Open Data at the Digital Department of Radio France. I have a technical background and I love podcasts. Our main focus in the R&D team is on trying to increase the knowledge, by using automatic metadata extraction, on the audio contents of Radio France.
Summary: Radio France is experimenting ways to segment its audio contents into chapters. One of our current approaches, in collaboration with the EBU and France 24, is to identify the key questions of the host. To this end, we are creating a corpus of annotations to train and evaluate a model of key questions.
14h00 - 14H30 Introducing Labelit: a multi-purpose and extensible annotation solution
Speakers: Karel Bourgois (Voxist, Le Voice Lab), Corentin Giraud (Batvoice), Olivier Baude (Huma-Num & CNRS)
Bio: Coming Soon
Summary: Labelit is an open-sourced solution maintained by BatvoiceAI and supported by LeVoiceLab. It has been built for flexibility and extensibility, enabling diverse annotation schemes on text, audio, video or a combination thereof.
In this presentation, we will make you a quick introduction to Voice Lab’s main goals and achievements and give you an overview of Labelit’s features, then we will focus on a use-case for research, in the context of the CNRS’s project “Ecouter-parler” (CamionLabo).
14h30 - 15H00 : INA - Collecting and generating data in the archiving process
Speaker: Emmanuel Pije (INA)
Bio: Coming Soon
Summary: Presentation of the collections held by the Institute, archiving process, service users and audiences, focus on the human cataloguing and describing activities and Work in progress : automatic processing of collections for documentary purposes
15h00 - 15h30 When AI met the archive: The case of RTVE
Speaker: Virginia Bazan Gil (FIAT/IFTA)
Bio: Coming Soon
Summary: After 3 years of testing AI solutions, in 2020, RTVE launched an AI tender to automatically catalogue the television archive. It was a long process in which the archive had performed a leading role. At this stage of the project, we can say that we have a better understanding of the technology, what works and what doesn't, mainly regarding speech technologies and NLP. This has had a direct impact on the goals and the scope of the project.
There is a selected team within our archive already using the service and evaluating the results and it is only now when the different areas have started the discussion process about how to integrate human cataloging and artificial intelligence. On a wider scale, we have been organising training sessions for all the archive staff, but we need to have a deeper discussion to get to know how they perceive the technology and whether they should take a more active role in its use. We have also learnt from the performance of image recognition that it is a very powerful tool but requires some customization, which we plan to develop in an identity recognition project based on TVE pioneers. This will be useful not only as an integrating tool in the archive, but also as a means to prove its utility and enhance its visibility and promotion.
Finally, we need to understand the impact of automatic generated metadata in our system, since so far we have had no feedback from our users.
15h30 - 16h00 Coffee Break
16h00 - 16h30 Challenges of Annotating code switched speech for non-standard orthography languages
Speaker: Fethi Bougares (Elyadata)
Bio: Fethi BOUGARES is Head of Research at Elyadata. His research interests include Speech Recognition and Machine Translation in particular for low resource languages (in particular Arabic Dialects). He holds a PhD degree in Computer Science from the Le Mans University. Fethi has more than 60 published papers in international peer-reviewed conferences and journals and is an active member of ACL Special Interest Group on Arabic Natural Language Processing.
Summary : Annotation of spoken languages that do not have spelling standards is challenging. This being the case of all the Arabic Dialects where the same word could be represented differently across various native speakers. We tackled this problem on two fronts simultaneously: first, we developed a detailed and dialect specific annotation guidelines. Second we have put in place multi-level annotation and validation process. During this talk I will introduce the challenges of the speech annotation of Arabic dialects as example of non-standard orthography languages with code switching phenomena. I will introduce as well our inhouse annotation tool designed to assit the annotation process.
16h30 - 17h00 : Plain X: Learnings from research and development of a 4-in-1 content adaptation platform
Speaker: Mirko Lorenz (Deutsche Welle)
Bio: Mirko Lorenz is an innovation manager and a member of the Deutsche Welle Research- and Cooperation team since 2008. He has worked in multiple EU research projects, with a focus to develop new technologies to simplify work in the newsroom. He is a co-founder of Datawrapper, a tool enabling data journalism which is in use in >500 large newsrooms worldwide. Since 2021 he is involved with plain X, towards international roll-out to organizations in need of content adaptation.
Summary: The presentation will cover learnings and insights from development and provide a demo how plain X works – in terms of speed, quality, customization options, etc. The goal is an exchange with language experts on how integrated, AI driven platforms can be utilized effectively. Plain X is an AI-driven, 4-in1 platform for content adaptation, enabling to complete four tasks: Transcription, translation, subtitles and (synthetic) voice over. Three aspects make plain X noteworthy and different: Firstly, the tool allows to connect to multiple language engines. Secondly during development many workflow features have been integrated to ensure quick work in combination with quality output. Thirdly the tool emphasizes the role of “human in the loop”: Any file can be shared with a reviewer, in each step, to achieve the highest possible quality of the content.
The tool was created based on several years of research and development in HLT (Human Language Technologies). The goal: Help any organization with a growing workload for content adaptation to perform these tasks faster, with high output quality. The tool is jointly Deutsche Welle (Germany) and Priberam (Portugal). Since 2022 plain X is in professional, daily use by Deutsche Welle editors, simplifying and speeding up content adaptation.
17h00 - 17h45 Round table led by Alfonso Ortega
17h45 - 18h00 Closing
Evaluation Beyond Computation, Assessing the Impacts of Human Language Technologies
Biography: Craig Greenberg is a Mathematician at the National Institute of Standards and Technology (NIST), where he oversees NIST’s Speaker Recognition Evaluation series and Language Recognition Evaluation series, and researches the measurement and evaluation of Artificial Intelligence (AI) and other topics in AI and machine learning. Dr. Greenberg received his PhD in 2020 from the University of Massachusetts Amherst with a dissertation on uncertainty and exact and approximate inference in flat and hierarchical clustering, his M.S. degree in Computer Science from University of Massachusetts Amherst in 2016, his M.S. degree in Applied Mathematics from Johns Hopkins University in 2012, his B.A. (Hons.) degree in Logic, Infortion, & Computation from the University of Pennsylvania in 2007, and his B.M. degree in Percussion Performance from Vanderbilt University in 2003. Among his accolades, Dr Greenberg has received two official letters of commendation for his contribution to speaker recognition evaluation.
Reva Schwartz is a research scientist in the Information Technology Laboratory at the National Institute of Standards and Technology (NIST). She serves as Principal Investigator on Bias in Artificial Intelligence for NIST’s Trustworthy and Responsible AI program. Her research focuses on organizational practices for subject matter experts, and the role of expertise and expert judgment in socio-technical systems. She has advised federal agencies about how experts interact with automation to make sense of information in high-stakes settings. Her background is in linguistics and experimental phonetics and includes a forensic science posting at the United States Secret Service, advising forensic science practice while at NIST, a temporary duty assignment at the National Security Agency, and adjunct researcher at the Johns Hopkins University Human Language Technology Center of Excellence.
Abstract: The evaluation of speech and language technologies has a long history of driving the technologies forward and measuring the state-of-the-art. As these technologies become ever more pervasive in everyday life, there is an increasing need for understanding how they operate in real-world settings and, critically, the impacts that they have on individuals, communities, organizations, and societies – whether or not they work as designed. In this talk, we will highlight some of the basic needs and fundamental challenges in assessing the impacts of human language technologies and describe nascent efforts underway to develop the metrology needed to measure AI technologies in context, including and beyond performance accuracy.
Deep learning for protein structure discovery
Biography: Thibaut Véry received a PhD of Université de Lorraine in theoretical chemistry in 2012 on modelisation of the photochemistry of complex biological systems. Since 2016 he is member of the user support team of the french supercomputing center IDRIS where he is in charge of the atomistic simulation both for High Performance computing and Artificial Intelligence topics. His job is to help users to get the best performance from the supercomputer thanks to several actions such as training courses, software management, documentation, etc.
Abstract: Proteins are associated with almost all biological processes. For instance, they can carry oxygen, make muscles move, perform chemical reactions, or serve as keys to enter cells for viruses. Chemically, up to thousands of small molecules (aminoacids) from a set of 22 react together to form long chains. An analogy with Natural Language Processing can be drawn if we view aminoacids as a sequence of letters forming a sentence. As with a sentence, we need to find how the parts of the sequence interact to get the meaning. For proteins, we have physical interactions driving the folding of the aminoacids into a 3D structure.
Only by finding the structure can we decipher the exact role of a protein. Researchers have been working on this since the 1920s. Several experimental setups are available to get this information. However, keep in mind that preparing the proteins is difficult due to requirements of the experimental methods. With the increased power of computers, it became possible to use them to find the structures. The toolbox of numerical methods includes machine learning.
Every other year, a competition assesses the quality of numerical methods on unknown structures. For the 2018 edition, DeepMind (Alphabet) proposed a Deep Learning model based on Transformers: Alphafold. Alphafold beat the other models thanks to an increase in the quality of the predictions. The real breakthrough came in 2020 when Alphafold2 version entered the competition. The results were impressive because the quality of many predictions was comparable to experimental ones.
This presentation introduces the concepts needed to understand how to discover protein structures. We will focus on Alphafold2 model to understand how it gets such good results.
Inequal by design ? How to think together about closing gender gap in data
Biography: Prof. Isabelle Collet is a former computer scientist. For the past 20 years, her research has focused on narrowing the gender gap in STEM (particularly computer science) and developing inclusion strategies for women in higher education. She directs the educational research group Gender and Intersectional Relations (G-RIRE) at the University of Geneva. In 2019, she publishes "Les oubliées du numérique”.
Abstract: Today, women account for less than 15% of IT students. This virtual absence of gender diversity in the digital sector has consequences not only for gender equality in employment, but also for the inclusiveness and performance of digital applications. The great uniformity of the developer and manager population (white males from the middle or upper classes) tends to obscure the needs and characteristics of other populations, particularly women.
The aim of this conference is to expose the gender bias of artificial intelligence, and then to consider the educational solutions that need to be put in place to enable the whole of society to understand the challenges of data.
Textless NLP: towards language processing from raw audio
Biography: Emmanuel Dupoux is professor at the Ecole des Hautes Etudes en Sciences Sociales (EHESS) and Research Scientist at Meta AI Labs. He directs the Cognitive Machine Learning team at the Ecole Normale Supérieure (ENS) in Paris and INRIA. His education includes a PhD in Cognitive Science (EHESS), a MA in Computer Science (Orsay University) and a BA in Applied Mathematics (Pierre & Marie Curie University). His research mixes developmental science, cognitive neuroscience, and machine learning, with a focus on the reverse engineering of infant language and cognitive development using unsupervised or weakly supervised learning. He is the recipient of an Advanced ERC grant, co-organizer of the Zero Ressource Speech Challenge series (2015--2021), the Intuitive Physics Benchmark (2019) and led in 2017 a Jelinek Summer Workshop at CMU on multimodal speech learning. He is a CIFAR LMB and a ELLIS Fellow. He has authored 150 articles in peer reviewed outlets in cognitive science and language technology.
Abstract: The oral (or gestural) modality is the most natural channel for human language interactions. Yet, language technology (Natural Language Processing, NLP) is primarily based on the written modality, and requires massive amounts of textual resources for the training of useful language models. As a result, even fundamentally speech-first applications like speech-to-speech translation or spoken assistants like Alexa, or Siri, are constructed in a Frankenstein way, with text as an intermediate representation between the signal and language models. Besides this being inefficient, This has two unfortunate consequences: first, only a small fraction of the world's languages that have massive textual repositories can be addressed by current technology. Second, even for text-rich languages, the oral form mismatches the written form at a variety of levels, including vocabulary and expressions. The oral medium also contains typically unwritten linguistic features like rhythm and intonation (prosody) and rich paralinguistic information (non verbal vocalizations like laughter, cries, clicks, etc, nuances carried through changes in voice qualities) which are therefore inaccessible to language models. But is this a necessity? Could we build language applications directly from the audio stream without using any text? In this talk, we review recent breakthroughs in representation learning and self-supervised techniques which have made it possible to learn latent linguistic units directly from audio which unlock the learning of generative language models without the use of any text. We show that these models can capture heretofore unaddressed nuances of the oral language including in a dialogue context, opening up the possibility of speech-to-speech textless NLP applications. We outline existing technical challenges to achieve this goal, including challenges to build expressive oral language datasets at scale.
Foundational Problems in ASR
Biography: Eshan Variani is a research scientist in the speech and language algorithm team in Google research. His general research interests are information theory, machine learning and speech recognition. He has been with Google since 2015 and before that he was a PhD student at Johns Hopkins university.
Abstract: This talk focuses on some foundational problems in practical speech recognition and discusses some solutions for each of these problems.
Introduction to speaker identification and deep fake context
Biography: Petr Schwarz [PhD, Brno University of Technology, 2009] is senior researcher in BUT Speech@FIT at the Faculty of Information Technology (FIT) of BUT. He has broad experience in speech technologies ranging from voice biometry, speech transcription, keyword spotting, to language identification. At BUT, Petr worked on many national, EU, and US research projects and many international technology evaluation campaigns like those organized by the U.S. National Institute of Standards and Technology (NIST). In 2006, Petr co-founded Phonexia, and served for several years as its CEO and CTO. Phonexia sells speech technologies to more than 60 countries. Currently, he is working on conversational AI technologies and security/defense applications of voice biometry.
Abstract: Petr will present how a speaker identification system based on the ResNet neural network architecture is designed. He will also tell you about basic principles used in speech synthesis, voice morphing, and speech codecs and explain how speaker identification, speech synthesis, and speech codecs can affect each other in the real world.
Extracting speaker and emotion information from self-supervised speech models
Biography: Themos Stafylakis received the B.Eng. degree from the National Technical University of Athens, Greece, in 2004, the M.Sc. degree in communication and signal processing from Imperial College London, London, U.K., in 2005, and the Ph.D. degree in speaker diarization for broadcast news from the National Technical University of Athens, Athens, Greece, in 2011. In 2011, he joined Centre de Recherche Informatique de Montréal: (Montréal, QC, Canada) as a Postdoc Researcher on speaker recognition. In 2016, he joined the Computer Vision Laboratory, University of Nottingham, Nottingham, U.K., as a Marie Curie Research Fellow. His main research interests are audiovisual speech and speaker recognition, machine learning and deep neural networks.
Abstract: Themos will present hot topics in speaker identification research emphasizing self-supervised models..
What kind of errors are made by neural generation models and why?
In this talk, I will present our work on assessing, analysing and explaining the output of text generation models that are grounded in Knowledge Graphs (KG).
Focusing on KG-to-Text encoder-decoder models i.e., generation models which aim to verbalise the content of a Knowledge Graph, I will discuss missing information i.e., information that is present in the input but not in the output. I will introduce a novel evaluation metric for assessing to what extent generation models omit input information and show that, while this metric correlates with human scores, correlation varies with the specifics of the human evaluation setup. This suggests that an automatic metric might be more reliable, as less subjective and more focused on correct verbalisation of the input, than human evaluation measures. I will then go on to demonstrate, using both a parametric and a non-parametric probe, that omissions are already "visible" in the encoder representations i.e., can be tracked back to the encoder.
In the second part of the talk, I will discuss conversational question generation and show that grounding dialog in knowledge allows for a detailed analysis of the model behaviour in terms of well-formedness, relevance, semantic adequacy and dialog coherence.
NLP at scale made easy on Jean Zay (PLM4ALL)
Biography: Hatim Bourfoune is a research engineer with a passion for Artificial Intelligence who has been working for several years in the field of Deep Learning. He has been working for more than two years at IDRIS in the user support team specialised in AI, in particular on optimisation work on very large models such as Transformers. His flagship project was his work on the development of the BLOOM language model, where he participated in the evaluation of this model as well as in its enhancement (Finetuning, RLHF...). In addition to the support he provides to Jean Zay users, he regularly gives lectures and courses on Deep Learning topics.
Nathan Cassereau is an engineer specialised in artificial intelligence and distributed computing. After graduating from Imperial College London, he joined IDRIS, the French institute operating Jean Zay, a powerful supercomputer dedicated to high performance computing and artificial intelligence research. At IDRIS, Nathan helps researchers optimise their code and their use of the supercomputer. He was also part of a team working on the evaluation and development of large language models, such as BLOOM.
Pierre Cornette is a dedicated research engineer with a strong background in supporting several AI research projects at IDRIS. With access to one of the most powerful supercomputers in Europe, Jean Zay, Pierre brings knowledge on the exploitation of computational resources for training deep learning models. From image and speech recognition to natural language understanding, Pierre's knowledge covers many subfields of AI.
Abstract: The conference aims at presenting the different tools to use and train language models in an optimized way. We will see practical examples, a presentation of exploitation tools (Accelerate, DeepSpeed, Megatron...) and the efforts made by IDRIS to make these tools easy to use.
Peer-reviewed journal of AI-agents: a challenging LLM competition
Biography: Isabelle Guyon recently joined Google Research as a director of research. She is also professor of artificial intelligence at Université Paris-Saclay (Orsay). Her areas of expertise include computer vision, bioinformatics, and power systems. She is best known for being a co-inventor of Support Vector Machines. Her recent interests are in automated machine learning, meta-learning, data-centric AI, and large language model. She has been a strong promoter of challenges and benchmarks, and is president of ChaLearn, a non-profit dedicated to organizing machine learning challenges. She is community lead of Codalab competitions, a challenge platform used both in academia and industry. She co-organized the “Challenges in Machine Learning Workshop” @ NeurIPS between 2014 and 2019, launched the "NeurIPS challenge track" in 2017 while she was general chair, and pushed the creation of the "NeurIPS datasets and benchmark track" in 2021, as a NeurIPS board member.
Abstract: Significant recent advancements have occurred in the field of question answering, particularly within LLM-powered chatbots and augmented search engines like ChatGPT, Bard, the new Bing, and similar platforms. These advancements offer promising prospects for accelerating the acquisition of academic knowledge across various disciplines, including science, social sciences, education, and other fields of study. In academia, progress is typically facilitated through a peer-reviewed literature system. This has inspired us to emulate this process using AI agents.
In our envisioned scenario, AI agents would emulate a peer-reviewed journal. Human contributors would assume roles as editors or meta-reviewers, providing prompts (call-for-papers) to AI authors and selecting papers for publication based on the evaluations from AI reviewers, as well as their own judgment. Acceptance or rejection of papers would serve as a teaching signal for both AI authors and AI reviewers, motivating them to continuously enhance their skills.
To begin working towards this ambitious objective, we are organizing a challenge for the AutoML-conf'23 conference. Participants will be invited to submit AI agents capable of acting as authors and reviewers. For this challenge, we will focus on generating systematic surveys or overview papers.
To generate prompts for the challenge, we reverse-engineered numerous papers from various fields indexed in Semantic Scholars, including Computer Science, Medicine, Chemistry, Biology, Materials Science, Physics, Geology, Psychology, Art, History, Geography, Sociology, Business, Political Science, Economics, Philosophy, Mathematics, Engineering, Environmental Science, Education, Law, and Linguistics. These papers served as a basis for creating prompts such as: "Write a systematic survey or overview examining the impact of social media on mental health. This paper should explore current research on the correlation between social media usage and mental health outcomes, encompassing areas such as depression, anxiety, and self-esteem."
Furthermore, we have developed a baseline AI author and AI reviewer as a starting point. During the conference, we will present the competition design and our initial results based on the baseline models. We will also provide instructions on how to submit your first entry to the competition.
Acknowledgments: The support of INRIA, Google Research and ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022 and TAILOR EU Horizon 2020 grant 952215 are gratefully acknowledged.
Data-driven speech and language technology: from small to large (language) models
Biography: Hermann Ney is a full professor of computer science with RWTH Aachen University, Germany. Previously, he headed the Speech Recognition Group, Philips Research. His main research interests include the area of statistical methods for pattern recognition and human language technology and their specific applications to speech recognition, machine translation, and image object recognition. In particular, he has worked on dynamic programming for continuous speech recognition, language modeling, and phrase-based approaches to machine translation. He has authored and coauthored more than 600 papers in journals, books, conferences, and workshops.
Abstract: Today data-driven methods like neural networks and deep learning are widely used for speech and language processing. We will re-visit the evolution of these methods over the last 40 years and try to present a unifying view of their principles. Specifically the talk will focus on speech recognition and language modelling.
Representation and Metric Learning Advances for Face and Speaker Biometric Systems
Abstract: In recent years, as advanced as deep learning techniques are, they still have some problems when the task has limited data or a successful approach in one task is intended to be used for another task. Therefore, in this talk, I will present different alternative approaches to deal with these issues in biometric systems. First part of the talk is focused on different ways to improve the generation of signal representations for the text-dependent speaker verification task, since this task has a strong dependency of the phonetic content. While in the second part, I will explain several approaches using new training loss functions for deep neural networks that are based on the final verification metrics. These training loss functions can be applied to different verification tasks.
Multiclass audio segmentation in broadcast environments
Abstract: Audio segmentation can be defined as the division of an audio signal into smaller fragments according to a predefined set of attributes. This wide definition could include several systems depending on the set of rules considered. In this talk, the focus will be set on multiclass audio segmentation tasks, aiming to obtain a set of labels describing several tipologies in an audio signal such as speech, music and noise. During the presentation, different approaches will be presented evaluating these kind of systems in broadcast domain data.
This session will be broadcast live and questions can be submitted via chat.
Venue: IC2, Auditorium
08:00 Continental Breakfast
08:55 Call to order and housekeeping announcements
09:00 Sneak Preview: How we got ere and what we discovered
Activities and Findings of JSALT 2023 Teams (Part I)
09:30 Team Presentation: X-Diar: Explainability for diarization(Live streaming)
10:40 Stretch/Caffeine Break
10:50 Team Presentation (Continued)
12:00 Discussion, Questions and Comments
12:30 Lunch Break
14:00 Team Presentation: Better together: Text + context(Live streaming)
15:10 Stretch/Caffeine Break
15:20 Team Presentation (Continued)
16:30 Discussion, Questions and Comments
17:00 Adjourn for the day
Venue: IC2, Auditorium
08:00 Continental Breakfast
09:25 Call to order and housekeeping announcements
Activities and Findings of JSALT 2023 Teams (Part II)
09:30 Team Presentation: Automatic design of conversational models from observation of human-to-human conversation(Live streaming)
10:40 Stretch/Caffeine Break
10:50 Team Presentation (Continued)
12:00 Discussion, Questions and Comments
12:30 Lunch Break
14:00 Team Presentation: Finite state methods with modern neural Architectures for speech applications and beyond(Live streaming)
15:10 Stretch/Caffeine Break
15:20 Team Presentation (Continued)
16:30 Discussion, Questions and Comments
16:55 Concluding remarks and plans for 2024
17:00 Adjour to labs for planning post-workshop activities
Farewell Dinner - Abbaye de l'Epau
18:30 Meet up at Préfecture tram stop to go together at the Abbey (to catch the 6.35pm tram T2)
19:00 Workshop Team Photos
19:05 Cocktail Reception, Concert by the group "Para Ir"
20:00 Seated Dinner
22:30 Back to the city centre
Monday, June 12 to Friday, June 23 : Summer school
Monday, June 26, JSALT Openning day
8:00-12:00: Jsalt workshop Openning presentations (open to public)
Tuesday, June 27: Invited speaker Craig Greenberg & Reva Schwartz
11:00 Evaluation Beyond Computation, Assessing the Impacts of Human Language Technologies (live streaming)
Friday, June 30 to Sunday July, 2 : Plein Champ art Festival
Tuesday, July 4, Invited speaker: Thibault Véry
11:00 Deep learning for protein structure discovery
Wednesday, July 5
13:00 Group progress report
Thursday, July 6, Invited speaker: Isabelle Collet
11:00 Inequal by design? How to think together about gender bias in data (live streaming)
14:00-17:00 Gender equality in computer science, from secondary school to university (open to public)
Tuesday, July 11, Invited speaker: Emmanuel Dupoux
11:00 Textless NLP: towards language processing from raw audio (live streaming)
Wednesday, July 12
13:00 Group progress report
Thursday, July 13, Invited speaker: Ehsan Variani
11:00 Foundational Problems in ASR (live streaming)
Tuesday, July 18, Invited speakers: Petr Schwarz and Themos Stafylakis
11:00 Introduction to speaker identification and deep fake context. + Extracting speaker and emotion information from self-supervised speech models (live streaming)
Wednesday, July 19
13:00 Group progress report
Thursday, July 20, Invited speaker: Claire Gardent
11:00 What kind of errors are made by neural generation models and why ? (live streaming)
Tuesday, July 25, Invited speakers: Hatim Bourfoune, Nathan Cassereau & Pierre Cornette
11:00 NLP at scale made easy on Jean Zay (live streaming)
Wednesday, July 26
13:00 Group progress report
Thursday, July 27, Invited speaker: Isabelle Guyon
11:00 Peer-reviewed journal of AI-agents: a challenging LLM competition (live streaming)
Friday, July 28, Invited speaker: Hermann Ney (remote)
14:00 Data-driven speech and language technology: from small to large (language) models (live streaming)
Tuesday, August 1, Invited speakers: Victoria Mingote, Pablo Gimeno
11:00 Representation and Metric Learning Advances for Face and Speaker Biometric Systems + Multiclass audio segmentation in broadcast environments (live streaming)
Thursday, August 3, closing presentations
9:30-12:30 Explainability for diarization (live streaming and open to public)
14:00-17:00 Better together: text + context (live streaming and open to public)
Friday, August 4: Closing presentations
9:30-12:30 Automatic design of conversational models from observation of human-to-human conversation (live streaming and open to public)
14:00-17:00 Finite state methods with modern neural Architectures for speech applications and beyond (live streaming and open to public)
19:00: Closing ceremony at l'Abbaye de l'Epau
Saturday, August 5: Departure of participants