Logo JSALT2023
JSALT2023

The JSALT workshop Programme

26 juin, Inauguration grand public Cinéma Pathé le Mans, 9h - 17h30

Entrée gratuite sur inscription uniquement : Formulaire d'inscription

 Session 1 - Matin : 9h-12h30

  IA : De nouveaux usages permis par les progrès récents ?
  • Faut-il avoir peur du grand méchant GPT ? Démystifions les modèles de langues. 
        Djamé Seddah, , INRIA Paris
  • Réflexions sur une recherche privée en TAL à l’ère de GPT4 
        Laurent Besacier, Chef de l'équipe TAL de Naver.Labs
  • Quelques challenges pour évaluer quantitativement les sujets abordés par nos clients lors des enquêtes de satisfaction
        Maxence Jeunesse, Directeur de la recherche en IA à Covea
  • Le LIUM et JSALT : le futur de l'IA pour le texte et la parole
        Anthony Larcher, LIUM, Professeur à Le Mans Université
  • L'IA au service de la recherche - exemples à LMU
    • Explorer et annoter les corpus textuels avec l'IA : le projet READ-IT
         
      François Vignale, 3.LAM, Conservateur en chef des bibliothèques 
    • L’IA embarquée au service des données hétérogènes : le projet TIFAIFAI
         
      Valérie Renault et Florent Carlier, CREN, maîtres de conférences à Le Mans Université

Table ronde animée par Josep Crego (Directeur de la recherche de Systran) et Romain Sambarino d’Allomédia
    Avec Djamé Seddah, Laurent Besacier, Hatim Bourfoune (Ingénieur de recherche) IDRIS, Maxence Jeunesse

 

 

Session 2 - Après-midi : 14h-17h30

 IA : Quels challenges pour l'acceptabilité ?
  • La couverture de l’IA par les médias, une économie de la promesse et de la critique
        Jean-Philippe Cointet, Enseignant chercheur Sciences po Paris, MediaLab
  • L'IA responsable. Rêve ou réalité ?
        Vincent Courboulay, Maitre de conférences HDR à La Rochelle Université et Directeur scientifique de l'Institut du   Numérique Responsable
  • Les enjeux éthiques de l'IA vus par le prisme du traitement automatique des langues
        Karën Fort, Maîtresse de Conférences HDR à Sorbonne Université, Loria, Co-chair du comité d’éthique ACL
  • Enjeux juridiques et éthiques des systèmes d’IA : personnalité, vie privée, projets de législation 
        Céline Béguin, Maître de conférences au Thémis, Le Mans Université, Magali Bouteille-Brigant, Maître de conférences en droit privé – HDR, Co-directrice du Themis, Le Mans Université
  • Régulation des systèmes d’IA : l‘action de la CNIL
        Félicien Vallet, Responsable IA à la CNIL 

Table ronde animée par Céline Beguin
    Avec Magali Bouteille-Brigant, Jean-Philippe Cointet, Vincent Courboulay, Karën Fort, Félicien Vallet

 

télécharger le programme complet ici

June 26, Opening day

Watch live streaming here

Programme

 

08:00-09:00 Arrival and coffee (IC2 Building)

09:00-09:05 Welcome Remarks (Anthony Larcher)
09:05-09:20 The road to JSALT 2023 (Sanjeev Khudanpur)
09:20-09:40 Team Presentation (Lucas Ondel)
09:40-09:50 Open Discussion
09:50-10:10 Team Presentation (Kenneth Church)
10:10-10:20 Open Discussion

10:20-10:30 Break

10:30-10:50 Team Presentation (Marie Tahon)
10:50-11:00 Open Discussion
11:00-11:20 Team Presentation (Petr Schwarz)
11:20-11:30 Open Discussion
11:30-12:00 Workshop laboratory orientation/set-up

 

12:00-13:30 Lunch Break

 

13:30-15:30 Team meetings (Rooms TBD)  —>  workshop laboratory
15:30-16:00 JSALT Steering Committee Meeting (Team leaders and JSALT organizers)
16:00-18:00 Focus time (workshop laboratory)

18:00-23:00 Welcome Reception (Outside IC2 Building)

June 29, Special day on data collection and annotation, IC2 building

Special day on data collection and annotation presentations 

IC2 building, auditorium

 

The Data Collection and Annotation Day will take place on the 29th of June at the University of Le Mans. This day is dedicated to exchange the needs, challenges, and tools of researcher, archivers, broadcasters and data producers about data collection and annotation in the human language technology community. We aim at fostering a collaboration between academia and industry in terms of leveraging machine learning research and human-in-the-loop to efficiently create, manage, and evaluate data related processes.  The event will cover the following main topics:

  • Needs of data and providers: Exploring the requirements and challenges in various domains.
  • Indexing and archiving data: Discussing tools and techniques for efficiently organizing and storing collected data for future use.
  • Annotation: Delving into the process of annotating, validation issues, and the role of humans in data collection and annotation.
Program

 

9h00 - 9h15 Opening

Morning Session: Data production

9h15 - 9h45 Language data collection and distribution: achievements, challenges and looking ahead 
Speaker: Denise DiPersio (Associate DirectorLDC)

Bio: The Linguistic Data Consortium, hosted by the University Pennsylvania, develops and distributes language resources to organizations around the globe. Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership, and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property, and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Summary: This presentation explores the data collection approach employed by LDC for its projects, highlighting LDC's role as a data repository within the community and the advantages of sharing and reusing data. In light of LDC's recent 30th anniversary, the evolving landscape of data production and sharing is also examined. Topics include the transformation of LDC's model, the significance of web data, the utilization of crowdsourcing for collection and annotation, various data distribution options, and the influence of platforms like Kaggle, Hugging Face, and Github on data production and distribution. Additionally, the speech addresses the growing awareness of privacy concerns in human subjects collections and the expanding user community for language resources. Finally, the implications of generative technologies for traditional methods of data collection, annotation, and evaluation will be discussed.


9h45 - 10h15 Common Voice
Speaker: Rebecca Ryakitimbo (Mozilla Foundation)

Bio: Rebecca is a techie, writer and researcher. She is currently a community engagement fellow at Mozilla, working towards building an open voice dataset in Kiswahili to promote voice technology. She is working on establishing and supporting diverse Kiswahili language and tech communities along axes of gender, age, regional origin, accent and vernacular usage towards building an open voice dataset in Kiswahili. Before joining Mozilla, Rebecca has been an Internet Society fellow, an Afrisig fellow, a Google Policy fellow, a national geographic explorer and a digital rights program officer at Paradigm Initiative. Rebecca is  an enthusiast of digital inclusion and the founder of the first women SIG "Arusha women school of Internet governance”.

Summary: Presentation of the Common Voice: crowdsourcing for underesouced languages and gendering Voice technology.


10h15 - 10h30 Coffee Break

 

10h30 - 11h15 Services around Language Resource Production and Sharing
    Part 1: LR production and data management
    Part 2: Legal challenges of data production and distribution

Speaker: Victoria Arranz (Head of R&D) and Mickaël Rigault (Legal Counsel) (ELDA)
Bio: Victoria Arranz is responsible for national and international projects, collaborating with both industry and academia. She holds a PhD in Computational Linguistics and an MSc in Machine Translation, both from the University of Manchester, Institute of Science and Technology. She worked as a researcher and lecturer in NLP before joining ELDA, where she has continued her work towards the creation (collection, annotation, definition of specifications), description (metadata) and sharing of language resources, respecting the full lifecycle of a LR and defining procedures on a user-oriented basis. She is also the coordinator of ELDA’s participation in the European Language Data Space initiative, which aims at establishing a European platform and marketplace for the collection, creation and sharing of language data.
Mickaël Rigault: Mickaël is a legal specialist working in the field of language data at ELDA. He provides legal expertise on intellectual property, data management and the protection of personal data on a day-to-day basis for business and research activities. He takes care of the legal aspects for European infrastructures such as ELRC, ELG and the current Language Data Space. Mickaël holds a Master’s degree in Comparative Law from Paris Ouest University, a Master’s degree in Multimedia Law from Lyon 3 (France) and a LLM degree in Media Law from the University of East Anglia (Norwich – UK).

Summary: The Evaluations and Language resources Distribution Agency (ELDA) is the operational body of the ELRA Language Resources Association. It addresses a wide variety of services around language resources (LR), such as identification, production (data collection, annotation and processing), distribution and dissemination. A major focus lies on the clearing of legal aspects behind all data-related matters. This presentation is divided into two parts: a) Victoria’s description of ELDA’s activities regarding LR production and data management, including ELDA’s participation in the setting up of data sharing infrastructures. This part will dive very particularly into the procedures behind different types of data collection and annotation; b) The second part will focus on ELDA’s legal challenges resulting from such activities. Mickaël will describe the concepts of general intellectual property and protection of personal data and their application to data collection and distribution. He will provide an overview of some licensing aspects and contractual obligations and finish with a review of current legal cases revolving around the use of generative language models such as ChatGPT.

 

11h15 - 12h00 Round table

 

12h00 - 13H30 Lunch Break
 


 Afternoon Session : Annotations and Human in the loop

13h30 - 14h00 : Annotations for the chapterization of audio contents
Speaker: Ivan Thomas (Radio France)

Bio: Lead for R&D and Open Data at the Digital Department of Radio France. I have a technical background and I love podcasts. Our main focus in the R&D team is on trying to increase the knowledge, by using automatic metadata extraction, on the audio contents of Radio France.

Summary: Radio France is experimenting ways to segment its audio contents into chapters. One of our current approaches, in collaboration with the EBU and France 24, is to identify the key questions of the host. To this end, we are creating a corpus of annotations to train and evaluate a model of key questions.

 

14h00 - 14H30 Introducing Labelit: a multi-purpose and extensible annotation solution 
Speakers: Karel Bourgois (Voxist, Le Voice Lab), Corentin Giraud (Batvoice), Olivier Baude (Huma-Num & CNRS)

Bio: Coming Soon

Summary: Labelit is an open-sourced solution maintained by BatvoiceAI and supported by LeVoiceLab. It has been built for flexibility and extensibility, enabling diverse annotation schemes on text, audio, video or a combination thereof.
In this presentation, we will make you a quick introduction to Voice Lab’s main goals and achievements and give you an overview of Labelit’s features, then we will focus on a use-case for research, in the context of the CNRS’s project “Ecouter-parler” (CamionLabo).

 

14h30 - 15H00 : INA - Collecting and generating data in the archiving process
Speaker:  Emmanuel Pije (INA)

Bio: Coming Soon

Summary: Presentation of the collections held by the Institute,  archiving process, service users and  audiences, focus on the human cataloguing and describing activities and  Work in progress : automatic processing of collections for documentary purposes


15h00 - 15h30  When AI met the archive: The case of RTVE
Speaker: Virginia Bazan Gil (FIAT/IFTA)

Bio: Coming Soon

Summary: After 3 years of testing AI solutions, in 2020, RTVE launched an AI tender to automatically catalogue the television archive. It was a long process in which the archive had performed a leading role. At this stage of the project, we can say that we have a better understanding of the technology, what works and what doesn't, mainly regarding speech technologies and NLP. This has had a direct impact on the goals and the scope of the project. 
There is a selected team within our archive already using the service and evaluating the results and it is only now when the different areas have started the discussion process about how to integrate human cataloging and artificial intelligence. On a wider scale, we have been organising training sessions for all the archive staff, but we need to have a deeper discussion to get to know how they perceive the technology and whether they should take a more active role in its use. We have also learnt from the performance of image recognition that it is a very  powerful tool but requires some customization, which we plan to develop in an identity recognition project based on TVE pioneers. This will be useful not only as an integrating tool in the archive, but also as a means to prove its utility and enhance its visibility and promotion.
Finally, we need to understand the impact of automatic generated metadata in our system, since so far we have had no feedback from our users.

15h30 - 16h00 Coffee Break

 

16h00 - 16h30 Challenges of Annotating code switched speech for non-standard orthography languages
Speaker: Fethi Bougares (Elyadata)

Bio: Fethi BOUGARES is Head of Research at Elyadata. His research interests include Speech Recognition and Machine Translation in particular for low resource languages (in particular Arabic Dialects). He holds a PhD degree in Computer Science from the Le Mans University. Fethi has more than 60 published papers in international peer-reviewed conferences and journals and is an active member of ACL Special Interest Group on Arabic Natural Language Processing.

Summary : Annotation of spoken languages that do not have spelling standards is challenging. This being the case of all the Arabic Dialects where the same word could be represented differently across various native speakers. We tackled this problem on two fronts simultaneously: first, we developed a detailed and dialect specific annotation guidelines. Second we have put in place multi-level annotation and validation process. During this talk I will introduce the challenges of the speech annotation of Arabic dialects as example of non-standard orthography languages with code switching phenomena. I will introduce as well our inhouse annotation tool designed to assit the annotation process.

 

16h30 - 17h00 :  Plain X: Learnings from research and development of a 4-in-1 content adaptation platform
Speaker: Mirko Lorenz (Deutsche Welle)
Bio: Mirko Lorenz is an innovation manager and a member of the Deutsche Welle Research- and Cooperation team since 2008. He has worked in multiple EU research projects, with a focus to develop new technologies to simplify work in the newsroom. He is a co-founder of Datawrapper, a tool enabling data journalism which is in use in >500 large newsrooms worldwide. Since 2021 he is involved with plain X, towards international roll-out to organizations in need of content adaptation. 

Summary: The presentation will cover learnings and insights from development and provide a demo how plain X works – in terms of speed, quality, customization options, etc. The goal is an exchange with language experts on how integrated, AI driven platforms can be utilized effectively. Plain X is an AI-driven, 4-in1 platform for content adaptation, enabling to complete four tasks: Transcription, translation, subtitles and (synthetic) voice over. Three aspects make plain X noteworthy and different: Firstly, the tool allows to connect to multiple language engines. Secondly during development many workflow features have been integrated to ensure quick work in combination with quality output. Thirdly the tool emphasizes the role of “human in the loop”: Any file can be shared with a reviewer, in each step, to achieve the highest possible quality of the content.
The tool was created based on several years of research and development in HLT (Human Language Technologies). The goal: Help any organization with a growing workload for content adaptation to perform these tasks faster, with high output quality. The tool is jointly Deutsche Welle (Germany) and Priberam (Portugal). Since 2022 plain X is in professional, daily use by Deutsche Welle editors, simplifying and speeding up content adaptation.

 

17h00 - 17h45 Round table led by Alfonso Ortega

 

17h45 - 18h00 Closing

Plenary lectures by invited speakers

Programme of plenary lectures

June, Tuesday 27th: Craig Greenberg and Reva Schwartz, NIST

Evaluation Beyond Computation, Assessing the Impacts of Human Language Technologies

Biography: Craig Greenberg is a Mathematician at the National Institute of Standards and Technology (NIST), where he oversees NIST’s Speaker Recognition Evaluation series and Language Recognition Evaluation series, and researches the measurement and evaluation of Artificial Intelligence (AI) and other topics in AI and machine learning. Dr. Greenberg received his PhD in 2020 from the University of Massachusetts Amherst with a dissertation on uncertainty and exact and approximate inference in flat and hierarchical clustering, his M.S. degree in Computer Science from University of Massachusetts Amherst in 2016, his M.S. degree in Applied Mathematics from Johns Hopkins University in 2012, his B.A. (Hons.) degree in Logic, Infortion, & Computation from the University of Pennsylvania in 2007, and his B.M. degree in Percussion Performance from Vanderbilt University in 2003. Among his accolades,  Dr Greenberg has received two official letters of commendation for his contribution to speaker recognition evaluation.

Reva Schwartz is a research scientist in the Information Technology Laboratory at the National Institute of Standards and Technology (NIST). She serves as Principal Investigator on Bias in Artificial Intelligence for NIST’s Trustworthy and Responsible AI program. Her research focuses on organizational practices for subject matter experts, and the role of expertise and expert judgment in socio-technical systems. She has advised federal agencies about how experts interact with automation to make sense of information in high-stakes settings. Her background is in linguistics and experimental phonetics and includes a forensic science posting at the United States Secret Service, advising forensic science practice while at NIST, a temporary duty assignment at the National Security Agency, and adjunct researcher at the Johns Hopkins University Human Language Technology Center of Excellence.

Abstract: The evaluation of speech and language technologies has a long history of driving the technologies forward and measuring the state-of-the-art.  As these technologies become ever more pervasive in everyday life, there is an increasing need for understanding how they operate in real-world settings and, critically, the impacts that they have on individuals, communities, organizations, and societies – whether or not they work as designed.  In this talk, we will highlight some of the basic needs and fundamental challenges in assessing the impacts of human language technologies and describe nascent efforts underway to develop the metrology needed to measure AI technologies in context, including and beyond performance accuracy.

 

July, Tuesday 4th: Thibaut Very, IDRIS

Deep learning for protein structure discovery

Biography: Thibaut Véry received a PhD of Université de Lorraine in theoretical chemistry in 2012 on modelisation of the photochemistry of complex biological systems. Since 2016 he is member of the user support team of the french supercomputing center IDRIS where he is in charge of the atomistic simulation both for High Performance computing and Artificial Intelligence topics. His job is to help users to get the best performance from the supercomputer thanks to several actions such as training courses, software management, documentation, etc.

Abstract: Proteins are associated with almost all biological processes. For instance, they can carry oxygen, make muscles move, perform chemical reactions, or serve as keys to enter cells for viruses. Chemically, up to thousands of small molecules (aminoacids) from a set of 22 react together to form long chains. An analogy with Natural Language Processing can be drawn if we view aminoacids as a sequence of letters forming a sentence. As with a sentence, we need to find how the parts of the sequence interact to get the meaning. For proteins, we have physical interactions driving the folding of the aminoacids into a 3D structure.

Only by finding the structure can we decipher the exact role of a protein. Researchers have been working on this since the 1920s. Several experimental setups are available to get this information. However, keep in mind that preparing the proteins is difficult due to requirements of the experimental methods. With the increased power of computers, it became possible to use them to find the structures. The toolbox of numerical methods includes machine learning.

Every other year, a competition assesses the quality of numerical methods on unknown structures. For the 2018 edition, DeepMind (Alphabet) proposed a Deep Learning model based on Transformers: Alphafold. Alphafold beat the other models thanks to an increase in the quality of the predictions. The real breakthrough came in 2020 when Alphafold2 version entered the competition. The results were impressive because the quality of many predictions was comparable to experimental ones.

This presentation introduces the concepts needed to understand how to discover protein structures. We will focus on Alphafold2 model to understand how it gets such good results.

 

July, Thursday 6th: Isabelle Collet, Université de Genève

Inequal by design ? How to think together about closing gender gap in data

Biography: Prof. Isabelle Collet is a former computer scientist. For the past 20 years, her research has focused on narrowing the gender gap in STEM (particularly computer science) and developing inclusion strategies for women in higher education. She directs the educational research group Gender and Intersectional Relations (G-RIRE) at the University of Geneva. In 2019, she publishes "Les oubliées du numérique”.


Abstract: Today, women account for less than 15% of IT students. This virtual absence of gender diversity in the digital sector has consequences not only for gender equality in employment, but also for the inclusiveness and performance of digital applications. The great uniformity of the developer and manager population (white males from the middle or upper classes) tends to obscure the needs and characteristics of other populations, particularly women. 

The aim of this conference is to expose the gender bias of artificial intelligence, and then to consider the educational solutions that need to be put in place to enable the whole of society to understand the challenges of data.

 

July, Tuesday 11th: Emmanuel Dupoux, EHESS, Laboratoire de Sciences Cognitives et Psycholinguistique (LSCP).

Textless NLP: towards language processing from raw audio

Biography: Emmanuel Dupoux is professor at the Ecole des Hautes Etudes en Sciences Sociales (EHESS) and Research Scientist at Meta AI Labs. He directs the Cognitive Machine Learning team at the Ecole Normale Supérieure (ENS) in Paris and INRIA.  His education includes a PhD in Cognitive Science (EHESS), a MA in Computer Science (Orsay University) and a BA in Applied Mathematics (Pierre & Marie Curie University). His research mixes developmental science, cognitive neuroscience, and machine learning, with a focus on the reverse engineering of infant language and cognitive development using unsupervised or weakly supervised learning. He is the recipient of an Advanced ERC grant, co-organizer of the Zero Ressource Speech Challenge series (2015--2021), the Intuitive Physics Benchmark (2019) and led in 2017 a Jelinek Summer Workshop at CMU on multimodal speech learning. He is a CIFAR LMB and a ELLIS Fellow. He has authored 150 articles in peer reviewed outlets in cognitive science and language technology. 

Abstract: The oral (or gestural) modality is the most natural channel for human language interactions. Yet, language technology (Natural Language Processing, NLP) is primarily based on the written modality, and requires massive amounts of textual resources for the training of useful language models.  As a result, even fundamentally speech-first applications like speech-to-speech translation or spoken assistants like Alexa, or Siri, are constructed in a Frankenstein way, with text as an intermediate representation between the signal and language models. Besides this being inefficient, This has two unfortunate consequences: first, only a small fraction of the world's languages that have massive textual repositories can be addressed by current technology. Second, even for text-rich languages, the oral form mismatches the written form at a variety of levels, including vocabulary and expressions. The oral medium also contains typically unwritten linguistic features like rhythm and intonation (prosody) and rich paralinguistic information (non verbal vocalizations like laughter, cries, clicks, etc, nuances carried through changes in voice qualities) which are therefore inaccessible to language models. But is this a necessity? Could we build language applications directly from the audio stream without using any text? In this talk, we review recent breakthroughs in representation learning and self-supervised techniques which have made it possible to learn latent linguistic units directly from audio which unlock the learning of generative language models without the use of any text. We show that these models can capture heretofore unaddressed nuances of the oral language including in a dialogue context, opening up the possibility of speech-to-speech textless NLP applications. We outline existing technical challenges to achieve this goal, including challenges to build expressive oral language datasets at scale.

 

July, Thursday 13th: Ehsan Variani, Google

Foundational Problems in ASR

Biography: Eshan Variani is a research scientist in the speech and language algorithm team in Google research. His general research interests are information theory, machine learning and speech recognition. He has been with Google since 2015 and before that he was a PhD student at Johns Hopkins university.


Abstract: This talk focuses on some foundational problems in practical speech recognition and discusses some solutions for each of these problems.

 

July, Tuesday 18th: Petr Schwarz, BUT and themos Stafylakis, Omilia

Introduction to speaker identification and deep fake context

Biography: Petr Schwarz [PhD, Brno University of Technology, 2009] is senior researcher in BUT Speech@FIT at the Faculty of Information Technology (FIT) of BUT. He has broad experience in speech technologies ranging from voice biometry, speech transcription, keyword spotting, to language identification. At BUT, Petr worked on many national, EU, and US research projects and many international technology evaluation campaigns like those organized by the U.S. National Institute of Standards and Technology (NIST). In 2006, Petr co-founded Phonexia, and served for several years as its CEO and CTO. Phonexia sells speech technologies to more than 60 countries. Currently, he is working on conversational AI technologies and security/defense applications of voice biometry.

Abstract: Petr will present how a speaker identification system based on the ResNet neural network architecture is designed. He will also tell you about basic principles used in speech synthesis, voice morphing, and speech codecs and explain how speaker identification, speech synthesis, and speech codecs can affect each other in the real world.

 

Extracting speaker and emotion information from self-supervised speech models

Biography: Themos Stafylakis received the B.Eng. degree from the National Technical University of Athens, Greece, in 2004, the M.Sc. degree in communication and signal processing from Imperial College London, London, U.K., in 2005, and the Ph.D. degree in speaker diarization for broadcast news from the National Technical University of Athens, Athens, Greece, in 2011. In 2011, he joined Centre de Recherche Informatique de Montréal: (Montréal, QC, Canada) as a Postdoc Researcher on speaker recognition. In 2016, he joined the Computer Vision Laboratory, University of Nottingham, Nottingham, U.K., as a Marie Curie Research Fellow. His main research interests are audiovisual speech and speaker recognition, machine learning and deep neural networks.

Abstract: Themos will  present hot topics in speaker identification research emphasizing self-supervised models..

July, Thursday 20th: Claire Gardent, CNRS/LORIA, Nancy
Joint work with Juliette Faille, CNRS/LORIA and Lorraine University, Nancy , Albert Gatt (U. Utrecht), Quentin Brabant, Gwénolé Lecorvé and Lina Rojas-Barahona (Orange Lanion)

What kind of errors are made by neural generation models and why?

In this talk, I will present our work on assessing, analysing and explaining the output of text generation models that are grounded in Knowledge Graphs (KG).

Focusing on KG-to-Text encoder-decoder models i.e., generation models which aim to verbalise the content of a Knowledge Graph, I will discuss missing information i.e., information that is present in the input but not in the  output. I will introduce a novel evaluation metric for assessing to what extent generation models omit input information and show that, while this metric correlates with human scores, correlation varies with the specifics of the human evaluation setup. This  suggests that an automatic metric might be more reliable, as less subjective and more focused on correct verbalisation of the input, than human evaluation measures. I will then go on to demonstrate, using both a parametric and a non-parametric probe, that omissions are already "visible" in the encoder representations i.e., can be tracked back to the encoder. 

In the second part of the talk, I will discuss conversational question generation and show that grounding  dialog in knowledge  allows for a detailed analysis of the model behaviour in terms of well-formedness, relevance, semantic adequacy and dialog coherence. 

 

July, Tuesday 25th: Hatim Bourfoune, Nathan Cassereau, Pierre Cornette, IDRIS

NLP at scale made easy on Jean Zay (PLM4ALL)

Biography: Hatim Bourfoune is a research engineer with a passion for Artificial Intelligence who has been working for several years in the field of Deep Learning. He has been working for more than two years at IDRIS in the user support team specialised in AI, in particular on optimisation work on very large models such as Transformers. His flagship project was his work on the development of the BLOOM language model, where he participated in the evaluation of this model as well as in its enhancement (Finetuning, RLHF...). In addition to the support he provides to Jean Zay users, he regularly gives lectures and courses on Deep Learning topics.

Nathan Cassereau is an engineer specialised in artificial intelligence and distributed computing. After graduating from Imperial College London, he joined IDRIS, the French institute operating Jean Zay, a powerful supercomputer dedicated to high performance computing and artificial intelligence research. At IDRIS, Nathan helps researchers optimise their code and their use of the supercomputer. He was also part of a team working on the evaluation and development of large language models, such as BLOOM.

Pierre Cornette is a dedicated research engineer with a strong background in supporting several AI research projects at IDRIS. With access to one of the most powerful supercomputers in Europe, Jean Zay, Pierre brings knowledge on the exploitation of computational resources for training deep learning models. From image and speech recognition to natural language understanding, Pierre's knowledge covers many subfields of AI.

Abstract: The conference aims at presenting the different tools to use and train language models in an optimized way. We will see practical examples, a presentation of exploitation tools (Accelerate, DeepSpeed, Megatron...) and the efforts made by IDRIS to make these tools easy to use.

 

July, Thursday 27th: Isabelle Guyon (Google Research, ChaLearn, and Université Paris-Saclay), Kent Rachmat, Khuong Thanh Gia Hieu (LISN/INRIA/CNRS, Université Paris-Saclay)

Peer-reviewed journal of AI-agents: a challenging LLM competition

Biography: Isabelle Guyon recently joined Google Research as a director of research. She is also professor of artificial intelligence at Université Paris-Saclay (Orsay). Her areas of expertise include computer vision, bioinformatics, and power systems. She is best known for being a co-inventor of Support Vector Machines. Her recent interests are in automated machine learning, meta-learning, data-centric AI, and large language model. She has been a strong promoter of challenges and benchmarks, and is president of ChaLearn, a non-profit dedicated to organizing machine learning challenges. She is community lead of Codalab competitions, a challenge platform used both in academia and industry. She co-organized the “Challenges in Machine Learning Workshop” @ NeurIPS between 2014 and 2019, launched the "NeurIPS challenge track" in 2017 while she was general chair, and pushed the creation of the "NeurIPS datasets and benchmark track" in 2021, as a NeurIPS board member.

Abstract: Significant recent advancements have occurred in the field of question answering, particularly within LLM-powered chatbots and augmented search engines like ChatGPT, Bard, the new Bing, and similar platforms. These advancements offer promising prospects for accelerating the acquisition of academic knowledge across various disciplines, including science, social sciences, education, and other fields of study. In academia, progress is typically facilitated through a peer-reviewed literature system. This has inspired us to emulate this process using AI agents.

In our envisioned scenario, AI agents would emulate a peer-reviewed journal. Human contributors would assume roles as editors or meta-reviewers, providing prompts (call-for-papers) to AI authors and selecting papers for publication based on the evaluations from AI reviewers, as well as their own judgment. Acceptance or rejection of papers would serve as a teaching signal for both AI authors and AI reviewers, motivating them to continuously enhance their skills.

To begin working towards this ambitious objective, we are organizing a challenge for the AutoML-conf'23 conference. Participants will be invited to submit AI agents capable of acting as authors and reviewers. For this challenge, we will focus on generating systematic surveys or overview papers.

To generate prompts for the challenge, we reverse-engineered numerous papers from various fields indexed in Semantic Scholars, including Computer Science, Medicine, Chemistry, Biology, Materials Science, Physics, Geology, Psychology, Art, History, Geography, Sociology, Business, Political Science, Economics, Philosophy, Mathematics, Engineering, Environmental Science, Education, Law, and Linguistics. These papers served as a basis for creating prompts such as: "Write a systematic survey or overview examining the impact of social media on mental health. This paper should explore current research on the correlation between social media usage and mental health outcomes, encompassing areas such as depression, anxiety, and self-esteem."

Furthermore, we have developed a baseline AI author and AI reviewer as a starting point. During the conference, we will present the competition design and our initial results based on the baseline models. We will also provide instructions on how to submit your first entry to the competition.

Acknowledgments: The support of INRIA, Google Research and ANR Chair of Artificial Intelligence HUMANIA ANR-19-CHIA-0022 and TAILOR EU Horizon 2020 grant 952215 are gratefully acknowledged.

 

July, Friday 28th: Hermann Ney (RWTH Aachen University, Aachen, Germany)

Data-driven speech and language technology: from small to large (language) models

Biography: Hermann Ney is a full professor of computer science with RWTH Aachen University, Germany. Previously, he headed the Speech Recognition Group, Philips Research. His main research interests include the area of statistical methods for pattern recognition and human language technology and their specific applications to speech recognition, machine translation, and image object recognition. In particular, he has worked on dynamic programming for continuous speech recognition, language modeling, and phrase-based approaches to machine translation. He has authored and coauthored more than 600 papers in journals, books, conferences, and workshops.

Abstract: Today data-driven methods like neural networks and deep learning are widely used for speech and language processing. We will re-visit the evolution of these methods over the last 40 years and try to present a unifying view of their principles. Specifically the talk will focus on speech recognition and language modelling.

 

August, Tuesday 1st: Victoria Mingote and Pablo Gimeno (University of Zaragoza, Spain)

Representation and Metric Learning Advances for Face and Speaker Biometric Systems

Abstract: In recent years, as advanced as deep learning techniques are, they still have some problems when the task has limited data or a successful approach in one task is intended to be used for another task. Therefore, in this talk, I will present different alternative approaches to deal with these issues in biometric systems. First part of the talk is focused on different ways to improve the generation of signal representations for the text-dependent speaker verification task, since this task has a strong dependency of the phonetic content. While in the second part, I will explain several approaches using new training loss functions for deep neural networks that are based on the final verification metrics. These training loss functions can be applied to different verification tasks.

Multiclass audio segmentation in broadcast environments

Abstract: Audio segmentation can be defined as the division of an audio signal into smaller fragments according to a predefined set of attributes. This wide definition could include several systems depending on the set of rules considered. In this talk, the focus will be set on multiclass audio segmentation tasks, aiming to obtain a set of labels describing several tipologies in an audio signal such as speech, music and noise. During the presentation, different approaches will be presented evaluating these kind of systems in broadcast domain data.

August 3 and 4, Closing days

Closing day presentations

This session will be broadcast live and questions can be submitted via chat.

Thursday, August 3rd, 2022

Venue: IC2, Auditorium

08:00  Continental Breakfast
08:55  Call to order and housekeeping announcements
09:00  Sneak Preview: How we got ere and what we discovered

 

Activities and Findings of JSALT 2023 Teams (Part I)

09:30  Team Presentation: X-Diar: Explainability for diarization(Live streaming)
10:40  Stretch/Caffeine Break
10:50  Team Presentation (Continued)
12:00  Discussion, Questions and Comments

12:30  Lunch Break

14:00  Team Presentation:  Better together: Text + context(Live streaming)
15:10  Stretch/Caffeine Break
15:20  Team Presentation (Continued)
16:30  Discussion, Questions and Comments

17:00  Adjourn for the day

 

Friday, August 4th, 2022

Venue: IC2, Auditorium

08:00  Continental Breakfast
09:25  Call to order and housekeeping announcements

 

Activities and Findings of JSALT 2023 Teams (Part II)

09:30  Team Presentation: Automatic design of conversational models from observation of human-to-human conversation(Live streaming)
10:40  Stretch/Caffeine Break
10:50  Team Presentation (Continued)
12:00  Discussion, Questions and Comments

12:30  Lunch Break

14:00  Team Presentation:  Finite state methods with modern neural Architectures for speech applications and beyond(Live streaming)
15:10  Stretch/Caffeine Break
15:20  Team Presentation (Continued)
16:30  Discussion, Questions and Comments

16:55  Concluding remarks and plans for 2024
17:00  Adjour to labs for planning post-workshop activities

 

Farewell Dinner - Abbaye de l'Epau

18:30  Meet up at Préfecture tram stop to go together at the Abbey (to catch the 6.35pm tram T2)
19:00  Workshop Team Photos
19:05  Cocktail Reception, Concert by the group "Para Ir"
20:00  Seated Dinner
22:30  Back to the city centre

Full Programme

Monday, June 12 to Friday, June 23 : Summer school

 

Monday, June 26, JSALT  Openning day

8:00-12:00:        Jsalt workshop Openning presentations (open to public)

Tuesday, June 27: Invited speaker Craig Greenberg & Reva Schwartz

11:00                Evaluation Beyond Computation, Assessing the Impacts of Human Language Technologies (live streaming)

Friday, June 30 to Sunday July, 2 : Plein Champ art Festival

 

Tuesday, July 4, Invited speaker: Thibault Véry

11:00               Deep learning for protein structure discovery

Wednesday, July 5 

13:00               Group progress report

Thursday, July 6,  Invited speaker: Isabelle Collet

11:00               Inequal by design? How to think together about gender bias in data (live streaming)

14:00-17:00     Gender equality in computer science, from secondary school to university (open to public)

 

Tuesday, July 11,  Invited speaker: Emmanuel Dupoux 

11:00               Textless NLP: towards language processing from raw audio (live streaming)

Wednesday, July 12

13:00               Group progress report

Thursday, July 13,  Invited speaker: Ehsan Variani

11:00               Foundational Problems in ASR (live streaming)

 

Tuesday, July 18,  Invited speakers: Petr Schwarz and Themos Stafylakis

11:00               Introduction to speaker identification and deep fake context. + Extracting speaker and emotion information from self-supervised speech models (live streaming)

Wednesday, July 19

13:00               Group progress report

Thursday, July 20,  Invited speaker: Claire Gardent

11:00               What kind of errors are made by neural generation models and why ? (live streaming)

 

Tuesday, July 25,  Invited speakers: Hatim Bourfoune, Nathan Cassereau & Pierre Cornette

11:00               NLP at scale made easy on Jean Zay (live streaming)

Wednesday, July 26

13:00               Group progress report

Thursday, July 27,  Invited speaker: Isabelle Guyon

11:00               Peer-reviewed journal of AI-agents: a challenging LLM competition (live streaming)

Friday, July 28,  Invited speaker: Hermann Ney (remote)

14:00               Data-driven speech and language technology: from small to large (language) models (live streaming)

 

Tuesday, August 1,  Invited speakers: Victoria Mingote, Pablo Gimeno

11:00               Representation and Metric Learning Advances for Face and Speaker Biometric Systems + Multiclass audio segmentation in broadcast environments (live streaming)

Thursday, August 3,  closing presentations

9:30-12:30       Explainability for diarization (live streaming and open to public)

14:00-17:00     Better together: text + context (live streaming and open to public)

Friday, August 4: Closing presentations

9:30-12:30       Automatic design of conversational models from observation of human-to-human conversation (live streaming and open to public)

14:00-17:00     Finite state methods with modern neural Architectures for speech applications and beyond (live streaming and open to public)

19:00:              Closing ceremony at l'Abbaye de l'Epau

 

Saturday, August 5: Departure of participants

Partagez :