Summer school - JSALT2023

Presentation summer-school

The internship includes a comprehensive 2-week summer school on HLT, followed by intensive research projects on select topics for 6 weeks.

The 8-week workshop provides an intense, dynamic intellectual environment. Undergraduates work closely alongside senior researchers as part of a multi-university research team, which has been assembled for the summer to attack HLT problem of current interest.

REGISTRATION

Lecturers

Hervé Bredin, IRIT-CNRS, Toulouse

Hervé Bredin received a PhD on talking-face biometric authentication from Telecom ParisTech in 2007. He has been a permanent CNRS researcher (cnrs.fr) since 2008 and is currently based in Toulouse (France) at IRIT (irit.fr). His current research interests include deep learning techniques applied to machine listening, with a strong focus on speaker diarization. He is the creator and lead contributor of the open source "pyannote.audio" toolkit for speaker diarization (2.5k Github stars). Since 2021, he has also been working as a scientific advisor around speech processing technologies for various companies.

www: herve.niderb.fr
twitter: @Hervé Bredin
github: @Hervé Bredin

Ondrej Bojar, Charles University, Prague

Ondřej Bojar is an associate professor at ÚFAL, Charles University, and a lead scientist in Machine Translation in the Czech Republic. He has been co-organizing WMT shared tasks in machine translation and machine translation evaluation since 2013. His system has dominated English-Czech translation in the years 2013-2015, before deep learning and neural networks fundamentally changed the field. Having taken part and later supervised ÚFAL’s participation in a series of EU projects (EuroMatrix, EuroMatrixPlus, MosesCore, QT21, HimL, CRACKER, Bergamot), he has recently concluded his coordination of the EU project ELITR (http://elitr.eu/) focussed on simultaneous speech translation into over 40 languages. ELITR has also coined the task of project meeting summarization with its AutoMin shared task (2021 and now 2023).

Jean-François Bonastre, Avignon Université

Jean-François Bonastre received his PhD on automatic speaker recognition in 1994 and his “Habilitation à Diriger les Recherches” (HDR) in 2000. Jean-François Bonastre is a Senior Research Scientist (Directeur de Recherche) at Inria Defense & Security and associate member of Computer Science Laboratory of Avignon (LIA), Avignon University. Before to join Inria in 2022, Jean-François Bonastre was Professor (of Exceptional Class) at Avignon University, where he lead the LIA from 2016 to 2020 and was vice-president from 2008 to 2015.
He is a member of the Institut Universitaire de France (promotion Junior 2006). He was an auditor of the 26th session of IHEDN/FMES (the Mediterranean session of the Institut des Hautes Etudes de la Défense Nationale) in 2015. He spent a sabbatical year at Panasonic Speech Technology Laboratory (Santa Barbara, California, USA) in 2002-2003.
He was the President of the International Speech Communication Association (ISCA) from 2011 to 2013 and the President of the Association Francophone de la Communication Parlée from 2000 to 2004. He is an IEEE Senior Member and was elected member of the IEEE Speech and Language Technical Committee and IEEE Biometrics Council. He is one of the founders of ISCA's Special Interest Group "Speaker and Language Characterization" (SPLC). He was a member of the Scientific Committee of the Montreal Computer Research Center (CRIM) from 2016 to 2020.
Jean-François Bonastre has supervised 21 defended PhDs and is currently supervising 3 PhD students. He is the author or co-author of more than 300 papers, with ~8000 citations, an h-index of 43 and three patents.

Hatim Bourfoune, IDRIS

Hatim Bourfoune is a research engineer with a passion for Artificial Intelligence who has been working for several years in the field of Deep Learning. He has been working for more than two years at IDRIS in the user support team specialised in AI, in particular on optimisation work on very large models such as Transformers. His flagship project was his work on the development of the BLOOM language model, where he participated in the evaluation of this model as well as in its enhancement (Finetuning, RLHF...). In addition to the support he provides to Jean Zay users, he regularly gives lectures and courses on Deep Learning topics.

Nathan Cassereau, IDRIS

Nathan Cassereau is an engineer specialised in artificial intelligence and distributed computing. After graduating from Imperial College London, he joined IDRIS, the French institute operating Jean Zay, a powerful supercomputer dedicated to high performance computing and artificial intelligence research. At IDRIS, Nathan helps researchers optimise their code and their use of the supercomputer. He was also part of a team working on the evaluation and development of large language models, such as BLOOM.

Kenneth Church, Northeastern University

Kenneth Church is professor of the practice at the Khoury College of Computer Sciences at Northeastern University. His research focuses on natural language processing and information retrieval, artificial intelligence and machine learning.

Before joining Northeastern in 2022, Church worked as a scientist at Baidu, a researcher at IBM and a scientist at Johns Hopkins University. He was recognized as a Baidu Fellow in 2018, an Association for Computational Linguistics (ACL) Fellow in 2015 and served as president of ACL in 2012. Notable journals Church’s research has been published in includes NeurIPS, ACL, EMNLP, NAACL, Journal of Natural Language Engineering and Frontiers Interspeech. Outside of academic research, he enjoys chess and hiking. A fun fact most people don’t know about him is his great-grandfather invented a method that is still used today to predict stream runoff from mountain ranges across the West, as well as floods and droughts.

Pierre Cornette, IDRIS

Pierre Cornette is a dedicated research engineer with a strong background in supporting several AI research projects at IDRIS. With access to one of the most powerful supercomputers in Europe, Jean Zay, Pierre brings knowledge on the exploitation of computational resources for training deep learning models. From image and speech recognition to natural language understanding, Pierre's knowledge covers many subfields of AI.

Benoît Crabbé, Université Paris Diderot

Benoît Crabbé is professor at the University Paris Diderot. He is teaching in the Linguistics department (UFRL) and he is affiliated in research to the LLF lab (CNRS and Paris Diderot). His research interests are in computational linguistics and more specifically in natural language understanding with an interest on parsing French and related languages. He's also involved in empirical and experimental issues in linguistics and in cognitive science related to modelling the syntax of natural languages.

Denise DiPersio, Linguistic Data Consortium, University of Pennsylvania

The Linguistic Data Consortium, hosted by the University Pennsylvania, develops and distributes language resources to organizations around the globe. Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership, and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property, and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Craig Greenberg, National Institute of Standards and Technology

Craig Greenberg is a Mathematician at the National Institute of Standards and Technology (NIST), where he oversees NIST’s Speaker Recognition Evaluation series and Language Recognition Evaluation series, and researches the measurement and evaluation of Artificial Intelligence (AI) and other topics in AI and machine learning. Dr. Greenberg received his PhD in 2020 from the University of Massachusetts Amherst with a dissertation on uncertainty and exact and approximate inference in flat and hierarchical clustering, his M.S. degree in Computer Science from University of Massachusetts Amherst in 2016, his M.S. degree in Applied Mathematics from Johns Hopkins University in 2012, his B.A. (Hons.) degree in Logic, Information, & Computation fom the University of Pennsylvania in 2007, and his B.M. degree in Percussion Performance from Vanderbilt University in 2003. Among his accolades, Dr Greenberg has received two official letters of commendation for his contribution to speaker recognition evaluation.

Nils Holzenberger, Telecom Paris

Nils Holzenberger graduated in 2017 with a M.S. from Mines Paris, part of Paris Sciences et Lettres University, and in 2022 with a PhD from Johns Hopkins University, advised by Prof. Benjamin Van Durme. His major scientific contributions are in representation learning for language, information extraction, and natural legal language processing. His research on statutory reasoning has been featured on the NLP Highlights podcast of the AI2 institute. With Benjamin Van Durme and Andrew Blair-Stanek, Nils obtained an NSF CISE award for research in legal NLP. Since February 2023, he is a faculty member at Télécom Paris, part of Institut Polytechnique de Paris.

Lucas Ondel Yang, LISN, Université Paris Saclay

Lucas Ondel Yang is a CNRS researcher at the LISN laboratory of Université Paris Saclay. He received a PhD Degree in from the Faculty of Information Technology of Brno University of Technology in 2021 on bayesian models for unsupervised learning of speech. His research interest concerns developing new machine learning approaches to facilitate the democratization of speech technologies. In this endeavor, he his actively working on bringing speech technologies to the so-called low-resource languages as well as creating new theoretical and practical tools to ease the use of speech technologies by non-expert users. He is currently leading the "Finite State Methods" team within the JSALT workshop.

Petr Schwartz, Brno University of Technology

Petr Schwarz [PhD, Brno University of Technology, 2009] is senior researcher in BUT Speech@FIT at the Faculty of Information Technology (FIT) of BUT. He has broad experience in speech technologies ranging from voice biometry, speech transcription, keyword spotting, to language identification. At BUT, Petr worked on many national, EU, and US research projects and many international technology evaluation campaigns like those organized by the U.S. National Institute of Standards and Technology (NIST). In 2006, Petr co-founded Phonexia, and served for several years as its CEO and CTO. Phonexia sells speech technologies to more than 60 countries. Currently, he is working on conversational AI technologies and security/defense applications of voice biometry.

Marie Tahon, LIUM, Le Mans Université

Marie Tahon is currently Associate Professor at Le Mans University and conducts her research at LIUM. She received the Ph.D. degree in computer science from the University of Paris-Sud (Orsay) in 2012 on para-linguistic speech processing. Her research interests concern expressive speech processing, mainly in the fields of speech synthesis, emotion recognition and speaker identification. She has also developed strong skills in musical acoustics for automatic songs analysis and organology. She is currently leading a project within JSALT workshop.
https://cv.hal.science/marie-tahon

François Yvon, LISN-CNRS

François Yvon is a senior CNRS researcher at the LISN (formerly LIMSI) laboratory of Université Paris Saclay in Orsay, France. F. Yvon has been leading activities in Machine Translation at LISN for about 15 years, resulting in more than one hundred scientific publications on all aspects related to the development and evaluation of multilingual language processing technologies, from word and sentence alignment to translation modelling and evaluation, including recent work on multi-domain adaptation in MT and on cross-lingual transfert learning issues. He has acted as coordinator or PI in multiple past national and international projects in MT such as Quaero or H2020/QT21 and has supervised more than 15 PhDs on MT related topics. He was recently involved in the evaluation activities of the large "Big Science" collaboration. He is a board member of the European chapter of the Association for Computational Linguistics, of the MetaNet network, and has recently contributed as an expert on linguistic technologies for the French language to several European projects (European Language Resource Collection, ELE – European Language Equality, ELG – European Language Grid).

Teams and topics

The teams and topics for 2023 are:

Published on June 22, 2023

Programme

All lectures will be held in the IC2 auditorium. All labs will be held in IC2 classroom.

Monday, June 12
Deep Learning: Introduction & Acceleration (Host: Kamel Guerda, Nathan Cassereau, IDRIS)

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Deep Learning introduction (Kamel Guerda, Nathan Cassereau)
10:40 – 11:00 Stretch Break
11:00 – 12:00 Laboratory
12:00 – 13:00 Lunch Break (RU)
13:00 – 15:00 Deep Learning Optimization & Acceleration (Kamel Guerda, Nathan Cassereau)
15:00 – 15:20 Stretch Break
15:20 – 15:40 Computer Lab Setup
15:40 – 17:00 Laboratory

Tuesday, June 13
Machine Translation, Host: Ondrej Bojar, (UFAL, Charles University)

Machine translation (MT) can be seen as the basis of the current amazing and amazingly popular large language models: MT always had a greed for data, the use of monolingual texts was critical for neural MT to take off and Transformers were created for MT. This lecture will provide you with the background from translation which will help you to have a realistic view on LLMs and be wary of common evaluation issues and misconceptions.

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Machine translation 1 (Ondrej Bojar)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Machine translation 2 (Ondrej Bojar)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Wednesday, June 14
Gigamodels, Hosts: Benoît Crabbé (LLF), François Yvon (LISN)

The Large Language Models introduced in the recent years have been found extremely helpful to advance the state-of-the-art in many Natural Language Applications, notably due to their ability to compute numerical, high-dimensional, representations of linguistic units such as words or sentences. Multilingual language models go one step further and add the ability to handle multiple languages, sometimes even multiple scripts, with just one single model. In this presentation, I will discuss multilingual language models at length, how they are typically learned and used, with a focus on the measurement of their multilingual abilities. The main question I will thus try to answer is "what does it mean for a multilingual model X to cover language Y ?".

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Giga Models 1 (Benoît Crabbé)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Evaluating Multilinguality in Large Language Models (François Yvon)

12:00 – 13:00 Lunch Break (RU)

Wednesday, June 14, afternoon
Fairness and equity in the data, Host: Denise Dipersio (University of Pennsylvania)

This session will cover concepts around the notion of data fairness. We begin with a discussion of general ethical principles, then apply those principles to research tasks in speech and natural language processing. These are manifested throughout the development, testing, deployment and post-deployment life cycle and include data diversity, bias and transparency. We also examine relevant laws and regulations impacting this research. We discuss strategies and resources for managing these concerns and explore potential use cases.

13:30 – 17:00 Fairness & equity in Data (Denise Dispersio)

Thursday, June 15
NLP, Host: Nils Holzenberger
08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Nils Holzenberger)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Ryan Cotterell)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Friday, June 16
Combining Finite State Methods with Neural Networks, Host: Lucas Ondel Yang (LISN)

This lecture will cover how to use finite state methods to train and conduct inference with neural networks. We will explore how finite state methods can be used to define standard loss functions for sequence-to-sequence training, how to back-propagate the gradient through finite state automata and how one can leverage GPU to accelerate inference in large automata

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 ASR 1 (Lucas Ondel Yang)
10:20 – 10:40 Stretch Break
10:40 – 12:00 ASR 2 (Martin Kocour)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

__________________________________________________________________

___________________________________________________________

Monday, June 19
Speech segmentation and speaker diarization, Hosts: Marie Tahon, (LIUM) & Hervé Bredin, (IRIT)

This course introduces basis knowledge on speech segmentation. Processing a full recording, obtained for instance from a TV or radio show, requires to identify specific segments of the audio signal. In order to have clean speech with a single speaker, the presence of noise, speech and overlapping speech needs to be precisely determined under a segmentation task. Then, speaker diarization is the task of partitioning an audio stream into homogeneous temporal segments according to the identity of the speaker (i.e. answering the question "who speaks when?"). During the day, we will present the speech segmentation by classification approach and the speaker diarization process.

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Marie Tahon)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Hervé Bredin)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Tuesday, June 20
Neural Conversational AI, Host: Petr Schwarz (BUT)

This lecture will give you a basic overview of dialog systems. It will start with transformer and pre-trained language models and continue with neural models for dialogue system components like language understanding, state tracking, and dialogue policy. Then the end-to-end neural models will be presented, and evaluation metrics and the lecture will be finished with current state-of-the-art approaches. We will train and evaluate our end-to-end dialog model during the practical part.

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Petr Schwartz)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Santosh Kesiraju, Ondrej Platek)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Wednesday, June 21
Natural Language Processing, Deep Nets, Linear Algebra and Information Retrieval, Host: Kenneth Church, (Northeastern University)

Firth's famous quote, \textit{you shall know a word by the company it keeps}, has had considerable influence on Natural Language methods for processing words and phrases (PMI, Word2vec, BERT). Firth's approach has been generalized from words and phrases to topics and documents, using methods such as node2vec and graphical neural nets (GNNs).
However, these approaches often view documents, at least at inference time, as short (512) sequences of subwords. Recommender systems and systems for assigning papers to reviewers tend to focus on titles and abstracts, but in our collection of 200M documents from Semantic Scholar, there are more papers in the citation graph without abstracts (46M) than vice versa (34M). Links have been super-important in websearch. There is an opportunity to take more advantage of citations (and citing sentences) in topic modeling and information retrieval.
Methods such as GNNs and Specter use links at training time to improve models of 512-subwords, but these methods do not use links at inference time. In addition to BERT-like methods for encoding documents as vectors, we will also use ProNE, a node2vec method for encoding encoding nodes in a citation graph as vectors. Cosines of vectors based on text (e.g., bags of words, BERT) denote word similarity, whereas cosines based on node2vec (spectral clustering of the citation graph, $G$) can be interpreted in terms of distance in $G$.

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Kenneth Church)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Kenneth Church)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Thursday, June 22
Explainability in Speech Procesing: From Wishes to practice, Host: Jean-François Bonastre (LIA)

In the field of artificial intelligence, explainability is rapidly moving from an optional aspect, considered as a candied fruit on a cake, to a mandatory feature requested in all situations. This is due to both regulatory changes and to citizens' opinions on the possibilities and dangers of AI.
This talk is composed of three parts:
- An introduction to explainability and interpretability in AI, with a focus on the specifics of speech processing. It includes a brief presentation of the main approaches and tools available.
- The presentation of two practical applications of explicability, in the fields of voice characterization and pathological voice assessment.
- A live session, where participants will be able to interact with the presenters and dive into the source code and results.

08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Jean François Bonastre)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Imen Ben amor, Sondes Aberrazek)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Friday, June 23
Evaluation in speech and NLP, Hosts: Craig Greenberg, (NIST) & Olivier Galibert, (LNE)
08:00 – 09:00 Continental Breakfast
09:00 – 10:20 Lecture 1 (Craig Greenberg)
10:20 – 10:40 Stretch Break
10:40 – 12:00 Lecture 2 (Olivier Galibert)
12:00 – 13:00 Lunch Break (RU)
13:00 – 13:30 Computer Lab Setup
13:30 – 17:00 Laboratory

Published on June 23, 2023