Presentation summer-school

The internship includes a comprehensive 2-week summer school on HLT, followed by intensive research projects on select topics for 6 weeks.

The 8-week workshop provides an intense, dynamic intellectual environment. Undergraduates work closely alongside senior researchers as part of a multi-university research team, which has been assembled for the summer to attack HLT problem of current interest.

REGISTRATION

Lecturers

Hervé Bredin, IRIT-CNRS, Toulouse

Hervé Bredin received a PhD on talking-face biometric authentication from Telecom ParisTech in 2007. He has been a permanent CNRS researcher (cnrs.fr) since 2008 and is currently based in Toulouse (France) at IRIT (irit.fr). His current research interests include deep learning techniques applied to machine listening, with a strong focus on speaker diarization. He is the creator and lead contributor of the open source "pyannote.audio" toolkit for speaker diarization (2.5k Github stars). Since 2021, he has also been working as a scientific advisor around speech processing technologies for various companies.

www: herve.niderb.fr
twitter: @Hervé Bredin
github: @Hervé Bredin

Ondrej Bojar, Charles University, Prague

Ondřej Bojar is an associate professor at ÚFAL, Charles University, and a lead scientist in Machine Translation in the Czech Republic. He has been co-organizing WMT shared tasks in machine translation and machine translation evaluation since 2013. His system has dominated English-Czech translation in the years 2013-2015, before deep learning and neural networks fundamentally changed the field. Having taken part and later supervised ÚFAL’s participation in a series of EU projects (EuroMatrix, EuroMatrixPlus, MosesCore, QT21, HimL, CRACKER, Bergamot), he has recently concluded his coordination of the EU project ELITR (http://elitr.eu/) focussed on simultaneous speech translation into over 40 languages. ELITR has also coined the task of project meeting summarization with its AutoMin shared task (2021 and now 2023).

Jean-François Bonastre, Avignon Université

Jean-François Bonastre received his PhD on automatic speaker recognition in 1994 and his “Habilitation à Diriger les Recherches” (HDR) in 2000. Jean-François Bonastre is a Senior Research Scientist (Directeur de Recherche) at Inria Defense & Security and associate member of Computer Science Laboratory of Avignon (LIA), Avignon University. Before to join Inria in 2022, Jean-François Bonastre was Professor (of Exceptional Class) at Avignon University, where he lead the LIA from 2016 to 2020 and was vice-president from 2008 to 2015.
He is a member of the Institut Universitaire de France (promotion Junior 2006). He was an auditor of the 26th session of IHEDN/FMES (the Mediterranean session of the Institut des Hautes Etudes de la Défense Nationale) in 2015. He spent a sabbatical year at Panasonic Speech Technology Laboratory (Santa Barbara, California, USA) in 2002-2003.
He was the President of the International Speech Communication Association (ISCA) from 2011 to 2013 and the President of the Association Francophone de la Communication Parlée from 2000 to 2004. He is an IEEE Senior Member and was elected member of the IEEE Speech and Language Technical Committee and IEEE Biometrics Council. He is one of the founders of ISCA's Special Interest Group "Speaker and Language Characterization" (SPLC). He was a member of the Scientific Committee of the Montreal Computer Research Center (CRIM) from 2016 to 2020.
Jean-François Bonastre has supervised 21 defended PhDs and is currently supervising 3 PhD students. He is the author or co-author of more than 300 papers, with ~8000 citations, an h-index of 43 and three patents.

Hatim Bourfoune, IDRIS

Hatim Bourfoune is a research engineer with a passion for Artificial Intelligence who has been working for several years in the field of Deep Learning. He has been working for more than two years at IDRIS in the user support team specialised in AI, in particular on optimisation work on very large models such as Transformers. His flagship project was his work on the development of the BLOOM language model, where he participated in the evaluation of this model as well as in its enhancement (Finetuning, RLHF...). In addition to the support he provides to Jean Zay users, he regularly gives lectures and courses on Deep Learning topics.

Nathan Cassereau, IDRIS

Nathan Cassereau is an engineer specialised in artificial intelligence and distributed computing. After graduating from Imperial College London, he joined IDRIS, the French institute operating Jean Zay, a powerful supercomputer dedicated to high performance computing and artificial intelligence research. At IDRIS, Nathan helps researchers optimise their code and their use of the supercomputer. He was also part of a team working on the evaluation and development of large language models, such as BLOOM.

Kenneth Church, Northeastern University

Kenneth Church is professor of the practice at the Khoury College of Computer Sciences at Northeastern University. His research focuses on natural language processing and information retrieval, artificial intelligence and machine learning.

Before joining Northeastern in 2022, Church worked as a scientist at Baidu, a researcher at IBM and a scientist at Johns Hopkins University. He was recognized as a Baidu Fellow in 2018, an Association for Computational Linguistics (ACL) Fellow in 2015 and served as president of ACL in 2012. Notable journals Church’s research has been published in includes NeurIPS, ACL, EMNLP, NAACL, Journal of Natural Language Engineering and Frontiers Interspeech. Outside of academic research, he enjoys chess and hiking. A fun fact most people don’t know about him is his great-grandfather invented a method that is still used today to predict stream runoff from mountain ranges across the West, as well as floods and droughts.

Pierre Cornette, IDRIS

Pierre Cornette is a dedicated research engineer with a strong background in supporting several AI research projects at IDRIS. With access to one of the most powerful supercomputers in Europe, Jean Zay, Pierre brings knowledge on the exploitation of computational resources for training deep learning models. From image and speech recognition to natural language understanding, Pierre's knowledge covers many subfields of AI.

Benoît Crabbé, Université Paris Diderot

Benoît Crabbé is professor at the University Paris Diderot. He is teaching in the Linguistics department (UFRL) and he is affiliated in research to the LLF lab (CNRS and Paris Diderot). His research interests are in computational linguistics and more specifically in natural language understanding with an interest on parsing French and related languages. He's also involved in empirical and experimental issues in linguistics and in cognitive science related to modelling the syntax of natural languages.

Denise DiPersio, Linguistic Data Consortium, University of Pennsylvania

The Linguistic Data Consortium, hosted by the University Pennsylvania, develops and distributes language resources to organizations around the globe. Denise is responsible for the overall operation of LDC’s External Relations group which includes intellectual property management, licensing, regulatory matters, publications, membership, and communications. Before joining LDC, she practiced law for over 20 years in the areas of international trade, intellectual property, and commercial litigation. She has an A.B. in Political Science from Bryn Mawr College and a Juris Doctor degree from the University of Miami School of Law.

Craig Greenberg, National Institute of Standards and Technology

Craig Greenberg is a Mathematician at the National Institute of Standards and Technology (NIST), where he oversees NIST’s Speaker Recognition Evaluation series and Language Recognition Evaluation series, and researches the measurement and evaluation of Artificial Intelligence (AI) and other topics in AI and machine learning. Dr. Greenberg received his PhD in 2020 from the University of Massachusetts Amherst with a dissertation on uncertainty and exact and approximate inference in flat and hierarchical clustering, his M.S. degree in Computer Science from University of Massachusetts Amherst in 2016, his M.S. degree in Applied Mathematics from Johns Hopkins University in 2012, his B.A. (Hons.) degree in Logic, Information, & Computation fom the University of Pennsylvania in 2007, and his B.M. degree in Percussion Performance from Vanderbilt University in 2003. Among his accolades, Dr Greenberg has received two official letters of commendation for his contribution to speaker recognition evaluation.

Nils Holzenberger, Telecom Paris

Nils Holzenberger graduated in 2017 with a M.S. from Mines Paris, part of Paris Sciences et Lettres University, and in 2022 with a PhD from Johns Hopkins University, advised by Prof. Benjamin Van Durme. His major scientific contributions are in representation learning for language, information extraction, and natural legal language processing. His research on statutory reasoning has been featured on the NLP Highlights podcast of the AI2 institute. With Benjamin Van Durme and Andrew Blair-Stanek, Nils obtained an NSF CISE award for research in legal NLP. Since February 2023, he is a faculty member at Télécom Paris, part of Institut Polytechnique de Paris.

Lucas Ondel Yang, LISN, Université Paris Saclay

Lucas Ondel Yang is a CNRS researcher at the LISN laboratory of Université Paris Saclay. He received a PhD Degree in from the Faculty of Information Technology of Brno University of Technology in 2021 on bayesian models for unsupervised learning of speech. His research interest concerns developing new machine learning approaches to facilitate the democratization of speech technologies. In this endeavor, he his actively working on bringing speech technologies to the so-called low-resource languages as well as creating new theoretical and practical tools to ease the use of speech technologies by non-expert users. He is currently leading the "Finite State Methods" team within the JSALT workshop.

Petr Schwartz, Brno University of Technology

Petr Schwarz [PhD, Brno University of Technology, 2009] is senior researcher in BUT Speech@FIT at the Faculty of Information Technology (FIT) of BUT. He has broad experience in speech technologies ranging from voice biometry, speech transcription, keyword spotting, to language identification. At BUT, Petr worked on many national, EU, and US research projects and many international technology evaluation campaigns like those organized by the U.S. National Institute of Standards and Technology (NIST). In 2006, Petr co-founded Phonexia, and served for several years as its CEO and CTO. Phonexia sells speech technologies to more than 60 countries. Currently, he is working on conversational AI technologies and security/defense applications of voice biometry.

Marie Tahon, LIUM, Le Mans Université

Marie Tahon is currently Associate Professor at Le Mans University and conducts her research at LIUM. She received the Ph.D. degree in computer science from the University of Paris-Sud (Orsay) in 2012 on para-linguistic speech processing. Her research interests concern expressive speech processing, mainly in the fields of speech synthesis, emotion recognition and speaker identification. She has also developed strong skills in musical acoustics for automatic songs analysis and organology. She is currently leading a project within JSALT workshop.
https://cv.hal.science/marie-tahon

François Yvon, LISN-CNRS

François Yvon is a senior CNRS researcher at the LISN (formerly LIMSI) laboratory of Université Paris Saclay in Orsay, France. F. Yvon has been leading activities in Machine Translation at LISN for about 15 years, resulting in more than one hundred scientific publications on all aspects related to the development and evaluation of multilingual language processing technologies, from word and sentence alignment to translation modelling and evaluation, including recent work on multi-domain adaptation in MT and on cross-lingual transfert learning issues. He has acted as coordinator or PI in multiple past national and international projects in MT such as Quaero or H2020/QT21 and has supervised more than 15 PhDs on MT related topics. He was recently involved in the evaluation activities of the large "Big Science" collaboration. He is a board member of the European chapter of the Association for Computational Linguistics, of the MetaNet network, and has recently contributed as an expert on linguistic technologies for the French language to several European projects (European Language Resource Collection, ELE – European Language Equality, ELG – European Language Grid).

Teams and topics

The teams and topics for 2023 are:

JSALT2023

Jelinek Summer Workshop on Speech and Language Technology