Researcher Spotlight

Sultan Alrowili

My name is Sultan Alrowili, and I am a Ph.D. student at the University of Delaware under the supervision of Prof. K. Vijay-Shanker, who has more than 30 years of research experience in the NLP area. Our research involves building a large biomedical language model (BioM-Transformers) based on state-of-the-art NLP models, including BERT, ELECTRA, and ALBERT. We trained our models on large biomedical corpora, including PubMed abstracts and PMC full articles. Our models serve various applications in the biomedical field, including Named Entity Recognition (NER) and biomedical text classification tasks such as ChemProt, GAD, DDI, and MedMNLI. We explain our large biomedical question answering models in our published paper.

Our models also are practical with biomedical question answering tasks such as BioASQ, COVID QA, and biomedical chatbots. We participated in the BioASQ9B challenge (2021) where we took the lead in two batches in List and Factoid questions. In 2022 we participated in the BioASQ10b challenge, where we took the lead in all four batches of List questions and three batches of both Factoid and Yes/No questions.

In addition to our research work in the BioNLP area, we have been actively working in the Arabic NLP domain where we built our efficient and State-of-the-Art Arabic language model (ArabicTransformer), which we published at EMNLP21.

Per E Kummervold

I am a Norwegian researcher that is passionate about large language models and their implications. The resources from TRC have allowed me to do research experiments that would have been very hard without this support. I am together with the Vaccine Confidence Project at the London School of Hygiene & Tropical Medicine using natural language processing for getting a better understanding of vaccine sentiments.

While transformer-based models have yielded state-of-the-art performance in natural language processing, they still have had some problems with the specialised social media language. We were able to do domain-specific pretraining on a corpus of nearly 100M Twitter messages on the topic of COVID-19. Together with Martin Müller and Marcel Salathé from the Digital Epidemiology Lab at EPFL, Switzerland, we trained the COVID-Twitter-BERT (CT-BERT). The model gives a marginal performance increase of 10-30% compared to the base model, with improvements highest in evaluation sets that are closer to the target domain.

I have also been training Norwegian language models. Only five million people worldwide speak Norwegian, and the available textual sources found on the Internet are simply not sufficient for training well performing transformer models. The National Library of Norway (NLN) has positioned itself at the forefront of memory institutions globally by building an enormous collection of digitized materials. Together with Javier de La Rosa, Freddy Wetjen and Svein Arne Brygfjeld at the AiLab of the National Library the resources provided by TRC have enabled us to train robust transformer models for Norwegian with performance on par with English models.

Rowan Zellers

I am a PhD student at the University of Washington advised by Yejin Choi and Ali Farhadi. My main research interest is multimodal commonsense reasoning: teaching machines to reason about "how the world works," both intuitively and in visual scenes, and express that through language. I've helped create several datasets that measure this, such as SWAG, HellaSWAG, and VCR. More recently, I've worked on two models for learning multimodal commonsense: PIGLeT and MERLOT.

Arya Girish Manjaramkar

Hi! My name is Arya Girish Manjaramkar. I am a grade 10 student at North Park Secondary School, Brampton, Canada. A small team and I used Google's TPU Research Cloud (TRC) to fine-tune a large Transformer model (GPT-J) on Open-Source Code. GPT-J is a large six billion parameter model. Hence, it was not possible to handle it on traditional hardware.

The result of this experiment was CodeGenX. It is an Open-Source Visual Studio Code extension focused on Python Code Generation. We are now making it more accessible by making CodeGenX compatible with additional editors and IDEs.

We are currently conducting introductory sessions for students in urban and rural regions. We walk them through important topics in Machine Learning and how they can use CodeGenX to make learning more efficient. These students often do not have valuable resources at their disposal. We are sure that the concepts they learn through these sessions will enable them to understand cutting-edge technologies.

We envision that students will find CodeGenX of great benefit in their educational journey, it will equip them with the right tools to excel at their career prospects and enable them to become valuable contributors in society when they enter the real world.

More Information:

Andrew Yates

Andrew Yates is a Senior Researcher at the Max Planck Institute for Informatics, where he heads a research group at the intersection of Information Retrieval and Natural Language Processing, and teaches related courses at Saarland University. Yates is the lead author of the open-source Capreolus toolkit for building ad hoc retrieval pipelines [1], such as multi-stage pipelines that use an efficient method to rank candidate documents likely to be relevant to a given query before improving the ranking using a less-efficient neural method. Capreolus implements state-of-the-art neural ranking models and facilitates experiments on standard benchmarks. Thanks to the TPU Research Cloud's support, any Capreolus TensorFlow model can seamlessly be run on GPUs or on Cloud TPUs. One example is the PARADE model [2], which was developed using Cloud TPUs by Canjia Li while working in Yates' group. By considering the relationships among passages in a long document, PARADE achieves state-of-the-art effectiveness on several standard benchmarks while requiring substantially fewer parameters than similarly-effective approaches. This approach has also performed well on new tasks like the TREC-COVID Challenge. Moving forward, Yates' group is exploring how query and document representations can be enriched with information about key entities to further improve ranking.

[1] A. Yates, K. M. Jose, X. Zhang, and J. Lin. Flexible IR pipelines with Capreolus. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pages 3181-3188, 2020.

[2] C. Li, A. Yates, S. MacAvaney, B. He, and Y. Sun. PARADE: Passage representation aggregation for document reranking. arXiv:2008.09093, 2020.

Ruiqi Zhong

My name is Ruiqi Zhong, a second-year EECS Ph.D. at UC Berkeley, working on Natural Language Processing.

The community has witnessed a trend that larger pre-trained language models lead to higher accuracy. However, are larger models better at every single datapoint, or is there any datapoint where larger models systematically fail on?

I investigated this question in my ACL 2021 paper. We first observe that it is challenging even to pinpoint datapoints where smaller models are better. The reason is that, even if we train the same-sized model twice with different random seeds, they might disagree on 8% of the datapoint, whereas BERT-Large improves the accuracy over BERT-Base by 2.5%. Therefore, we might observe smaller models to be better on some datapoint merely due to statistical noise.

To de-noise, we pre-train 10 times for each model size and fine-tune 5 times for each pre-trained model with different random seeds. This is extremely computationally heavy, but we managed to accomplish this with Google's TPUs. Along with the help of a new statistical method, we find that larger models are indeed worse on 1-4% of the datapoint across many datasets.

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

Jonathan Frankle

Over the past two years, the Programming Systems Group at MIT (led by Professor Michael Carbin) has used the TPU Research Cloud (TRC) as our primary research infrastructure for a number of projects related to neural network pruning and sparsity, most notably our work on the Lottery Ticket Hypothesis. The TRC made it possible for us to, among several other projects, explore our lottery ticket findings at much larger scales (Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin, ICML 2020), develop state-of-the-art methods for fine-tuning pruned neural networks (Alex Renda, Jonathan Frankle, Michael Carbin, ICLR 2020 Oral), evaluate the state of research in pruning at initialization (Frankle, Dziugaite, Roy, Carbin, ICLR 2021), develop scaling laws to predict the performance of pruned neural networks (Jonathan Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit, ICML 2021), and further work to be released soon. Our style of research involves rigorous empirical analysis of deep learning phenomena, an approach that requires significant amounts of compute to ensure that findings are robust. The TRC has made it possible for us to run each experiment multiple times across a range of settings, allowing us to convincingly present our findings about the nature of neural network sparsity.

Ernesto Mboana

My name is Ernesto from Mozambique, I have a Mathematics Degree from Eduardo Mondlane University, a local University. I have been involved in web development and programming for most part of my professional career and I always looked for ways to conciliate it with my passion for mathematics research. The new dawn of Deep Learning and it's potential, offered such an opportunity, so I found myself reading more and more about it and taking some online courses. I have been particularly interested in Natural Language Processing and how I could integrate with my personal web related projects.

Eventually I found out about TensorFlow Research Cloud (TFRC) and applied. It was a huge opportunity to train and finetune some models and to put to test some ideas I had, eventually I managed to deploy them in some of my projects: QuantoSei

Focusing on offering AI tools for Education and rapid information retrieval, including exams simulation, autocomplete, summarization, question generation, question and answering features, mainly in portuguese language.

NoticiasAI

A local news aggregator website, that aims to distill and create insights from news, including news similarity, sentiment analysis, predominant actions, distillation, summarization and translation.

It has not been an easy journey, most times would not know how to proceed when facing a particular technical difficulty, but additional research and the web has always helped to move forward and master every new technique.

Evan Crothers

Evan Crothers is a Computer Science PhD student at the University of Ottawa, working under the supervision of Dr. Herna Viktor (University of Ottawa) and Dr. Nathalie Japkowicz (American University), where he focuses on applications of large neural network language models to improve trustworthiness of online social spaces.

Evan was previously employed full-time in the Canadian federal public service for 6 years, where he worked to safeguard Canada from violent extremism. He is the youngest-ever recipient of the “Director's Merit Award”, and has further been recognized for his contribution to the Security and Intelligence Threats to Election (SITE) Task Force as part of the G7 Rapid Response Mechanism (RRM), protecting G7 democracies from threats to elections.

Evan's academic research was published in the IEEE MLSP 2019 conference paper Towards the Ethical Detection of Online Influence Campaigns, and focuses on methods of reducing algorithmic bias against non-native English speakers in language models trained to detect foreign influence operations on social media. This work was continued in his Master's thesis, Ethical Detection of Online Influence Campaigns Using Transformer Language Models. TRC made these experiments possible, allowing the development of new ethical and effective methods for detecting online influence campaigns.

Ahmed Elnaggar

Ahmed Elnaggar is a research associate at the Technical University of Munich. His main focus of research is self-supervised learning on various modalities (Text, Protein, Source code, Images, and speech) using high-performance computing. TPU Research Cloud program allowed him to access Google TPUs, which provided enormous computing power to train deep learning language models (LMs). During training, these models utilized billions of unlabeled data, and during inference, they provided an accurate feature representation at low inference costs. Two of his recent breakthroughs are ProtTrans and CodeTrans.

In ProtTrans research, he trained six LMs up to 11 billion parameters on un-labeled data up to up to 393 billion amino acids. These models captured various biophysical features of protein sequences, which for the first time outperformed the state-of-the-art without using evolutionary information for tasks such as secondary structure prediction, thereby bypassing expensive database searches.

In CodeTrans research, he trained various encoder-decoder transformer LMs up to 0.7 billion parameters on 38 million un-labeled source code projects for nine programming languages (Python, Java, Go, Php, Ruby, Javascript, C#, SQL, Lisp) and one human language (English). The fine-tuned models outperformed the state-of-the-art models on thirteen software engineering tasks, including code generation and code documentation generation.

Papers

GitHub repositories

Prof. Amnon Shashua's Lab

In our lab, led by Prof. Amnon Shashua with graduate students Or Sharir, Yoav Levine, Noam Wies, Hofit Bata, and Daniel Jannai, we theoretically investigate the mechanisms behind prominent deep learning techniques, and leverage these insights to drive practical innovations.

Lately, we focus on self-attention networks (aka, Transformers), which facilitated recent breakthroughs in language understanding, and are showing promising signals in various other domains.

Our unexpected theoretical findings below were empirically reinforced by targeted and comprehensive experiments (100s of trained models), facilitated by the TFRC program computational resources.

The depth-to-width interplay (NeurIPS 2020): Depth has long been suggested as Deep Learning's source of success. However, we prove that in self-attention networks the power of depth can only be unlocked when the width of the model is above a certain threshold. Our results point to inefficiencies in commonly used architectures and prescribe a simple practical formula for the optimal depth-width ratio per parameter budget.

Paper

Blogpost

Which Transformer architecture fits my data? (ICML 2021): Just as depth is limited by width, we prove that width itself is limited by the rank of the input vocabulary matrix. This bears special implications for cutting edge efforts for utilizing self-attention in non-language domains (e.g., images).

Paper

Wisdom d'Almeida

Hi! I'm Wisdom and I joined the TRC program in 2018 when it was still TFRC :) Access to the TRC Compute helped me run large-scale experiments on radiology report generation from Chest X-ray images. The idea was to train powerful language models on image and text data jointly, in such a way that models can generate clinically-pertinent natural language reports (including findings and impression sections) at test time, in order to support Chest X-ray disease predictions with some textual evidence or explanation. The output of my work constituted the main portion of my Master's thesis (not online unfortunately), and motivated further research on imbuing clinical awareness into language models with design and data inductive biases. Some of these follow-up works, still backed up by TRC Compute, were presented at venues such as Stanford's Frontier of Assisted Care Scientific Symposium, and the Montreal AI Symposium.