Researcher Spotlight

Jonathan Frankle

Over the past two years, the Programming Systems Group at MIT (led by Professor Michael Carbin) has used the TPU Research Cloud (TRC) as our primary research infrastructure for a number of projects related to neural network pruning and sparsity, most notably our work on the Lottery Ticket Hypothesis. The TRC made it possible for us to, among several other projects, explore our lottery ticket findings at much larger scales (Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, Michael Carbin, ICML 2020), develop state-of-the-art methods for fine-tuning pruned neural networks (Alex Renda, Jonathan Frankle, Michael Carbin, ICLR 2020 Oral), evaluate the state of research in pruning at initialization (Frankle, Dziugaite, Roy, Carbin, ICLR 2021), develop scaling laws to predict the performance of pruned neural networks (Jonathan Rosenfeld, Jonathan Frankle, Michael Carbin, Nir Shavit, ICML 2021), and further work to be released soon. Our style of research involves rigorous empirical analysis of deep learning phenomena, an approach that requires significant amounts of compute to ensure that findings are robust. The TRC has made it possible for us to run each experiment multiple times across a range of settings, allowing us to convincingly present our findings about the nature of neural network sparsity.

Ernesto Mboana

My name is Ernesto from Mozambique, I have a Mathematics Degree from Eduardo Mondlane University, a local University. I have been involved in web development and programming for most part of my professional career and I always looked for ways to conciliate it with my passion for mathematics research. The new dawn of Deep Learning and it's potential, offered such an opportunity, so I found myself reading more and more about it and taking some online courses. I have been particularly interested in Natural Language Processing and how I could integrate with my personal web related projects.

Eventually I found out about TensorFlow Research Cloud (TFRC) and applied. It was a huge opportunity to train and finetune some models and to put to test some ideas I had, eventually I managed to deploy them in some of my projects: QuantoSei

Focusing on offering AI tools for Education and rapid information retrieval, including exams simulation, autocomplete, summarization, question generation, question and answering features, mainly in portuguese language.


A local news aggregator website, that aims to distill and create insights from news, including news similarity, sentiment analysis, predominant actions, distillation, summarization and translation.

It has not been an easy journey, most times would not know how to proceed when facing a particular technical difficulty, but additional research and the web has always helped to move forward and master every new technique.

Evan Crothers

Evan Crothers is a Computer Science PhD student at the University of Ottawa, working under the supervision of Dr. Herna Viktor (University of Ottawa) and Dr. Nathalie Japkowicz (American University), where he focuses on applications of large neural network language models to improve trustworthiness of online social spaces.

Evan was previously employed full-time in the Canadian federal public service for 6 years, where he worked to safeguard Canada from violent extremism. He is the youngest-ever recipient of the “Director’s Merit Award”, and has further been recognized for his contribution to the Security and Intelligence Threats to Election (SITE) Task Force as part of the G7 Rapid Response Mechanism (RRM), protecting G7 democracies from threats to elections.

Evan’s academic research was published in the IEEE MLSP 2019 conference paper Towards the Ethical Detection of Online Influence Campaigns, and focuses on methods of reducing algorithmic bias against non-native English speakers in language models trained to detect foreign influence operations on social media. This work was continued in his Master’s thesis, Ethical Detection of Online Influence Campaigns Using Transformer Language Models. TRC made these experiments possible, allowing the development of new ethical and effective methods for detecting online influence campaigns.

Ahmed Elnaggar

Ahmed Elnaggar is a research associate at the Technical University of Munich. His main focus of research is self-supervised learning on various modalities (Text, Protein, Source code, Images, and speech) using high-performance computing. TPU Research Cloud program allowed him to access Google TPUs, which provided enormous computing power to train deep learning language models (LMs). During training, these models utilized billions of unlabeled data, and during inference, they provided an accurate feature representation at low inference costs. Two of his recent breakthroughs are ProtTrans and CodeTrans.

In ProtTrans research, he trained six LMs up to 11 billion parameters on un-labeled data up to up to 393 billion amino acids. These models captured various biophysical features of protein sequences, which for the first time outperformed the state-of-the-art without using evolutionary information for tasks such as secondary structure prediction, thereby bypassing expensive database searches.

In CodeTrans research, he trained various encoder-decoder transformer LMs up to 0.7 billion parameters on 38 million un-labeled source code projects for nine programming languages (Python, Java, Go, Php, Ruby, Javascript, C#, SQL, Lisp) and one human language (English). The fine-tuned models outperformed the state-of-the-art models on thirteen software engineering tasks, including code generation and code documentation generation.


GitHub repositories

Prof. Amnon Shashua’s Lab

In our lab, led by Prof. Amnon Shashua with graduate students Or Sharir, Yoav Levine, Noam Wies, Hofit Bata, and Daniel Jannai, we theoretically investigate the mechanisms behind prominent deep learning techniques, and leverage these insights to drive practical innovations.

Lately, we focus on self-attention networks (aka, Transformers), which facilitated recent breakthroughs in language understanding, and are showing promising signals in various other domains.

Our unexpected theoretical findings below were empirically reinforced by targeted and comprehensive experiments (100s of trained models), facilitated by the TFRC program computational resources.

The depth-to-width interplay (NeurIPS 2020): Depth has long been suggested as Deep Learning’s source of success. However, we prove that in self-attention networks the power of depth can only be unlocked when the width of the model is above a certain threshold. Our results point to inefficiencies in commonly used architectures and prescribe a simple practical formula for the optimal depth-width ratio per parameter budget.



Which Transformer architecture fits my data? (ICML 2021): Just as depth is limited by width, we prove that width itself is limited by the rank of the input vocabulary matrix. This bears special implications for cutting edge efforts for utilizing self-attention in non-language domains (e.g., images).


Wisdom d'Almeida

Hi! I'm Wisdom and I joined the TRC program in 2018 when it was still TFRC :) Access to the TRC Compute helped me run large-scale experiments on radiology report generation from Chest X-ray images. The idea was to train powerful language models on image and text data jointly, in such a way that models can generate clinically-pertinent natural language reports (including findings and impression sections) at test time, in order to support Chest X-ray disease predictions with some textual evidence or explanation. The output of my work constituted the main portion of my Master's thesis (not online unfortunately), and motivated further research on imbuing clinical awareness into language models with design and data inductive biases. Some of these follow-up works, still backed up by TRC Compute, were presented at venues such as Stanford's Frontier of Assisted Care Scientific Symposium, and the Montreal AI Symposium.