Publications

Explore TRC-supported projects at GitHub

Explore TRC-supported projects at Hugging Face

2025

October

Scaling and Taming Adversarial Training with Synthetic Data

Wu et al.

PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation Framework

Yang et al.

Diffusion Transformers with Representation Autoencoders

Zheng et al.

Beyond Conventional Transformers: A Medical X-ray Attention Block for Improved Multi-Label Diagnosis

Rand & Ibrahim

Beyond Next-Token Prediction: A Performance Characterization of Diffusion versus Autoregressive Language Models

Kim et al.

September

PUREVQ-GAN: Defending Data Poisoning Attacks through Vector-Quantized Bottlenecks

Branch et al.

Comparative Analysis of Chemical Structure String Representations for Neural Machine Translation

Rajan, Zielesny & Steinbeck

Cheminformatics Microservice V3: a web portal for chemical structure manipulation and analysis

Rajan et al.

CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs

Chervov et al.

Artificial Neural Networks and Machine Learning

Senn et al.

BC-predict: mining of signal biomarkers and production of models for early-stage breast cancer subtyping and prognosis

Muthamilselvan, Vaithilingam & Palaniappan

Pre-training under infinite compute

Kim et al.

World Modeling with Probabilistic Structure Integration

Kotar et al.

Experiments with data-augmented modeling of ADME and potency endpoints in the ASAP-Polaris-OpenADMET Antiviral Challenge

Srilakshmi, Tituss & Palaniappan

Building High-Quality Datasets for Portuguese LLMs: From Common Crawl Snapshots to Industrial-Grade Corpora

Almeida, Nogueira & Pedrini

Creating a Large Clean Web Corpus for Turkish

Uzun et al.

Fantastic Pretraining Optimizers and Where to Find Them

Wen et al.

OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning

Liu et al.

August

Democracy-in-Silico: Institutional Design as Alignment in AI-Governed Polities

Srinivasan & Patapati

Age-Normalized HRV Features for Non-Invasive Glucose Prediction: A Pilot Sleep-Aware Machine Learning Study

Azam & Singh

XQuant: Breaking the Memory Wall for LLM Inference with KV Cache Rematerialization

Tomar et al.

Loss Landscape Degeneracy and Stagewise Development in Transformers

Hoogland et al.

Representing Speech Through Autoregressive Prediction of Cochlear Tokens

Tuckute et al.

ReGate: Enabling Power Gating in Neural Processing Units

Xue & Huang

July

HTU at SemEval-2025 Task 11: Divide and Conquer-Multi-Label emotion classification using 6 DziriBERTs submodels with Label-fused Iterative Mask Filling technique for low-resource data augmentation.

Saleh & Biltawi

Clinical-Grade Blood Pressure Prediction in ICU Settings: An Ensemble Framework with Uncertainty Quantification and Cross-Institutional Validation

Azam & Singh

Discovering and using Spelke segments

Venkatesh et al.

Topographic Vision Transformers

Shah & Yamins

Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models

Wu et al.

Taming generative video models for zero-shot optical flow extraction

Kim et al.

Reward Under Attack: Evaluating the Sensitivity of Process Reward Models

Bamba et al.

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful

Marek et al.

MARCUS: Molecular Annotation and Recognition for Curating Unravelled Structures

Rajan et al.

June

Bimodal masked language modeling for bulk RNA-seq and DNA methylation representation learning

Gélard et al.

Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models

Park et al.

MeshSlice: Efficient 2D Tensor Parallelism for Distributed DNN Training

Nam, Gerogiannis & Torrellas

SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification

Wang et al.

Don't throw the baby out with the bathwater: How and why deep learning for ARC

Cole & Osman

Multipole Attention for Efficient Long Context Reasoning

Hooper et al.

Scalable Generalized Bayesian Online Neural Network Training for Sequential Decision Making

Duran-Martin et. al

Visual Pre-Training on Unlabeled Images using Reinforcement Learning

Ghosh & Levine

Diffuse and Disperse: Image Generation with Representation Regularization

Wang & He

A Stable Whitening Optimizer for Efficient Neural Network Training

Franz, Levine & Abbeel

Language Representation Models for Low-and Medium-Resource Languages

Daðason

Cheminformatics Microservice V-3: A Web Portal for Chemical Structure Manipulation and Analysis

Rajan et al.

Is an Exponentially Growing Action Space Really that Bad? Validating a Core Assumption for using Multi-Agent RL

de Kock, Pretorius & Shock

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

Deitke et al.

May

Classification of Epilepsy Seizure Types in Pediatrics Based on Turkish EEG Reports

Aslan et al.

MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs

Liu et al.

Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic

Dash & Bornelöv

Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models

Zhang, Midgely & Hernández-Lobato

Breaking the Performance Ceiling in Complex Reinforcement Learning requires Inference Strategies

Chalumeau et al.

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

Fuller et al.

Mean Flows for One-step Generative Modeling

Geng et al.

A Data-Driven Modeling Pipeline for Quantitative Systems Pharmacology

Denaro

Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Tang et al.

Model-brain comparison using inter-animal transforms

Thobani et al.

Achieving Tokenizer Flexibility in Language Models through Heuristic Adaptation and Supertoken Learning

Sharthak et al.

Variational Rank Reduction Autoencoder

Mounayer et al.

Adaptive, Robust and Scalable Bayesian Filtering for Online Learning

Duran-Martin

Openvision: A fully-open, cost-effective family of advanced vision encoders for multimodal learning

Li et al.

AbBFN2: A flexible antibody foundation model based on Bayesian Flow Networks

Guloglu et al.

Ensemble Distribution Distillation via Flow Matching

Park et al.

April

Crowdsourcing Proposal Supporting Patient Engagement in Parkinson's Disease: A Digital Research Environment (DRE)-Enabled, Patient Swarm Approach to Develop QSP Models

Barrett et al.

Learning Adaptive Parallel Reasoning with Language Models

Pan et al.

Scaling sparse feature circuit finding for in-context learning

Kharlapenko et al.

Are Multimodal LLMs Robust Against Adversarial Perturbations? RoMMath: A Systematic Evaluation on Multimodal Math Reasoning

Zhao et al.

Vision Transformers Beat WideResNets on Small Scale Datasets Adversarial Robustness

Wu et al.

SPIRe: Boosting LLM Inference Throughput with Speculative Decoding

Neelam et al.

3D Scene Understanding Through Local Random Access Sequence Modeling

Lee et al.

Beyond Conventional Transformers: The Medical X-ray Attention (MXA) Block for Improved Multi-Label Diagnosis Using Knowledge Distillation

Rand & Ibrahim

Lightweight Pre-Trained Korean Language Model Based on Knowledge Distillation and Low-Rank Factorization

Kim & Choi

March

PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving

Feng et al.

Unified Multimodal Discrete Diffusion

Swerdlow et al.

MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search

Hu et al.

Cross-Tokenizer Distillation via Approximate Likelihood Matching

Minixhofer, Vulić & Ponti

Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals

Stojanov et al.

PISA Experiments: Exploring Physics Post-Training for Video Diffusion Models by Watching Stuff Drop

Li et al.

PLAN-AND-ACT: Improving Planning of Agents for Long-Horizon Tasks

Erdogan et al.

Beyond Adversarial Robustness: Breaking the Robustness-Alignment Trade-off in Object Recognition

Feng et al.

BirdieDNA: Reward-Based Pre-Training for Genomic Sequence Modeling

Blouir et al.

Performance of Adaptive Stochastic Gradient Descent Optimization Algorithms in Natural Language Processing Tasks

Erdogan et al.

February

Enhancing deep neural networks through complex-valued representations and Kuramoto synchronization dynamics

Muzellec et al.

Fractal Generative Models

Li et al.

Simpler Fast Vision Transformers with a Jumbo CLS Token

Fuller et al.

ETS: Efficient Tree Search for Inference-Time Scaling

Hooper et al.

Robot Data Curation with Mutual Information Estimators

Hejna et al.

EpiFoundation: A Foundation Model for Single-Cell ATAC-seq via Peak-to-Gene Alignment

Wu et al.

MethylProphet: A Generalized Gene-Contextual Model for Inferring Whole-Genome DNA Methylation Landscape

Huang et al.

Augmenting Interaction Effects in Convolutional Networks with Taylor Polynomial Gated Units

Zou, Liu & Dai

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Wang et al.

QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache

Tiwari et al.

January

STP: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving

Dong & Ma

A multi-modal transformer for cell type-agnostic regulatory predictions

Javed et al.

Dynamics of Transient Structure in In-Context Linear Regression Transformers

Carroll et al.

AdaCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Chain-of-Thought

Huang et al.

LiT and Lean: Distilling Listwise Rerankers into Encoder-Decoder Models

Tamber, Pradeep & Lin

Investigating machine moral judgement through the Delphi experiment

Jiang et al.

2024

December

Insights into Low-Resource Language Modelling: Improving Model Performances for South African Languages

Visser, Grobler & Dunaiski

When to Think Step by Step: Computing the Cost-Performance Trade-offs of Chain-of-Thought Prompting

Manglik & Choudhri

General-Purpose Retrieval-Enhanced Medical Prediction Model Using Near-Infinite History

Kim et al.

MERaLiON-TextLLM: Cross-Lingual Understanding of Large Language Models in Chinese, Indonesian, Malay, and Singlish

Huang et al.

Amortising Variational Bayesian Inference over prior hyperparameters with a Normalising Flow

Battaglia & Nicholls

Thinking in space: How multimodal large language models see, remember, and recall spaces

Yang et al.

Evaluating LLMs for Diagnosis Summarization

Santos et al.

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhou et al.

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Zhou et al.

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

Mark et al.

Monet: Mixture of Monosemantic Experts for Transformers

Park et al.

Optimizing Large Language Models for Turkish: New Methodologies in Corpus Selection and Training

Kesgin et al.

Scaling BERT Models for Turkish Automatic Punctuation and Capitalization Correction

Saoud et al.

Improving Vietnamese Legal Document Retrieval using Synthetic Data

Tien et al.

November

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Liu et al.

Development of Pre-Trained Transformer-based Models for the Nepali Language

Thapa et al.

BEThiz: Precise and complete BERT model to fill out questions with correct answers, understanding the context for the Spanish language

Zoukagh

Tissue clocks derived from histological signatures of biological aging enable tissue-specific aging predictions from blood

Abila et al.

Squeezed Attention: Accelerating Long Context Length LLM Inference

Hooper et al.

Searching Latent Program Spaces

Bonnet & Macfarlane

“Knowing When You Don't Know”: A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation

Thakur et al.

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks

Shi, Li & You

Beyond the Boundaries of Proximal Policy Optimization

Tan et al.

October

Birdie: Advancing State Space Language Modeling with Dynamic Mixtures of Training Objectives

Blouir et al.

COMAL: A Convergent Meta-Algorithm for Aligning LLMs with General Preferences

Liu et al.

COADREADx: A comprehensive algorithmic dissection of colorectal cancer unravels salient biomarkers and actionable insights into its discrete progression

Palaniappan, Muthamilselvan & Sarathi

Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback

Miranda et al.

From English-Centric to Effective Bilingual: LLMs with Custom Tokenizers for Underrepresented Languages

Kiulian et al.

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Nakamoto et al.

Cliqueformer: Model-Based Optimization with Structured Transformers

Kuba, Abbeel & Levine

Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging

Morrison et al.

Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR

Minixhofer, Klejch & Bell

One Step Diffusion via Shortcut Models

Frans et al.

AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning

Pramanik et al.

Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws

Jiang et al.

ACER: Automatic Language Model Context Extension via Retrieval

Gao, Zhang & Callan

A community effort to optimize sequence-based deep learning models of gene regulation

Rafi et al.

Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

Qiu et al.

Reversible Decoupling Network for Single Image Reflection Removal

Zhao et al.

Conic Activation Functions

Fu & Cohen

Best Unpacking DPO and PPO: Disentangling Practices for Learning from Preference Feedback

Ivison et al.

ElasticTok: Adaptive Tokenization for Image and Video

Yan et al.

HaloClass: Salt-Tolerant Protein Classification with Protein Language Models

Narang et al.

Temperature Optimization for Bayesian Deep Learning

Ng et al.

Understanding warmup-stable-decay learning rates: A river valley loss landscape perspective

Wen et al.

Fine-tuning large vision-language models as decision-making agents via reinforcement learning

Zhai et al.

Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient

Wang et al.

Distilling an End-to-End Voice Assistant Without Instruction Training Data

Held et al.

Tracking objects that change in appearance with phase synchrony

Muzellec et al.

September

Retrieval-Pretrained Transformer: Long-range Language Modeling with Self-retrieval

Rubin & Berant

Building better models of biological vision by searching for more ecological data diets and learning objectives

Linsley et al.

Protein Sequence Modelling with Bayesian Flow Networks

Atkinson et al.

SPO: Sequential Monte Carlo Policy Optimisation

Macfarlane et al.

Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems

Nam & Lee

MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions

Köksal et al.

Efficient and Unbiased Sampling of Boltzmann Distributions via Consistency Models

Zhang et al.

Benchmarking Quantum Red TEA on CPUs, GPUs, and TPUs

Jaschke et al.

Efficient and Scalable Estimation of Tool Representations in Vector Space

Moon et al.

August

Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs

Yadav, Theodoridis & Tan

Re-Mix: Optimizing Data Mixtures for Large Scale Imitation Learning

Hejna et al.

PHIStruct: Improving phage-host interaction prediction at low sequence similarity settings using structure-aware protein embeddings

Gonzales, Ureta & Shrestha

Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation

Doshi et al.

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition

Cornell et al.

STOUT V2. 0: SMILES to IUPAC name conversion using transformer models

Rajan, Zielesny & Steinbeck

TaPERA: Enhancing Faithfulness and Interpretability in Long-Form Table QA by Content Planning and Execution-based Reasoning

Zhao et al.

The Foundation Model Path to Open-World Robots

Shah

A Novel One-To-One Framework for Relative Camera Pose Estimation

Aydogdu & Demirci

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Xie et al.

July

Autonomous Improvement of Instruction Following Skills via Foundation Models

Zhou et al.

PrivaT5: A Generative Language Model for Privacy Policies

Zoubi et al.

The Use of Clinical Language Models Pretrained on Institutional EHR Data for Downstream Tasks

Suvirat et al.

TTSDS--Text-to-Speech Distribution Score

Minixhofer, Klejch & Bell

TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers

Chen et al.

Keypoint aware masked image modelling

Krishna & Subramanyam

Understanding Reference Policies in Direct Preference Optimization

Liu, Liu & Cohan

Efficient and Scalable Tiny Machine Learning

Banbury

Subject-driven Text-to-Image Generation via Preference-based Reinforcement Learning

Miao et al.

Figuring out Figures: Using Textual References to Caption Scientific Figures

Cao & Liu

Toucan: Many-to-Many Translation for 150 African Language Pairs

Elmadany, Adebara & Abdul-Mageed

Characterizing Prompt Compression Methods for Long Context Inference

Jha et al.

Predicting Emergent Capabilities by Finetuning

Snell et al.

Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?

Martin, Visser & Dunaiski

A Case Study for the Application of the GraphCast AI Model in Hungary

Varga-Balogh et al.

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Sun et al.

NeRF-US: Removing Ultrasound Imaging Artifacts from Neural Radiance Fields in the Wild

Dagli et al.

June

BulkRNABert: Cancer prognosis from bulk RNA-seq based language models

Gélard et al.

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

Tong et al.

Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization

Chalumeau et al.

Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation

Frohmann et al.

SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization

Ahn et al.

Autoregressive Image Generation without Vector Quantization

Li et al.

Generative Visual Instruction Tuning

Hernandez, Villegas & Ordonez

Loss landscape geometry reveals stagewise development of transformers

Wang et al.

ContrastiveMix: Overcoming Code-Mixing Dilemma in Cross-Lingual Transfer for Information Retrieval

Do, Lee & Hwang

COMMIT: Code-Mixing English-Centric Large Language Model for Multilingual Instruction Tuning

Lee, Jung & Hwang

Re-evaluating the Need for Visual Signals in Unsupervised Grammar Induction

Li et al.

ScriptMix: Mixing Scripts for Low-resource Language Parsing

Lee, Lee & Hwang

Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback

Ivison et al.

Pecan: Cost-Efficient ML Data Preprocessing with Automatic Transformation Ordering and Hybrid Placement

Graur et al.

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

Gao et al.

NAFlora-1M: Continental-Scale High-Resolution Fine-Grained Plant Classification Dataset

Park et al.

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

Ren et al.

The 3D-PC: a benchmark for visual perspective taking in humans and machines

Linsley et al.

SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound

Dagli et al.

Robust Modeling through Causal Priors and Data Purification in Machine Learning

Bhat

EvIL: Evolution Strategies for Generalisable Imitation Learning

Sapora et al.

Learning to Explore for Stochastic Gradient MCMC

Kim et al.

May

Scaling White-Box Transformers for Vision

Yang et al.

PureEBM: Universal Poison Purification via Mid-Run Dynamics of Energy-Based Models

Pooladzandi et al.

PureGen: Universal Data Purification for Train-Time Poison Defense via Generative Model Dynamics

Bhat et al.

The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol

Subbotko, Jablonski & Bilinski

Learning the Language of Protein Structure

Gaujac et al.

MambaOut: Do We Really Need Mamba for Vision?

Yu & Wang

Zero-Shot Tokenizer Transfer

Minixhofer, Ponti & Vulić

Probabilistic simulation supports generalizable intuitive physics

Wang et al.

A unifying framework for functional organization in early and higher ventral visual cortex

Margalit et al.

InstructPatentGPT: training patent language models to follow instructions with human feedback

Lee

Towards A Machine Capable of Learning And Discovering Everything

Liu

Saliency strikes back: How filtering out high frequencies improves white-box explanations

Muzellec et al.

BERT2D: Two Dimensional Positional Embeddings for Efficient Turkish NLP

Kaya & Tantuğ

April

Comparing GPU and TPU in an Iterative Scenario: A Study on Neural Network-based Image Generation

Roman, Schaarschmidt & Karl

Accel-NASBench: Sustainable Benchmarking for Accelerator-Aware NAS

Ahmad et al.

Fast Ensembling with Diffusion Schrödinger Bridge

Kim, Yoon & Lee

Expressing and Exploiting Parallelism in Language Model Decoding

Jin et al.

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Tajwar et al.

Ecological Data and Objectives for Human Alignment

Nagaraj et al.

HMAX Strikes Back: Self-supervised Learning of Human-Like Scale Invariant Representations

Pant et al.

Walk a mile in my shoes! 3D visual perspective taking in humans and machines.

Zhou et al.

Automatically Detecting Political Viewpoints in Norwegian Text

Doan et al.

Measuring Cross-lingual Transfer in Bytes

de Souza et al.

A Large-Scale Exploration of µ-Transfer

Lingle

Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining

Ljubešić et al.

gRNAde: Geometric Deep Learning for 3D RNA inverse design

Joshi et al.

Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance

Nam, Heo & Lee

March

Text Filtering Classifiers for Medium-Resource Languages

Daðason & Loftsson

Comparing human and machine visual perception

Veerabadran

DeepFake Video Detection using Vision Transformer

Hussien & Mohamed

IT5: Text-to-text Pretraining for Italian Language Understanding and Generation

Sarti & Nissim

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Lee et al.

BC-Predict: Mining of signal biomarkers and multilevel validation of cascade classifier for early-stage breast cancer subtyping and prognosis

Muthamilselvan & Palaniappan

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

Khazatsky et al.

Deep Manifold Learning for Reading Comprehension and Logical Reasoning Tasks with Polytuplet Loss

Lu & Rodriguez

Humanoidbench: Simulated humanoid benchmark for whole-body locomotion and manipulation

Sferrazza et al.

MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections

Hui et al.

MYTE: Morphology-Driven Byte Encoding for Better and Fairer Multilingual Language Modeling

Limisiewicz et al.

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Huang et al.

Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages

van Noord et al.

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

Khan et al.

Advancements in Hand-Drawn Chemical Structure Recognition through an Enhanced DECIMER Architecture

Rajan et al.

Eyes wide shut? exploring the visual shortcomings of multimodal llms

Tong et al.

A generative model of symmetry transformations

Allingham et al.

Can a Confident Prior Replace a Cold Posterior?

Marek, Paige & Izmailov

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

Li et al.

February

Set the Clock: Temporal Alignment of Pretrained Language Models

Zhao et al.

Monkeys engage in visual simulation to solve complex problems

Ahuja et al.

What Evidence Do Language Models Find Convincing?

Wan, Wallace & Klein

EVEDIT: Event-based Knowledge Editing with Deductive Editing Boundaries

Liu et al.

World Model on Million-Length Video And Language With RingAttention

Liu et al.

SMX: Sequential Monte Carlo Planning for Expert Iteration

Macfarlane et al.

Cacophony: An Improved Contrastive Audio-Text Model

Zhu & Duan

Should I try multiple optimizers when fine-tuning pre-trained Transformers for NLP tasks? Should I tune their hyperparameters?

Gkouti et al.

Universal Neural Functionals

Zhou, Finn & Harrison

Curvature-Informed SGD via General Purpose Lie-Group Preconditioners

Pooladzandi & Li

The Developmental Landscape of In-Context Learning

Hoogland et al.

Whispering in Norwegian: Navigating Orthographic and Dialectic Challenges

Kummervold et al.

January

DeepPrism: Channel Convolution for Lightweight Generative Models

Fu & Cohen

Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks

Yildirim

TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation

Uludoğan et al.

Industry-sensitive language modeling for business

Borchert et al.

Synthesizing Moving People with 3D Control

Li et al.

Attentive Fusion: A Transformer-based Approach to Multimodal Hate Speech Detection

Mandal et al.

Conic Linear Units: Improved Model Fusion and Rotational-Symmetric Generative Model

Fu & Cohen

FMB: a Functional Manipulation Benchmark for Generalizable Robotic Learning

Luo et al.

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

Ma et al.

Revisiting Adversarial Training at Scale

Wang et al.

Image Sculpting: Precise Object Editing with 3D Geometry Control

Yenphraphai et al.

Cheetah: Natural Language Generation for 517 African Languages

Adebara, Elmadany & Abdul-Mageed

2023

December

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

Lu et al.

BantuLM: Enhancing Cross-Lingual Learning in the Bantu Language Family

Mohamed et al.

Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models

Tamber, Pradeep & Lin

Discovering modular solutions that generalize compositionally

Schug et al.

Diffusion Models With Learned Adaptive Noise

Sahoo et al.

NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation

Thakur et al.

OCaTS: an Online Cost-aware Teacher-Student Framework to Reduce the Calls to Large Language Models

Stogiannidis

Understanding Physical Dynamics with Counterfactual World Modeling

Venkatesh et al.

CogMemLM: Human-Like Memory Mechanisms Improve Performance and Cognitive Plausibility of LLMs

Thoma et al.

Compress & Align: Curating Image-Text Data with Human Knowledge

Zhang et al.

Better Quality Pre-training Data and T5 Models for African Languages

Oladipo et al.

Multilingual Lottery Tickets to Pretrain Language Models

Lee & Hwang

JASMINE: Arabic GPT Models for Few-Shot Learning

Nagoudi et al.

Dolphin: A Challenging and Diverse Benchmark for Arabic NLG

Nagoudi et al.

An LLM Compiler for Parallel Function Calling

Kim et al.

QTSumm: Query-Focused Summarization over Tabular Data

Zhao et al.

Rejuvenating image-GPT as Strong Visual Representation Learners

Ren et al.

Sequential Modeling Enables Scalable Learning for Large Vision Models

Bai et al.

November

Offline RL for generative design of protein binders

Tarasov et al.

Neural Rendering in the Cloud with Tensor Processing Unit

Soto-Chirinos, Condori-Alejo & Alzamora

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Ivison et al.

Striped Attention: Faster Ring Attention for Causal Transformers

Brandon et al.

Improving Korean NLP Tasks with Linguistically Informed Subword Tokenization and Sub-character Decomposition

Jeon et al.

MixCon3D: Synergizing Multi-View and Cross-Modal Contrastive Learning for Enhancing 3D Representation

Gao

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

Gandhi, von Platen & Rush

October

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysis

Meulemans et al.

To repeat or not to repeat: Insights from scaling llm under token-crisis

Xue et al.

IACS-LRILT: Machine Translation for Low-Resource Indic Languages

Suman et al.

A Holistic Approach to Unifying Automatic Concept Extraction and Concept Importance Estimation

Fel et al.

Beyond Invariance: Test-Time Label-Shift Adaptation for Addressing "Spurious" Correlations

Sun et al.

Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Grinsztajn et al.

Speculative Decoding with Big Little Decoder

Kim et al.

Blockwise Parallel Transformers for Large Context Models

Liu & Abbeel

Proving test set contamination in black box language models

Oren et al.

Combinatorial Optimization with Policy Adaptation using Latent Space Search

Chalumeau et al.

Recurrent Linear Transformers

Pramanik

Octopus: A Multitask Model and Toolkit for Arabic Natural Language Generation

Elmadany, Nagoudi & Abdul-Mageed

Cache me if you Can: an Online Cost-aware Teacher-Student framework to Reduce the Calls to Large Language Models

Stogiannidis et al.

Kunstig intelligens og Nasjonalbiblioteket

Brygfjeld

GUANinE v1. 0: Benchmark Datasets for Genomic AI Sequence-to-Function Models

robson & Ioannidis

NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

Sridhar et al.

TabLib: A Dataset of 627M Tables with Context

Eggert et al.

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

Chen et al.

Lemur: Harmonizing Natural Language and Code for Language Agents

Xu et al.

FedConv: Enhancing Convolutional Neural Networks for Handling Data Heterogeneity in Federated Learning

Xu et al.

Ring Attention with Blockwise Transformers for Near-Infinite Context

Liu, Zaharia & Abbeel

A Foundational Large Language Model for Edible Plant Genomes

Mendoza-Revilla et al.

September

A Manual Evaluation Method of Neural MT for Indigenous Languages

Wiechetek, Pirinen & Kummervold

FinAraT5: A text to text model for financial Arabic text understanding and generation

Zmandar, El-Haj & Rayson

An ML approach to resolution of singularities

Bérczi, Fan & Zeng

Transformer-VQ: Linear-Time Transformers via Vector Quantization

Lingle

Extracting the gamma-ray source-count distribution below the Fermi-LAT detection limit with deep learning

Amerio, Cuoco & Fornengo

SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning

Wang et al.

MIRACL: A Multilingual Retrieval Dataset Covering 18 Diverse Languages

Zhang et al.

August

Cabrita: closing the gap for foreign languages

Larcher et al.

A Recycling Training Strategy for Medical Image Segmentation with Diffusion Denoising Models

Fu et al.

BridgeData V2: A Dataset for Robot Learning at Scale

Walke et al.

SE (3) Equivariant Augmented Coupling Flows

Midgley et al.

Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation

Chalcroft et al.

QDax: A Library for Quality-Diversity and Population-based Algorithms with Hardware Acceleration

Chalumeau et al.

Using machine learning for automated de-identification and clinical coding of free text data in electronic medical records

Liu

Harmonizing the visual strategies of image-computable models with humans yields more performant and interpretable models of primate visual system function.

Rodriguez et al.

July

Light and fast language models for spanish through compression technique

Cañete López

Deciphering “the language of nature”: A transformer-based language model for deleterious mutations in proteins

Jiang, Fang & Wang

Learning to Model the World with Language

Lin et al.

Developing and Evaluating Tiny to Medium-Sized Turkish BERT Models

Kesgin, Yüce & Amasyali

Synthetic Faces For Robots

Bosschart

PASTA: Pretrained Action-State Transformer Agents

Boige et al.

Gradient strikes back: How filtering out high frequencies improves explanations

Muzellec et al.

HINT: Hypernetwork Instruction Tuning for Efficient Zero-and Few-Shot Generalisation

Ivison et al.

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

Doddapaneni et al.

Focused Transformer: Contrastive Training for Context Scaling

Tworkowski et al.

Reinforcement Learning from Passive Data via Latent Intentions

Ghosh, Bhateja & Levine

June

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Yadav et al.

SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design

Guo

CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a\$10,000 Budget; An Extra\$4,000 Unlocks 81.8% Accuracy

Li, Wang & Xie

SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate through Compiler Co-design

Guo

AmnioML: Amniotic Fluid Segmentation and Volume Prediction with Uncertainty Quantification

Csillag et al.

Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression

Raventós et al.

Script, Language, and Labels: Overcoming Three Discrepancies for Low-Resource Language Specialization

Lee, Lee & Hwang

ViNT: A Foundation Model for Visual Navigation

Shah et al.

Long-range Language Modeling with Self-retrieval

Rubin & Berant

Sampling from Gaussian Process Posteriors using Stochastic Gradient Descent

Lin et al.

Understanding and Mitigating Hardware Failures in Deep Learning Training Systems

He et al.

Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX

Bonnet et al.

Resolution based Incremental Scaling Methodology for CNNs

Lim, Lee & Ha

Anticipatory Music Transformer

Thickstun et al.

HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication

Üveges & Ring

SqueezeLLM: Dense-and-Sparse Quantization

Kim et al.

RoBERTweet: A BERT Language Model for Romanian Tweets

Tăiatu et al.

Gradient-Informed Quality Diversity for the Illumination of Discrete Spaces

Boige et al.

Performance-optimized deep neural networks are evolving into worse models of inferotemporal visual cortex

Linsley et al.

Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception

Linsley et al.

LexGPT 0.1: pre-trained GPT-J models with Pile of Law

Lee

Unifying (Machine) Vision via Counterfactual World Modeling

Bear et al.

Extracting Reward Functions from Diffusion Models

Nuti, Franzmeyer & Henriques

Masked Autoencoders with Multi-Window Attention Are Better Audio Learners

Yadav et al.

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Subramanian et al.

May

Full Stack Optimization of Transformer Inference

Kim et al.

Temporally Consistent Transformers for Video Generation

Yan et al.

Hardware Software Co-design and Architectural Optimization of Deep Learning Models for Natural Language Processing

Wattanawong & Keutzer

Blockwise Parallel Transformer for Long Context Large Models

Liu & Abbeel

Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation

Minixhofer, Pfeiffer & Vulic

Self Information Update for Large Language Models through Mitigating Exposure Bias

Yu & Ji

Emergent Agentic Transformer from Chain of Hindsight Experience

Liu & Abbeel

The false promise of imitating proprietary language models

Gudibande et al.

Beyond Model Efficiency: Data Optimizations for Machine Learning Systems

Kuchnik

GPTAraEval: A Comprehensive Evaluation of ChatGPT on Arabic NLP

Khondaker et al.

BERT を用いた日本語の意味変化の分析

小林千真

Difference-Masking: Choosing What to Mask in Continued Pretraining

Wilf et al.

Video Prediction Models as Rewards for Reinforcement Learning

Escontrela et al.

Exploring Large Language Models for Classical Philology

Riemenschneider & Frank

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

Minixhofer, Pfeiffer & Vulic

Training Diffusion Models with Reinforcement Learning

Black et al.

An Inverse Scaling Law for CLIP Training

Li, Wang & Xie

Varta: A Large-Scale Headline-Generation Dataset for Indic Languages

Aralikatte et al.

Harnessing the Power of BERT in the Turkish Clinical Domain: Pretraining Approaches for Limited Data Scenarios

Türkmen et al.

Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation

Kirstain et al.

Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation

Phan et al.

Poisoning Language Models During Instruction Tuning

Wan et al.

April

Tevatron: An efficient and flexible toolkit for neural retrieval

Gao et al.

Evaluation and optimization of sequence-based gene regulatory deep learning models

Rafi et al.

Improving Conversational Passage Re-ranking with View Ensemble

Ju et al.

PP-NAS: Searching for Plug-and-Play Blocks on Convolutional Neural Networks

Xiao et al.

Factorized visual representations in the primate visual system and deep neural networks

Lindsey & Issa

Differentiable RandAugment: Learning Selecting Weights and Magnitude Distributions of Image Transformations

Xiao et al.

Effect of Tokenization Granularity for Turkish Large Language Models

Kaya & Tantu

Offensive Language Detection in Social Media Using Transformers and Importance of Pre-training

Saraclar et al.

Astroformer: More Data Might Not be All You Need for Classification

Dagli

A Recipe for Training Large Models

Dayma

Automated Diagnosis Code Assignment of Thai Free-text Clinical Notes

Suvirat et al.

Polytuplet Loss: A Reverse Approach to Training Reading Comprehension and Logical Reasoning Models

Lu & Rodriguez

March

Reduce, Reuse, Recycle: Selective Reincarnation in Multi-Agent Reinforcement Learning

Formanek et al.

InceptionNeXt: When Inception Meets ConvNeXt

Yu et al.

The quality-diversity transformer: Generating behavior-conditioned trajectories with decision transformers

Macé et al.

Understanding Masked Autoencoders via Hierarchical Latent Variable Models

Kong et al.

C2LIR: Continual Cross-Lingual Transfer for Low-Resource Information Retrieval

Lee et al.

Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering

Tamber, Pradeep & Lin

PyGaggle: A Gaggle of Resources for Open-Domain Question Answering

Pradeep et al.

Importance of Aligning Training Strategy with Evaluation for Diffusion Models in 3D Multiclass Segmentation

Fu et al.

Towards Learning Emotion Information From Short Segments Of Speech

Purohit et al.

Magnushammer: A Transformer-based Approach to Premise Selection

Mikula et al.

February

Big Little Transformer Decoder

Kim et al.

Traversing Between Modes in Function Space for Fast Ensembling

Yun et al.

Decoupled Training for Long-Tailed Classification With Stochastic Representations

Nam, Jang & Lee

Connecting representation and generation via masked vision-language transformer

Geng et al.

Martingale Posterior Neural Processes

Lee et al.

Language-Driven Representation Learning for Robotics

Karamcheti et al.

DECIMER.ai - An open platform for automated optical chemical structure identification, segmentation and recognition in scientific publications

Rajan et al.

Toward denoising of 3D CT scans with few data

Liang et al.

DarijaBERT: A Step Forward in NLP for the Written Moroccan Dialect

Gaanoun et al.

AIOSA: An approach to the automatic identification of obstructive sleep apnea events based on deep learning

Bernardini et al.

Languages are Rewards: Hindsight Finetuning using Human Feedback

Liu, Sferrazza & Abbeel

Bioformer: an efficient transformer language model for biomedical text mining

Fang et al.

Model-based Policy Optimization under Approximate Bayesian Inference

Wang, Chen & Murphy

January

プロットで拡張した小説生成システムによる短編製作支援

後藤諒也 & 伊藤克亘

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling

Elnaggar et al.

Does progress on ImageNet transfer to real world datasets?

Fang, Kornblith & Schmidt

Benchmarking Robustness in Neural Radiance Fields

Wang et al.

2022

December

SkinFormer: Robust Vision Transformer for Automatic Skin Disease Identification

Osman et al.

Generative Approach for Gender Rewriting Task with ArabicT5

Alrowili & Vijay-Shanker

Generating Classical Arabic Poetry using Pre-trained Models

ElOraby et al.

Cervical Cancer Screening on Multi-class Imbalanced Cervigram Dataset using Transfer Learning

Saini & Susan

Deep Learning Methodology for Early Detection and Outbreak Prediction of Invasive Species Growth

Elias

HINT: Hypernetwork Instruction Tuning for Efficient Zero-Shot Generalisation

Ivison et al.

Localization vs. Semantics: How Can Language Benefit Visual Representation Learning?

Li et al.

RoSummary: Control Tokens for Romanian News Summarization

Niculescu, Ruseti & Dascalu

Exploring Learning Rate Scaling Rules for Distributed ML Training on Transient Resources

André, Strati & Klimovic

ManyFold: an efficient and flexible library for training and validating protein folding models

Villegas-Morcillo et al.

Non-generalizability of biomarkers for mortality in SARS-CoV-2: a meta-analyses series

Shuvo et al.

CRMnet: a deep learning model for predicting gene expression from large regulatory sequence datasets

Ding et al.

The Effect of Data Dimensionality on Neural Network Prunability

Ankner et al.

November

Sistema prototipo de renderización neural en la nube con TPU para determinar la capacidad de producción de tres arquitecturas

Soto Chirinos

Semi-supervised Automated Clinical Coding Using International Classification of Diseases

Hlynsson et al.

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

Richter & Pal

NepBERTa: Nepali Language Model Trained in a Large Corpus

Gautam, Timilsina & Bhattarai

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Bonnet, Midgley & Laterre

MFinBERT: Multilingual Pretrained Language Model For Financial Domain

Nguyen et al.

Learning to Generate Data by Estimating Gradients of the Data Distribution

Song

Investigating Multilingual Approaches for Parsing Universal Dependencies

Barry

Posterior Matching for Arbitrary Conditioning

Strauss & Oliva

LERT: A Linguistically-motivated Pre-trained Language Model

Cui et al.

Harmonizing the object recognition strategies of deep neural networks with humans

Fel et al.

Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of L2 Regularization

Sun

Generative adversarial networks and diffusion models in material discovery

Alverson et al.

October

Random Sharpness-Aware Minimization

Liu et al.

Offline RL With Realistic Datasets: Heteroskedasticity and Support Constraints

Singh et al.

Learning Probabilistic Models from Generator Latent Spaces with Hat EBM

Hill et al.

Pruning's Effect on Generalization Through the Lens of Training and Regularization

Jin et al.

ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition

Gandhi, von Platen & Rush

Legal-Tech Open Diaries: Lesson learned on how to develop and deploy light-weight models in the era of humongous Language Models

Maroudas et al.

Instruction-Following Agents with Jointly Pre-Trained Vision-Language Models

Liu et al.

MetaFormer Baselines for Vision

Yu et al.

Do Language Models Understand Measurements?

Park, Ryu & Choi

Bioberturk: Exploring Turkish Biomedical Language Model Development Strategies in Low Resource Setting

Türkmen et al.

A Comprehensive Analysis of Subword Tokenizers for Morphologically Rich Languages

Erkaya

Optimizing Hierarchical Image VAEs for Sample Quality

Luhman & Luhman

MTet: Multi-domain Translation for English and Vietnamese

Ngo et al.

Integrative dissection of gene regulatory elements at base resolution

Chen et al.

EleutherAI: Going Beyond “Open Science” to “Science in the Open”

Phang et al.

IndoLib: A Natural Language Processing Toolkit for Low-Resource South Asian Languages

Timalsina

Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials

Kumar et al.

ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Nguyen, Zheng & Grover

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

Chalkidis et al.

Population-Based Reinforcement Learning for Combinatorial Optimization

Grinsztajn, Furelos-Blanco & Barrett

Temporally Consistent Video Transformer for Long-Term Video Prediction

Yan et al.

September

The Effectiveness of Bidirectional Generative Patent Language Models

Lee

Hide and Seek in a Gridworld

Chen

More with Less: ZeroQA and Relevant Subset Selection for AI2 Reasoning Challenge

Patrascu et al.

Understanding Pure CLIP Guidance for Voxel Grid NeRF Models

Lee & Chang

Masked Autoencoders are Secretly Efficient Learners

Wei et al.

Learning by Distilling Context

Snell, Klein & Zhong

GazeRadar: A Gaze and Radiomics-Guided Disease Localization Framework

Bhattacharya, Jain & Prasanna

A Light Recipe to Train Robust Vision Transformers

Debenedetti, Sehwag & Mittal

Developing Pretrained Language Models for Turkish Biomedical Domain

Türkmen et al.

Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT

Malhas & Elsayed

Diabetic Retinopathy Screening using Deep Learning for Multi-class Imbalanced Datasets

Saini & Susan

August

Text-to-Text Multi-view Learning for Passage Re-ranking

Ju, Yang & Wang

Project Nabla

Chen

Conviformers: Convolutionally guided Vision Transformer

Vaishnav et al.

ViP: Unified Certified Detection and Recovery for Patch Attack with Vision Transformers

Li, Zhang & Xie

Masked Autoencoders Enable Efficient Knowledge Distillers

Bai et al.

NUS-IDS at CheckThat!2022: Identifying Check-worthiness of Tweets using CheckthaT5

Du, Gollapalli & Ng

Exploring Biomedical Question Answering with BioM-Transformers At BioASQ10B challenge: Findings and Techniques

Alrowili & Vijay-Shanker

Presentation on AR & VR work for the CIEHF

Pelham

TPU Research Cloud体験記 with JAX/Flax

Onishi

Fine-tuning DALL·E Mini (Craiyon) to Generate Blogpost Images

Turc

July

Is Attention always needed? A Case Study on Language Identification from Speech

Mandal et al.

On Automatic Summarization of Dutch Legal Cases

Prijs

AfriTeVa: Extending “Small Data” Pretraining Approaches to Sequence-to-Sequence Models

Ogundepo et al.

Lightweight Transformers for Conversational AI

Pressel et al.

SSBNet: Improving Visual Recognition Efficiency by Adaptive Sampling

Kwan & Song

Detecting and mitigating issues in image-based COVID-19 diagnosis

Silva, Rezende & Ponti

Recurrent Connections in the Primate Ventral Visual Stream Mediate a Trade-Off Between Task Performance and Network Size During Core Object Recognition

Nayebi et al.

BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla

Bhattacharjee et al.

Yes, No or IDK: The Challenge of Unanswerable Yes/No Questions

Sulem, Hay & Roth

Language Modelling with Pixels

Rust et al.

Use of Deep Learning to Detect the Maternal Heart Rate and False Signals on Fetal Heart Rate Recordings

Boudet et al.

June

BasqueGLUE: A Natural Language Understanding Benchmark for Basque

Urbizu et al.

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation

Nam et al.

Evaluating Generative Patent Language Models

Lee

Thermalization and localization in isolated many-body quantum systems

Morningstar

Unified-IO

Lu et al.

Pre-training and Evaluating Transformer-based Language Models for Icelandic

Daðason & Loftsson

A Dense Representation Framework for Lexical and Semantic Matching

Lin & Lin

Insights into Pre-training via Simpler Synthetic Tasks

Wu, Li & Liang

Cachew: Machine Learning Input Data Processing as a Service

Graur et al.

Visualizing attention zones in machine reading comprehension models

Cui, Zhang & Liu

Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks

Schaefer et al.

Xplique: A Deep Learning Explainability Toolbox

Fel et al.

Can CNNs Be More Robust Than Transformers?

Wang et al.

Channelized Axial Attention - Considering Channel Relation within Spatial Attention for Semantic Segmentation

Huang et al.

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Kim et al.

May

Divide to adapt: Mitigating confirmation bias for domain adaptation of black-box predictors

Yang et al.

FaceMAE: Privacy-Preserving Face Recognition via Masked Autoencoders

Wang et al.

Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction

Kolluru et al.

Describing Differences between Text Distributions with Natural Language

Zhong et al.

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

Notin et al.

EBM Life Cycle: MCMC Strategies for Synthesis, Defense, and Density Modeling

Hill et al.

hmBERT: Historical Multilingual Language Models for Named Entity Recognition

Schweter et al.

Multimodal Masked Autoencoders Learn Transferable Representations

Geng et al.

Zero-Shot and Few-Shot Learning for Lung Cancer Multi-Label Classification using Vision Transformer

Guo & Fan

Unsupervised Corpus Aware Language Model Pre-training for Dense Passage Retrieval

Gao & Callan

Inception Transformer

Si et al.

BanglaNLG and BanglaT5: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla

Bhattacharjee et al.

Semi-self-supervised Automated ICD Coding

Hlynsson et al.

xcit

Dagli

Generating Disentangled Arguments With Prompts: A Simple Event Extraction Framework That Works

Si et al.

Multilingual multi-aspect explainability analyses on machine reading comprehension models

Cui et al.

ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language

Phan et al.

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Liu et al.

Long Document Re-ranking with Modular Re-ranker

Gao & Callan

Odor Descriptor Understanding through Prompting

Sisson

On the Design of 2D Human Pose Estimation Networks using Accelerated Neuroevolution and Novel Keypoint Representations

McNally

April

Reliable label correction is a good booster when learning with extremely noisy labels

Wang et al.

Adversarially robust vision transformers

Debenedetti

Hierarchical Label-wise Attention Transformer Model for Explainable ICD Coding

Liu et al.

Cross-stitched Multi-modal Encoders

Singla et al.

Multi-label topic classification for COVID-19 literature with Bioformer

Fang & Wang

Density Matrix Renormalization Group with Tensor Processing Units

Ganahl et al.

What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Wang et al.

Scalable Semi-Modular Inference with Variational Meta-Posteriors

Carmona & Nicholls

Learning Generative Models with Energy-Based Models and Transformer GANs

Lee

March

Pytorch Hyperparameter Optimization on TPUs

Suess

Towards Efficient and Scalable Sharpness-Aware Minimization

Liu et al.

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

Kuchnik et al.

Skillful Precipitation Nowcasting — An Implementation of DeepMind's DGMR

Hassaan

Learning neural audio features without supervision

Yadav & Zeghidour

STaR: Bootstrapping Reasoning With Reasoning

Zelikman, Wu & Goodman

PACS: A Dataset for Physical Audiovisual CommonSense Reasoning

Yu et al.

Dynamics of Transmon Ionization

Shillito et al.

KinyaBERT: a Morphology-aware Kinyarwanda Language Model

Nzeyimana & Rubungo

PERT: Pre-training BERT with permuted language model

Cui, Yang & Liu

Tevatron: An Efficient and Flexible Toolkit for Dense Retrieval

Gao et al.

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons

Mohankumar & Khapra

IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation

Sarti & Nissim

Adversarial Robustness of Neural-Statistical Features in Detection of Generative Transformers

Crothers et al.

The evolution, evolvability and engineering of gene regulatory DNA

Vaishnav et al.

February

Mastering the Game of Abalone using Deep Reinforcement Learning and Self-Play

Claußen

Development of a Construction Specialized Pretrained Language Model

Kim

audax

Yadav

CEDILLE: A large autoregressive language model in French

Müller & Laurent

Tensor Processing Units as Quantum Chemistry Supercomputers

Pederson et al.

Environment Classification for Robotic Leg Prostheses and Exoskeletons Using Deep Convolutional Neural Networks

Laschowski et al.

JaQuAD: Japanese Question Answering Dataset for Machine Reading Comprehension

So et al.

January

Our Summer of Code Project on TF-GAN

P A, Maynard-Reid & Shor

BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA

Alrowili & Vijay-Shanker

Interactive Gated Decoder for Machine Reading Comprehension

Cui et al.

Making and Using AI in the Library: Creating a BERT Model at the National Library of Sweden

Haffenden, Fano & Malmsten

A Large and Diverse Arabic Corpus for Language Modeling

Ali

MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound

Zellers et al.

2021

December

NLP Tasks with GreekLegalBERT v2

Apostolopoulou & Briakos

BeerAI

Beckmann

Information retrieval and question answering: A case study on COVID-19 scientific literature

Otegi et al.

How and What to Learn: Taxonomizing Self-Supervised Learning for 3D Action Recognition

Tanfous et al.

BiCSNet: A Bidirectional Cross-Scale Backbone for Recognition and Localization

Peng et al.

Learned Queries for Efficient Local Attention

Arar, Shamir & Bermano

CPPE-5: Medical Personal Protective Equipment Dataset

Dagli & Shaikh

Large Scale Distributed Linear Algebra With Tensor Processing Units

Lewis et al.

Mask-combine Decoding and Classification Approach for Punctuation Prediction with real-time Inference Constraints

Minixhofer, Klejch & Bell

WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models

Minixhofer, Paischer & Rekabsaz

S3NAS: Fast Hardware-aware Neural Architecture Search Methodology

Lee et al.

Densifying Sparse Representations for Passage Retrieval by Representational Slicing

Lin & Lin

A Novel Method for Automated Identification and Prediction of Invasive Species Using Deep Learning

Elias

November

Interpreting intermediate feature representations of raw-waveform deep CNNs by sonification

Yadav

RoGPT2: Romanian GPT2 for Text Generation

Niculescu, Ruseti & Dascalu

ArabicTransformer: Efficient Large Arabic Language Model with Funnel Transformer and ELECTRA Objective

Alrowili & Vijay-Shanker

Simulation of quantum physics with Tensor Processing Units: brute-force computation of ground states and time evolution

Hauru et al.

MetaFormer is Actually What You Need for Vision

Yu et al.

Building Keras from Source: A Follow-Along Guide

Kane

RoBERTuito: a pre-trained language model for social media text in Spanish

Pérez et al.

Simulation of quantum many-body dynamics with Tensor Processing Units: Floquet prethermalization

Morningstar et al.

Solving Inverse Problems in Medical Imaging with Score-Based Generative Models

Song et al.

October

StaResGRU-CNN with CMedLMs:A stacked residual GRU-CNN with pre-trained biomedical language models for predictive intelligence

Ni et al.

DARLING: Deep leARning for chemicaL InformationprocessinG

Rajan

Do We Know What We Don't Know? Studying Unanswerable Questions beyond SQuAD 2.0

Sulem, Hay & Roth

Post-correction of OCR Results Using Pre-trained Language Model

Piotrowski

Search-Augmented Question Answering System Using Multilingual Transformer Model

Piotrowski

MutFormer: A context-dependent transformer-based model to predict pathogenic missense mutations

Jiang, Fang & Wang

Redesigning the Transformer Architecture with Insights from Multi-particle Dynamical Systems

Datta et al.

CrabNet for explainable deep learning in materials science: bridging the gap between academia and industry

Wang et al.

Multitask Prompted Training Enables Zero-Shot Task Generalization

Sanh et al.

AIのべりすと

Kousin

Delphi: Towards Machine Ethics and Norms

Jiang et al.

The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design

Levine et al.

Cut the CARP: Fishing for zero-shot story evaluation

Matiana et al.

ResNet strikes back: An improved training procedure in timm

Wightman, Touvron & Jégou

September

Revisiting transposed convolutions for interpreting raw waveform sound event recognition CNNs by sonification

Yadav & Foster

Training on Test Data with Bayesian Adaptation for Covariate Shift

Zhou & Levine

O-JMeSH: creating a bilingual English-Japanese controlled vocabulary of MeSH UIDs through machine translation and mutual information

Soares et al.

JAX vs PyTorch: A simple transformer benchmark

Nolan

The Challenge of Appearance-Free Object Tracking with Feedforward Neural Networks

Malik et al.

AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation

Nagoudi, Elmadany & Abdul-Mageed

Clustering Monolingual Vocabularies to Improve Cross-Lingual Generalization

Bassani

Pretrained Neural Models for Turkish Text Classification

Okur & Sertbaş

Augmenting BERT-style Models with Predictive Coding to Improve Discourse-level Representations

Araujo et al.

BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation

Xu, Van Durme & Murray

ReasonBERT: Pre-trained to Reason with Distant Supervision

Deng et al.

ChessCoach

Butner

Performance of chemical structure string representations for chemical image recognition using transformers

Rajan, Zielesny & Steinbeck

An Approach to Extractive Bangla Question Answering Based On BERT-Bangla And BQuAD

Saha et al.

TRC로 월 몇만원에 GPU 수십개급의.. TPU 사용 가능

Lee

Characterizing Possible Failure Modes in Physics-Informed Neural Networks

Krishnapriyan et al.

An Empirical Exploration in Quality Filtering of Text Data

Gao

August

End-to-End Differentiability and Tensor Processing Unit Computing to Accelerate Materials' Inverse Design

Liu et al.

Understanding Attention in Machine Reading Comprehension

Cui et al.

huggingface를 이용한 한국어 BART 학습 후기

Park

Making of #IAN

Kirchner

End-to-end Biomedical Question Answering via Bio-AnswerFinder and Discriminative Language Representation Models

Ozyurt

Large Biomedical Question Answering Models with ALBERT and ELECTRA

Alrowili & Vijay-Shanker

Pretrained Transformers for Text Ranking: BERT and Beyond

Lin, Nogueira & Yates

Understanding The Computational Demands Underlying Visual Reasoning

Vaishnav et al.

Using Computational Analysis of Behavior To Discover Developmental Change In Memory-Guided Attention Mechanisms In Childhood

Amso et al.

July

Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search

Pradeep et al.

gaBERT — an Irish Language Model

Barry et al.

Simple Allocation Rules and Optimal Portfolio Choice Over the Lifecycle

Duarte et al.

DiffIR: Exploring Differences in Ranking Models' Behavior

Jose et al.

Exploiting Generative Self-Supervised Learning For The Assessment Of Biological Images With Lack Of Annotations: A Covid-19 Case-Study

Mascolini et al.

ViTGAN: Training GANs with Vision Transformers

Lee et al.

Exploring Listwise Evidence Reasoning with T5 for Fact Verification

Jiang, Pradeep & Lin

June

Job Descriptions Keyword Extraction using Attention based Deep Learning Models with BERT

Mahdi et al.

Dangers of Bayesian Model Averaging under Covariate Shift

Izmailov et al.

LV-BERT: Exploiting Layer Variety for BERT

Yu et al.

Accessing your TPUs in Docker Containers with TPU VM

Nguyen

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study

Khemchandani et al.

Training Data Augmentation for Code-Mixed Translation

Gupta, Vavre & Sarawagi

GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model

Wang & Komatsuzaki

Concurrent Adversarial Learning for Large-Batch Training

Liu et al.

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

Zellers et al.

Which transformer architecture fits my data? A vocabulary bottleneck in self-attention

Wies et al.

May

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Yao et al.

Flexible Architectures for Image Synthesis

Jain

Scientific Claim Verification with VERT5ERINI

Pradeep et al.

Detecting Anatomical and Functional Connectivity Relations in Biomedical Literature via Language Representation Models

Ozyurt et al.

BioELECTRA:Pretrained Biomedical text Encoder using Discriminators

Kanakarajan, Kundumani & Sankarasubbu

Stress Test Evaluation of Biomedical Word Embeddings

Araujo et al.

Are Larger Pretrained Language Models Uniformly Better? Comparing Performance at the Instance Level

Zhong et al.

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering

Jiang et al.

CoolMomentum: a method for stochastic optimization by Langevin dynamics with simulated annealing

Borysenko & Byshkin

Tensorflow2 기반 Seq2Seq 모델, 학습, 서빙 코드 구현

Park

DeepDarts: Modeling Keypoints as Objects for Automatic Scorekeeping in Darts using a Single Camera

McNally et al.

KLUE: Korean Language Understanding Evaluation

Park et al.

How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset

Mackie, Dalton & Yates

Unbiased Monte Carlo Cluster Updates with Autoregressive Neural Networks

Wu et al.

April

Contextualized Query Embeddings for Conversational Search

Lin, Yang & Lin

DECIMER1.0: Deep Learning for Chemical Image Recognition using Transformers

Rajan, Zielesny & Steinbeck

Clinical BERT Models Trained on Pseudo Re-identified MIMIC-III Notes

Lehman et al.

Operationalizing a National Digital Library: The Case for a Norwegian Transformer Model

Kummervold et al.

Categorising Vaccine Confidence with TransformerBased Machine Learning Model: The Nuances of Vaccine Sentiment on Twitter

Kummervold et al.

City-Scale Simulation Of Covid-19 Pandemic & Intervention Policies Using Agent-Based Modelling

Suryawanshi et al.

CodeTrans: Towards Cracking the Language of Silicone's Code Through Self-Supervised Deep Learning and High Performance Computing

Elnaggar et al.

Arabic Compact Language Modelling for Resource Limited Devices

Alyafeai & Ahmad

Igor Ivanov: Harnessing Machine Learning Skills to Reduce Damages from Tropical Storms

Radiant Earth Foundation

Computer Vision and Deep Learning for Environment-Adaptive Control of Robotic Lower-Limb Exoskeletons

Laschowski et al.

InAugment: Improving Classifiers via Internal Augmentation

Arar, Shamir & Bermano

IndT5: A Text-to-Text Transformer for 10 Indigenous Languages

Nagoudi et al.

Samanantar: The Largest Publicly Available Parallel Corpora Collection for 11 Indic Languages

Ramesh et al.

Self-Supervised Representation Learning with Relative Predictive Coding

Tsai et al.

Virtual Sensing and Sensors Selection for Efficient Temperature Monitoring in Indoor Environments

Brunello et al.

March

Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models

Wang et al.

Comparing score aggregation approaches for document retrieval with pretrained transformers

Zhang, Yates & Lin

MPII at the TREC 2020 Deep Learning Track

Li & Yates

The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models

Inoue et al.

Is Attention Better Than Matrix Decomposition?

Geng et al.

TPU Research Cloud — Free TPU Hardware for your Deep Learning Projects

Manai

TPUPoint: Automatic Characterization of Hardware-Accelerated Machine-Learning Behavior for Cloud Computing

Wudenhe & Tseng

UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark

Lourie et al.

February

Architext

Galanos

Score-Based Generative Modeling through Stochastic Differential Equations

Song et al.

I-BERT: Integer-only BERT Quantization

Kim et al.

A comprehensive fitness landscape model reveals the evolutionary history and future evolvability of eukaryotic cis-regulatory DNA sequences

Vaishnav et al.

ConvBERT: Improving BERT with Span-based Dynamic Convolution

Jiang et al.

Goal-Driven Recurrent Neural Network Models of the Ventral Visual Stream

Nayebi et al.

Introducing huBERT

Nemeskey

Symbolic regression for scientific discovery: an application to wind speed forecasting

Abdellaoui & Mehrkanoon

Text generation in GPT-2

a-emami

Time Series (re)sampling using Generative Adversarial Networks

Dahl & Sørensen

January

The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models

Pradeep, Nogueira & Lin

Hessian-Aware Pruning and Optimal Neural Implant

Yu et al.

A Multi-Class Hinge Loss for Conditional GANs

Kavalerov, Czaja & Chellappa

Bottleneck Transformers for Visual Recognition

Srinivas et al.

Distributed Evolution Strategies Using TPUs for Meta-Learning

Sheng & He

Few-Shot Question Answering by Pretraining Span Selection

Ram et al.

Neural Grammatical Error Correction for Romanian

Cotet, Ruseti & Dascalu

Stress Testing of Meta-learning Approaches for Few-shot Learning

Aimen et al.

The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT

Pande et al.

Transfer Learning from Play and Language - Nailing the Baseline

Douglas

2020

December

Addressing Machine Learning Concept Drift Reveals Declining Vaccine Sentiment During the COVID-19 Pandemic

Müller & Salathé

ARAELECTRA: Pre-Training Text Discriminators for Arabic Language Understanding

Antoun, Baly & Hajj

AraGPT2: Pre-Trained Transformer for Arabic Language Generation

Antoun, Baly & Hajj

ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic

Abdul-Mageed, Elmadany & Nagoudi

PARSINLU:A Suite of Language Understanding Challenges for Persian

Khashabi et al.

RecipeNLG: A Cooking Recipes Dataset for Semi-Structured Text Generation

Bień et al.

STOUT: SMILES to IUPAC names using Neural Machine translation

Rajan, Zielesny & Steinbeck

The Depth-to-Width Interplay in Self-Attention

Levine et al.

Virtual Sensing of Temperatures in Indoor Environments: A Case Study

Brunello et al.

GottBERT: a pure German Language Model

Scheible et al.

November

HAWQ-V3: Dyadic Neural Network Quantization

Yao et al.

A Little Bit Is Worse Than None: Ranking with Limited Training Data

Zhang, Yates & Lin

An Empirical Study of Pre-trained Transformers for Arabic Information Extraction

Lan et al.

CLUE: A Chinese Language Understanding Evaluation Benchmark

Xu et al.

Cooking recipes generator utilizing a deep learning-based language model

Bień et al.

Ensemble Predictions of Wildfire Spread Through TPU-Compatible TensorFlow Acceleration

Bonanni & Ihme

Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks

Hui & Belkin

EvoPose2D: Pushing the Boundaries of 2D Human Pose Estimation using Neuroevolution

McNally et al.

Learning from Task Descriptions

Weller et al.

Modern Control Technologies in Robotics

ΠΑΝΑΓΙΩΤΗΣ & ΧΑΤΖΟΠΟΥΛΟ

On the weak link between importance and prunability of attention heads

Budhraja et al.

UoS Participation in the WMT20 Translation of Biomedical Abstracts

Soares & Vaz

Διπλωματική Εργασία Σχεδίαση Αποδοτικών Μηχανισμών Προσοχής Για Βαθειά Νευρωνικά Δίκτυα

Δάρας

October

An Empirical Study of Tokenization Strategies for Various Korean NLP Tasks

Park et al.

Challenges in Zh-Ru MT Task and How to Overcome Them

Nazarov

Distilling Dense Representations for Ranking using Tightly-Coupled Teachers

Lin, Yang, Lin

Ensemble Predictions of Wildfire Spread Through TPU-Compatible TensorFlow Acceleration

Bonanni & Ihme

Flexible IR Pipelines with Capreolus

Yates et al.

German's Next Language Model

Chan, Schweter & Möller

Guiding Attention for Self-Supervised Learning with Transformers

Deshpande & Narasimhan

Identifying Learning Rules From Neural Network Observables

Nayebi et al.

LEGAL-BERT: The Muppets straight out of Law School

Chalkidis et al.

MammoGANesis: Controlled Generation of High-Resolution Mammograms for Radiology Education

Zakka et al.

ML Summit: Predict | Introduction to Swift for Tensorflow

Koonce

Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning

Chang et al.

SMYRF: Efficient Attention using Asymmetric Clustering

Daras et al.

TPU VS GPU

Emami

WDN: A Wide and Deep Network to Divide-and-Conquer Image Super-resolution

Singh & Mittal

September

Plant Species Identification Using Transfer Learning-PlantCLEF 2020

Krishna, M & R

A Text to Text Approach for COVID-19 Event Extraction on Social Media

Wang

Application of Machine Learning Techniques for Text Generation

Román

Deep multi-stations weather forecasting: explainable recurrent convolutional neural networks

Abdellaoui & Mehrkanoon

Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining

Sai et al.

IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages

Kakwani et al.

S3NAS: Fast NPU-aware Neural Architecture Search Methodology

Lee, Kang & Ha

August

Ethical Detection of Online Influence Campaigns Using Transformer Language Models

Crothers

GREEK-BERT: The Greeks visiting Sesame Street

Koutsikakis et al.

HASeparator: Hyperplane-Assisted Softmax

Kansizoglou et al.

PARADE: Passage Representation Aggregation for Document Reranking

Li et al.

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

Carmo et al.

RECLOR: A Reading Comprehension Dataset Requiring Logical Reasoning

Yu et al.

July

A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams

Soares

Differentiable Augmentation for Data-Efficient GAN Training

Zhao et al.

Hong Kong Transformer Models And Other NLP Resources

出嚟食飯

Latent Retrieval for Large-Scale Fact-Checking and Question Answering with NLI training

Samarinas, Hsu & Lee

Lessons Learned from the Training of GANs on Artificial Datasets

Tang

Natural Language Processing Methods for Language Modeling

Nemeskey

NLP Reseach Project Part 2: How I (almost) replicated OpenAI's GPT-2 (124M version)

Khan

Playing with Words at the National Library of Sweden - Making a Swedish BERT

Malmsten, Borjeson & Haffenden

ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

Elnaggar et al.

Tetra-Tagging: Word-Synchronous Parsing with Linear-Time Inference

Kitaev & Klein

This Pony Does Not Exist

Arfa

June

Denoising Diffusion Probabilistic Models

Ho, Jain & Abbeel

FinEst BERT and CroSloEngual BERT: less is more in multilingual models

Ulˇcar & Robnik-Sikonja

How the Google AI Community Used Cloud to Help Biomedical Researchers

Elliott, Kwon & Goncharov

Learning compact generalizable neural representations supporting perceptual grouping

Veerabadran & de Sa

On the Predictability of Pruning Across Scales

Rosenfeld et al.

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

Vaillancourt et al.

Swedish NLP Solutions for Email Classification

Castronuovo

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

Li et al.

ทุนวิจัย Tensorflow Research Cloud ที่ให้พวกเราสามารถใช้ TPU ได้ฟรี 60 วัน!

Chatpatanasiri

May

Adaptation of Deep Bidirectional Transformers for Afrikaans Language

Ralethe

COVID-Twitter-BERT: A Natural Language Processing Model To Analyse COVID-19 Content On Twitter

Müller, Salathé & Kummervold

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Srinivas, Laskin & Abbeel

Deep Learning Training at Scale: Experiments with MLPerf on Multi-GPU and Multi-TPU Hardware

Verma

Establishing Baselines for Text Classification in Low-Resource Languages

Cruz & Cheng

iMaterialist (Fashion) 2020 at FGVC7 - 1st place solution

Polosin

Lexicon-Enhancement of Embedding-based Approaches Towards the Detection of Abusive Language

Koufakou & Scott

ParaPat: The Multi-Million Sentences Parallel Corpus of Patents Abstracts

Soares et al.

ParsBERT: Transformer-based Model for Persian Language Understanding

Farahani et al.

Query Reformulation using Query History for Passage Retrieval in Conversational Search

Lin et al.

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

Brix, Bahar & Ney

This Fursona Does Not Exist

Arfa

TPU-Tutorial

Diao

April

A Deep Learning Algorithm For The Diagnosis And Gleason Grading Of Whole Slide Images Of Prostate Cancer Core Biopsies

Kott et al.

Imitation Attacks and Defenses for Black-box Machine Translation Systems

Wallace, Stern & Song

Playing with machines: Using machine learning to understand automated copyright enforcement at scale

Gray & Suzor

Revisiting Pre-Trained Models for Chinese Natural Language Processing

Cui et al.

TensorFlow TPU 학습 101

Hong

TPU를 이용하여 Electra Pretraining하기

Park

March

A Not-so-Dangerous AI in the Persian Language

Khashei

CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model

Xu, Zhang & Dong

Comparing Rewinding and Fine-tuning in Neural Network Pruning

Renda, Frankle & Carbin

The Limitations of Stylometry for Detecting Machine-Generated Fake News

Schuster et al.

Writing Persian Poetry with GPT-2.0

Khashei

February

AraBERT: Transformer-based Model for Arabic Language Understanding

Antoun, Baly & Hajj

Discriminating between sleep and exercise-induced fatigue using computer vision and behavioral genetics

Schuch et al.

Going Much Wider with Deep Networks for Image Super-Resolution

Singh et al.

Improving the performance of convolutional neural network for the segmentation of optic disc in fundus images using attention gates and conditional random fields

Bhatkalkar et al.

Multi-focus Image Fusion Using Encoder-Decoder Network

Zhang

On Identifiability in Transformers

Brunner et al.

Redes neurais densamente conectadas para detecção de câncer de mama em imagens histopatológicas

Wentz

TensorFlow 2.1 with TPU in Practice

Lee

Two Routes to Scalable Credit Assignment without Weight Symmetry

Kunin et al.

January

Attention! A Lightweight 2D Hand Pose Estimation Approach

Santavas et al.

2019

December

BERTje: A Dutch BERT Model

de Vries et al.

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

K et al.

Discriminative Sentence Modeling for Story Ending Prediction

Cui et al.

The Selection of Optimal Neural Network Hyperparameters

Akhmetzyanov

November

Development of a Deep Learning Algorithm for the Histopathologic Diagnosis and Gleason Grading of Prostate Cancer Biopsies: A Pilot Study

Kott et al.

High-Quality Cloud Masking of Landsat 8 Imagery Using Convolutional Neural Networks

Hughes & Kennedy

Personalized Patent Claim Generation and Measurement

Lee

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models

Daras et al.

Διαχείριση Πόρων σε Ετερογενείς Αρχιτεκτονικές με Εφαρμογή στο TensorFlow

Sofianídis

October

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Raffel et al.

Localization of Fake News Detection via Multitask Transfer Learning

Cruz, Tan & Cheng

My experience with Cloud TPU

Akhmetzyanov

Parallel Iterative Edit Models for Local Sequence Transduction

Awasthi et al.

Report of using TPU Pod to pre-train a Japanese NLP model

Cong

Solving an object detection problem with Google's TPUs

Palvelev

Unsupervised Question Answering for Fact-Checking

Jobanputra

September

Answering questions by learning to rank -- Learning to rank by answering questions

Pîrtoacă, Rebedea & Ruseti

Benchmarking the Performance and Power of AI Accelerators for AI Training

Wang et al.

Cross-Lingual Machine Reading Comprehension

Cui et al.

GPT-2 Folk Music

Branwen

Reweighted Proximal Pruning for Large-Scale Language Representation

Guo et al.

August

OpenGPT-2: We Replicated GPT-2 Because You Can Too

Gokaslan & Cohen

Running PyTorch on TPU: a bag of tricks

Chikishev

Towards Ethical Content-Based Detection of Online Influence Campaigns

Crothers, Japkowicz & Victor

July

Benchmarking TPU, GPU, and CPU Platforms for Deep Learning

Wang, Wei & Brooks

Exploring the Use of Lexicons to aid Deep Learning towards the Detection of Abusive Language

Koufakou & Scott

Single-Path Mobile AutoML: Efficient ConvNet Design and NAS Hyperparameter Optimization

Stamoulis et al.

Use What You Have: Video Retrieval Using Representations From Collaborative Experts

Liu et al.

June

A Focus on Neural Machine Translation for African Languages

Martinus & Abbott

Benchmarking Neural Machine Translation for Southern African Languages

Martinus & Abbott

Comic2vec: Vector representation of comics

Piták

Disentangling neural mechanisms for perceptual grouping

Kim et al.

Evaluating Language Model Finetuning Techniques for Low-resource Languages

Cruz & Cheng

Pre-Training with Whole Word Masking for Chinese BERT

Cui et al.

Replicating GPT2-1.5B

Leahy

May

Neural heuristics for SAT solving

Jaszczur et al.

Single-Path NAS: Device-Aware Efficient ConvNet Design

Stamoulis et al.

April

Fast Deep Neural Network Training on Distributed Systems and Cloud TPUs

You et al.

Single-Path NAS: Designing Hardware-Efficient ConvNets in less than 4 Hours

Stamoulis et al.

The Lottery Ticket Hypothesis at Scale

Frankle et al.

March

SCIBERT: Pretrained Contextualized Embeddings for Scientific Text

Beltagy, Cohan & Lo

January

Large-Batch Training for LSTM and Beyond

You et al.

2018

December

ART + AI — Generating African Masks using Tensorflow and TPUs

Dibia

Ensemble, Distill, and Fuse for Easy Video Labeling

Zhou et al.

Google Cloud TPU を使用して気づいたことなど

Suzuki

Quaternion convolutional neural networks for detection and localization of 3D sound events

Comminiello et al.

November

End-to-end sound source separation conditioned on instrument labels

Slizovskaia et al.

Mapping scRNA-seq data onto cell type taxonomies

Svensson & Pachter

Sample-efficient image segmentation through recurrence

Linsley, Kim & Serre

October

Task-Driven Convolutional Recurrent Models of the Visual System

Nayebi et al.

Automatic full compilation of Julia programs and ML models to Cloud TPUs

Fischer & Saba

Dostałem 105 TPU! Grant od TensorFlow Research Cloud!

Majek

September

Conditional WaveGAN

Lee et al.

Inferring travel activity pattern from Smartphone sensing data using deep learning

Ghorpade

Learning what and where to attend

Linsley et al.

Maximum Entropy Fine-Grained Classification

Dubey et al.

Some tips about using Google's TPU

Dong

July

A tutorial on using Google Cloud TPUs

Lee

Don't see your TRC-supported work here?

Please let us know about it by filling out this short form .