I am currently a research scientist at ASAPP Inc, New York City. Prior to this, I did my PhD from 2016-2022 with the
Language Technologies Institute
at CMU,
where I was advised by Prof. Eduard Hovy. My research is broadly on language generation, with specific interests in style transfer, data-to-text generation, narrative generation and low-resource & creative generation.
As a natural development of my interest in low-resource & creative generation, I became increasingly piqued about data augmentation (DA), first specifically for generation, and then in general, leading to several fruitful research directions:
A comprehensive survey on recent DA methods in NLP - we also try sensitizing the NLP community about lacunae, e.g w.r.t CV research and outline future challenges. We maintain a live git repo and arXiv - send us a PR to add your method onto both!
DA for improving commonsense plausibility and fluency of Concept-to-Text Generation by:
Corollary of my interest in narrative generation, some of my work circa 2020 investigated probing extra-sentential abilities of contextual representations, such as locating event arguments and infilling whole sentences a.k.a ``sentence cloze".
In the past few years, I have also been involved in co-organizing many collaborative NLP research efforts, such as:
The GEM benchmark, associated workshop@ACL'21, and paper for better and standardized evaluation and comparison of NLG models and systems - a parallel to GLUE for generation
The challenge sets submodule of GEM, where we built domain-shifted sets under a unified theme for NLG tasks in our benchmark, using various perturbation [backtranslation], sub-selection [length] and other domain shift [diachronic] strategies. Our work was accepted @
NEURIPS'21 Datasets & Benchmarks Track!
The NL-Augmenter participative repository and benchmark, which provides a structure for NLPers to contribute and evaluate task-specific data augmentations a.k.a transformations, as well as subset selection strategies a.k.a filters. We aim to create a large, usable suite (~140 and counting!) of transformations and filters leveraging wisdom-of-the-crowd - opening the door to more systematic analysis and deployment of data augmentation/robustness evaluation.