Toybox: Rethinking Atari Benchmarks.

I have been working with John Foley, Kaleigh Clary, and David Jensen on developing a new testing framework for reinforcement learning. I worked with collaborators at both UMass and Brown to build a prototype version that we used to investigate the behavior of generalized agents in our work, Measuring and Characterizing Generalization in Deep Reinforcement Learning. I presented Toybox at the IBM AI Systems Day and as a poster at the 2018 NeurIPS Systems for ML Workshop. We are currently preparing a full-length conference submission. If you are interested in contributing to our suite, please join our Slack team.

[ Systems4ML Paper ] [ AI Systems Day Slides ] [ code ] [ bibtex ]

SurveyMan: Programming and Debugging Surveys

I worked with Emery Berger on a programming language and runtime system to design, debug, and deploy scientific surveys on the web. We collaborated with Joe Pater from Linguistics; this work became my Synthesis Project, which earned an Outstanding Synthesis Project. We first presented SurveyMan at the 2014 Off the Beaten Track workshop. This work won first place at the PLDI Student Research Competition. We later presented the full paper at OOPSLA 2014, where it won a best paper award (3 awarded in total).

[ OBT slides ] [ Technical Report ] [ OOPSLA 2014 Full Paper ] [ OOPSLA 2014 slides ] [ code ][ bibtex ]

COSMOS: A New Experimental Criterion for Genetic Programming

Typical Genetic Programming experiments focus on the convergence of the best candidate program in a population. Practicioners often use both parametric and non-parametric hypothesis tests to compare metrics such as computational effort or mean-best-fitness. Lee Spector and I found that the number of independent runs needed to ensure reliability of such tests varied greatly across problems. We proposed a new criterion for determining the number of independent runs required to make inferences about a set of Genetic Programming techniques. Presented at the First Workshop for Understanding Problems, GECCO 2012.

[ GECCO UP Paper ][ GECCO UP Slides ][ code ][ bibtex ]

Older Projects