Toybox: Rethinking Atari Benchmarks.
I have been working with John Foley, Kaleigh Clary, and David Jensen on developing a new testing framework for reinforcement learning. I worked with collaborators at both UMass and Brown to build a prototype version that we used to investigate the behavior of generalized agents in our work, Measuring and Characterizing Generalization in Deep Reinforcement Learning. I presented Toybox at the IBM AI Systems Day and as a poster at the 2018 NeurIPS Systems for ML Workshop. We are currently preparing a full-length conference submission. If you are interested in contributing to our suite, please join our Slack team.
SurveyMan: Programming and Debugging Surveys
I worked with Emery Berger on a programming language and runtime system to design, debug, and deploy scientific surveys on the web. We collaborated with Joe Pater from Linguistics; this work became my Synthesis Project, which earned an Outstanding Synthesis Project. We first presented SurveyMan at the 2014 Off the Beaten Track workshop. This work won first place at the PLDI Student Research Competition. We later presented the full paper at OOPSLA 2014, where it won a best paper award (3 awarded in total).
COSMOS: A New Experimental Criterion for Genetic Programming
Typical Genetic Programming experiments focus on the convergence of the best candidate program in a population. Practicioners often use both parametric and non-parametric hypothesis tests to compare metrics such as computational effort or mean-best-fitness. Lee Spector and I found that the number of independent runs needed to ensure reliability of such tests varied greatly across problems. We proposed a new criterion for determining the number of independent runs required to make inferences about a set of Genetic Programming techniques. Presented at the First Workshop for Understanding Problems, GECCO 2012.
Creating Conversational Characters Using Question Generation Tools. Xuchen Yao, Emma Tosch, Grace Chen, Elnaz Nouri, Ron Artstein, Anton Leuski, Kenji Sagae, David Traum. Dialogue and Discourse, Volume 3, Issue 2.
Evaluating Conversational Characters Created through Question Generation. Grace Chen, Emma Tosch, Ron Artstein, Anton Leuski, David Traum. Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference