When: Friday, 4/18/25 11:00 am; Where: Bliss 260
Abstract: Can large language models (LLMs) help with planning? And how should we even measure that ability? In this talk, I will present our work on Planetarium, a benchmark that evaluates LLMs’ ability to generate PDDL (Planning Domain Definition Language) code from natural language descriptions of planning tasks. We argue that using LLMs to generate formal descriptions of planning problems that are given to classical planners is both challenging and often advantageous versus asking LLMs to create plans directly. Planetarium introduces a novel PDDL equivalence algorithm that assesses the correctness of generated PDDL against ground truth. Additionally, Planetarium includes a dataset of 145,918 text-to-PDDL pairs spanning 73 unique types of initial state and goal condition combinations, offering varying levels of difficulty for evaluation. Our experimental results show that even state-of-the-art LLMs struggle with this task, but fine-tuning them can greatly improve performance. Planetarium aims to highlight that LLMs don’t need to perform every task. Hybrid approaches that apply LLMs alongside traditional AI methods offer an alternative with compelling advantages. Planetarium is available at: https://github.com/BatsResearch/planetarium.
Bio: Stephen Bach is an assistant professor of computer science at Brown University. His research focuses on weakly supervised, zero-shot, and few-shot machine learning. The goal of his work is to create methods and systems that drive down the labor cost of AI. He was a core contributor to the Snorkel framework, which was recognized with a Best of VLDB 2018 award. Snorkel is used in production at numerous Fortune 500 companies for programmatic training data curation. He also co-led the team that developed the T0 family of large language models. The team was also one of the proposers of instruction tuning, which is the process of fine-tuning language models with supervised training to follow instructions. Instruction tuning is now a standard part of training large language models. Stephen is also an advisor to Snorkel AI, a company that provides software and services for data-centric AI.