Language Models Speed Up Local Search for Finding Programmatic Policies

Quazi Asif Sadmine, Hendrik Baier, Levi Lelis

Research output: Contribution to journalArticleAcademicpeer-review

80 Downloads (Pure)

Abstract

Encoding policies that solve sequential decision-making problems as programs offers advantages over neural representations, such as interpretability and modifiability of the policies. On the downside, programmatic policies are elusive because their generation requires one to search in spaces of programs that are often discontinuous. In this paper, we leverage the ability of large language models (LLMs) to write computer programs to speed up the synthesis of programmatic policies. We use an LLM to provide initial candidates for the policy, which are then improved by local search. Empirical results in three problems that are challenging for programmatic representations show that LLMs can speed up local search and facilitate the synthesis of policies. We conjecture that LLMs are effective in this setting because we give them access to the outcomes of the policies rollouts. That way, LLMs can try policies encoding different behaviors, once they observe what a previous policy has accomplished. This process forces the search to explore different parts of the space through “exploratory initial programs”. Experiments also show that much of the knowledge LLMs leverage comes from the domain-specific language that defines the search space - the overall performance of the system drops sharply if we change the name of the functions used in the language to meaningless names. Since our system only queries the LLM in the first step of the search, it offers an economical method for using LLMs to guide the synthesis of policies.
Original languageEnglish
Number of pages35
JournalTransactions on Machine Learning Research
Volume2024
Issue number11
Publication statusPublished - Nov 2024

Fingerprint

Dive into the research topics of 'Language Models Speed Up Local Search for Finding Programmatic Policies'. Together they form a unique fingerprint.

Cite this