Google DeepMind Launches Mind Evolution: Improving Natural Language Programming with Dynamic Searches in Large Language Models

It can greatly improve the problem-solving skills of LLMs by guiding them to think deeply about complex problems and effectively use decision-making time. Previous research has tested a variety of techniques, including sequential reasoning, stability, sequential review with feedback, and search methods led by co-validators or testers. Search-based methods, especially when coupled with solution evaluators, use additional computational resources to evaluate a wider set of solution candidates. Techniques such as best-of-N and tree search use this ability to increase the probability of identifying successful solutions by exploring a wider solution space.
Recent efforts have combined LLMs with adaptive search for developmental tasks, such as numerical and combinatorial problems and natural language programming. Unlike previous studies that required the formalization of tasks in structured environments, these methods convert solutions directly into natural language, bypassing the need for expert knowledge to formalize tasks. Evolutionary search has also been used in the rapid development and design of multi-agent systems, such as EvoAgent, which have evolved problem-solving agents. However, these methods often achieve limited success compared to methods such as Gemini 1.5 Flash, which show significant improvements in tasks such as the TravelPlanner benchmark. Additionally, program-based evaluators integrated during evolutionary search provide reliable feedback to refine solutions, a widely accepted method for code generation and solution optimization across diverse domains. While learned or experimental feedback models have been studied, they often suffer from noise and unreliability, which presents opportunities for future development.
Researchers from Google DeepMind, UC San Diego, and the University of Alberta have introduced Mind Evolution, an evolutionary strategy designed to improve inference-time computation in LLMs. Unlike previous methods such as Best-of-N or sequential optimization, Mind Evolution uses a genetic approach to iteratively generate, refine, and recombine candidate solutions in natural language. It avoids formalizing tasks by relying on a solution tester, allowing for high levels of success in natural language planning tasks such as TravelPlanner and Natural Plan. Mind Evolution achieved 95.6% success in TravelPlanner and introduced new benchmarks like StegPoet, which shows its flexibility in all challenging, unstructured domains.
Mind Evolution combines a genetic and LLM search engine with customized instructions to successfully tackle natural language processing tasks. It uses language-based genetic algorithms, where solutions are represented in natural language, allowing LLMs to perform important tasks such as crossover, mutation, and island reset. The process begins by generating initial solutions using LLM-driven commands. Solutions are iteratively refined using a “Refinement Through Critical Discussion” (RCC) process that includes the roles of critic and author for evaluation and improvement. The framework includes Boltzmann tournament selection, circular migration between islands, and periodic resets of islands to improve diversity and effectively increase solutions.
The test tests Mind Evolution on three natural language planning benchmarks: TravelPlanner, Trip Planning, and Meeting Planning, excluding Calendar Planning for its simplicity. The main model, Gemini 1.5 Flash, is used with specified hyperparameters, while the two-stage method includes Gemini 1.5 Pro for unsolved cases, to improve cost efficiency. Mind Evolution outperforms the basics, achieving more than 95% success in TravelPlanner and Trip Planning and 85% in Meeting Planning, with near-perfect results using a two-stage approach. Metrics such as success rate, LLM calls, token usage, and API costs highlight the effectiveness of Mind Evolution's search strategy compared to baselines.
In conclusion, Mind Evolution presents an evolutionary search strategy to improve the computation of prediction time for complex natural language programming tasks, focusing on continuous testing and iterative development. Unlike methods that rely on formal solvers, Mind Evolution uses linguistic models to generate, recombine, and refine candidate solutions, requiring only a solution tester. It outperforms techniques like Best-of-N and Sequential Revision in benchmarks like TravelPlanner, Natural Plan, and the recently launched StegPoet. To control the cost of inference, it achieves remarkable success, solving more than 98% of the problems in the benchmarks of TravelPlanner and Natural Plan using Gemini 1.5 Pro, showing its performance without formal dependence on the solver.
Check out Paper. All credit for this study goes to the researchers of this project. Also, don't forget to follow us Twitter and join our Telephone station again LinkedIn Grup. Don't forget to join our 65k+ ML SubReddit.
🚨 [Recommended Read] Nebius AI Studio extends with vision models, new language models, embeddings and LoRA (Promoted)

Sana Hassan, a consulting intern at Marktechpost and a dual graduate student at IIT Madras, is passionate about using technology and AI to address real-world challenges. With a deep interest in solving real-world problems, he brings a fresh perspective to the intersection of AI and real-life solutions.
📄 Meet 'Height': The only standalone project management tool (Sponsored)