NVDA 208.65 ▼0.97%GOOGL 349.68 ▼4.99%MSFT 367.34 ▼3.18%AMD 551.63 ▲2.65%INTC 140.94 ▲5.19%TSMC 467.67 ▲1.20%AMZN 232.79 ▼4.75%META 563.85 ▼2.32%AAPL 297.01 ▼0.34%PLTR 119.50 ▼6.98%
Markets at last close

Models

MIT researchers present SEAL, advancing self-improving language models

·2 min read

MIT researchers have introduced SEAL (Self-Adapting LLMs), a groundbreaking approach that empowers large language models to autonomously update their own parameters. The new framework, detailed in the paper ´Self-Adapting Language Models´, centers around the concept of self-generated data: the model creates and applies its own training samples—or self-edits—through carefully designed reinforcement learning loops. By tying performance rewards to downstream tasks, the model learns which self-edits are most beneficial for continuous improvement.

The SEAL method operates through a nested structure. The outer loop uses reinforcement learning to guide the generation of effective self-edits, while the inner loop updates the model using supervised fine-tuning based on these edits. Initially, researchers observed instability with standard policy optimization methods, ultimately favoring a more robust behavioral cloning strategy (ReST^EM) inspired by work at DeepMind. This process filters self-edits based on observed performance gains before incorporating them. While the current design uses a single model for generating and learning from edits, future iterations could separate these into distinct ´teacher´ and ´student´ models.

SEAL was put to the test in domains such as knowledge integration and few-shot learning. Results were notable: in few-shot learning with a Llama-3.2-1B-Instruct model, SEAL improved adaptation success rates dramatically, reaching over 70 percent success compared to more conventional approaches. For knowledge integration, the Qwen2.5-7B model effectively assimilated new facts, outpacing baseline and previous reinforcement learning methods, sometimes exceeding even setups using GPT-4.1-generated data. The researchers highlighted how reinforcement learning not only boosted quantitative outcomes but also enabled the model to generate more nuanced, task-relevant self-edits. Despite the promise, challenges remain—particularly with catastrophic forgetting, computational costs, and context-aware evaluation, all of which the team discusses in their publication.

This work emerges amid a surge of global interest in self-evolving Artificial Intelligence, with parallel projects like Sakana AI´s Darwin-Gödel Machine and OpenAI´s speculation on recursive self-improvement capturing widespread attention. SEAL stands out as a concrete and experimentally validated step towards autonomous, self-improving language technologies, offering a glimpse at the ongoing transformation of the field.

Originally reported by syncedreview.comRead the source →
Related coverage