Wink - AI原生创新，忠于用户，专属智能体验

MIT researchers have recently created something new called SEAL (Self-Adapting Language Models). The unique aspect of this is that it doesn't require human fine-tuning; instead, it reads new information, rewrites it in its own words, and then runs gradient updates to optimize itself. Essentially, it's self-learning.

The results? A 40% improvement in factual recall, and after training with data it generated, its performance outperforms GPT-4.1. Moreover, it can learn new tasks entirely without human intervention.

![Cover of an academic paper titled 'Self-Adapting Language Models,' authored by MIT researchers](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3gW-bcAAjzee%3Fformat%3Dpng%26name%3Dlarge)

Current AI models are fixed after training, but SEAL flips this process. It runs a reinforcement learning loop: first, it generates a 'self-editing' instruction to tell itself how to update; then, it tests the results; and finally, it reinforces only those parts that improve performance.

![Flowchart of the outer loop of reinforcement learning](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3hT_aMAAX4SL%3Fformat%3Djpg%26name%3Dlarge)

How does it work? For example, when SEAL reads new content about the Apollo program, it first reformats the information into concise logical key points, like taking study notes for itself. Then, it uses its notes to fine-tune itself.

![Flowchart of knowledge integration, showing the process from segments to self-editing to evaluation](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3iKpbUAAjiGW%3Fformat%3Djpg%26name%3Dlarge)

Few-shot learning has also been upgraded. SEAL doesn't rely on fixed heuristic methods but decides its training strategy on its own: selecting what data augmentation methods to use, how to optimize, and even setting its own learning rate. The result is a 72.5% success rate, 3.6 times better than the baseline.

![Flowchart of few-shot learning setup](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3jCsbEAAB9YQ%3Fformat%3Djpg%26name%3Dlarge)

What's even more impressive is that after just two rounds of self-reinforcement, SEAL surpasses GPT-4.1 using data it generated. It has learned how to write data that is easier for itself to learn—by reformating facts into simple, atomic truths.

![Line chart of the effect of integrating single-segment knowledge](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3j5-bQAApETu%3Fformat%3Dpng%26name%3Dlarge)

There's also a key point: during self-updates, SEAL retains most of what it previously learned. This is a major advancement for continuous learning. Although some forgetting still occurs, the retention curve looks promising.

![Heatmap of catastrophic forgetting](https://wink.run/image?url=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FG3H3kyoawAAGbVw%3Fformat%3Dpng%26name%3Dlarge)

Of course, there are concerns about what might happen if such an AI trains itself with biased or incorrect information. Indeed, without the ability to detect biased or erroneous information, LLMs could harm themselves rather than improve.

But regardless, the idea of large models that can fine-tune themselves is no longer science fiction. We have just entered the era of self-evolving models.

Wink Pings

MIT's AI Begins Writing Its Own Code for Upgrades, Even Outperforming GPT-4.1