Wink Pings

LaDiR: Teaching LLMs to 'Think Back' with Latent Diffusion Models

Researchers propose the LaDiR framework, which uses a variational autoencoder to encode text reasoning steps into 'thought blocks', then employs a latent diffusion model for iterative optimization, enabling language models to more flexibly plan and correct reasoning paths.

Language models typically generate reasoning in a sequential chain-of-thought (CoT) manner, but this approach has a limitation: once a step is written, it's difficult to go back and modify it. This is similar to writing an essay where you can only move forward without being able to edit previous sentences.

Recent research has introduced the LaDiR (Latent Diffusion Reasoner) framework to address this challenge. Its core idea is to encode text reasoning steps into compact latent representations, which are then iteratively optimized using diffusion models.

Specifically, researchers first use a Variational Autoencoder (VAE) to encode text reasoning steps into 'thought blocks'—latent representations that preserve semantic information. These thought blocks are more compact than the original text but remain readable.

Next, they train a latent diffusion model to learn the denoising process. A key innovation is the use of block-wise bidirectional attention masks, allowing the model to consider both preceding and following contexts during generation. This means the model can generate multiple reasoning paths in parallel and make holistic corrections to previous reasoning steps when needed.

![LaDiR framework diagram](https://example.com/ladir-framework.png)

Experiments on mathematical reasoning and planning tasks show that LaDiR outperforms existing autoregressive, diffusion-based, and latent reasoning methods in terms of accuracy, diversity, and interpretability. Notably, it allows for adaptive adjustment of computational resources during reasoning—fewer iterations for simple problems, more rounds of thinking for complex ones.

This approach breaks the linear constraints of traditional autoregressive decoding, enabling language models to plan their reasoning processes more flexibly. The researchers suggest this opens new possibilities for text reasoning and could potentially be applied to more complex logical reasoning scenarios in the future.

Paper link: https://arxiv.org/abs/2510.04573

发布时间: 2025-10-10 16:30