Transducing Language Models

ICLR 2026 Conference SubmissionAnonymous Authors

OpenReview Score: 8.0 Download Report PDF

language modelstokenizationautomatatransducers

Modern language models define distributions over strings, but their outputs are not always suited to downstream task. For instance, a model generating byte-pair strings may not be suitable when word-level predictions are needed, and a DNA model may not fit applications requiring amino acids. In such cases, a deterministic string-to-string transformation can convert the model's output to the desired form. This is a familiar pattern in probability theory: applying a function $f$ to a random variable $X\sim p$ yields a transformed random variable $f(X)$ with an induced distribution. While such transformations are occasionally used in language modeling, they are not treated as yielding new, fully functional language models. We formalize this perspective and introduce a general framework for language models derived from deterministic string-to-string transformations. Focusing on transformations representable as finite-state transducers---a commonly used state-machine abstraction for efficient string-to-string mappings---we develop algorithms that compose a language model with an FST to marginalize over source strings mapping to a given target. This allows us to propagate probabilities through the transducer without altering model parameters and to condition on transformed outputs. We present an exact algorithm, an efficient approximation, and a theoretical analysis. We conduct experiments in three domains: converting token-level language models to character-level language models, token-level language models to word-level models, and deriving amino-acid models from DNA models. This demonstrates inference-time adaptation of pretrained language models to match application-specific output requirements.

Abstract:

Disclaimer

This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.

NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.

If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes a general framework for transforming language model distributions through deterministic string-to-string mappings representable as finite-state transducers. It occupies the 'Transducer-Based Language Model Transformation Theory' leaf, which contains only two papers including this one. This is a notably sparse research direction within the broader taxonomy of 50 papers across 22 leaf nodes, suggesting the work addresses a relatively underexplored theoretical niche. The sibling paper examines neural architectures through a transducer lens, whereas this work focuses on composing arbitrary language models with FSTs to marginalize over source strings.

The taxonomy reveals that most FST research concentrates on practical applications: speech recognition (5 papers across 3 leaves), morphological analysis (9 papers across 3 leaves), and machine translation (9 papers across 3 leaves). The theoretical frameworks branch, where this paper resides, is comparatively small with only 7 papers total across 3 leaves. Neighboring work includes core FST composition algorithms and probabilistic model conversion to FST representations, but these focus on optimization techniques and HMM/RNN conversion rather than the general transformation theory this paper develops. The scope note explicitly excludes empirical applications, reinforcing the paper's foundational theoretical positioning.

Among 18 candidates examined through limited semantic search, no contributions were clearly refuted. The 'general framework for transduced language models' examined 10 candidates with none providing overlapping prior work; 'algorithms for composing language models with FSTs' examined 1 candidate; and 'prefix decomposition of the precover' examined 7 candidates. This suggests that within the search scope, the specific formalization of FST-based language model transformation and the associated marginalization algorithms appear novel. However, the limited search scale (18 candidates, not exhaustive) means substantial related work may exist outside the top-K semantic matches examined.

Based on the limited literature search, the work appears to occupy a genuinely sparse theoretical area, with minimal direct competition in its specific leaf and few closely related papers in neighboring theoretical branches. The absence of refuting candidates across all three contributions, combined with the small size of the theoretical frameworks branch, suggests the formalization may represent a meaningful conceptual advance. However, the analysis covers only top-K semantic matches and does not guarantee comprehensive coverage of all relevant prior work in FST theory or language model transformation.

Taxonomy

Core-task Taxonomy Papers

Claimed Contributions

Contribution Candidate Papers Compared

Refutable Paper

Research Landscape Overview

Core task: transforming language models via finite-state transducers. The field of FST-based language processing spans a diverse set of branches, each addressing distinct aspects of how finite-state machinery can be applied to linguistic computation. At the highest level, the taxonomy divides into theoretical frameworks and algorithms that underpin FST operations (including composition, generalization, and transducer-based model transformation), practical applications in speech recognition and acoustic modeling (e.g., Speech Recognition FST[1], Delay CTC FST[29]), morphological analysis and generation for richly inflected languages (e.g., Inuit Morphological Networks[12], Hungarian Morphosyntax FST[38]), machine translation and transliteration tasks (e.g., Stochastic FST Translation[9], Romanized Arabic Transliteration[10]), and hybrid neural-FST architectures that integrate deep learning with classical transducer methods (e.g., Neural FST Error[14], RNN to FST[46]). Additional branches cover language model combination and adaptation, specialized applications such as text normalization and keyboard decoding, and emerging work on large language models and dialogue systems. Within this landscape, a particularly active line of inquiry concerns the theoretical underpinnings of transducer-based language model transformation, where researchers explore how FSTs can be used to systematically modify or constrain the output distributions of probabilistic models. Transducing Language Models[0] sits squarely in this theoretical branch, focusing on the formal mechanisms by which finite-state transducers can transform language model probabilities. This work contrasts with more application-driven efforts in speech or morphology, and it shares conceptual ground with Transformers as Transducers[32], which examines the expressive power of neural architectures through a transducer lens. While many branches emphasize engineering solutions for specific tasks, the transducer-based transformation theory branch addresses foundational questions about compositionality, expressiveness, and the algebraic properties of FST operations on language models, positioning Transducing Language Models[0] as a contribution to the formal toolkit that supports the broader ecosystem of FST applications.

Claimed Contributions

General framework for transduced language models

10 retrieved papers

The authors introduce a foundational framework that formalizes language models obtained by applying deterministic string-to-string transformations (encoded as finite-state transducers) to existing language models, enabling inference-time adaptation without retraining.

10 retrieved papers

Algorithms for composing language models with FSTs

1 retrieved paper

The authors develop exact and approximate algorithms that compose a language model with a finite-state transducer to compute probabilities over transformed strings by marginalizing over source strings, enabling sampling, scoring, and conditioning on transformed outputs.

1 retrieved paper

Prefix decomposition of the precover

7 retrieved papers

The authors introduce a prefix decomposition method that decomposes the precover into quotient and remainder sets, enabling finite-time computation of prefix probabilities for transduced language models under certain conditions on the transformation.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

[32] Transformers as Transducers PDF

Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, Ashish Sabharwal (2025) • Transactions of the Association for Computational Linguistics

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

General framework for transduced language models

[3] AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails PDF

Cannot Refute

[51] Utilizing process models in the requirements engineering process through model2text transformation PDF

Cannot Refute

[52] Neural transition-based string transduction for limited-resource setting in morphology PDF

Cannot Refute

[53] What languages are easy to language-model? a perspective from learning probabilistic regular languages PDF

Cannot Refute

[54] De-diffusion makes text a strong cross-modal interface PDF

Cannot Refute

[55] DecoStrat: Leveraging the capabilities of language models in D2T generation via decoding framework PDF

Cannot Refute

[56] Text-to-text extraction and verbalization of biomedical event graphs PDF

Cannot Refute

[57] Controlling the Text Generation of a Large Language Model in Multilingual Setting using Latent Space Steering PDF

Cannot Refute

[58] Contrastive Deterministic Autoencoders For Language Modeling PDF

Cannot Refute

[59] Finite-state transducers in language and speech processing PDF

Cannot Refute

Contribution

Algorithms for composing language models with FSTs

[22] Fitting class-based language models into weighted finite-state transducer framework. PDF

Cannot Refute

Contribution

Prefix decomposition of the precover

[60] Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions PDF

Cannot Refute

[61] PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails PDF

Cannot Refute

[62] Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model PDF

Cannot Refute

[63] ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition PDF

Cannot Refute

[64] Learning Tractable Distributions Of Language Model Continuations PDF

Cannot Refute

[65] A Generalized Language Model in Tensor Space PDF

Cannot Refute

[66] Prefix-Suffix Based Statistical Language Models of Turkish PDF

Cannot Refute

Transducing Language Models

Overview

Overall Novelty Assessment

Taxonomy

Research Landscape Overview

Claimed Contributions

Core Task Comparisons

[32] Transformers as Transducers PDF

Contribution Analysis

General framework for transduced language models

[3] AutoTutor meets Large Language Models: A Language Model Tutor with Rich Pedagogy and Guardrails PDF

[51] Utilizing process models in the requirements engineering process through model2text transformation PDF

[52] Neural transition-based string transduction for limited-resource setting in morphology PDF

[53] What languages are easy to language-model? a perspective from learning probabilistic regular languages PDF

[54] De-diffusion makes text a strong cross-modal interface PDF

[55] DecoStrat: Leveraging the capabilities of language models in D2T generation via decoding framework PDF

[56] Text-to-text extraction and verbalization of biomedical event graphs PDF

[57] Controlling the Text Generation of a Large Language Model in Multilingual Setting using Latent Space Steering PDF

[58] Contrastive Deterministic Autoencoders For Language Modeling PDF

[59] Finite-state transducers in language and speech processing PDF

Algorithms for composing language models with FSTs

[22] Fitting class-based language models into weighted finite-state transducer framework. PDF

Prefix decomposition of the precover

[60] Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions PDF

[61] PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails PDF

[62] Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Model PDF

[63] ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition PDF

[64] Learning Tractable Distributions Of Language Model Continuations PDF

[65] A Generalized Language Model in Tensor Space PDF

[66] Prefix-Suffix Based Statistical Language Models of Turkish PDF

Table of Contents