Transducing Language Models

ICLR 2026 Conference SubmissionAnonymous Authors
language modelstokenizationautomatatransducers
Abstract:

Modern language models define distributions over strings, but their outputs are not always suited to downstream task. For instance, a model generating byte-pair strings may not be suitable when word-level predictions are needed, and a DNA model may not fit applications requiring amino acids. In such cases, a deterministic string-to-string transformation can convert the model's output to the desired form. This is a familiar pattern in probability theory: applying a function ff to a random variable XpX\sim p yields a transformed random variable f(X)f(X) with an induced distribution. While such transformations are occasionally used in language modeling, they are not treated as yielding new, fully functional language models. We formalize this perspective and introduce a general framework for language models derived from deterministic string-to-string transformations. Focusing on transformations representable as finite-state transducers---a commonly used state-machine abstraction for efficient string-to-string mappings---we develop algorithms that compose a language model with an FST to marginalize over source strings mapping to a given target. This allows us to propagate probabilities through the transducer without altering model parameters and to condition on transformed outputs. We present an exact algorithm, an efficient approximation, and a theoretical analysis. We conduct experiments in three domains: converting token-level language models to character-level language models, token-level language models to word-level models, and deriving amino-acid models from DNA models. This demonstrates inference-time adaptation of pretrained language models to match application-specific output requirements.

Disclaimer
This report is AI-GENERATED using Large Language Models and WisPaper (A scholar search engine). It analyzes academic papers' tasks and contributions against retrieved prior work. While this system identifies POTENTIAL overlaps and novel directions, ITS COVERAGE IS NOT EXHAUSTIVE AND JUDGMENTS ARE APPROXIMATE. These results are intended to assist human reviewers and SHOULD NOT be relied upon as a definitive verdict on novelty.
NOTE that some papers exist in multiple, slightly different versions (e.g., with different titles or URLs). The system may retrieve several versions of the same underlying work. The current automated pipeline does not reliably align or distinguish these cases, so human reviewers will need to disambiguate them manually.
If you have any questions, please contact: mingzhang23@m.fudan.edu.cn

Overview

Overall Novelty Assessment

The paper formalizes a general framework for transforming language model distributions through deterministic string-to-string mappings representable as finite-state transducers. It occupies the 'Transducer-Based Language Model Transformation Theory' leaf, which contains only two papers including this one. This is a notably sparse research direction within the broader taxonomy of 50 papers across 22 leaf nodes, suggesting the work addresses a relatively underexplored theoretical niche. The sibling paper examines neural architectures through a transducer lens, whereas this work focuses on composing arbitrary language models with FSTs to marginalize over source strings.

The taxonomy reveals that most FST research concentrates on practical applications: speech recognition (5 papers across 3 leaves), morphological analysis (9 papers across 3 leaves), and machine translation (9 papers across 3 leaves). The theoretical frameworks branch, where this paper resides, is comparatively small with only 7 papers total across 3 leaves. Neighboring work includes core FST composition algorithms and probabilistic model conversion to FST representations, but these focus on optimization techniques and HMM/RNN conversion rather than the general transformation theory this paper develops. The scope note explicitly excludes empirical applications, reinforcing the paper's foundational theoretical positioning.

Among 18 candidates examined through limited semantic search, no contributions were clearly refuted. The 'general framework for transduced language models' examined 10 candidates with none providing overlapping prior work; 'algorithms for composing language models with FSTs' examined 1 candidate; and 'prefix decomposition of the precover' examined 7 candidates. This suggests that within the search scope, the specific formalization of FST-based language model transformation and the associated marginalization algorithms appear novel. However, the limited search scale (18 candidates, not exhaustive) means substantial related work may exist outside the top-K semantic matches examined.

Based on the limited literature search, the work appears to occupy a genuinely sparse theoretical area, with minimal direct competition in its specific leaf and few closely related papers in neighboring theoretical branches. The absence of refuting candidates across all three contributions, combined with the small size of the theoretical frameworks branch, suggests the formalization may represent a meaningful conceptual advance. However, the analysis covers only top-K semantic matches and does not guarantee comprehensive coverage of all relevant prior work in FST theory or language model transformation.

Taxonomy

Core-task Taxonomy Papers
50
3
Claimed Contributions
18
Contribution Candidate Papers Compared
0
Refutable Paper

Research Landscape Overview

Core task: transforming language models via finite-state transducers. The field of FST-based language processing spans a diverse set of branches, each addressing distinct aspects of how finite-state machinery can be applied to linguistic computation. At the highest level, the taxonomy divides into theoretical frameworks and algorithms that underpin FST operations (including composition, generalization, and transducer-based model transformation), practical applications in speech recognition and acoustic modeling (e.g., Speech Recognition FST[1], Delay CTC FST[29]), morphological analysis and generation for richly inflected languages (e.g., Inuit Morphological Networks[12], Hungarian Morphosyntax FST[38]), machine translation and transliteration tasks (e.g., Stochastic FST Translation[9], Romanized Arabic Transliteration[10]), and hybrid neural-FST architectures that integrate deep learning with classical transducer methods (e.g., Neural FST Error[14], RNN to FST[46]). Additional branches cover language model combination and adaptation, specialized applications such as text normalization and keyboard decoding, and emerging work on large language models and dialogue systems. Within this landscape, a particularly active line of inquiry concerns the theoretical underpinnings of transducer-based language model transformation, where researchers explore how FSTs can be used to systematically modify or constrain the output distributions of probabilistic models. Transducing Language Models[0] sits squarely in this theoretical branch, focusing on the formal mechanisms by which finite-state transducers can transform language model probabilities. This work contrasts with more application-driven efforts in speech or morphology, and it shares conceptual ground with Transformers as Transducers[32], which examines the expressive power of neural architectures through a transducer lens. While many branches emphasize engineering solutions for specific tasks, the transducer-based transformation theory branch addresses foundational questions about compositionality, expressiveness, and the algebraic properties of FST operations on language models, positioning Transducing Language Models[0] as a contribution to the formal toolkit that supports the broader ecosystem of FST applications.

Claimed Contributions

General framework for transduced language models

The authors introduce a foundational framework that formalizes language models obtained by applying deterministic string-to-string transformations (encoded as finite-state transducers) to existing language models, enabling inference-time adaptation without retraining.

10 retrieved papers
Algorithms for composing language models with FSTs

The authors develop exact and approximate algorithms that compose a language model with a finite-state transducer to compute probabilities over transformed strings by marginalizing over source strings, enabling sampling, scoring, and conditioning on transformed outputs.

1 retrieved paper
Prefix decomposition of the precover

The authors introduce a prefix decomposition method that decomposes the precover into quotient and remainder sets, enabling finite-time computation of prefix probabilities for transduced language models under certain conditions on the transformation.

7 retrieved papers

Core Task Comparisons

Comparisons with papers in the same taxonomy category

Contribution Analysis

Detailed comparisons for each claimed contribution

Contribution

General framework for transduced language models

The authors introduce a foundational framework that formalizes language models obtained by applying deterministic string-to-string transformations (encoded as finite-state transducers) to existing language models, enabling inference-time adaptation without retraining.

Contribution

Algorithms for composing language models with FSTs

The authors develop exact and approximate algorithms that compose a language model with a finite-state transducer to compute probabilities over transformed strings by marginalizing over source strings, enabling sampling, scoring, and conditioning on transformed outputs.

Contribution

Prefix decomposition of the precover

The authors introduce a prefix decomposition method that decomposes the precover into quotient and remainder sets, enabling finite-time computation of prefix probabilities for transduced language models under certain conditions on the transformation.