Mila presents 70 papers at ICLR 2026, covering frontiers such as model merging and graph learning.

robot
Abstract generation in progress
ME News, April 23 (UTC+8), Mila announced that its researchers will present 70 papers at ICLR 2026 (Brazil). Highlights from the first day include: in model merging and fine-tuning, DisTaC achieves robust model merging through distillation of conditional task vectors; one study uses epsilon scheduling to mitigate the suboptimal transfer issue when fine-tuning non-robust pretrained models, and an oral presentation reveals the effectiveness of a single global merging strategy in decentralized learning; in the field of graph learning, GraphOmni proposes a benchmark framework for evaluating large language model performance on graph theory tasks, and another work clarifies the misunderstanding about Transformer oversmoothing; in reinforcement learning, SHAPO introduces sharpness-aware optimization for safe exploration, ARM-FM uses foundation models to automatically generate reward machines, hierarchical value decomposition offline reinforcement learning methods are applied to whole-body control, and Asymmetric Proximal Policy Optimization improves large language model reasoning ability through a small critic; in generative models, Efficient Regression-based Training of Normalizing Flows for Boltzmann Generators proposes an efficient regression training method, FALCON achieves few-step exact likelihood computation for continuous flows, and Contractive Diffusion Policies enhance the robustness of action diffusion through contractive score sampling. Regarding large language models: Landscape of Thoughts visualizes the reasoning process, Model Collapse has been redefined as a feature of machine forgetting rather than a defect, Beyond Multi-Token Prediction uses future summary pretraining, and Visual symbolic mechanisms explore symbolic processing in vision-language models. Other highlights include the high-resolution tropical tree canopy detection dataset SelvaBox, computationally efficient meta-generalization of learned optimizers µLO, the efficient modular library TGM for temporal graphs, and Robust Reward Modeling, which improves the robustness of reward modeling through causal rules. (Source: InFoQ)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments