イベント企画
トップコンファレンス4-2 情報論的学習理論と機械学習
9月4日9:30-12:00
第5イベント会場
第5イベント会場
座長:中村 篤祥(北海道大学)
9:30-9:50 講演(1) 対戦型バンディット問題におけるマルチプレイヤー手法 | |
Raveh Or(東京大学/理研AIP 大学院新領域創成科学研究科杉山・横矢・石田研究室 博士課程学生) | |
【原発表の書誌情報】Or Raveh, Junya Honda, Masashi Sugiyama; "Multi-Player Approaches for Dueling Bandits". Proceedings of The 28th International Conference on Artificial Intelligence and Statistics,2025, PMLR 258:1540-1548 | |
【概要】Fine-tuning deep networks with human preferences shows promise, but scaling to many users and recent complex data like videos requires distributed feedback collection. To address this, we introduce a multiplayer dueling bandit problem, highlighting that exploring non-informative candidate pairs becomes especially challenging in a collaborative environment. We demonstrate that the use of a Follow Your Leader black-box approach matches the asymptotic regret lower-bound when utilizing known dueling bandit algorithms as a foundation. Additionally, we propose and analyze a message-passing fully distributed approach with a novel Condorcet-Winner recommendation protocol, resulting in expedited exploration in the non-asymptotic regime, which reduces regret. Our experimental comparisons underscore the multiplayer algorithms’ efficacy in addressing the nuanced challenges of this setting. | |
![]() | 【略歴】I received my B.Sc. in Physics and Electrical Engineering (2017) and M.Sc. in Electrical Engineering (2020) from the Technion – Israel Institute of Technology. From 2021 to 2022, I was a research student in the Sugiyama–Yokoya–Ishida Laboratory at the University of Tokyo under the MEXT scholarship, where I have been pursuing my Ph.D. since 2022. My current research focuses on bandit algorithms and reinforcement learning, particularly in multi-agent settings. In addition, since 2022, I have been working as a part-time researcher at the RIKEN Center for Advanced Intelligence Project. |
9:50-10:10 講演(2) 停留点を求めるための適応的複雑性 | |
Huanjian Zhou(東京大学) | |
【原発表の書誌情報】Huanjian Zhou, Andi Han, Akiko Takeda, Masashi Sugiyama. "The Adaptive Complexity of Finding a Stationary Point." 38th Annual Conference on Learning Theory, (COLT2025). | |
【概要】In large-scale machine learning, highly parallel non-convex optimization is essential. We characterize the adaptive complexity, the minimal number of sequential rounds required to find a stationary point when many queries can be executed in parallel. In high dimensions, we prove tight lower bounds matching the performance of standard gradient- and higher-order methods. In constant dimensions, we present an algorithm achieving stationarity in a fixed number of rounds and establish a corresponding lower bound. All bounds are tight up to logarithmic factors. | |
![]() | 【略歴】Huanjian Zhou is a doctoral student at the Department of Complexity Science and Engineering, the Graduate School of Frontier Sciences, the University of Tokyo. He has been working on sampling, continuous optimization, and submodular optimization. |
10:10-10:30 講演(3) 深層偏ラベル学習アルゴリズムの現実的な評価 | |
Wang Wei(東京大学/理研AIP) | |
【原発表の書誌情報】W. Wang, D.-D. Wu, J. Wang, G. Niu, M.-L. Zhang, and M. Sugiyama. Realistic evaluation of deep partial-label learning algorithms. In: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025), 2025. | |
【概要】Partial-label learning (PLL) is a weakly supervised learning problem in which each example is associated with multiple candidate labels and only one is the true label. In recent years, many deep PLL algorithms have been developed to improve model performance. However, we find that some early developed algorithms are often underestimated and can outperform many later algorithms with complicated designs. In this paper, we delve into the empirical perspective of PLL and identify several critical but previously overlooked issues. First, model selection for PLL is non-trivial, but has never been systematically studied. Second, the experimental settings are highly inconsistent, making it difficult to evaluate the effectiveness of the algorithms. Third, there is a lack of real-world image datasets that can be compatible with modern network architectures. Based on these findings, we propose PLENCH, the first Partial-Label learning bENCHmark to systematically compare state-of-the-art deep PLL algorithms. We investigate the model selection problem for PLL for the first time, and propose novel model selection criteria with theoretical guarantees. We also create Partial-Label CIFAR-10 (PLCIFAR10), an image dataset of human-annotated partial labels collected from Amazon Mechanical Turk, to provide a testbed for evaluating the performance of PLL algorithms in more realistic scenarios. Researchers can quickly and conveniently perform a comprehensive and fair evaluation and verify the effectiveness of newly developed algorithms based on PLENCH. We hope that PLENCH will facilitate standardized, fair, and practical evaluation of PLL algorithms in the future. | |
![]() | 【略歴】Wei Wang is a Ph.D. student at the University of Tokyo, supervised by Prof. Masashi Sugiyama. He is also a Junior Research Associate at the Imperfect Information Learning Team, RIKEN Center for Advanced Intelligence Project. His research interests are data-centric machine learning and reliable machine learning. |
10:30-10:50 講演(4) 非定常環境におけるテスト時適応:適応的表現アライメントによるアプローチ | |
ZHANG Zhen-Yu(理研AIP) | |
【原発表の書誌情報】Zhen-Yu Zhang, Zhiyu Xie, Huaxiu Yao, and Masashi Sugiyama.: Test-time Adaptation in Non-stationary Environments via Adaptive Representation Alignment. In: Advances in Neural Information Processing Systems 37 (NeurIPS), pp.94607-94632 (2024). | |
【概要】Adapting to distribution shifts is a critical challenge in modern machine learning, especially as data in many real-world applications accumulate continuously in the form of streams. We investigate the problem of sequentially adapting a model to non-stationary environments, where the data distribution is continuously shifting and only a small amount of unlabeled data are available each time. Continual testtime adaptation methods have shown promising results by using reliable pseudolabels, but they still fall short in exploring representation alignment with the source domain in non-stationary environments. In this paper, we propose to leverage non-stationary representation learning to adaptively align the unlabeled data stream, with its changing distributions, to the source data representation using a sketch of the source data. To alleviate the data scarcity in non-stationary representation learning, we propose a novel adaptive representation alignment algorithm called Ada-ReAlign. This approach employs a group of base learners to explore different lengths of the unlabeled data stream, which are adaptively combined by a meta learner to handle unknown and continuously evolving data distributions. The proposed method comes with nice theoretical guarantees under convexity assumptions. Experiments on both benchmark datasets and a real-world application validate the effectiveness and adaptability of our proposed algorithm. | |
![]() | 【略歴】Zhen-Yu Zhang is a postdoctoral researcher in the Imperfect Information Learning Team at RIKEN Center for Advanced Intelligence Project, led by Professor Masashi Sugiyama. He obtained his Ph.D. degree from the Department of Computer Science and Technology at Nanjing University in 2022, where he was very fortunate to receive guidance from both Professor Yuan Jiang and Professor Zhi-Hua Zhou. |
10:50-11:10 講演(5) 安全制約を考慮した言語モデルの段階的アライメント | |
和地 瞭良(LINEヤフー株式会社 LINEヤフー研究所 上席研究員) | |
【原発表の書誌情報】Wachi, A., Tran, T., Sato, R., Tanabe, T., & Akimoto, Y. (2024). Stepwise alignment for constrained language model policy optimization. Advances in Neural Information Processing Systems, 37, 104471-104520. | |
【概要】AIシステムが現実世界で使われるためには、安全性と信頼性が不可欠である。 本論文では、人間の価値観とのアライメントを、「安全性の制約のもとで報酬を最大化する」という言語モデルの方策最適化問題として定式化し、それに対するアルゴリズムである SACPO を提案する。SACPOの核となるアイデアは、報酬と安全性を取り入れた最適方策は、報酬にアライメントされた方策から直接導き出せるというものであり、これは理論的にも裏付けられている。このアイデアに基づき、SACPOは、Direct Preference Optimization(DPO)といったシンプルで強力なアライメント手法を活用しつつ、各指標に対して段階的に言語モデルをアライメントしていく。SACPOは、シンプルさ、安定性、計算効率の高さ、アルゴリズムやデータセットに対する柔軟性といった複数の利点を持つ。実験では、SACPOがAlpaca-7Bを、従来の最先端手法よりも有用性と無害性の両面で優れた性能にファインチューニングできることを示した。 | |
![]() | 【略歴】IBM東京基礎研究所を経てLINEヤフー株式会社で勤務。強化学習の理論と応用の研究に従事。特に、安全性に関する制約条件を課した強化学習に興味がある。著書(共著)に『強化学習から信頼できる意思決定へ』。博士(情報工学) 。詳細については、https://akifumi-wachi-4.github.io/website/ (外部サイト)を参照。 |
11:10-11:30 講演(6) 異方性データを用いたシングルインデックスモデルの学習:バニラSGDの解析 | |
Braun Guillaume(理研AIP 特任研究員) | |
【原発表の書誌情報】Learning a Single Index Model from Anisotropic Data with Vanilla Stochastic Gradient Descent Guillaume Braun, Minh Ha Quang, Masaaki Imaizumi Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:1216-1224, 2025. | |
【概要】異方性ガウス分布に従う入力データに対して、シングルインデックスモデル(SIM)をバニラ確率的勾配降下法(SGD)で学習する際の動態を理論的に解析する。共分散行列を明示的に推定せずとも、SGDがその構造に自然に適応することを示し、有効次元に基づいてサンプル複雑度の上下界を導出する.理論的知見は数値実験によって検証される。 | |
![]() | 【略歴】2022年リール大学博士課程修了。2023年より理化学研究所AIPにて特任研究員として勤務。高次元統計や機械学習の理論に関心がある。 |
11:30-11:50 講演(7) Poisson-Dirac Neural Networksを用いた物理ドメインを横断した連成系のモデル化 | |
Khosrovian Razmik Arman(大阪大学 大学院基礎工学研究科システム創成専攻櫻間研究室 学生) | |
【原発表の書誌情報】Razmik Arman Khosrovian, Takaharu Yaguchi, Hiroaki Yoshimura, and Takashi Matsubara, “Poisson-Dirac Neural Networks for Modeling Coupled Dynamical Systems across Domains,” International Conference on Learning Representations (ICLR), Singapore, Apr, 2025. | |
【概要】深層学習は、支配方程式が未知の力学系のモデリングにおいて大きな成功を収めてきた。しかし、既存手法は機械系にしか適用できず、システムを一体的に扱うため連成系をそのままモデル化できない。そこで、ディラック構造に基づくPoisson-Dirac Neural Networks(PoDiNNs)を提案する。本手法では、複数の物理ドメインにまたがる様々なシステムや、それらを構成する素子間の相互作用とそれによる退化を統一的に表現することが可能である。 | |
![]() | 【略歴】2024年大阪大学基礎工学部システム科学科卒業.現在大阪大学基礎工学研究科修士課程に在籍. |