蒸留によるモデル圧縮

NISHIO Hirokazu [Translate]

蒸留によるモデル圧縮
現時点でのハードウェア制約ではデプロイできない
とはいえ、ハードウェア性能はどんどん上がるんだから、近い将来単なるアンサンブルで動くようになるのでは

教師モデルの出力で生徒モデルを学習
one-hotではなくSoftmaxの出力をそのまま使うケース
ソフトターゲットロス
ラベルスムージングに相当

1503.02531 Distilling the Knowledge in a Neural Network


Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
生徒モデルが教師モデルの注意を学ぶ

Born Again Neural Networks
教師と生徒で同じモデルを使った場合に生徒の方が性能が良くなる

Deep Mutual Learning
生徒同士で教え合う

"Engineer's way of creating knowledge" the English version of my book is now available on [Engineer's way of creating knowledge]

(C)NISHIO Hirokazu / Converted from [Scrapbox] at 11/23/2025, 5:49:04 PM[Edit]

Related Pages