2025S-Seminar #6 / Deep Learning

2025.05.14

今回のゼミを担当したのは、B4大野さん、M1中澤さんと、M1中山さんでした。
This time, the seminar was led by B4 student Ono, and M1 students Nakazawa and Nakayama.

三人が分担して「人工知能学会監修. 深層学習」の第１章から第４章まで発表しました。大野さんが、階層型ニューラルネットワークによる深層学習の紹介と、深層ボルツマンマシンという学習モデルを発表しました。そして中澤さんが、事前学習とその周辺について発表しました。最後、中山さんが、大規模深層学習の実現技術について発表しました。
The three of them shared responsibilities and presented Chapters 1 through 4 of “Deep Learning”, a book supervised by the Japanese Society for Artificial Intelligence. Ono introduced deep learning using hierarchical neural networks and presented a learning model called the Deep Boltzmann Machine. Then, Nakazawa gave a presentation on pre-training and related topics. Finally, Nakayama presented on the technologies that enable large-scale deep learning.

下記は、担当者からもらった発表内容のまとめです。
The presenters summaried their contents as following.

大野（Ono）：
深層学習の基礎理論として、階層型ニューラルネットワークとボルツマンマシンについて幅広く説明を行いました。確定的ダイナミクスのニューラルネットワークとは異なり、確率的に結果を出力するボルツマンマシンは、マルコフ確率場の特殊な形であり、ノードのバイアスパラメータとリンクの結合パラメータを学習によって得ることで、生成モデルに近く、より確からしいモデルの推定を行います。また、組み合わせ爆発の問題を回避するために、完全二部グラフ上に定義した制限ボルツマンマシンや、それに階層構造を持たせた深層ボルツマンマシンの紹介も行いました。
As part of the foundational theory of deep learning, a broad explanation was given on hierarchical neural networks and Boltzmann machines. Unlike neural networks with deterministic dynamics, Boltzmann machines output results probabilistically. They are a special form of Markov random fields, and by learning the bias parameters of the nodes and the weight parameters of the links, they approach generative models and enable estimation of more plausible models. To address the problem of combinatorial explosion, the presentation also introduced restricted Boltzmann machines, which are defined on complete bipartite graphs, as well as deep Boltzmann machines that incorporate hierarchical structures.

中澤（Nakazawa）：
深層学習における学習には、過学習や勾配消失などの問題が生じ、それを解消する手法として主に事前学習を説明しました。事前学習とは勾配法の際に良い初期値を与えることで、良い初期値を調べる方法として確率的なモデルと確定的なモデルのアプローチがあります。確率的なモデルではギブスサンプリングやCD法などが用いられ、確定的なモデルでは積層自己符号化器による貪欲学習などによる手法が用いられることを紹介しました。
In deep learning, training often faces issues such as overfitting and vanishing gradients. As a method to address these problems, pre-training was primarily explained. Pre-training involves providing good initial values for gradient-based optimization. There are two main approaches to finding such initial values: probabilistic models and deterministic models. In probabilistic models, methods such as Gibbs sampling and Contrastive Divergence (CD) are used. In deterministic models, techniques such as greedy layer-wise training using stacked autoencoders are employed.

中山（Nakayama）：
大規模深層学習の実現技術を担当した。大規模なニューラルネットワークの実現には桁違いに大きな学習が必要になり、人工知能の実現には計算能力がカギとなる。誤差逆伝播法や確率的勾配降下法による深層学習の最適化や複数マシンの分散並列計算、GPU、InfiniBandによる物理的なデバイスの利用、バッチ正規化や知識の蒸留などの高速化手法、過学習制御であるDropOut、活性化関数のReLUやMaxOut、実装の正しさの確認方法を学習した。すごく難しいものだと思っていた深層学習が、ステップごとに分解して考えると少し身近に思えてきた。
I was in charge of the technologies enabling large-scale deep learning. To implement large-scale neural networks, an enormous amount of training is required, and computational power becomes the key to realizing artificial intelligence. We covered topics such as optimization of deep learning using backpropagation and stochastic gradient descent, distributed and parallel computing across multiple machines, the use of physical devices such as GPUs and InfiniBand, acceleration techniques like batch normalization and knowledge distillation, overfitting control methods such as DropOut, activation functions like ReLU and MaxOut, and methods for verifying implementation correctness. Although I had thought deep learning was extremely difficult, breaking it down step by step made it feel a bit more approachable.

Thank you for your presentations!

written by M2 Wei Miaoheng

2025S-Seminar #5 / BMSS Exercise & User Equilibrium with Variable Demand & Stochastic Network Loading Models

東大BinNと合同ゼミをしました