Oja’s learning rule

Published: December 19, 2025

📝 Idea / Thought

The invention of the Oja’s rule is to overcome the problem of unbounded growth when training the neural network with the Hebbian learning rules

\[y_i = \sum_j w_{ij} x_j\]

Oja’s rule

In Oja’s rule, it used the multiplicative constrains to re-normalize the weights, so that it will converge to the first principal component. But keep in mind that it’s only for the one neuron Oja’s model; If the model has multiple neurons following Oja’s rule, it will converge to find the subspace of the two eigenvectors, not exactly the two eigenvectors.

How Oja’s rule is derived

we already know the classic Hebbian learning is $\mathbf{y = w x}$ When updated weight, the learning rule is: $\mathbf{\Delta w = yx}$

it means that when the presynaptic neurons $\mathbf{x}$ and postsynaptic neurons $\mathbf{y}$ fire together, they will update the weights more. We can also write it in this way: $\mathbf{w(t+1) = w(t) + \eta yx}$ To avoid unbounded weight growth, for each step, the updated weight is re-normalized by the scalar term itself:

$\mathbf{w(t+1) = \Delta w' = \frac{w'}{||w'||}}$ As we already know $\mathbf{w' = w(t) + \eta yx}$ We can calculate the $\mathbf{||w’||^2}$ as $\begin{align} \mathbf{||w'||^2 = (w(t))^2 + 2\eta yxw' + (\eta yx)^2} \\ =\mathbf{||w'||^2 + 2 \eta y^2 + \eta^2 y^2 ||x||^2} \\ =1 + 2\eta y^2 + \eta^2 y^2 ||x||^2 \end{align}$

Then, let $\epsilon = 2\eta y^2 + \eta^2 y^2 ||x||^2$ : $\begin{align} ||w'||^2 = 1+\epsilon, \\ ||w'||= \sqrt{1+\epsilon} \\ = (1+\epsilon)^{\frac{1}{2}} \end{align}$ Then, get the approximate value of $||w’||$ by Taylor series: $\begin{align} ||w'||= \sqrt{1+\epsilon} \approx 1 + \frac{1}{2}\epsilon+ O(\eta^2) \approx 1 + \eta y^2 + O(\eta^2) \end{align}$ Let $\delta = \eta y^2 + O(\eta^2)$, and approximate by Taylor series: $\frac{1}{||w'||} = (1+\delta)^{-1} \approx 1-\delta \approx 1- \eta y^2 - O(\eta^2)$ After we have all elements, we can calculate $\mathbf{w(t+1)}$ now: $\begin{align} \mathbf{w(t+1) = \frac{w'}{||w'||} = (w(t)+\eta yx) (1-\eta y^2 - O(\eta^2))} \\ \approx \mathbf{w(t)(1-\eta y^2) + \eta yx(1-\eta y^2)} \\ \approx \mathbf{w(t) + \eta yx - \eta y^2 w(t)- \eta^2y^3x} \end{align}$ Finally, we moved $\mathbf{w(t)}$ to the left hand side: $\begin{align} \mathbf{w(t+1) - w(t) = \eta(yx-y^2w) - \eta^2 y^3x} \\ \end{align}$ The last term will become $O(\eta^2)$, we will have Oja’s rule: $\mathbf{\Delta w = \eta(yx - y^2w)}$

Here is the multiple neuron version of the Oja’s learning rule $\Delta w_{ij} = \alpha(x_j y_i -y_i \sum_{k=1}^m w_{kj}y_k)$

Regularization term in Oja’s rule

In the last term of the Oja’s rule, the $\mathbf{y^2w}$ can be $diag(y)^2 w$ or $\mathbf{YY^T}$.

diagonal term: All the neurons learn the first PCs
off-diagonal term: Neurons learn PC subspace

Share on

Twitter Facebook LinkedIn

Yile Wang

📝 Idea / Thought

Oja’s rule

How Oja’s rule is derived

Regularization term in Oja’s rule

Share on