Does learning rate depend on input and output range?

Question

I watched hours of videos on gradient descent and still feel pretty confused. Let's say I have a "model":

y = x * w

I use 2 as my target w so my training set is:

{ x, y } = {{ 1, 2 }, { 100, 200}}

I start with w of 1.

This means that losses are 1 and 10000. Loss gradients (2(y - y^)) = { -1, -100 }. w gradient is (-1 + (-100 * 100))/2 = -50000.5

This means I need a tiny learning rate.

Meanwhile, for a data set of

{ x, y } = {{ 1, 2 }, { 10, 20 }}

Gradient is (-1 + (-10) * 10)/2 = 50.5, which enables me to increase the learning rate.

Am I missing something? Should I divide by x or the loss somewhere so I could use the same learning rate?

Karl · Accepted Answer · 2023-12-05 21:52:09Z

You're noticing that the magnitude of the loss value influences the magnitude of the gradient. This is true, but not necessarily something you need to adapt the learning rate around.

Neural networks struggle with wide ranges of values. It tends to lead to numerical instability, or large magnitude loss terms dominating small magnitude loss terms.

Typically, we deal with this by preprocessing the data to have a more reasonable value range. For example, your toy dataset { x, y } = {{ 1, 2 }, { 100, 200}} crosses several orders of magnitude in both the input and output values. We might preprocess the data by transforming it with x' = log(x+1). This would give us a processed dataset of { x, y } = {{ 0.693, 1.098 }, { 4.615, 5.303}}

With the un-normalized dataset:

w = nn.Parameter(torch.tensor(1.)) x = torch.tensor([1, 100]).float() y = torch.tensor([2, 200]).float() y_pred = x*w loss = nn.functional.mse_loss(y_pred, y) loss.backward() w.grad >tensor(-10001.)

With normalization

w = nn.Parameter(torch.tensor(1.)) x = torch.tensor([1, 10]).float().log1p() y = torch.tensor([2, 20]).float().log1p() y_pred = x*w loss = nn.functional.mse_loss(y_pred, y) loss.backward() w.grad >tensor(-1.8316)

You can also control the magnitude of the gradient with gradient clipping.

Thank you, really helpful. Will read more on topics you’ve touched. — Eugene
– Eugene, Commented Dec 6, 2023 at 2:34

Stack Exchange Network

Does learning rate depend on input and output range?

1 Answer 1

Hot Network Questions

Does learning rate depend on input and output range?

1 Answer 1

Related

Hot Network Questions