https://www.youtube.com/watch?v=OIenNRt2bjg

  • Inplace modification vs assigning new w -= 0.01*w.grad

w = w - 0.01* w.grad

are different. In second case, you are assigning a new tensor to w. It might not have initially declared properties like requires_grad = True.

Imp points

    # Cleaning grad  before applying backward
    if w.grad is not None:
        w.grad.zero_()
    
    loss_val.backward()

    # should not calculate grad of its own re-assignment
    with torch.no_grad():
        w -= 0.01 * w.grad
  • Broadcasting -torch.size(50) means 1 x 50. While broadcasting checking, the smaller dimension one is row vector(a single row)