SGD

Stochastic gradient descent (SGD) is a basic gradient descent optimizer to minimize loss given a set of model parameters and updates the parameters in the opposite direction of the gradient. The update is performed on a randomly sampled mini-batch of data from the dataset.

bitsandbytes also supports momentum and Nesterov momentum to accelerate SGD by adding a weighted average of past gradients to the current gradient.

SGD

class bitsandbytes.optim.SGD

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseoptim_bits = 32args = Nonemin_8bit_size = 4096 )

init

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseoptim_bits = 32args = Nonemin_8bit_size = 4096 )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float) — The learning rate.
momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
args (object, defaults to None) — An object with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

Base SGD optimizer.

SGD8bit

class bitsandbytes.optim.SGD8bit

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseargs = Nonemin_8bit_size = 4096 )

init

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseargs = Nonemin_8bit_size = 4096 )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float) — The learning rate.
momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
args (object, defaults to None) — An object with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

8-bit SGD optimizer.

SGD32bit

class bitsandbytes.optim.SGD32bit

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseargs = Nonemin_8bit_size = 4096 )

init

< source >

( paramslrmomentum = 0dampening = 0weight_decay = 0nesterov = Falseargs = Nonemin_8bit_size = 4096 )

Parameters

params (torch.tensor) — The input parameters to optimize.
lr (float) — The learning rate.
momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
args (object, defaults to None) — An object with additional arguments.
min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

32-bit SGD optimizer.

Update on GitHub

Bitsandbytes

SGD

SGD

class bitsandbytes.optim.SGD

__init__

SGD8bit

class bitsandbytes.optim.SGD8bit

__init__

SGD32bit

class bitsandbytes.optim.SGD32bit

__init__

init

init

init