Bitsandbytes documentation

SGD

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.49.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

SGD

Stochastic gradient descent (SGD) is a basic gradient descent optimizer to minimize loss given a set of model parameters and updates the parameters in the opposite direction of the gradient. The update is performed on a randomly sampled mini-batch of data from the dataset.

bitsandbytes also supports momentum and Nesterov momentum to accelerate SGD by adding a weighted average of past gradients to the current gradient.

SGD

class bitsandbytes.optim.SGD

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False optim_bits = 32 args = None min_8bit_size = 4096 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False optim_bits = 32 args = None min_8bit_size = 4096 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • optim_bits (int, defaults to 32) — The number of bits of the optimizer state.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

Base SGD optimizer.

SGD8bit

class bitsandbytes.optim.SGD8bit

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

8-bit SGD optimizer.

SGD32bit

class bitsandbytes.optim.SGD32bit

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 )

__init__

< >

( params lr momentum = 0 dampening = 0 weight_decay = 0 nesterov = False args = None min_8bit_size = 4096 )

Parameters

  • params (torch.tensor) — The input parameters to optimize.
  • lr (float) — The learning rate.
  • momentum (float, defaults to 0) — The momentum value speeds up the optimizer by taking bigger steps.
  • dampening (float, defaults to 0) — The dampening value reduces the momentum of the optimizer.
  • weight_decay (float, defaults to 0.0) — The weight decay value for the optimizer.
  • nesterov (bool, defaults to False) — Whether to use Nesterov momentum.
  • args (object, defaults to None) — An object with additional arguments.
  • min_8bit_size (int, defaults to 4096) — The minimum number of elements of the parameter tensors for 8-bit optimization.

32-bit SGD optimizer.

Update on GitHub