Pytorch Optimizer 3.1.1 | GitLocker.com Product

Description:

pytorch optimizer 3.1.1

pytorch-optimizer

Build

Quality

Package

Status

License

pytorch-optimizer is optimizer & lr scheduler collections in PyTorch.
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
Currently, 75 optimizers (+ bitsandbytes, qgalore), 16 lr schedulers, and 13 loss functions are supported!
Highly inspired by pytorch-optimizer.
Getting Started
For more, see the documentation.
Most optimizers are under MIT or Apache 2.0 license, but a few optimizers like Fromage, Nero have CC BY-NC-SA 4.0 license, which is non-commercial.
So, please double-check the license before using it at your work.
Installation
$ pip3 install pytorch-optimizer

From v2.12.0, v3.1.0, you can use bitsandbytes, q-galore-torch optimizers respectively!
please check the bnb requirements, q-galore-torch installation
before installing it.
From v3.0.0, drop Python 3.7 support. However, you can still use this package with Python 3.7 by installing with --ignore-requires-python option.
$ pip install "pytorch-optimizer[bitsandbytes]"

Simple Usage
from pytorch_optimizer import AdamP

model = YourModel()
optimizer = AdamP(model.parameters())

# or you can use optimizer loader, simply passing a name of the optimizer.

from pytorch_optimizer import load_optimizer

optimizer = load_optimizer(optimizer='adamp')(model.parameters())

# if you install `bitsandbytes` optimizer, you can use `8-bit` optimizers from `pytorch-optimizer`.

from pytorch_optimizer import load_optimizer

opt = load_optimizer(optimizer='bnb_adamw8bit')
optimizer = opt(model.parameters())

Also, you can load the optimizer via torch.hub.
import torch

model = YourModel()
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
optimizer = opt(model.parameters())

If you want to build the optimizer with parameters & configs, there's create_optimizer() API.
from pytorch_optimizer import create_optimizer

optimizer = create_optimizer(
model,
'adamp',
lr=1e-3,
weight_decay=1e-3,
use_gc=True,
use_lookahead=True,
)

Supported Optimizers
You can check the supported optimizers with below code.
from pytorch_optimizer import get_supported_optimizers

supported_optimizers = get_supported_optimizers()

Optimizer
Description
Official Code
Paper
Citation

AdaBelief
Adapting Step-sizes by the Belief in Observed Gradients
github
https://arxiv.org/abs/2010.07468
cite

AdaBound
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
github
https://openreview.net/forum?id=Bkg3g2R9FX
cite

AdaHessian
An Adaptive Second Order Optimizer for Machine Learning
github
https://arxiv.org/abs/2006.00719
cite

AdamD
Improved bias-correction in Adam

https://arxiv.org/abs/2110.10828
cite

AdamP
Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights
github
https://arxiv.org/abs/2006.08217
cite

diffGrad
An Optimization Method for Convolutional Neural Networks
github
https://arxiv.org/abs/1909.11015v3
cite

MADGRAD
A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic
github
https://arxiv.org/abs/2101.11075
cite

RAdam
On the Variance of the Adaptive Learning Rate and Beyond
github
https://arxiv.org/abs/1908.03265
cite

Ranger
a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer
github
https://bit.ly/3zyspC3
cite

Ranger21
a synergistic deep learning optimizer
github
https://arxiv.org/abs/2106.13731
cite

Lamb
Large Batch Optimization for Deep Learning
github
https://arxiv.org/abs/1904.00962
cite

Shampoo
Preconditioned Stochastic Tensor Optimization
github
https://arxiv.org/abs/1802.09568
cite

Nero
Learning by Turning: Neural Architecture Aware Optimisation
github
https://arxiv.org/abs/2102.07227
cite

Adan
Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
github
https://arxiv.org/abs/2208.06677
cite

Adai
Disentangling the Effects of Adaptive Learning Rate and Momentum
github
https://arxiv.org/abs/2006.15815
cite

SAM
Sharpness-Aware Minimization
github
https://arxiv.org/abs/2010.01412
cite

ASAM
Adaptive Sharpness-Aware Minimization
github
https://arxiv.org/abs/2102.11600
cite

GSAM
Surrogate Gap Guided Sharpness-Aware Minimization
github
https://openreview.net/pdf?id=edONMAnhLu-
cite

D-Adaptation
Learning-Rate-Free Learning by D-Adaptation
github
https://arxiv.org/abs/2301.07733
cite

AdaFactor
Adaptive Learning Rates with Sublinear Memory Cost
github
https://arxiv.org/abs/1804.04235
cite

Apollo
An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
github
https://arxiv.org/abs/2009.13586
cite

NovoGrad
Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
github
https://arxiv.org/abs/1905.11286
cite

Lion
Symbolic Discovery of Optimization Algorithms
github
https://arxiv.org/abs/2302.06675
cite

Ali-G
Adaptive Learning Rates for Interpolation with Gradients
github
https://arxiv.org/abs/1906.05661
cite

SM3
Memory-Efficient Adaptive Optimization
github
https://arxiv.org/abs/1901.11150
cite

AdaNorm
Adaptive Gradient Norm Correction based Optimizer for CNNs
github
https://arxiv.org/abs/2210.06364
cite

RotoGrad
Gradient Homogenization in Multitask Learning
github
https://openreview.net/pdf?id=T8wHz4rnuGL
cite

A2Grad
Optimal Adaptive and Accelerated Stochastic Gradient Descent
github
https://arxiv.org/abs/1810.00553
cite

AccSGD
Accelerating Stochastic Gradient Descent For Least Squares Regression
github
https://arxiv.org/abs/1704.08227
cite

SGDW
Decoupled Weight Decay Regularization
github
https://arxiv.org/abs/1711.05101
cite

ASGD
Adaptive Gradient Descent without Descent
github
https://arxiv.org/abs/1910.09529
cite

Yogi
Adaptive Methods for Nonconvex Optimization

NIPS 2018
cite

SWATS
Improving Generalization Performance by Switching from Adam to SGD

https://arxiv.org/abs/1712.07628
cite

Fromage
On the distance between two neural networks and the stability of learning
github
https://arxiv.org/abs/2002.03432
cite

MSVAG
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
github
https://arxiv.org/abs/1705.07774
cite

AdaMod
An Adaptive and Momental Bound Method for Stochastic Learning
github
https://arxiv.org/abs/1910.12249
cite

AggMo
Aggregated Momentum: Stability Through Passive Damping
github
https://arxiv.org/abs/1804.00325
cite

QHAdam
Quasi-hyperbolic momentum and Adam for deep learning
github
https://arxiv.org/abs/1810.06801
cite

PID
A PID Controller Approach for Stochastic Optimization of Deep Networks
github
CVPR 18
cite

Gravity
a Kinematic Approach on Optimization in Deep Learning
github
https://arxiv.org/abs/2101.09192
cite

AdaSmooth
An Adaptive Learning Rate Method based on Effective Ratio

https://arxiv.org/abs/2204.00825v1
cite

SRMM
Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates
github
https://arxiv.org/abs/2201.01652
cite

AvaGrad
Domain-independent Dominance of Adaptive Methods
github
https://arxiv.org/abs/1912.01823
cite

PCGrad
Gradient Surgery for Multi-Task Learning
github
https://arxiv.org/abs/2001.06782
cite

AMSGrad
On the Convergence of Adam and Beyond

https://openreview.net/pdf?id=ryQu7f-RZ
cite

Lookahead
k steps forward, 1 step back
github
https://arxiv.org/abs/1907.08610
cite

PNM
Manipulating Stochastic Gradient Noise to Improve Generalization
github
https://arxiv.org/abs/2103.17182
cite

GC
Gradient Centralization
github
https://arxiv.org/abs/2004.01461
cite

AGC
Adaptive Gradient Clipping
github
https://arxiv.org/abs/2102.06171
cite

Stable WD
Understanding and Scheduling Weight Decay
github
https://arxiv.org/abs/2011.11152
cite

Softplus T
Calibrating the Adaptive Learning Rate to Improve Convergence of ADAM

https://arxiv.org/abs/1908.00700
cite

Un-tuned w/u
On the adequacy of untuned warmup for adaptive optimization

https://arxiv.org/abs/1910.04209
cite

Norm Loss
An efficient yet effective regularization method for deep neural networks

https://arxiv.org/abs/2103.06583
cite

AdaShift
Decorrelation and Convergence of Adaptive Learning Rate Methods
github
https://arxiv.org/abs/1810.00143v4
cite

AdaDelta
An Adaptive Learning Rate Method

https://arxiv.org/abs/1212.5701v1
cite

Amos
An Adam-style Optimizer with Adaptive Weight Decay towards Model-Oriented Scale
github
https://arxiv.org/abs/2210.11693
cite

SignSGD
Compressed Optimisation for Non-Convex Problems
github
https://arxiv.org/abs/1802.04434
cite

Sophia
A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
github
https://arxiv.org/abs/2305.14342
cite

Prodigy
An Expeditiously Adaptive Parameter-Free Learner
github
https://arxiv.org/abs/2306.06101
cite

PAdam
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
github
https://arxiv.org/abs/1806.06763
cite

LOMO
Full Parameter Fine-tuning for Large Language Models with Limited Resources
github
https://arxiv.org/abs/2306.09782
cite

AdaLOMO
Low-memory Optimization with Adaptive Learning Rate
github
https://arxiv.org/abs/2310.10195
cite

Tiger
A Tight-fisted Optimizer, an optimizer that is extremely budget-conscious
github

cite

CAME
Confidence-guided Adaptive Memory Efficient Optimization
github
https://aclanthology.org/2023.acl-long.243/
cite

WSAM
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
github
https://arxiv.org/abs/2305.15817
cite

Aida
A DNN Optimizer that Improves over AdaBelief by Suppression of the Adaptive Stepsize Range
github
https://arxiv.org/abs/2203.13273
cite

GaLore
Memory-Efficient LLM Training by Gradient Low-Rank Projection
github
https://arxiv.org/abs/2403.03507
cite

Adalite
Adalite optimizer
github
https://github.com/VatsaDev/adalite
cite

bSAM
SAM as an Optimal Relaxation of Bayes
github
https://arxiv.org/abs/2210.01620
cite

Schedule-Free
Schedule-Free Optimizers
github
https://github.com/facebookresearch/schedule_free
cite

FAdam
Adam is a natural gradient optimizer using diagonal empirical Fisher information
github
https://arxiv.org/abs/2405.12807
cite

Grokfast
Accelerated Grokking by Amplifying Slow Gradients
github
https://arxiv.org/abs/2405.20233
cite

Kate
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
github
https://arxiv.org/abs/2403.02648
cite

StableAdamW
Stable and low-precision training for large-scale vision-language models

https://arxiv.org/abs/2304.13013
cite

AdamMini
Use Fewer Learning Rates To Gain More
github
https://arxiv.org/abs/2406.16793
cite

TRAC
Adaptive Parameter-free Optimization
github
https://arxiv.org/abs/2405.16642
cite

AdamG
Towards Stability of Parameter-free Optimization

https://arxiv.org/abs/2405.04376
cite

Supported LR Scheduler
You can check the supported learning rate schedulers with below code.
from pytorch_optimizer import get_supported_lr_schedulers

supported_lr_schedulers = get_supported_lr_schedulers()

LR Scheduler
Description
Official Code
Paper
Citation

Explore-Exploit
Wide-minima Density Hypothesis and the Explore-Exploit Learning Rate Schedule

https://arxiv.org/abs/2003.03977
cite

Chebyshev
Acceleration via Fractal Learning Rate Schedules

https://arxiv.org/abs/2103.01338
cite

REX
Revisiting Budgeted Training with an Improved Schedule
github
https://arxiv.org/abs/2107.04197
cite

WSD
Warmup-Stable-Decay learning rate scheduler
github
https://arxiv.org/abs/2404.06395
cite

Supported Loss Function
You can check the supported loss functions with below code.
from pytorch_optimizer import get_supported_loss_functions

supported_loss_functions = get_supported_loss_functions()

Loss Functions
Description
Official Code
Paper
Citation

Label Smoothing
Rethinking the Inception Architecture for Computer Vision

https://arxiv.org/abs/1512.00567
cite

Focal
Focal Loss for Dense Object Detection

https://arxiv.org/abs/1708.02002
cite

Focal Cosine
Data-Efficient Deep Learning Method for Image Classification Using Data Augmentation, Focal Cosine Loss, and Ensemble

https://arxiv.org/abs/2007.07805
cite

LDAM
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
github
https://arxiv.org/abs/1906.07413
cite

Jaccard (IOU)
IoU Loss for 2D/3D Object Detection

https://arxiv.org/abs/1908.03851
cite

Bi-Tempered
The Principle of Unchanged Optimality in Reinforcement Learning Generalization

https://arxiv.org/abs/1906.03361
cite

Tversky
Tversky loss function for image segmentation using 3D fully convolutional deep networks

https://arxiv.org/abs/1706.05721
cite

Lovasz Hinge
A tractable surrogate for the optimization of the intersection-over-union measure in neural networks
github
https://arxiv.org/abs/1705.08790
cite

Useful Resources
Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in Ranger21 optimizer.
Also, most of the captures are taken from Ranger21 paper.

Adaptive Gradient Clipping
Gradient Centralization
Softplus Transformation

Gradient Normalization
Norm Loss
Positive-Negative Momentum

Linear learning rate warmup
Stable weight decay
Explore-exploit learning rate schedule

Lookahead
Chebyshev learning rate schedule
(Adaptive) Sharpness-Aware Minimization

On the Convergence of Adam and Beyond
Improved bias-correction in Adam
Adaptive Gradient Norm Correction

Adaptive Gradient Clipping
This idea originally proposed in NFNet (Normalized-Free Network) paper. AGC (Adaptive Gradient Clipping) clips gradients based on the unit-wise ratio of gradient norms to parameter norms.

code : github
paper : arXiv

Gradient Centralization

Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.

code : github
paper : arXiv

Softplus Transformation
By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.

paper : arXiv

Gradient Normalization
Norm Loss

paper : arXiv

Positive-Negative Momentum

code : github
paper : arXiv

Linear learning rate warmup

paper : arXiv

Stable weight decay

code : github
paper : arXiv

Explore-exploit learning rate schedule

code : github
paper : arXiv

Lookahead
k steps forward, 1 step back. Lookahead consisting of keeping an exponential moving average of the weights that is updated and substituted to the current weights every k lookahead steps (5 by default).
Chebyshev learning rate schedule
Acceleration via Fractal Learning Rate Schedules.
(Adaptive) Sharpness-Aware Minimization
Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
On the Convergence of Adam and Beyond
Convergence issues can be fixed by endowing such algorithms with 'long-term memory' of past gradients.
Improved bias-correction in Adam
With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.
Adaptive Gradient Norm Correction
Correcting the norm of a gradient in each iteration based on the adaptive training history of gradient norm.
Frequently asked questions
here
Visualization
here
Citation
Please cite the original authors of optimization algorithms. You can easily find it in the above table!
If you use this software, please cite it below. Or you can get it from "cite this repository" button.
@software{Kim_pytorch_optimizer_optimizer_2021,
author = {Kim, Hyeongchan},
month = jan,
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
url = {https://github.com/kozistr/pytorch_optimizer},
version = {3.1.0},
year = {2021}
}

Maintainer
Hyeongchan Kim / @kozistr

Overview

For personal and professional use. You cannot resell or redistribute these repositories in their original state.

You're allowed to use the code bits in the repositories in unlimited projects.
Attribution is not required to use the code bits.

What you can do with it

Use them freely in your personal and professional work.

What you can't do with it

Don't be greedy. Selling or distributing these repositories in their original state is prohibited.

zed

pytorch_optimizer 3.1.1

Languages

Categories

Description:

License:

Share

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4

pytorch_optimizer 3.1.1

Languages

Categories

Description:

License:

Share

Customer Reviews

License

Overview

What you can do with it

What you can't do with it

Related Products

Views For YouTube Bot writed on Python

AI-Web-Scraper

quivr

roop

zed

More From This Creator

apiverve-randomquote 1.1.4

apiverve-randomidentitygenerator 1.1.4

apiverve-randomidentity 1.0.11

apiverve-randomfacts 1.1.4

apiverve-mortgagecalculator 1.1.4