Technical Engineering of Distributed Data intelligence (TEDDi)

TEDDi uses a cosine decay learning rate schedule. This means that the learning rate is gradually decreased over the course of training, following a cosine function. The learning rate is set to a maximum of 6e-5 at the start of training and then gradually decreased to a final learning rate of 6e-6.

TEDDi uses, a method for adjusting learning rates in deep learning models, mimics the shape of a cosine function. It starts with a warm-up period where the learning rate gradually increases, allowing the model to explore the error landscape more effectively. During training, the learning rate follows a cosine curve, gradually decreasing until it reaches a predetermined minimum or epsilon value. This decay schedule prevents oscillations and helps the model converge to deeper, narrower regions of the loss function, potentially avoiding local minima. Despite its advantages, cosine decay requires careful selection of the initial learning rate and can lead to plateaus in learning. Advanced methods like Adam and RMSProp can complement cosine decay for even better results.

Learn More: About Fine Tune