Training Metrics
Last updated
Last updated
The Lumo 8B Instruct dataset was used to fine-tune the Lumo model. The following metrics were closely monitored during the training process:
Training Loss:
The primary metric used to evaluate the model's performance during training.
Calculated using the cross-entropy loss function, which measures the difference between the model's predicted probabilities and the true probabilities of the next token in the sequence.
Lower training loss generally indicates better model performance.
Validation Loss:
Calculated on the validation set during each training epoch.
Used to monitor the model's performance on unseen data and detect overfitting.
Perplexity:
Measures the average probability of the next token in the sequence.
Lower perplexity indicates that the model is better at predicting the next token, suggesting a better understanding of the data.
Training Process
Optimizer: The AdamW optimizer was used to update the model's parameters during training.
Learning Rate: The learning rate was set to 3e-4.
Gradient Accumulation: Gradient accumulation was used to effectively train the model with smaller batch sizes, which can improve training stability and reduce memory consumption.
Learning Rate Scheduler: A StepLR scheduler was used to adjust the learning rate during training, allowing the model to converge more effectively.
By carefully monitoring these metrics and adjusting training hyperparameters as needed, the Lumo model was successfully fine-tuned on the Lumo 8B Instruct dataset, achieving state-of-the-art performance on Solana-related tasks.