完成训练模块的转移

2025-04-17 11:03:05 +08:00
parent 4439687870
commit 74e8f0d415
188 changed files with 32931 additions and 70 deletions
--- a/yolov5/utils/loggers/comet/README.md
+++ b/yolov5/utils/loggers/comet/README.md
@@ -0,0 +1,250 @@
+<img src="https://cdn.comet.ml/img/notebook_logo.png">
+
+# YOLOv5 with Comet
+
+This guide will cover how to use YOLOv5 with [Comet](https://bit.ly/yolov5-readme-comet2)
+
+# About Comet
+
+Comet builds tools that help data scientists, engineers, and team leaders accelerate and optimize machine learning and deep learning models.
+
+Track and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with [Comet Custom Panels](https://www.comet.com/docs/v2/guides/comet-dashboard/code-panels/about-panels/?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)! Comet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes!
+
+# Getting Started
+
+## Install Comet
+
+```shell
+pip install comet_ml
+```
+
+## Configure Comet Credentials
+
+There are two ways to configure Comet with YOLOv5.
+
+You can either set your credentials through environment variables
+
+**Environment Variables**
+
+```shell
+export COMET_API_KEY=<Your Comet API Key>
+export COMET_PROJECT_NAME=<Your Comet Project Name> # This will default to 'yolov5'
+```
+
+Or create a `.comet.config` file in your working directory and set your credentials there.
+
+**Comet Configuration File**
+
+```
+[comet]
+api_key=<Your Comet API Key>
+project_name=<Your Comet Project Name> # This will default to 'yolov5'
+```
+
+## Run the Training Script
+
+```shell
+# Train YOLOv5s on COCO128 for 5 epochs
+python train.py --img 640 --batch 16 --epochs 5 --data coco128.yaml --weights yolov5s.pt
+```
+
+That's it! Comet will automatically log your hyperparameters, command line arguments, training and validation metrics. You can visualize and analyze your runs in the Comet UI
+
+<img width="1920" alt="yolo-ui" src="https://user-images.githubusercontent.com/26833433/202851203-164e94e1-2238-46dd-91f8-de020e9d6b41.png">
+
+# Try out an Example!
+
+Check out an example of a [completed run here](https://www.comet.com/examples/comet-example-yolov5/a0e29e0e9b984e4a822db2a62d0cb357?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&xAxis=step&utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)
+
+Or better yet, try it out yourself in this Colab Notebook
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/comet-ml/comet-examples/blob/master/integrations/model-training/yolov5/notebooks/Comet_and_YOLOv5.ipynb)
+
+# Log automatically
+
+By default, Comet will log the following items
+
+## Metrics
+
+- Box Loss, Object Loss, Classification Loss for the training and validation data
+- mAP_0.5, mAP_0.5:0.95 metrics for the validation data.
+- Precision and Recall for the validation data
+
+## Parameters
+
+- Model Hyperparameters
+- All parameters passed through the command line options
+
+## Visualizations
+
+- Confusion Matrix of the model predictions on the validation data
+- Plots for the PR and F1 curves across all classes
+- Correlogram of the Class Labels
+
+# Configure Comet Logging
+
+Comet can be configured to log additional data either through command line flags passed to the training script or through environment variables.
+
+```shell
+export COMET_MODE=online # Set whether to run Comet in 'online' or 'offline' mode. Defaults to online
+export COMET_MODEL_NAME=<your model name> #Set the name for the saved model. Defaults to yolov5
+export COMET_LOG_CONFUSION_MATRIX=false # Set to disable logging a Comet Confusion Matrix. Defaults to true
+export COMET_MAX_IMAGE_UPLOADS=<number of allowed images to upload to Comet> # Controls how many total image predictions to log to Comet. Defaults to 100.
+export COMET_LOG_PER_CLASS_METRICS=true # Set to log evaluation metrics for each detected class at the end of training. Defaults to false
+export COMET_DEFAULT_CHECKPOINT_FILENAME=<your checkpoint filename> # Set this if you would like to resume training from a different checkpoint. Defaults to 'last.pt'
+export COMET_LOG_BATCH_LEVEL_METRICS=true # Set this if you would like to log training metrics at the batch level. Defaults to false.
+export COMET_LOG_PREDICTIONS=true # Set this to false to disable logging model predictions
+```
+
+## Logging Checkpoints with Comet
+
+Logging Models to Comet is disabled by default. To enable it, pass the `save-period` argument to the training script. This will save the logged checkpoints to Comet based on the interval value provided by `save-period`
+
+```shell
+python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data coco128.yaml \
+--weights yolov5s.pt \
+--save-period 1
+```
+
+## Logging Model Predictions
+
+By default, model predictions (images, ground truth labels and bounding boxes) will be logged to Comet.
+
+You can control the frequency of logged predictions and the associated images by passing the `bbox_interval` command line argument. Predictions can be visualized using Comet's Object Detection Custom Panel. This frequency corresponds to every Nth batch of data per epoch. In the example below, we are logging every 2nd batch of data for each epoch.
+
+**Note:** The YOLOv5 validation dataloader will default to a batch size of 32, so you will have to set the logging frequency accordingly.
+
+Here is an [example project using the Panel](https://www.comet.com/examples/comet-example-yolov5?shareable=YcwMiJaZSXfcEXpGOHDD12vA1&utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)
+
+```shell
+python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data coco128.yaml \
+--weights yolov5s.pt \
+--bbox_interval 2
+```
+
+### Controlling the number of Prediction Images logged to Comet
+
+When logging predictions from YOLOv5, Comet will log the images associated with each set of predictions. By default a maximum of 100 validation images are logged. You can increase or decrease this number using the `COMET_MAX_IMAGE_UPLOADS` environment variable.
+
+```shell
+env COMET_MAX_IMAGE_UPLOADS=200 python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data coco128.yaml \
+--weights yolov5s.pt \
+--bbox_interval 1
+```
+
+### Logging Class Level Metrics
+
+Use the `COMET_LOG_PER_CLASS_METRICS` environment variable to log mAP, precision, recall, f1 for each class.
+
+```shell
+env COMET_LOG_PER_CLASS_METRICS=true python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data coco128.yaml \
+--weights yolov5s.pt
+```
+
+## Uploading a Dataset to Comet Artifacts
+
+If you would like to store your data using [Comet Artifacts](https://www.comet.com/docs/v2/guides/data-management/using-artifacts/#learn-more?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github), you can do so using the `upload_dataset` flag.
+
+The dataset be organized in the way described in the [YOLOv5 documentation](https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/). The dataset config `yaml` file must follow the same format as that of the `coco128.yaml` file.
+
+```shell
+python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data coco128.yaml \
+--weights yolov5s.pt \
+--upload_dataset
+```
+
+You can find the uploaded dataset in the Artifacts tab in your Comet Workspace <img width="1073" alt="artifact-1" src="https://user-images.githubusercontent.com/7529846/186929193-162718bf-ec7b-4eb9-8c3b-86b3763ef8ea.png">
+
+You can preview the data directly in the Comet UI. <img width="1082" alt="artifact-2" src="https://user-images.githubusercontent.com/7529846/186929215-432c36a9-c109-4eb0-944b-84c2786590d6.png">
+
+Artifacts are versioned and also support adding metadata about the dataset. Comet will automatically log the metadata from your dataset `yaml` file <img width="963" alt="artifact-3" src="https://user-images.githubusercontent.com/7529846/186929256-9d44d6eb-1a19-42de-889a-bcbca3018f2e.png">
+
+### Using a saved Artifact
+
+If you would like to use a dataset from Comet Artifacts, set the `path` variable in your dataset `yaml` file to point to the following Artifact resource URL.
+
+```
+# contents of artifact.yaml file
+path: "comet://<workspace name>/<artifact name>:<artifact version or alias>"
+```
+
+Then pass this file to your training script in the following way
+
+```shell
+python train.py \
+--img 640 \
+--batch 16 \
+--epochs 5 \
+--data artifact.yaml \
+--weights yolov5s.pt
+```
+
+Artifacts also allow you to track the lineage of data as it flows through your Experimentation workflow. Here you can see a graph that shows you all the experiments that have used your uploaded dataset. <img width="1391" alt="artifact-4" src="https://user-images.githubusercontent.com/7529846/186929264-4c4014fa-fe51-4f3c-a5c5-f6d24649b1b4.png">
+
+## Resuming a Training Run
+
+If your training run is interrupted for any reason, e.g. disrupted internet connection, you can resume the run using the `resume` flag and the Comet Run Path.
+
+The Run Path has the following format `comet://<your workspace name>/<your project name>/<experiment id>`.
+
+This will restore the run to its state before the interruption, which includes restoring the model from a checkpoint, restoring all hyperparameters and training arguments and downloading Comet dataset Artifacts if they were used in the original run. The resumed run will continue logging to the existing Experiment in the Comet UI
+
+```shell
+python train.py \
+--resume "comet://<your run path>"
+```
+
+## Hyperparameter Search with the Comet Optimizer
+
+YOLOv5 is also integrated with Comet's Optimizer, making is simple to visualize hyperparameter sweeps in the Comet UI.
+
+### Configuring an Optimizer Sweep
+
+To configure the Comet Optimizer, you will have to create a JSON file with the information about the sweep. An example file has been provided in `utils/loggers/comet/optimizer_config.json`
+
+```shell
+python utils/loggers/comet/hpo.py \
+  --comet_optimizer_config "utils/loggers/comet/optimizer_config.json"
+```
+
+The `hpo.py` script accepts the same arguments as `train.py`. If you wish to pass additional arguments to your sweep simply add them after the script.
+
+```shell
+python utils/loggers/comet/hpo.py \
+  --comet_optimizer_config "utils/loggers/comet/optimizer_config.json" \
+  --save-period 1 \
+  --bbox_interval 1
+```
+
+### Running a Sweep in Parallel
+
+```shell
+comet optimizer -j <set number of workers> utils/loggers/comet/hpo.py \
+  utils/loggers/comet/optimizer_config.json"
+```
+
+### Visualizing Results
+
+Comet provides a number of ways to visualize the results of your sweep. Take a look at a [project with a completed sweep here](https://www.comet.com/examples/comet-example-yolov5/view/PrlArHGuuhDTKC1UuBmTtOSXD/panels?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)
+
+<img width="1626" alt="hyperparameter-yolo" src="https://user-images.githubusercontent.com/7529846/186914869-7dc1de14-583f-4323-967b-c9a66a29e495.png">
--- a/yolov5/utils/loggers/comet/init.py
+++ b/yolov5/utils/loggers/comet/init.py
@@ -0,0 +1,549 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+import glob
+import json
+import logging
+import os
+import sys
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+FILE = Path(__file__).resolve()
+ROOT = FILE.parents[3]  # YOLOv5 root directory
+if str(ROOT) not in sys.path:
+    sys.path.append(str(ROOT))  # add ROOT to PATH
+
+try:
+    import comet_ml
+
+    # Project Configuration
+    config = comet_ml.config.get_config()
+    COMET_PROJECT_NAME = config.get_string(os.getenv("COMET_PROJECT_NAME"), "comet.project_name", default="yolov5")
+except ImportError:
+    comet_ml = None
+    COMET_PROJECT_NAME = None
+
+import PIL
+import torch
+import torchvision.transforms as T
+import yaml
+
+from utils.dataloaders import img2label_paths
+from utils.general import check_dataset, scale_boxes, xywh2xyxy
+from utils.metrics import box_iou
+
+COMET_PREFIX = "comet://"
+
+COMET_MODE = os.getenv("COMET_MODE", "online")
+
+# Model Saving Settings
+COMET_MODEL_NAME = os.getenv("COMET_MODEL_NAME", "yolov5")
+
+# Dataset Artifact Settings
+COMET_UPLOAD_DATASET = os.getenv("COMET_UPLOAD_DATASET", "false").lower() == "true"
+
+# Evaluation Settings
+COMET_LOG_CONFUSION_MATRIX = os.getenv("COMET_LOG_CONFUSION_MATRIX", "true").lower() == "true"
+COMET_LOG_PREDICTIONS = os.getenv("COMET_LOG_PREDICTIONS", "true").lower() == "true"
+COMET_MAX_IMAGE_UPLOADS = int(os.getenv("COMET_MAX_IMAGE_UPLOADS", 100))
+
+# Confusion Matrix Settings
+CONF_THRES = float(os.getenv("CONF_THRES", 0.001))
+IOU_THRES = float(os.getenv("IOU_THRES", 0.6))
+
+# Batch Logging Settings
+COMET_LOG_BATCH_METRICS = os.getenv("COMET_LOG_BATCH_METRICS", "false").lower() == "true"
+COMET_BATCH_LOGGING_INTERVAL = os.getenv("COMET_BATCH_LOGGING_INTERVAL", 1)
+COMET_PREDICTION_LOGGING_INTERVAL = os.getenv("COMET_PREDICTION_LOGGING_INTERVAL", 1)
+COMET_LOG_PER_CLASS_METRICS = os.getenv("COMET_LOG_PER_CLASS_METRICS", "false").lower() == "true"
+
+RANK = int(os.getenv("RANK", -1))
+
+to_pil = T.ToPILImage()
+
+
+class CometLogger:
+    """Log metrics, parameters, source code, models and much more with Comet."""
+
+    def __init__(self, opt, hyp, run_id=None, job_type="Training", **experiment_kwargs) -> None:
+        """Initializes CometLogger with given options, hyperparameters, run ID, job type, and additional experiment
+        arguments.
+        """
+        self.job_type = job_type
+        self.opt = opt
+        self.hyp = hyp
+
+        # Comet Flags
+        self.comet_mode = COMET_MODE
+
+        self.save_model = opt.save_period > -1
+        self.model_name = COMET_MODEL_NAME
+
+        # Batch Logging Settings
+        self.log_batch_metrics = COMET_LOG_BATCH_METRICS
+        self.comet_log_batch_interval = COMET_BATCH_LOGGING_INTERVAL
+
+        # Dataset Artifact Settings
+        self.upload_dataset = self.opt.upload_dataset or COMET_UPLOAD_DATASET
+        self.resume = self.opt.resume
+
+        self.default_experiment_kwargs = {
+            "log_code": False,
+            "log_env_gpu": True,
+            "log_env_cpu": True,
+            "project_name": COMET_PROJECT_NAME,
+        } | experiment_kwargs
+        self.experiment = self._get_experiment(self.comet_mode, run_id)
+        self.experiment.set_name(self.opt.name)
+
+        self.data_dict = self.check_dataset(self.opt.data)
+        self.class_names = self.data_dict["names"]
+        self.num_classes = self.data_dict["nc"]
+
+        self.logged_images_count = 0
+        self.max_images = COMET_MAX_IMAGE_UPLOADS
+
+        if run_id is None:
+            self.experiment.log_other("Created from", "YOLOv5")
+            if not isinstance(self.experiment, comet_ml.OfflineExperiment):
+                workspace, project_name, experiment_id = self.experiment.url.split("/")[-3:]
+                self.experiment.log_other(
+                    "Run Path",
+                    f"{workspace}/{project_name}/{experiment_id}",
+                )
+            self.log_parameters(vars(opt))
+            self.log_parameters(self.opt.hyp)
+            self.log_asset_data(
+                self.opt.hyp,
+                name="hyperparameters.json",
+                metadata={"type": "hyp-config-file"},
+            )
+            self.log_asset(
+                f"{self.opt.save_dir}/opt.yaml",
+                metadata={"type": "opt-config-file"},
+            )
+
+        self.comet_log_confusion_matrix = COMET_LOG_CONFUSION_MATRIX
+
+        if hasattr(self.opt, "conf_thres"):
+            self.conf_thres = self.opt.conf_thres
+        else:
+            self.conf_thres = CONF_THRES
+        if hasattr(self.opt, "iou_thres"):
+            self.iou_thres = self.opt.iou_thres
+        else:
+            self.iou_thres = IOU_THRES
+
+        self.log_parameters({"val_iou_threshold": self.iou_thres, "val_conf_threshold": self.conf_thres})
+
+        self.comet_log_predictions = COMET_LOG_PREDICTIONS
+        if self.opt.bbox_interval == -1:
+            self.comet_log_prediction_interval = 1 if self.opt.epochs < 10 else self.opt.epochs // 10
+        else:
+            self.comet_log_prediction_interval = self.opt.bbox_interval
+
+        if self.comet_log_predictions:
+            self.metadata_dict = {}
+            self.logged_image_names = []
+
+        self.comet_log_per_class_metrics = COMET_LOG_PER_CLASS_METRICS
+
+        self.experiment.log_others(
+            {
+                "comet_mode": COMET_MODE,
+                "comet_max_image_uploads": COMET_MAX_IMAGE_UPLOADS,
+                "comet_log_per_class_metrics": COMET_LOG_PER_CLASS_METRICS,
+                "comet_log_batch_metrics": COMET_LOG_BATCH_METRICS,
+                "comet_log_confusion_matrix": COMET_LOG_CONFUSION_MATRIX,
+                "comet_model_name": COMET_MODEL_NAME,
+            }
+        )
+
+        # Check if running the Experiment with the Comet Optimizer
+        if hasattr(self.opt, "comet_optimizer_id"):
+            self.experiment.log_other("optimizer_id", self.opt.comet_optimizer_id)
+            self.experiment.log_other("optimizer_objective", self.opt.comet_optimizer_objective)
+            self.experiment.log_other("optimizer_metric", self.opt.comet_optimizer_metric)
+            self.experiment.log_other("optimizer_parameters", json.dumps(self.hyp))
+
+    def _get_experiment(self, mode, experiment_id=None):
+        """Returns a new or existing Comet.ml experiment based on mode and optional experiment_id."""
+        if mode == "offline":
+            return (
+                comet_ml.ExistingOfflineExperiment(
+                    previous_experiment=experiment_id,
+                    **self.default_experiment_kwargs,
+                )
+                if experiment_id is not None
+                else comet_ml.OfflineExperiment(
+                    **self.default_experiment_kwargs,
+                )
+            )
+        try:
+            if experiment_id is not None:
+                return comet_ml.ExistingExperiment(
+                    previous_experiment=experiment_id,
+                    **self.default_experiment_kwargs,
+                )
+
+            return comet_ml.Experiment(**self.default_experiment_kwargs)
+
+        except ValueError:
+            logger.warning(
+                "COMET WARNING: "
+                "Comet credentials have not been set. "
+                "Comet will default to offline logging. "
+                "Please set your credentials to enable online logging."
+            )
+            return self._get_experiment("offline", experiment_id)
+
+        return
+
+    def log_metrics(self, log_dict, **kwargs):
+        """Logs metrics to the current experiment, accepting a dictionary of metric names and values."""
+        self.experiment.log_metrics(log_dict, **kwargs)
+
+    def log_parameters(self, log_dict, **kwargs):
+        """Logs parameters to the current experiment, accepting a dictionary of parameter names and values."""
+        self.experiment.log_parameters(log_dict, **kwargs)
+
+    def log_asset(self, asset_path, **kwargs):
+        """Logs a file or directory as an asset to the current experiment."""
+        self.experiment.log_asset(asset_path, **kwargs)
+
+    def log_asset_data(self, asset, **kwargs):
+        """Logs in-memory data as an asset to the current experiment, with optional kwargs."""
+        self.experiment.log_asset_data(asset, **kwargs)
+
+    def log_image(self, img, **kwargs):
+        """Logs an image to the current experiment with optional kwargs."""
+        self.experiment.log_image(img, **kwargs)
+
+    def log_model(self, path, opt, epoch, fitness_score, best_model=False):
+        """Logs model checkpoint to experiment with path, options, epoch, fitness, and best model flag."""
+        if not self.save_model:
+            return
+
+        model_metadata = {
+            "fitness_score": fitness_score[-1],
+            "epochs_trained": epoch + 1,
+            "save_period": opt.save_period,
+            "total_epochs": opt.epochs,
+        }
+
+        model_files = glob.glob(f"{path}/*.pt")
+        for model_path in model_files:
+            name = Path(model_path).name
+
+            self.experiment.log_model(
+                self.model_name,
+                file_or_folder=model_path,
+                file_name=name,
+                metadata=model_metadata,
+                overwrite=True,
+            )
+
+    def check_dataset(self, data_file):
+        """Validates the dataset configuration by loading the YAML file specified in `data_file`."""
+        with open(data_file) as f:
+            data_config = yaml.safe_load(f)
+
+        path = data_config.get("path")
+        if path and path.startswith(COMET_PREFIX):
+            path = data_config["path"].replace(COMET_PREFIX, "")
+            return self.download_dataset_artifact(path)
+        self.log_asset(self.opt.data, metadata={"type": "data-config-file"})
+
+        return check_dataset(data_file)
+
+    def log_predictions(self, image, labelsn, path, shape, predn):
+        """Logs predictions with IOU filtering, given image, labels, path, shape, and predictions."""
+        if self.logged_images_count >= self.max_images:
+            return
+        detections = predn[predn[:, 4] > self.conf_thres]
+        iou = box_iou(labelsn[:, 1:], detections[:, :4])
+        mask, _ = torch.where(iou > self.iou_thres)
+        if len(mask) == 0:
+            return
+
+        filtered_detections = detections[mask]
+        filtered_labels = labelsn[mask]
+
+        image_id = path.split("/")[-1].split(".")[0]
+        image_name = f"{image_id}_curr_epoch_{self.experiment.curr_epoch}"
+        if image_name not in self.logged_image_names:
+            native_scale_image = PIL.Image.open(path)
+            self.log_image(native_scale_image, name=image_name)
+            self.logged_image_names.append(image_name)
+
+        metadata = [
+            {
+                "label": f"{self.class_names[int(cls)]}-gt",
+                "score": 100,
+                "box": {"x": xyxy[0], "y": xyxy[1], "x2": xyxy[2], "y2": xyxy[3]},
+            }
+            for cls, *xyxy in filtered_labels.tolist()
+        ]
+        metadata.extend(
+            {
+                "label": f"{self.class_names[int(cls)]}",
+                "score": conf * 100,
+                "box": {"x": xyxy[0], "y": xyxy[1], "x2": xyxy[2], "y2": xyxy[3]},
+            }
+            for *xyxy, conf, cls in filtered_detections.tolist()
+        )
+        self.metadata_dict[image_name] = metadata
+        self.logged_images_count += 1
+
+        return
+
+    def preprocess_prediction(self, image, labels, shape, pred):
+        """Processes prediction data, resizing labels and adding dataset metadata."""
+        nl, _ = labels.shape[0], pred.shape[0]
+
+        # Predictions
+        if self.opt.single_cls:
+            pred[:, 5] = 0
+
+        predn = pred.clone()
+        scale_boxes(image.shape[1:], predn[:, :4], shape[0], shape[1])
+
+        labelsn = None
+        if nl:
+            tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
+            scale_boxes(image.shape[1:], tbox, shape[0], shape[1])  # native-space labels
+            labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
+            scale_boxes(image.shape[1:], predn[:, :4], shape[0], shape[1])  # native-space pred
+
+        return predn, labelsn
+
+    def add_assets_to_artifact(self, artifact, path, asset_path, split):
+        """Adds image and label assets to a wandb artifact given dataset split and paths."""
+        img_paths = sorted(glob.glob(f"{asset_path}/*"))
+        label_paths = img2label_paths(img_paths)
+
+        for image_file, label_file in zip(img_paths, label_paths):
+            image_logical_path, label_logical_path = map(lambda x: os.path.relpath(x, path), [image_file, label_file])
+
+            try:
+                artifact.add(
+                    image_file,
+                    logical_path=image_logical_path,
+                    metadata={"split": split},
+                )
+                artifact.add(
+                    label_file,
+                    logical_path=label_logical_path,
+                    metadata={"split": split},
+                )
+            except ValueError as e:
+                logger.error("COMET ERROR: Error adding file to Artifact. Skipping file.")
+                logger.error(f"COMET ERROR: {e}")
+                continue
+
+        return artifact
+
+    def upload_dataset_artifact(self):
+        """Uploads a YOLOv5 dataset as an artifact to the Comet.ml platform."""
+        dataset_name = self.data_dict.get("dataset_name", "yolov5-dataset")
+        path = str((ROOT / Path(self.data_dict["path"])).resolve())
+
+        metadata = self.data_dict.copy()
+        for key in ["train", "val", "test"]:
+            split_path = metadata.get(key)
+            if split_path is not None:
+                metadata[key] = split_path.replace(path, "")
+
+        artifact = comet_ml.Artifact(name=dataset_name, artifact_type="dataset", metadata=metadata)
+        for key in metadata.keys():
+            if key in ["train", "val", "test"]:
+                if isinstance(self.upload_dataset, str) and (key != self.upload_dataset):
+                    continue
+
+                asset_path = self.data_dict.get(key)
+                if asset_path is not None:
+                    artifact = self.add_assets_to_artifact(artifact, path, asset_path, key)
+
+        self.experiment.log_artifact(artifact)
+
+        return
+
+    def download_dataset_artifact(self, artifact_path):
+        """Downloads a dataset artifact to a specified directory using the experiment's logged artifact."""
+        logged_artifact = self.experiment.get_artifact(artifact_path)
+        artifact_save_dir = str(Path(self.opt.save_dir) / logged_artifact.name)
+        logged_artifact.download(artifact_save_dir)
+
+        metadata = logged_artifact.metadata
+        data_dict = metadata.copy()
+        data_dict["path"] = artifact_save_dir
+
+        metadata_names = metadata.get("names")
+        if isinstance(metadata_names, dict):
+            data_dict["names"] = {int(k): v for k, v in metadata.get("names").items()}
+        elif isinstance(metadata_names, list):
+            data_dict["names"] = {int(k): v for k, v in zip(range(len(metadata_names)), metadata_names)}
+        else:
+            raise "Invalid 'names' field in dataset yaml file. Please use a list or dictionary"
+
+        return self.update_data_paths(data_dict)
+
+    def update_data_paths(self, data_dict):
+        """Updates data paths in the dataset dictionary, defaulting 'path' to an empty string if not present."""
+        path = data_dict.get("path", "")
+
+        for split in ["train", "val", "test"]:
+            if data_dict.get(split):
+                split_path = data_dict.get(split)
+                data_dict[split] = (
+                    f"{path}/{split_path}" if isinstance(split, str) else [f"{path}/{x}" for x in split_path]
+                )
+
+        return data_dict
+
+    def on_pretrain_routine_end(self, paths):
+        """Called at the end of pretraining routine to handle paths if training is not being resumed."""
+        if self.opt.resume:
+            return
+
+        for path in paths:
+            self.log_asset(str(path))
+
+        if self.upload_dataset and not self.resume:
+            self.upload_dataset_artifact()
+
+        return
+
+    def on_train_start(self):
+        """Logs hyperparameters at the start of training."""
+        self.log_parameters(self.hyp)
+
+    def on_train_epoch_start(self):
+        """Called at the start of each training epoch."""
+        return
+
+    def on_train_epoch_end(self, epoch):
+        """Updates the current epoch in the experiment tracking at the end of each epoch."""
+        self.experiment.curr_epoch = epoch
+
+        return
+
+    def on_train_batch_start(self):
+        """Called at the start of each training batch."""
+        return
+
+    def on_train_batch_end(self, log_dict, step):
+        """Callback function that updates and logs metrics at the end of each training batch if conditions are met."""
+        self.experiment.curr_step = step
+        if self.log_batch_metrics and (step % self.comet_log_batch_interval == 0):
+            self.log_metrics(log_dict, step=step)
+
+        return
+
+    def on_train_end(self, files, save_dir, last, best, epoch, results):
+        """Logs metadata and optionally saves model files at the end of training."""
+        if self.comet_log_predictions:
+            curr_epoch = self.experiment.curr_epoch
+            self.experiment.log_asset_data(self.metadata_dict, "image-metadata.json", epoch=curr_epoch)
+
+        for f in files:
+            self.log_asset(f, metadata={"epoch": epoch})
+        self.log_asset(f"{save_dir}/results.csv", metadata={"epoch": epoch})
+
+        if not self.opt.evolve:
+            model_path = str(best if best.exists() else last)
+            name = Path(model_path).name
+            if self.save_model:
+                self.experiment.log_model(
+                    self.model_name,
+                    file_or_folder=model_path,
+                    file_name=name,
+                    overwrite=True,
+                )
+
+        # Check if running Experiment with Comet Optimizer
+        if hasattr(self.opt, "comet_optimizer_id"):
+            metric = results.get(self.opt.comet_optimizer_metric)
+            self.experiment.log_other("optimizer_metric_value", metric)
+
+        self.finish_run()
+
+    def on_val_start(self):
+        """Called at the start of validation, currently a placeholder with no functionality."""
+        return
+
+    def on_val_batch_start(self):
+        """Placeholder called at the start of a validation batch with no current functionality."""
+        return
+
+    def on_val_batch_end(self, batch_i, images, targets, paths, shapes, outputs):
+        """Callback executed at the end of a validation batch, conditionally logs predictions to Comet ML."""
+        if not (self.comet_log_predictions and ((batch_i + 1) % self.comet_log_prediction_interval == 0)):
+            return
+
+        for si, pred in enumerate(outputs):
+            if len(pred) == 0:
+                continue
+
+            image = images[si]
+            labels = targets[targets[:, 0] == si, 1:]
+            shape = shapes[si]
+            path = paths[si]
+            predn, labelsn = self.preprocess_prediction(image, labels, shape, pred)
+            if labelsn is not None:
+                self.log_predictions(image, labelsn, path, shape, predn)
+
+        return
+
+    def on_val_end(self, nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix):
+        """Logs per-class metrics to Comet.ml after validation if enabled and more than one class exists."""
+        if self.comet_log_per_class_metrics and self.num_classes > 1:
+            for i, c in enumerate(ap_class):
+                class_name = self.class_names[c]
+                self.experiment.log_metrics(
+                    {
+                        "mAP@.5": ap50[i],
+                        "mAP@.5:.95": ap[i],
+                        "precision": p[i],
+                        "recall": r[i],
+                        "f1": f1[i],
+                        "true_positives": tp[i],
+                        "false_positives": fp[i],
+                        "support": nt[c],
+                    },
+                    prefix=class_name,
+                )
+
+        if self.comet_log_confusion_matrix:
+            epoch = self.experiment.curr_epoch
+            class_names = list(self.class_names.values())
+            class_names.append("background")
+            num_classes = len(class_names)
+
+            self.experiment.log_confusion_matrix(
+                matrix=confusion_matrix.matrix,
+                max_categories=num_classes,
+                labels=class_names,
+                epoch=epoch,
+                column_label="Actual Category",
+                row_label="Predicted Category",
+                file_name=f"confusion-matrix-epoch-{epoch}.json",
+            )
+
+    def on_fit_epoch_end(self, result, epoch):
+        """Logs metrics at the end of each training epoch."""
+        self.log_metrics(result, epoch=epoch)
+
+    def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):
+        """Callback to save model checkpoints periodically if conditions are met."""
+        if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:
+            self.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)
+
+    def on_params_update(self, params):
+        """Logs updated parameters during training."""
+        self.log_parameters(params)
+
+    def finish_run(self):
+        """Ends the current experiment and logs its completion."""
+        self.experiment.end()
--- a/yolov5/utils/loggers/comet/comet_utils.py
+++ b/yolov5/utils/loggers/comet/comet_utils.py
@@ -0,0 +1,151 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+import logging
+import os
+from urllib.parse import urlparse
+
+try:
+    import comet_ml
+except ImportError:
+    comet_ml = None
+
+import yaml
+
+logger = logging.getLogger(__name__)
+
+COMET_PREFIX = "comet://"
+COMET_MODEL_NAME = os.getenv("COMET_MODEL_NAME", "yolov5")
+COMET_DEFAULT_CHECKPOINT_FILENAME = os.getenv("COMET_DEFAULT_CHECKPOINT_FILENAME", "last.pt")
+
+
+def download_model_checkpoint(opt, experiment):
+    """Downloads YOLOv5 model checkpoint from Comet ML experiment, updating `opt.weights` with download path."""
+    model_dir = f"{opt.project}/{experiment.name}"
+    os.makedirs(model_dir, exist_ok=True)
+
+    model_name = COMET_MODEL_NAME
+    model_asset_list = experiment.get_model_asset_list(model_name)
+
+    if len(model_asset_list) == 0:
+        logger.error(f"COMET ERROR: No checkpoints found for model name : {model_name}")
+        return
+
+    model_asset_list = sorted(
+        model_asset_list,
+        key=lambda x: x["step"],
+        reverse=True,
+    )
+    logged_checkpoint_map = {asset["fileName"]: asset["assetId"] for asset in model_asset_list}
+
+    resource_url = urlparse(opt.weights)
+    checkpoint_filename = resource_url.query
+
+    if checkpoint_filename:
+        asset_id = logged_checkpoint_map.get(checkpoint_filename)
+    else:
+        asset_id = logged_checkpoint_map.get(COMET_DEFAULT_CHECKPOINT_FILENAME)
+        checkpoint_filename = COMET_DEFAULT_CHECKPOINT_FILENAME
+
+    if asset_id is None:
+        logger.error(f"COMET ERROR: Checkpoint {checkpoint_filename} not found in the given Experiment")
+        return
+
+    try:
+        logger.info(f"COMET INFO: Downloading checkpoint {checkpoint_filename}")
+        asset_filename = checkpoint_filename
+
+        model_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
+        model_download_path = f"{model_dir}/{asset_filename}"
+        with open(model_download_path, "wb") as f:
+            f.write(model_binary)
+
+        opt.weights = model_download_path
+
+    except Exception as e:
+        logger.warning("COMET WARNING: Unable to download checkpoint from Comet")
+        logger.exception(e)
+
+
+def set_opt_parameters(opt, experiment):
+    """
+    Update the opts Namespace with parameters from Comet's ExistingExperiment when resuming a run.
+
+    Args:
+        opt (argparse.Namespace): Namespace of command line options
+        experiment (comet_ml.APIExperiment): Comet API Experiment object
+    """
+    asset_list = experiment.get_asset_list()
+    resume_string = opt.resume
+
+    for asset in asset_list:
+        if asset["fileName"] == "opt.yaml":
+            asset_id = asset["assetId"]
+            asset_binary = experiment.get_asset(asset_id, return_type="binary", stream=False)
+            opt_dict = yaml.safe_load(asset_binary)
+            for key, value in opt_dict.items():
+                setattr(opt, key, value)
+            opt.resume = resume_string
+
+    # Save hyperparameters to YAML file
+    # Necessary to pass checks in training script
+    save_dir = f"{opt.project}/{experiment.name}"
+    os.makedirs(save_dir, exist_ok=True)
+
+    hyp_yaml_path = f"{save_dir}/hyp.yaml"
+    with open(hyp_yaml_path, "w") as f:
+        yaml.dump(opt.hyp, f)
+    opt.hyp = hyp_yaml_path
+
+
+def check_comet_weights(opt):
+    """
+    Downloads model weights from Comet and updates the weights path to point to saved weights location.
+
+    Args:
+        opt (argparse.Namespace): Command Line arguments passed
+            to YOLOv5 training script
+
+    Returns:
+        None/bool: Return True if weights are successfully downloaded
+            else return None
+    """
+    if comet_ml is None:
+        return
+
+    if isinstance(opt.weights, str) and opt.weights.startswith(COMET_PREFIX):
+        api = comet_ml.API()
+        resource = urlparse(opt.weights)
+        experiment_path = f"{resource.netloc}{resource.path}"
+        experiment = api.get(experiment_path)
+        download_model_checkpoint(opt, experiment)
+        return True
+
+    return None
+
+
+def check_comet_resume(opt):
+    """
+    Restores run parameters to its original state based on the model checkpoint and logged Experiment parameters.
+
+    Args:
+        opt (argparse.Namespace): Command Line arguments passed
+            to YOLOv5 training script
+
+    Returns:
+        None/bool: Return True if the run is restored successfully
+            else return None
+    """
+    if comet_ml is None:
+        return
+
+    if isinstance(opt.resume, str) and opt.resume.startswith(COMET_PREFIX):
+        api = comet_ml.API()
+        resource = urlparse(opt.resume)
+        experiment_path = f"{resource.netloc}{resource.path}"
+        experiment = api.get(experiment_path)
+        set_opt_parameters(opt, experiment)
+        download_model_checkpoint(opt, experiment)
+
+        return True
+
+    return None
--- a/yolov5/utils/loggers/comet/hpo.py
+++ b/yolov5/utils/loggers/comet/hpo.py
@@ -0,0 +1,126 @@
+# Ultralytics 🚀 AGPL-3.0 License - https://ultralytics.com/license
+
+import argparse
+import json
+import logging
+import os
+import sys
+from pathlib import Path
+
+import comet_ml
+
+logger = logging.getLogger(__name__)
+
+FILE = Path(__file__).resolve()
+ROOT = FILE.parents[3]  # YOLOv5 root directory
+if str(ROOT) not in sys.path:
+    sys.path.append(str(ROOT))  # add ROOT to PATH
+
+from train import train
+from utils.callbacks import Callbacks
+from utils.general import increment_path
+from utils.torch_utils import select_device
+
+# Project Configuration
+config = comet_ml.config.get_config()
+COMET_PROJECT_NAME = config.get_string(os.getenv("COMET_PROJECT_NAME"), "comet.project_name", default="yolov5")
+
+
+def get_args(known=False):
+    """Parses command-line arguments for YOLOv5 training, supporting configuration of weights, data paths,
+    hyperparameters, and more.
+    """
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--weights", type=str, default=ROOT / "yolov5s.pt", help="initial weights path")
+    parser.add_argument("--cfg", type=str, default="", help="model.yaml path")
+    parser.add_argument("--data", type=str, default=ROOT / "data/coco128.yaml", help="dataset.yaml path")
+    parser.add_argument("--hyp", type=str, default=ROOT / "data/hyps/hyp.scratch-low.yaml", help="hyperparameters path")
+    parser.add_argument("--epochs", type=int, default=300, help="total training epochs")
+    parser.add_argument("--batch-size", type=int, default=16, help="total batch size for all GPUs, -1 for autobatch")
+    parser.add_argument("--imgsz", "--img", "--img-size", type=int, default=640, help="train, val image size (pixels)")
+    parser.add_argument("--rect", action="store_true", help="rectangular training")
+    parser.add_argument("--resume", nargs="?", const=True, default=False, help="resume most recent training")
+    parser.add_argument("--nosave", action="store_true", help="only save final checkpoint")
+    parser.add_argument("--noval", action="store_true", help="only validate final epoch")
+    parser.add_argument("--noautoanchor", action="store_true", help="disable AutoAnchor")
+    parser.add_argument("--noplots", action="store_true", help="save no plot files")
+    parser.add_argument("--evolve", type=int, nargs="?", const=300, help="evolve hyperparameters for x generations")
+    parser.add_argument("--bucket", type=str, default="", help="gsutil bucket")
+    parser.add_argument("--cache", type=str, nargs="?", const="ram", help='--cache images in "ram" (default) or "disk"')
+    parser.add_argument("--image-weights", action="store_true", help="use weighted image selection for training")
+    parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
+    parser.add_argument("--multi-scale", action="store_true", help="vary img-size +/- 50%%")
+    parser.add_argument("--single-cls", action="store_true", help="train multi-class data as single-class")
+    parser.add_argument("--optimizer", type=str, choices=["SGD", "Adam", "AdamW"], default="SGD", help="optimizer")
+    parser.add_argument("--sync-bn", action="store_true", help="use SyncBatchNorm, only available in DDP mode")
+    parser.add_argument("--workers", type=int, default=8, help="max dataloader workers (per RANK in DDP mode)")
+    parser.add_argument("--project", default=ROOT / "runs/train", help="save to project/name")
+    parser.add_argument("--name", default="exp", help="save to project/name")
+    parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
+    parser.add_argument("--quad", action="store_true", help="quad dataloader")
+    parser.add_argument("--cos-lr", action="store_true", help="cosine LR scheduler")
+    parser.add_argument("--label-smoothing", type=float, default=0.0, help="Label smoothing epsilon")
+    parser.add_argument("--patience", type=int, default=100, help="EarlyStopping patience (epochs without improvement)")
+    parser.add_argument("--freeze", nargs="+", type=int, default=[0], help="Freeze layers: backbone=10, first3=0 1 2")
+    parser.add_argument("--save-period", type=int, default=-1, help="Save checkpoint every x epochs (disabled if < 1)")
+    parser.add_argument("--seed", type=int, default=0, help="Global training seed")
+    parser.add_argument("--local_rank", type=int, default=-1, help="Automatic DDP Multi-GPU argument, do not modify")
+
+    # Weights & Biases arguments
+    parser.add_argument("--entity", default=None, help="W&B: Entity")
+    parser.add_argument("--upload_dataset", nargs="?", const=True, default=False, help='W&B: Upload data, "val" option')
+    parser.add_argument("--bbox_interval", type=int, default=-1, help="W&B: Set bounding-box image logging interval")
+    parser.add_argument("--artifact_alias", type=str, default="latest", help="W&B: Version of dataset artifact to use")
+
+    # Comet Arguments
+    parser.add_argument("--comet_optimizer_config", type=str, help="Comet: Path to a Comet Optimizer Config File.")
+    parser.add_argument("--comet_optimizer_id", type=str, help="Comet: ID of the Comet Optimizer sweep.")
+    parser.add_argument("--comet_optimizer_objective", type=str, help="Comet: Set to 'minimize' or 'maximize'.")
+    parser.add_argument("--comet_optimizer_metric", type=str, help="Comet: Metric to Optimize.")
+    parser.add_argument(
+        "--comet_optimizer_workers",
+        type=int,
+        default=1,
+        help="Comet: Number of Parallel Workers to use with the Comet Optimizer.",
+    )
+
+    return parser.parse_known_args()[0] if known else parser.parse_args()
+
+
+def run(parameters, opt):
+    """Executes YOLOv5 training with given hyperparameters and options, setting up device and training directories."""
+    hyp_dict = {k: v for k, v in parameters.items() if k not in ["epochs", "batch_size"]}
+
+    opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok or opt.evolve))
+    opt.batch_size = parameters.get("batch_size")
+    opt.epochs = parameters.get("epochs")
+
+    device = select_device(opt.device, batch_size=opt.batch_size)
+    train(hyp_dict, opt, device, callbacks=Callbacks())
+
+
+if __name__ == "__main__":
+    opt = get_args(known=True)
+
+    opt.weights = str(opt.weights)
+    opt.cfg = str(opt.cfg)
+    opt.data = str(opt.data)
+    opt.project = str(opt.project)
+
+    optimizer_id = os.getenv("COMET_OPTIMIZER_ID")
+    if optimizer_id is None:
+        with open(opt.comet_optimizer_config) as f:
+            optimizer_config = json.load(f)
+        optimizer = comet_ml.Optimizer(optimizer_config)
+    else:
+        optimizer = comet_ml.Optimizer(optimizer_id)
+
+    opt.comet_optimizer_id = optimizer.id
+    status = optimizer.status()
+
+    opt.comet_optimizer_objective = status["spec"]["objective"]
+    opt.comet_optimizer_metric = status["spec"]["metric"]
+
+    logger.info("COMET INFO: Starting Hyperparameter Sweep")
+    for parameter in optimizer.get_parameters():
+        run(parameter["parameters"], opt)