Sample Train Config

class ESDConfig(BaseConfig):

    def __init__(self, **kwargs):
        # Training parameters
        self.train_method = "xattn"  # Choices: ["noxattn", "selfattn", "xattn", "full", "notime", "xlayer", "selflayer"]
        self.start_guidance = (
            0.1  # Optional: guidance of start image (previously alpha)
        )
        self.negative_guidance = 0.0  # Optional: guidance of negative training
        self.iterations = 1  # Optional: iterations used to train (previously epochs)
        self.lr = 1e-5  # Optional: learning rate
        self.image_size = 512  # Optional: image size used to train
        self.ddim_steps = 50  # Optional: DDIM steps of inference

        # Model configuration
        self.model_config_path = current_dir / "model_config.yaml"
        self.ckpt_path = "models/compvis/style50/compvis.ckpt"  # Checkpoint path for Stable Diffusion

        # Dataset directories
        self.raw_dataset_dir = "data/quick-canvas-dataset/sample"
        self.processed_dataset_dir = "mu/algorithms/esd/data"
        self.dataset_type = "unlearncanvas"  # Choices: ['unlearncanvas', 'i2p']
        self.template = "style"  # Choices: ['object', 'style', 'i2p']
        self.template_name = (
            "Abstractionism"  # Choices: ['self-harm', 'Abstractionism']
        )

        # Output configurations
        self.output_dir = "outputs/esd/finetuned_models"
        self.separator = None

        # Device configuration
        self.devices = "0,0"
        self.use_sample = True

        # For backward compatibility
        self.interpolation = "bicubic"  # Interpolation method
        self.ddim_eta = 0.0  # Eta for DDIM
        self.num_workers = 4  # Number of workers for data loading
        self.pin_memory = True  # Pin memory for faster transfer to GPU

Sample Model Config

model:
  base_learning_rate: 1.0e-04
  target: stable_diffusion.ldm.models.diffusion.ddpm.LatentDiffusion
  params:
    linear_start: 0.00085
    linear_end: 0.0120
    num_timesteps_cond: 1
    log_every_t: 200
    timesteps: 1000
    first_stage_key: "jpg"
    cond_stage_key: "txt"
    image_size: 32
    channels: 4
    cond_stage_trainable: false   # Note: different from the one we trained before
    conditioning_key: crossattn
    monitor: val/loss_simple_ema
    scale_factor: 0.18215
    use_ema: False

    scheduler_config: # 10000 warmup steps
      target: stable_diffusion.ldm.lr_scheduler.LambdaLinearScheduler
      params:
        warm_up_steps: [ 10000 ]
        cycle_lengths: [ 10000000000000 ] # incredibly large number to prevent corner cases
        f_start: [ 1.e-6 ]
        f_max: [ 1. ]
        f_min: [ 1. ]

    unet_config:
      target: stable_diffusion.ldm.modules.diffusionmodules.openaimodel.UNetModel
      params:
        image_size: 32 # unused
        in_channels: 4
        out_channels: 4
        model_channels: 320
        attention_resolutions: [ 4, 2, 1 ]
        num_res_blocks: 2
        channel_mult: [ 1, 2, 4, 4 ]
        num_heads: 8
        use_spatial_transformer: True
        transformer_depth: 1
        context_dim: 768
        use_checkpoint: True
        legacy: False

    first_stage_config:
      target: stable_diffusion.ldm.models.autoencoder.AutoencoderKL
      params:
        embed_dim: 4
        monitor: val/rec_loss
        ddconfig:
          double_z: true
          z_channels: 4
          resolution: 256
          in_channels: 3
          out_ch: 3
          ch: 128
          ch_mult:
          - 1
          - 2
          - 4
          - 4
          num_res_blocks: 2
          attn_resolutions: []
          dropout: 0.0
        lossconfig:
          target: torch.nn.Identity

    cond_stage_config:
      target: stable_diffusion.ldm.modules.encoders.modules.FrozenCLIPEmbedder

Description of arguments being used in train_config class

These are the configuration used for training a Stable Diffusion model using the ESD (Erase Stable Diffusion) method. It defines various parameters related to training, model setup, dataset handling, and output configuration. Below is a detailed description of each section and parameter:

Training Parameters

These parameters control the fine-tuning process, including the method of training, guidance scales, learning rate, and iteration settings.

  • train_method: Specifies the method of training to decide which parts of the model to update.

    • Type: str
    • Choices: noxattn, selfattn, xattn, full, notime, xlayer, selflayer
    • Example: xattn
  • start_guidance: Guidance scale for generating initial images during training. Affects the diversity of the training set.

    • Type: float
    • Example: 0.1
  • negative_guidance: Guidance scale for erasing the target concept during training.

    • Type: float
    • Example: 0.0
  • iterations: Number of training iterations (similar to epochs).

    • Type: int
    • Example: 1
  • lr: Learning rate used by the optimizer for fine-tuning.

    • Type: float
    • Example: 5e-5
  • image_size: Size of images used during training and sampling (in pixels).

    • Type: int
    • Example: 512
  • ddim_steps: Number of diffusion steps used in the DDIM sampling process.

    • Type: int
    • Example: 50

Model Configuration

These parameters specify the Stable Diffusion model checkpoint and configuration file.

  • model_config_path: Path to the YAML file defining the model architecture and parameters.

    • Type: str
    • Example: mu/algorithms/esd/configs/model_config.yaml
  • ckpt_path: Path to the finetuned Stable Diffusion model checkpoint.

    • Type: str
    • Example: '../models/compvis/style50/compvis.ckpt'

Dataset Configuration

These parameters define the dataset type and template for training, specifying whether to focus on objects, styles, or inappropriate content.

  • dataset_type: Type of dataset used for training. Use generic as type if you want to use your own dataset.

    • Type: str
    • Choices: unlearncanvas, i2p, generic
    • Example: unlearncanvas
  • template: Type of concept or style to erase during training.

    • Type: str
    • Choices: object, style, i2p
    • Example: style
  • template_name: Specific name of the object or style to erase (e.g., "Abstractionism").

    • Type: str
    • Example Choices: Abstractionism, self-harm
    • Example: Abstractionism

Output Configuration

These parameters control where the outputs of the training process, such as fine-tuned models, are stored.

  • output_dir: Directory where the fine-tuned model and training results will be saved.

    • Type: str
    • Example: outputs/esd/finetuned_models
  • separator: Separator character used to handle multiple prompts during training. If set to null, no special handling occurs.

    • Type: str or null
    • Example: null

Device Configuration

These parameters define the compute resources for training.

  • devices: Specifies the CUDA devices used for training. Provide a comma-separated list of device IDs.

    • Type: str
    • Example: 0,1
  • use_sample: Boolean flag indicating whether to use a sample dataset for testing or debugging.

    • Type: bool
    • Example: True