segmind/SSD-1B · Hugging Face
Demo
Check out the mannequin at Segmind SSD-1B for ⚡ quickest inference. You may as well strive it on 🤗 Spaces
Mannequin Description
The Segmind Steady Diffusion Mannequin (SSD-1B) is a distilled 50% smaller model of the Steady Diffusion XL (SDXL), providing a 60% speedup whereas sustaining high-quality text-to-image era capabilities. It has been educated on various datasets, together with Grit and Midjourney scrape information, to boost its potential to create a variety of visible content material based mostly on textual prompts.
This mannequin employs a data distillation technique, the place it leverages the teachings of a number of professional fashions in succession, together with SDXL, ZavyChromaXL, and JuggernautXL, to mix their strengths and produce spectacular visible outputs.
Particular due to the HF workforce 🤗 particularly Sayak, Patrick and Poli for his or her collaboration and steering on this work.
Picture Comparision (SDXL-1.0 vs SSD-1B)
Utilization:
This mannequin can be utilized by way of the 🧨 Diffusers library.
Be sure to put in diffusers from supply by working
pip set up git+https://github.com/huggingface/diffusers
As well as, please set up transformers
, safetensors
and speed up
:
pip set up transformers speed up safetensors
To make use of the mannequin, you may run the next:
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained("segmind/SSD-1B", torch_dtype=torch.float16, use_safetensors=True, variant="fp16")
pipe.to("cuda")
immediate = "An astronaut driving a inexperienced horse"
neg_prompt = "ugly, blurry, poor high quality"
picture = pipe(immediate=immediate, negative_prompt=neg_prompt).photos[0]
Please do use unfavourable prompting, and a CFG round 9.0 for the very best quality!
Mannequin Description
Key Options
-
Textual content-to-Picture Technology: The mannequin excels at producing photos from textual content prompts, enabling a variety of artistic functions.
-
Distilled for Pace: Designed for effectivity, this mannequin affords a 60% speedup, making it a sensible selection for real-time functions and eventualities the place speedy picture era is important.
-
Numerous Coaching Knowledge: Skilled on various datasets, the mannequin can deal with quite a lot of textual prompts and generate corresponding photos successfully.
-
Data Distillation: By distilling data from a number of professional fashions, the Segmind Steady Diffusion Mannequin combines their strengths and minimizes their limitations, leading to improved efficiency.
Mannequin Structure
The SSD-1B Mannequin is a 1.3B Parameter Mannequin which has a number of layers faraway from the Base SDXL Mannequin
Coaching data
These are the important thing hyperparameters used throughout coaching:
- Steps: 251000
- Studying charge: 1e-5
- Batch measurement: 32
- Gradient accumulation steps: 4
- Picture decision: 1024
- Combined-precision: fp16
Pace Comparision
We now have noticed that SSD-1B is upto 60% sooner than the Base SDXL Mannequin. Under is a comparision on an A100 80GB.
Under are the velocity up metrics on a RTX 4090 GPU.
Mannequin Sources
For analysis and improvement functions, the SSD-1B Mannequin could be accessed by way of the Segmind AI platform. For extra data and entry particulars, please go to Segmind.
Makes use of
Direct Use
The Segmind Steady Diffusion Mannequin is appropriate for analysis and sensible functions in numerous domains, together with:
-
Artwork and Design: It may be used to generate artworks, designs, and different artistic content material, offering inspiration and enhancing the artistic course of.
-
Training: The mannequin could be utilized in academic instruments to create visible content material for educating and studying functions.
-
Analysis: Researchers can use the mannequin to discover generative fashions, consider its efficiency, and push the boundaries of text-to-image era.
-
Protected Content material Technology: It affords a protected and managed technique to generate content material, decreasing the chance of dangerous or inappropriate outputs.
-
Bias and Limitation Evaluation: Researchers and builders can use the mannequin to probe its limitations and biases, contributing to a greater understanding of generative fashions’ conduct.
Downstream Use
The Segmind Steady Diffusion Mannequin may also be used immediately with the 🧨 Diffusers library coaching scripts for additional coaching, together with:
export MODEL_NAME="segmind/SSD-1B"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
speed up launch train_text_to_image_lora_sdxl.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_model_name_or_path=$VAE_NAME
--dataset_name=$DATASET_NAME --caption_column="textual content"
--resolution=1024 --random_flip
--train_batch_size=1
--num_train_epochs=2 --checkpointing_steps=500
--learning_rate=1e-04 --lr_scheduler="fixed" --lr_warmup_steps=0
--mixed_precision="fp16"
--seed=42
--output_dir="sd-pokemon-model-lora-sdxl"
--validation_prompt="cute dragon creature" --report_to="wandb"
--push_to_hub
export MODEL_NAME="segmind/SSD-1B"
export VAE_NAME="madebyollin/sdxl-vae-fp16-fix"
export DATASET_NAME="lambdalabs/pokemon-blip-captions"
speed up launch train_text_to_image_sdxl.py
--pretrained_model_name_or_path=$MODEL_NAME
--pretrained_vae_model_name_or_path=$VAE_NAME
--dataset_name=$DATASET_NAME
--enable_xformers_memory_efficient_attention
--resolution=512 --center_crop --random_flip
--proportion_empty_prompts=0.2
--train_batch_size=1
--gradient_accumulation_steps=4 --gradient_checkpointing
--max_train_steps=10000
--use_8bit_adam
--learning_rate=1e-06 --lr_scheduler="fixed" --lr_warmup_steps=0
--mixed_precision="fp16"
--report_to="wandb"
--validation_prompt="a cute Sundar Pichai creature" --validation_epochs 5
--checkpointing_steps=5000
--output_dir="sdxl-pokemon-model"
--push_to_hub
export MODEL_NAME="segmind/SSD-1B"
export INSTANCE_DIR="canine"
export OUTPUT_DIR="lora-trained-xl"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
speed up launch train_dreambooth_lora_sdxl.py
--pretrained_model_name_or_path=$MODEL_NAME
--instance_data_dir=$INSTANCE_DIR
--pretrained_vae_model_name_or_path=$VAE_PATH
--output_dir=$OUTPUT_DIR
--mixed_precision="fp16"
--instance_prompt="a photograph of sks canine"
--resolution=1024
--train_batch_size=1
--gradient_accumulation_steps=4
--learning_rate=1e-5
--report_to="wandb"
--lr_scheduler="fixed"
--lr_warmup_steps=0
--max_train_steps=500
--validation_prompt="A photograph of sks canine in a bucket"
--validation_epochs=25
--seed="0"
--push_to_hub
Out-of-Scope Use
The SSD-1B Mannequin just isn’t appropriate for creating factual or correct representations of individuals, occasions, or real-world data. It’s not meant for duties requiring excessive precision and accuracy.
Limitations and Bias
Limitations & Bias
The SSD-1B Mannequin has some challenges in embodying absolute photorealism, particularly in human depictions. Whereas it grapples with incorporating clear textual content and sustaining the constancy of complicated compositions because of its autoencoding strategy, these hurdles pave the best way for future enhancements. Importantly, the mannequin’s publicity to a various dataset, although not a panacea for ingrained societal and digital biases, represents a foundational step in direction of extra equitable expertise. Customers are inspired to work together with this pioneering software with an understanding of its present limitations, fostering an setting of acutely aware engagement and anticipation for its continued evolution.