imaginAIry/README.md at grasp · brycedrennan/imaginAIry · GitHub
AI imagined photos. Pythonic technology of secure diffusion photos.
“simply works” on Linux and macOS(M1) (and perhaps home windows?).
Examples
# on macOS, be sure that rust is put in first
>> pip set up imaginairy
>> think about "a scenic panorama" "a photograph of a canine" "photograph of a fruit bowl" "portrait photograph of a freckled lady"
# Secure Diffusion 2.1
>> think about --model SD-2.1 "a forest"
Console Output
???????? obtained 4 immediate(s) and can repeat them 1 occasions to create 4 photos.
Loading mannequin onto mps backend...
Producing ???? : "a scenic panorama" 512x512px seed:557988237 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00, 1.36it/s]
???? saved to: ./outputs/000001_557988237_PLMS40_PS7.5_a_scenic_landscape.jpg
Producing ???? : "a photograph of a canine" 512x512px seed:277230171 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00, 1.41it/s]
???? saved to: ./outputs/000002_277230171_PLMS40_PS7.5_a_photo_of_a_dog.jpg
Producing ???? : "photograph of a fruit bowl" 512x512px seed:639753980 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:28<00:00, 1.40it/s]
???? saved to: ./outputs/000003_639753980_PLMS40_PS7.5_photo_of_a_fruit_bowl.jpg
Producing ???? : "portrait photograph of a freckled lady" 512x512px seed:500686645 prompt-strength:7.5 steps:40 sampler-type:PLMS
PLMS Sampler: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:29<00:00, 1.37it/s]
???? saved to: ./outputs/000004_500686645_PLMS40_PS7.5_portrait_photo_of_a_freckled_woman.jpg
???? Edit Photos with Directions alone! by InstructPix2Pix
Simply inform imaginairy the best way to edit the picture and it’ll do it for you!
Use immediate power to manage how robust the edit is. For additional management you’ll be able to mix
with prompt-based masking.
>> aimg edit scenic_landscape.jpg "make it winter" --prompt-strength 20
>> aimg edit canine.jpg "make the canine purple" --prompt-strength 5
>> aimg edit bowl_of_fruit.jpg "substitute the fruit with strawberries"
>> aimg edit freckled_woman.jpg "make her a cyborg" --prompt-strength 13
>> aimg edit pearl_girl.jpg "make her put on clown makup"
>> aimg edit mona-lisa.jpg "make it a colour skilled photograph headshot" --negative-prompt "previous, ugly"
Need simply rapidly have some enjoyable? Attempt --suprise-me
to use some pre-defined edits.
>> aimg edit --gif --suprise-me pearl_girl.jpg
>> aimg edit --gif --suprise-me mona-lisa.jpg
>> aimg edit --gif --suprise-me luke.jpg
>> aimg edit --gif --suprise-me spock.jpg
Immediate Primarily based Masking by clipseg
Specify superior textual content based mostly masks utilizing boolean logic and power modifiers.
Masks syntax:
- masks descriptions have to be lowercase
- key phrases (
AND
,OR
,NOT
) have to be uppercase - parentheses are supported
- masks modifiers could also be appended to any masks or group of masks. Instance:
(canine OR cat){+5}
signifies that we’ll
choose any canine or cat after which broaden the dimensions of the masks space by 5 pixels. Legitimate masks modifiers:{+n}
– broaden masks by n pixels{-n}
– shrink masks by n pixels{*n}
– multiply masks power. will broaden masks to areas that weakly matched the masks description{/n}
– divide masks power. will scale back masks to areas that the majority strongly matched the masks description. in all probability not helpful
When writing power modifiers remember the fact that pixel values are between 0 and 1.
>> think about
--init-image pearl_earring.jpg
--mask-prompt "face AND NOT (bandana OR hair OR blue cloth){*6}"
--mask-mode preserve
--init-image-strength .2
--fix-faces
"a contemporary feminine president" "a feminine robotic" "a feminine physician" "a feminine firefighter"
>> think about
--init-image fruit-bowl.jpg
--mask-prompt "fruit OR fruit stem{*6}"
--mask-mode substitute
--mask-modify-original
--init-image-strength .1
"a bowl of kittens" "a bowl of gold cash" "a bowl of popcorn" "a bowl of spaghetti"
Face Enhancement by CodeFormer
>> think about "a pair smiling" --steps 40 --seed 1 --fix-faces
Upscaling by RealESRGAN
>> think about "colourful smoke" --steps 40 --upscale
Tiled Photos
>> think about "gold cash" "a lush forest" "piles of previous books" leaves --tile
360 diploma photos
think about --tile-x -w 1024 -h 512 "360 diploma equirectangular panorama {photograph} of the desert" --upscale
Picture-to-Picture
Use depth maps for superb “translations” of current photos.
>> think about --model SD-2.0-depth --init-image girl_with_a_pearl_earring_large.jpg --init-image-strength 0.05 "skilled headshot photograph of a lady with a pearl earring" -r 4 -w 1024 -h 1024 --steps 50
Outpainting
Given a beginning picture, one can generate it is “environment”.
Instance:
think about --init-image pearl-earring.jpg --init-image-strength 0 --outpaint all250,up0,down600 "lady standing"
Immediate Growth
You should utilize {}
to randomly pull values from lists. A listing of values separated by |
and enclosed in { }
shall be randomly drawn from in a non-repeating style. Values which might be surrounded by _ _
will
pull from a phrase checklist of the identical title. Folders containing .txt phraselist information could also be specified by way of
--prompt_library_path
. The choice could also be specified a number of occasions. Constructed-in classes:
3d-term, adj-architecture, adj-beauty, adj-detailed, adj-emotion, adj-general, adj-horror, animal, art-movement,
art-site, artist, artist-botanical, artist-surreal, aspect-ratio, hen, body-of-water, body-pose, camera-brand,
camera-model, colour, cosmic-galaxy, cosmic-nebula, cosmic-star, cosmic-term, dinosaur, eyecolor, f-stop,
fantasy-creature, fantasy-setting, fish, flower, focal-length, meals, fruit, video games, gen-modifier, hair, hd,
iso-stop, landscape-type, national-park, nationality, neg-weight, noun-beauty, noun-fantasy, noun-general,
noun-horror, occupation, photo-term, pop-culture, pop-location, punk-style, amount, rpg-item, scenario-desc,
skin-color, spaceship, model, tree-species, trippy, world-heritage-site
Examples:
think about "a blue coloured canine" -r 4 --seed 0
(word that it generates a canine of every colour with out repetition)
think about "a {_color_} canine" -r 4 --seed 0
will generate 4, totally different coloured canines. The colours shall be pulled from an included
phraselist of colours.
think about "a _fruit_. low-poly" -r 4 --seed 0
will generate photos of spaceships or fruits or a sizzling air balloon
Credit score to noodle-soup-prompts the place most, however not all, of the wordlists originate.
Generate picture captions (by way of BLIP)
>> aimg describe belongings/mask_examples/bowl001.jpg
a bowl stuffed with gold bars sitting on a desk
Options
- It makes photos from textual content descriptions!
???? - Generate photos both in code or from command line.
- It simply works. Correct necessities are put in. mannequin weights are mechanically downloaded. No huggingface account wanted.
(when you’ve got the appropriate {hardware}… and are not on home windows) - No extra distorted faces!
- Noisy logs are gone (which was surprisingly arduous to perform)
- WeightedPrompts allow you to smash collectively separate prompts (cat-dog)
- Tile Mode creates tileable photos
- Immediate metadata saved into picture file metadata
- Edit photos by describing the half you need edited (see instance above)
- Have AI generate captions for photos
aimg describe <filename-or-url>
- Interactive immediate: simply run
aimg
???? finetune your individual picture mannequin. type of like dreambooth. Learn directions on “Concept Training” web page
How To
For full command line directions run aimg --help
from imaginairy import think about, imagine_image_files, ImaginePrompt, WeightedPrompt, LazyLoadingImage
url = "https://add.wikimedia.org/wikipedia/commons/thumb/6/6c/Thomas_Cole_-_ArchitectpercentE2percent80percent99s_Dream_-_Google_Art_Project.jpg/540px-Thomas_Cole_-_ArchitectpercentE2percent80percent99s_Dream_-_Google_Art_Project.jpg"
prompts = [
ImaginePrompt("a scenic landscape", seed=1, upscale=True),
ImaginePrompt("a bowl of fruit"),
ImaginePrompt([
WeightedPrompt("cat", weight=1),
WeightedPrompt("dog", weight=1),
]),
ImaginePrompt(
"a spacious constructing",
init_image=LazyLoadingImage(url=url)
),
ImaginePrompt(
"a bowl of strawberries",
init_image=LazyLoadingImage(filepath="mypath/to/bowl_of_fruit.jpg"),
mask_prompt="fruit OR stem{*2}", # amplify the stem masks x2
mask_mode="substitute",
mask_modify_original=True,
),
ImaginePrompt("strawberries", tile_mode=True),
]
for consequence in think about(prompts):
# do one thing
consequence.save("my_image.jpg")
# or
imagine_image_files(prompts, outdir="./my-art")
Necessities
- ~10 gb house for fashions to obtain
- A CUDA supported graphics card with >= 11gb VRAM (and CUDA put in) or an M1 processor.
- Python put in. Ideally Python 3.10. (not conda)
- For macOS rust and setuptools-rust have to be put in to compile the
tokenizer
library.
They are often put in by way of:curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
andpip set up setuptools-rust
Working in Docker
See instance Dockerfile (works on machine the place you’ll be able to move the gpu into the container)
docker construct . -t imaginairy
# you actually wish to map the cache or you find yourself losing a number of time and house redownloading the mannequin weights
docker run -it --gpus all -v $HOME/.cache/huggingface:/root/.cache/huggingface -v $HOME/.cache/torch:/root/.cache/torch -v `pwd`/outputs:/outputs imaginairy /bin/bash
Working on Google Colab
ChangeLog
8.0.0
- characteristic:
???? edit photos with directions alone! - characteristic: when enhancing a picture add
--gif
to create a comparision gif - characteristic:
aimg edit --suprise-me --gif my-image.jpg
for some enjoyable pre-programmed edits - characteristic: prune-ckpt command additionally removes the non-ema weights
7.6.0
- repair: default mannequin config was damaged
- characteristic: print model with
--version
- characteristic: skill to load safetensors
- characteristic:
???? outpainting. Examples:--outpaint up10,down300,left50,right50
or--outpaint all100
or--outpaint u100,d200,l300,r400
7.4.3
- repair: deal with previous pytorch lightning imports with a sleek failure (fixes #161)
- repair: deal with failed picture generations higher (fixes #83)
7.4.2
- repair: run face enhancement on GPU for 10x speedup
7.4.1
- repair: incorrect config information getting used for non-1.0 fashions
7.4.0
- characteristic:
???? finetune your individual picture mannequin. type of like dreambooth. Learn directions on “Concept Training” web page - characteristic: picture prep command. crops to face or different fascinating elements of photograph
- repair: back-compat for hf_hub_download
- characteristic: add prune-ckpt command
- characteristic: enable specification of mannequin config file
7.3.0
- characteristic:
???? depth-based image-to-image generations (and inpainting) - repair: k_euler_a produces extra constant photos per seed (randomization respects the seed once more)
7.2.0
- characteristic:
???? tile in a single dimension (“x” or “y”). This permits, hopefully, technology of 360 VR photos.
Do that for instance:think about --tile-x -w 1024 -h 512 "360 diploma equirectangular panorama {photograph} of the mountains" --upscale
7.1.1
- repair: reminiscence/pace regression launched in 6.1.0
- repair: mannequin switching now clears reminiscence higher, thus avoiding out of reminiscence errors
7.1.0
- characteristic:
???? Secure Diffusion 2.1. Generated persons are not (utterly) distorted.
Use with--model SD-2.1
or--model SD-2.0-v
7.0.0
- characteristic: unfavorable prompting.
--negative-prompt
orImaginePrompt(..., negative_prompt="ugly, deformed, additional arms, and so on")
- characteristic: a default unfavorable immediate is added to all generations. Photos in SD-2.0 do not look dangerous anymore. Photos in 1.5 look improved as nicely.
6.1.2
- repair: add again in memory-efficient algorithms
6.1.1
- characteristic: xformers shall be used if accessible (for quicker technology)
- repair: model metadata was damaged
6.1.0
- characteristic: use totally different default steps and picture sizes relying on sampler and mannequin selceted
- repair: #110 use correct model in picture metadata
- refactor: samplers all have their very own class that inherits from ImageSampler
- characteristic:
???? ???? ???? Secure Diffusion 2.0--model SD-2.0
to make use of (it makes worse photos than 1.5 although…)- Examined on macOS and Linux
- All samplers working for brand new 512×512 mannequin
- New inpainting mannequin working
- 768×768 mannequin working for all samplers besides PLMS (
--model SD-2.0-v
)
5.1.0
- characteristic: add progress picture callback
5.0.1
- repair: assist bigger photos on M1. Fixes #8
- repair: assist CPU technology by disabling autocast on CPU. Fixes #81
5.0.0
- characteristic:
???? inpainting assist utilizing new inpainting mannequin from RunwayML. It really works rather well! By default, the
inpainting mannequin will mechanically be used for any image-masking process - characteristic:
???? new default sampler makes picture technology greater than twice as quick - characteristic: added
DPM++ 2S a
andDPM++ 2M
samplers. - characteristic: enhance progress picture logging
- repair: repair bug with
--show-work
. fixes #84 - repair: add workaround for pytorch bug affecting macOS customers utilizing the brand new
DPM++ 2S a
andDPM++ 2M
samplers. - repair: add workaround for pytorch mps bug affecting
k_dpm_fast
sampler. fixes #75 - repair: bigger picture sizes now work on macOS. fixes #8
4.1.0
- characteristic: enable dynamic switching between fashions/weights
--model SD-1.5
or--model SD-1.4
or--model path/my-custom-weights.ckpt
) - characteristic: log complete progress when producing photos (picture X out of Y)
4.0.0
- characteristic: secure diffusion 1.5 (barely improved picture high quality)
- characteristic: dilation and erosion of masks
Beforehand the+
and-
characters in a masks (instance:face{+0.1}
) added to the grayscale worth of any masked areas. This wasn’t very helpful. The brand new habits is that the masks will broaden or contract by the variety of pixel specified. The technical phrases for this are dilation and erosion. This enables a lot better management over the masked space. - characteristic: replace k-diffusion samplers. add k_dpm_adaptive and k_dpm_fast
- characteristic: img2img/inpainting supported on all samplers
- refactor: consolidates img2img/txt2img code. consolidates schedules. consolidates masking
- ci: minor logging enhancements
3.0.1
- repair: k-samplers had been damaged
3.0.0
- characteristic: improved security filter
2.4.0
???? characteristic: immediate growth- characteristic: make (blip) photograph captions extra descriptive
2.3.1
- repair: face constancy default was damaged
2.3.0
- characteristic: mannequin weights file might be specified by way of
--model-weights-path
argument on the command line - repair: set face constancy default again to previous worth
- repair: deal with small photos with out throwing exception. credit score to @NiclasEriksen
- docs: add setuptools-rust as dependency for macos
2.2.1
- repair: init picture is totally ignored if init-image-strength = 0
2.2.0
- characteristic: face enhancement constancy is now configurable
2.1.0
2.0.3
- repair reminiscence leak in face enhancer
- repair blurry inpainting
- repair for pillow compatibility
2.0.0
???? repair: inpainted areas correlate with surrounding picture, even at 100% technology power. Beforehand if the technology power was excessive sufficient the generated picture
could be uncorrelated to the remainder of the encircling picture. It created horrible wanting photos.???? characteristic: interactive immediate added. entry by operatingaimg
???? characteristic: Specify superior textual content based mostly masks utilizing boolean logic and power modifiers. Masks descriptions have to be lowercase. Key phrases uppercase.
Legitimate symbols:AND
,OR
,NOT
,()
, and masks power modifier{+0.1}
the place+
might be any of+ - * /
. Single character boolean operators additionally work (|
,&
,!
)???? characteristic: apply masks edits to authentic information withmask_modify_original
(on by default)- characteristic: auto-rotate photos if exif information specifies to take action
- repair: masks boundaries are extra correct
- repair: settle for masks photos in command line
- repair: img2img algorithm was flawed and would not at values near 0 or 1
1.6.2
- repair: one other bfloat16 repair
1.6.1
- repair: be sure that picture tensors come to the CPU as float32 so there aren’t compatibility points with non-bfloat16 cpus
1.6.0
- repair: perhaps deal with #13 with
anticipated scalar kind BFloat16 however discovered Float
- at minimal one can specify
--precision full
now and that may in all probability repair the difficulty
- at minimal one can specify
- characteristic: tile mode can now be specified per-prompt
1.5.3
- repair: lacking config file for describe characteristic
1.5.1
- img2img now supported with PLMS (as an alternative of simply DDIM)
- added picture captioning characteristic
aimg describe canine.jpg
=>a brown canine sitting on grass
- added new commandline device
aimg
for added picture manipulation performance
1.4.0
- assist a number of additive targets for masking with
|
image. Instance: “fruit|stem|fruit stem”
1.3.0
- added immediate based mostly picture enhancing. Instance: “fruit => gold cash”
- check protection improved
1.2.0
- enable urls as init-images
earlier
- img2img truly does # of steps you specify
- efficiency optimizations
- quite a few different adjustments
Not Supported
- a GUI. this can be a python library
- exploratory options that do not work nicely
Todo
- Efficiency Optimizations
- Growth Surroundings
- Interface enhancements
✅ init-image at command line✅ immediate growth✅ interactive cli
- Picture Technology Options
✅ add k-diffusion sampling strategies✅ tiling- technology movies/gifs
- Compositional Visible Technology
✅ unfavorable prompting- some syntax to permit it in a textual content string
???? photos as precise prompts as an alternative of simply init photos.
- Picture Modifying
- outpainting
✅ inpainting✅ textual content based mostly picture masking- Consideration Management Strategies
- Picture Enhancement
- Picture Restoration – https://github.com/microsoft/Bringing-Old-Photos-Back-to-Life
- Upscaling
✅ face enhancers✅ picture describe characteristic –???? CPU assist. Whereas the code does truly work on some CPUs, the technology takes so lengthy that I do not assume it is
definitely worth the effort to assist this characteristic✅ img2img for plms✅ img2img for kdiff features
- Different
- Coaching
- Finetuning “dreambooth” model
- Textual Inversion
- Efficiency Enhancements
- ColoassalAI – nearly bought it working however it’s not straightforward sufficient to put in to benefit inclusion in imaginairy. We should always verify again in on this.
- Xformers
- Deepspeed