How do you compose images without "strangling" the image generation

Shiimiish@lem.ainyataovi.net · edit-2 1 year ago

How do you compose images without "strangling" the image generation

Send_me_nude_girls@feddit.de · edit-2 1 year ago

Well I’m a noob too, using SD for a month now. There’s a lot to tackle but let’s talk about some points of my workflow, of what I believe to understand.

So first of you got to ask yourself how complex is the scene going to be. One person alone is usually no problem. Multiple is tricky.

Then you have the lora vs no lora approach.

So first of I try to use as little lora as possible, as they add an additional layer of balancing weights, on top of the already necessary balancing of prompts. Lora deform your composition a lot and you want to avoid them. Then you have a ton of lora that only work well with certain checkpoints or contain the opposite of what you want to gen (anime vs realistic vs CGI for example). If you have to use lora from the opposite style you want, you need to set the lora <tag> low, like on 0.3 or even lower, to avoid oversaturation of the style bleeding into your composition. And at the same time increase the weight of the prompt keyword, that is used by the lora. I try to avoid going above 1.9 as that seams to cause artifacts and I’m doing better by removing keywords, adding keywords or shifting them. Sometimes the most important isn’t far enough up in the list. Using stuff like BREAK to separate certain elements might help too.

So far I found using “latent couple” extension and “composible lora” extension, to give me good results with multiple people and multiple lora. You can enable and add controlnet as well. There’s even a latent couple helper tool to make it easier to select the parts of an image you want to be person A and person B. Haven’t tried more than 4 people yet but there’s almost no limit I guess.

You are generally on a good track (meaning you picked the right balancing of weights and prompts) when faces get fixed in hires fix (2x resolution) automatically. Meaning without enabling restore faces option. Some checkpoints are bad at faces or the combination of your lora, so it’s a bit of a pain searching for a different one and testing it. I have like 40 now and I seam to download more instead of less. Haha. But maybe learning to use one and sticking too it is smarter as some like or dislike certain prompts (usually described on the checkpoint civitai page)

Increasing CFG can help or adding more prompts. If you use lora, you can look into the details. I use civitai helper extension and click on the small exclamation mark (!) to see trigger words of the lora (even more than used in example images or description of the lora on civitai) and there you often find words that trigger the lora, resulting in more weight generated for that lora. For example if the training data used more images with say a women with black hair, than it’s easier to generate a women with black hair instead of forcing blond hair.

I usually generate 4 images at once in low steps and low resolution, like 512x512 first. If I reached at least 1 in 4 images being similar to what I want, then I start with hires fix and later img2img ultimate SD upscaling + controlnet. (I’m not a fan of extra upscaling)

At the end I do generate like 20 to 100 images until my composition is nearly where I want it to be. And I prefer to not use controlnet, so I can easily reuse the image prompts I used. Nothing is worse than needing to search for the right controlnet, depth, canny, reference and more and the weights to get close again.

If it’s about using a certain position, you can combine multiple controlnet with lower weight. That’s usually smart if you want an exact pose.

Well I hope this did help a little. Cheers!