I reran the test without recording and the 4090 completed the run in 10.46 seconds and the 3090 ti completed the run in 16.62 seconds. Which makes the 4090 45.85 percent faster than the 3090 ti.
Full GPU settings. 4090 is overclocked with +160 on the core and +1400 on memory. Power and voltage sliders are maxed out. The card model is the 4090 Gigabyte gaming OC. The 3090 ti is overclocked with +85 on the core and +1200 on the memory. The card model is 3090 ti EVGA FTW3 Ultra Power and voltage sliders are maxed out.
Full stable diffusion settings.
Both were ran on the same settings with IMG2IMG. It should be noted that the batch size was 4. If you set it higher you can get better processing times if you want to get more images at once. Assuming you have the VRAM. Batch size of 8 gets the highest generation speed if you have the VRAM. I did batch size 4 so I could make the video shorter. But with a batch size of 8 the total time taken would increase but the acutal time taken per image would decrease. So basically you get more images quicker than if you did them one by one.
I also used the following optimizations for the 4090 found in these guides.
https://www.reddit.com/r/StableDiffusion/c...
https://www.reddit.com/r/StableDiffusion/c...
https://github.com/AUTOMATIC1111/stable-di...
If you don't set these optimizations up correctly the 4090 will be much slower than the 3090 ti.
Prompt settings:
1girl, apron, architecture, black_dress, black_hair, blurry, blurry_background, blurry_foreground, blush, bookshelf, building, cafe, city, cityscape, convenience_store, depth_of_field, dress, east_asian_architecture, house, library, long_hair, looking_at_viewer, maid, maid_apron, maid_headdress, motion_blur, outdoors, photo_background, puffy_short_sleeves, puffy_sleeves, real_world_location, shop, short_sleeves, shrine, skyscraper, smile, solo, stadium, storefront, street, town, white_apron
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 28, Sampler: Euler, CFG scale: 7, Seed: 1181893402, Size: 768x960, Model hash: 7145e188, Batch size: 4, Batch pos: 2, Denoising strength: 0.75, Clip skip: 2, Mask blur: 4
These are the settings that were used on both IMG2IMG. I got the IMG2IMG after prompting TXT2IMG with the following settings:
masterpiece, best quality, 1girl, solo, smiling, happy, black hair, maid, standing, japan, akihabara
Negative prompt: lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name,
Steps: 28, Sampler: Euler, CFG scale: 7, Seed: 1181893397, Size: 768x960, Model hash: 7145e188, Batch size: 4, Batch pos: 0, Denoising strength: 0.75, Clip skip: 2, First pass size: 512x576