Midjourney is one of the most popular AI Text-to-Image creation applications out there with millions of users world wide. I have been using Midjourney since the closed beta days when they started. However at the pace the platform has evolved in the first year of its existence is absolutely amazing.
In this post I want to take a look at the evolution of the images produced by Midjourney since its beginning in March 2022
As this post is created today, I am going to use the same prompt and vary the different versions of its generation algorithm with default settings. This is easy to achieve using the /settings switch when let’s you fine-tune the settings.
The prompts I will use to test the different versions are:
- symmetrical portrait of retro robots, glowing rays, cartooncore, by Simon Stålenhag, trending on artstation
- Street View Of A Home In The Style Of Mountain Lakeside wizard Cottage Architecture, By Marc Simonetti, Rosa Bonheur, and Craig Mullins, mythical creatures, Wide Angle, Natural Lighting, Realistic, 8K, Octane Render, Beautifully Detailed, Light Diffusion, Cinematic Shading, and Cinematic Elements
Midjourney Version 1
This is the very first version which was mind blowing at the time, although as you see the evolution and look back then you’d think that its no good but its amazing what v1 was able to do in its early days.
Prompt 1 was inspired by the TV Series “Tales of the Loop” on Amazon Prime, if you haven’t checked it out yet you should. It will have you hooked.
The composition of the image is as per the prompt, the figure created appears to be robotic and contains symmetry. The colour scheme is similar to the one represented by the artist itself. When you see the grid of the 4 images produced versus the final result you can see that there is a lot more details added, this is due to the upscaling algorithm used by Midjourney. In the v1 model there was no control on how much detail is added during upscaling but this changed in later version.
Prompt 1 in Midjourney v1
Prompt 1 in Midjourney v1 Grid
The grid of images produced using Prompt 2 are somewhat coherent, there are mountains, clouds, trees and cottage is visible with plenty of light emitting from it. You can also see in the upscaled version of the image that some of the lighting from the window is spilling out nicely on the area outside the cottage. The one thing is for sure the crunchy artefacts are still quite prominent while the grid image usually is much softer but its very small in size.
Midjourney Version 2
Version 2 was another iteration and improvement on the original version, the compositions start to be somewhat coherent with the text and also start to be more recognisable. Prompt 1 yields a portrait of a robot withs some symmetry, 3 of the images from the grid are usable and when you upscale you see lot more details added in the image. The colour palette is similar to the artist’s style.
Prompt 2 gives us more recognisable cottage with multiple stories and distant scenery. Upscale adds more details to the image and produces a high resolution version. Yet still it remains quite gritty and this is where most users asked for features that let you control some parameters of the generation to ensure the image is not as gritty and has some softness and appears cleaner (less noise). This was provided eventually by the Midjourney team in version 3 when they introduced Light Upscale.
Midjourney Version 3
Images are more pronounced in version 3 and with Light Upscale you are able to get images created with less artefacts. The use of –uplight stops the image from appearing overcooked by the AI and retains some nice balance of softness amongst the details.
Announcement of V3
1) We are entering ALPHA TESTING OF OUR V3 ALGORITHMS.
- New algorithm for /imagine is now live
- New upscaler is now live. Significantly less distortion / artefacts.
2) The new V3 algorithms have two important new arguments –stylize and –quality
- The stylize argument sets how strong of a ‘stylization’ your images have, the higher you set it, the more opinionated it will be. If you set it high enough it will get so opinionated it will start ignoring your words. It’s great fun!
- The quality argument lets you give the algorithm ‘more’ or ‘less’ time to think. It also changes the cost of your images. Quality=0.5 will make your images half cost. Quality=2 will make your images cost TWICE TIMES AS MUCH.
Below set of samples are again created using default settings and you can see improvements from v2. It was great fun to be able to revisit some of the old prompts and see better synthesised images produced by the AI model.
The cottage is also more defined and structured, and distinct mountains appear in the background, however the grungy look of the upscaled images still ruins the output in my view.
With Light Upscale (– Uplight)
Using the Light Upscale option (–uplight) the resulting images are less overcooked and look more appealing. Most of my images I made with MJ v3 used the Light Upscale option because although the output images without it were good, they would look very crusty and noisy by the time they were upscaled. For me using –uplight was a standard argument in my prompts. You can see the different when using Prompt 1 with this option.
Another example where Prompt 2 was run with –uplight argument the resulting upscaled mage is much pleasing. The image has details but also is softer than normal upscale.
Midjourney Version 4
Announcement of V4
Version 4 is now a brand new AI model that has been created from scratch and released as Alpha at the time of writing this post. As per the announcement on their discord Version 4 (v4) is:
What’s new with the V4 base model?
- Vastly more knowledge (of creatures, places, and more)
- Much better at getting small details right (in all situations)
- Handles more complex prompting (with multiple levels of detail)
- Better with multi-object / multi-character scenes
- Supports advanced functionality like image prompting and multi-prompts
- Supports –chaos arg (set it from 0 to 100) to control the variety of image grids
It is advised that the prompts are fine tuned for this model as it work differently from others, however for consistency I’m keeping the same prompt.
Prompt 1 represents very closely now to the artist style, if you haven’t seen his work just Google “Simon Stalenhag” and you can see so many of his work. Being a short and less detailed prompt you can see that AI tries to add what it thinks you may want, even though I say “portrait”, some images are more environmental shots. The prompt contains plural of “robots” and it seems that this model handles is so well. The word symmetry is also used very well as you can there is symmetry being applied.
Prompt 2 well just rocks as the detail created is so much better in these images that you just look at the image and go “Wow”!!
Check out the reflection how perfectly that is line up (in below image). The details created look pretty cool in this version, even though the highest resolution in the alpha release is 1024px by 1024px you can still see plenty of details in the images.
Grid of 4 images from Prompt 2
Because of the limitations of v4 alpha only being able to scale up to 1024px, I ran these above samples through Gigapixel AI which is very my favourite tool for upscaling, here are the results. The roof tile textures are well defined, the rocks in the walls, even the windows have well defined frames. Click on the image to see full details.
High Res version of 1024px original
High Res version of 1024px original
Each iteration or evolution or model used by Midjourney has pushed the status quo further and further, for sometime it seemed that Stable Diffusion was further ahead of Midjourney when it came out but it appears that Midjourney has overtaken with v4 and lifted the bar higher. So I guess we wait to see what Stable Diffusion will do next.