X
CLOSE
Subscribe for Free
Join over 5,000 doctors, providers, and staff by subscribing to im+, the largest digital educational resource for medical.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Understanding Art Diffusion AI Models (AI Part 3)

Published on
June 20, 2024
|
Last Updated
|
00
minute read
Christian Shepherd
Christian Shepherd
Content Strategist
/ Founder

This article is one of a five-part series exploring the fundamentals and practical application of artificial intelligence for professionals in the medical aesthetics industry. We are not responsible for any hostile AI takeovers, Armageddon, human extinctions, or cheesy dystopian plots. We will, however, take full credit for any incredible improvements to your marketing program. That we are used to.

I am a geek. It’s a condition I’ve come to terms with long ago. It causes me to go way too hard in my hobbies and interests and forces me to explore anything mildly interesting way too thoroughly. 

Plants, games, books, cooking, tech — when we had our daughter, my wife demonstrated genuine support (and real patience) when I decided to buy and learn how to use a 3D printing machine. 
And while our newborn slept between the hours of 11 P.M. and 2 A.M. in those first few weeks she was home, I printed about 100 D&D minis and pop culture figures, watching the monitor closely to jump for the next bottle feed while I read endless forums on why my prints weren’t coming out in their full resolution. 

In this perpetual geekdom, I have developed a real fondness for all things AI. 

You would think that, as someone who has spent more than a decade crafting and studying written content, language models like GPT-4 would excite me the most. 

That isn’t the case. The tools that wow and intrigue me the most are, by a landslide, the art generation AI models. 

See, I consider myself to be a creative person — but I do not consider myself to be an artsy person. You won’t catch me doodling anything coherent or leading any symposiums on graphic design anytime soon. 

But suddenly, with a few choice words, I had a virtually endless amount of interesting, provocative, relevant, and tailored art in a variety of styles at my fingertips.

Sure, it struggled to generate hands and created some terrifying concepts from time to time (see examples below), but for someone who could never in a million years produce .0001% of what art AI could, it was honestly kind of intoxicating. 

(These were images from a year ago — renders like this are pretty rare nowadays.)

I found myself relying on it both professionally and personally, and it didn’t take long to understand the value that these models could bring to any business, even a medical office focusing on something as specific as aesthetic treatments.

Art is interesting, so as long as you are looking to grab attention; therefore, AI art generation will always be relevant to your practice. This part of the AI series focuses on helping you understand these art tools, identifying those areas of where they can be of value, and teaching you how to use them yourself in your own practice.

So, come geek out with me a little bit as I walk you through it all.

What Is an Art Diffusion Model?

Put simply, an art diffusion model is an artificial intelligence program that will generate art or images based on the prompt and parameters that are provided by the user.

These art generation models use a process known as diffusion. During this process, the AI program will take an image and add noise to it, making the resolution and clarity of the image unrecognizable. Then, it will reverse this process, removing noise until the image is once again clear. 

When the AI model does this enough times, it learns the patterns behind the images and, ultimately, how to generate new images based on the trends of those it has already processed.

So when you go into the AI models interface and ask it for “a painting of a duck in the style of Van Gogh” it is going to comb through its database and reference images like these…

…and produce something like this:

How Does Art “Diffusion” Work?

As I previously mentioned, artwork that is produced via AI diffusion will start from a blank canvas and slowly add in more detail based on its understanding of how other similar images would be created. 

When you see the process in action, it looks like this:

The AI keeps adding more and more detail, until you eventually get these final images:

Midjourney, Version 5.1, Style: Medium. Prompt: a painting of a frog in the style of vincent van gogh

When there are millions of images loaded into the AI’s reference “brain,” it becomes pretty easy to be specific about what it is you are looking for, since part of the AI’s training is on object and style recognition. What it ultimately comes down to is the prompt you provide.

How to Prompt AI Diffusion Models

Depending on the art model you are using, there will be some small changes to the structure and technique that you use to get the final image you are looking for. Generally, though, there are a few categories you want to cover in your prompts whenever they are relevant:

[art medium], [subject], [subject attribute], [expression], [details], [artist]

Here is what each of those mean:

  1. Art Medium - Style of the image you want, i.e., photo, drawing, painting, etc.
  2. Subject - What you want the focus of the image to be.
  3. Subject Attribute - Specific details about the main object.
  4. Expression - A specific style or period you want to emulate.
  5. Details - Additional context to the image you want included but not focused.
  6. Artist - A specific artist you want to emulate.

Following this structure, a typical prompt might look something like this:

  • A watercolor painting of a woman sitting on a veranda in the 40s wearing a big hat in the style of Paul Cézanne

And here are some examples of a pretty typical output based on that exact:

DALL-E 3, default settings.
Midjourney, Version 5.1, Style: Medium.

As you can see from these generations, different AI models will often give you dramatically different results. DALL-E 3 stayed pretty close to what someone could reasonably expect from a styled watercolor painting, while Midjourney took the generalities of the medium and kicked up the detail a few notches.

Part of learning how to use these tools effectively is understanding how to subtly change the prompts to give you the desired output. I will delve deeper into specific strategies for prompt crafting later in this article, but for now, just know that each AI engine has its own specific parameters and systems by which it operates.

What Are the Most Popular Art Diffusion Models on the Market?

There are a lot of third-party applications that will tell you they can generate art for you, but most of them are operating on some version of the same three engines: Midjourney, Stable Diffusion and DALL-E 3. 

While each AI program has its own strengths and weaknesses, all three are capable of producing impressive assets with relative ease. 

Stable Diffusion

The vast majority of third-party diffusion platforms use some version of Stable Diffusion from Stability AI. Why? Because Stable Diffusion is open source, which means anyone can download the program and run it on a PC for free. 

While it takes some additional work to get it up and running (potentially very confusing work if you aren’t super techy), Stable Diffusion provides you with the most amount of flexibility considering that you can download public models or data sets to run through the Stable Diffusion program. 

Once you’ve installed it, the UI looks like this:

Let’s take a generic prompt and give it a test run. Maybe something like: digital art of a frog sitting on a lilypad wearing sunglasses in lofi art style.

To be honest, these results are underwhelming. Graphic art doesn’t tend to be a strength of vanilla Stable Diffusion. But unlike other models that specifically prohibit and block art generated by celebrities and other people of interest, one of its strengths is that there are no limitations on what it can create. 

What this means is that you can take prompts like this: a high resolution photograph of bill gates sitting in a modern doctor office during a consultation, aperture 1.8, cinematic pose.

And get images like this:

Still not perfect, but it’s getting there — and you might find when you dive into the community resources that there is a public model that is trained to create images specific to what you are looking for. With Stable Diffusion, you are trading off ease of use for unlimited freedom, community creation, and zero cost.

If you want to mess around with Stable Diffusion a bit, you can test run it on Playground.

Advanced Stable Diffusion Prompting

If you are looking to prompt for the highest level of results, use this building block formula:

  1. Base Prompt: pool in the backyard of a estate
  2. Add Detail: pool in the backyard of a vintage estate
  3. Add More Detail: pool in the terrace of a vintage estate, old furniture with classic cocktails, gold, vibrant plants; luxury feel, old money dynasty sepia
  4. Add Style: 35mm lomo photography of a pool in the terrace of a vintage estate, old furniture with classic cocktails, gold, vibrant plants; luxury feel, old money dynasty sepia

For the “Step” option, which determines the quality of the image, follow this guide:

  • 20 steps: Low number of steps, low detail, more blurred.
  • 50-100 steps: Medium number of steps, standard detail.
  • +100 steps: High number of steps, sharper image.

For the “CFG” scale, which dictates how creative the AI becomes, use the following guidelines:

  • CFG 2 - 6: Let the AI kind of do its own thing.
  • CFG 7 - 11: Nice middle ground and recommended range for most prompts.
  • CFG 12 - 15: The "listen to me and do what I say."
  • CFG 16 - 20: The "DO WHAT I AM TELLING YOU."

There are other metrics you can adjust, like word weight and XYA plotting for mass generations, but those are typically only needed by large-scale users. 

DALL-E 3

DALL-E 3 was developed by OpenAI, the same developers behind the language models ChatGPT and GPT-4. It operates on a credit system, which means you will need to put up some cash if you want to play around or use it extensively.

True to the ethos of OpenAI, it does aim to make the technology easy to use; the prompting box you are brought to by default is a simple one line field:

Let’s come back to our frog prompt: digital art of a frog sitting on a lilypad wearing sunglasses in lofi art style.

As you can see, DALL-E 3 has put its own spin and style on the prompt. When you compare two identical prompts side by side like this, you start to get an idea about what each art model can provide. 

Since DALL-E 3 has strict protections against potentially harmful content, we can’t do a side by side comparison of the Bill Gates example, but that doesn’t mean we can’t check out some other prompts:

Prompt: a hyper realistic photo of a beautiful house in mid-century modern style at sunset inside of a forest and full of trees and plants
Prompt: a macro 35mm photograph of two mice in Hawaii, they're each wearing tiny swimsuits and are carrying tiny surf boards, digital art
Prompt: an expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula
Prompt: a photo of a teddy bear on a skateboard in times square

If the DALL-E 3 style is something you can get behind, it is a fairly cost-effective option to use for art generation. 

(An advantage of the DALL-E 3 model is that you can now use it natively inside ChatGPT.)

Advanced DALL-E 3 Prompting

For DALL-E 3, the strategy is pretty simple: words that come at the beginning of the prompt hold a heavier weight than those that come at the end. For DALL-E 3 specifically, you can modify the standard template we provided above into something more detailed like this:

[art medium], [main objects], [attribute], [expression], [key light], [detailing], [camera shot], [media], [engine], [artist]

Art Medium - Style of the image you want, i.e., photo, drawing, painting, etc.

Main Object - What you want the focus of the image to be.

Attribute - Specific details about the main object.

Expression - A specific style or period you want to emulate.

Key Light - Specific lighting or time of day you want to capture.

Detailing - Additional context to the image you want included but not focused.

Media - Reference to a specific piece of media to emulate.

Engine - The platform by which something is created, e.g., Unreal Engine 5.

Artist - An artist whose style should be emulated closely.

Midjourney

After extensive use of all three art generation models, Midjourney is, by a large margin, my favorite. There is, however, a potential argument to be made for each diffusion model in terms of image quality and diversity. 

With enough practice, you can get a high quality image for your needs through quality prompt crafting in any art diffusion model. 

But for the average user — someone who wants to type their idea for an image into a box and get a high quality image out — Midjourney is the best option. 

It operates through Discord, a popular messaging, conferencing and community forum hub. You type a command (/imagine) into the prompt box, type your idea and then Midjourney’s model goes to work on creating its image for you.  

There is a web option available, but you need to generate a certain number of images before you gain access to it. Here are some generations from that same frog prompt we’ve been using in each model: digital art of a frog sitting on a lilypad wearing sunglasses in lofi art style

This is why I prefer Midjourney over the other options on the market right now — hot off the press, the results Midjourney provides are drastically higher quality and more creative than the others. 

Here’s a side by side comparison:

Now, there is an argument to be made about style. You may not be a fan of the more realistic, detailed art Midjourney provides by default. In fact, anyone who spends time using these AI tools can almost spot its art from a mile away since it always has a unique… vibe to it.

This pushes some people away since it can be difficult to get it to drop its own aesthetic preferences.

But recent updates to Midjourney are getting rid of that notion to a great degree. With simple modifications to the prompt you make Midjourney mimic the style that DALL-E 3 put out:

microsoft paint 2d drawing of a frog sitting on a lilypad wearing sunglasses, simple color palette --no depth of field, 3d, realism, shadows, foreground, background:

To really get an idea of the image quality that Midjourney can produce, check out these recently featured images:

Source

If you are looking for an easy-to-use art generation solution, Midjourney is your best and safest investment. Its downside is that it runs on a subscription-based model — meaning you’ll need to shell out between $10 and $60 per month to use it. 

It’s either your money or your time; so ultimately the decision is up to you. If you want to become a Stable Diffusion savant, I support you entirely. If you want to hop into Discord and start pumping out amazing images for a fee, I will rock with you either way. 

Advanced Midjourney Prompting

There are three main tools you should know if you are looking to take your Midjourney prompts to the next level: negative prompting, style, and permutation prompting.

Negative prompting is used to tell the AI to exclude certain elements from your image. In the frog image above, you’ll see the negative prompt represented as “--no depth of field, 3d, realism, shadows, foreground, background.”

The idea behind this command was to take away anything that the image could use to denote a three-dimensional canvas since we were looking for a flat illustration. 

You can use this tool when artifacts you don’t want in your images keep reappearing. Simply use the “--no” prompt followed by whatever elements you want excluded.

Style level is how strictly the AI will follow your prompt. Here is a general cheat sheet you can reference:

Style Low - less artistic, focuses more strictly on the prompt

Style Medium - more artistic, relaxes focus on prompt (default)

Style High - highly artistic, use if you want Midjourney to 'take over' 

Here are three images from the same prompt with different style levels: 

Prompt: a dramatic, beautiful images of an old woman with gray hair sitting on an old car, smiling wide, happy

Low

Medium

High

As you can see, as the AI was given more room to be creative, it drew inspiration from more “artsy” images. We lost some of the prompt information in the Style High version (the car doesn’t immediately scream “old”) but instead were given a much more artistic final image.

On the other side of the spectrum, zero room for creativity means that the AI tried to follow the instructions as closely as possible — to its own detriment. It took sitting on the car too literally and couldn’t create an entirely coherent image without much space to improvise. 

Permutation prompting sounds like a complicated concept, but it isn’t. In many cases, you will find yourself wanting to change one small factor of a prompt. Something like the color of a car. In that instance, rather than putting down three separate prompts, you can ask Midjourney to generate three versions of a single prompt with subtle changes. 

Here is an example of a base prompt:

  • A car driving up a lonely hill in the middle of the desert

Here is the prompt with permutations set to change the color of the car being generated:

  • A {red, blue, green} car driving up a lonely hill in the middle of the desert

Based on this permutation, the following three prompts will run:

  • A red car driving up a lonely hill in the middle of the desert
  • A blue car driving up a lonely hill in the middle of the desert
  • A green car driving up a lonely hill in the middle of the desert

How to Use AI Art In Your Practice

Even if you are excited about the tech and tools we are covering, there needs to be an applicable way for you to utilize this tech in your office. Below, you’ll find a few ways we think AI generated art can benefit your practice. 

Social Media & Website

If you are someone who is active on social media, AI art presents a huge opportunity for your content generation. You can use it to create comic strip style images to go with a funny post you wrote, stock-like imagery to show an example of a condition, or even use it to find a look and feel for something more extensive like a documentary or blog. 

Not to mention, when you add visuals to your website content, your readers are more likely to stay interested and on the page longer. I might not have to reiterate this to you, but I will anyway: the longer your potential patient stays on your site, the more likely they are to sign up with services for you.

Physical Art and Distributed Material

If you need some new art to fit the branding of your practice and persona, it’s available at your fingertips. Running a giveaway and want to generate something that is actually interesting as a canvas? AI has you covered. Want to spice up those fliers you make all the time and send to patients? Midjourney it, baby.

Need a new banger piece of AI art to hang up in your office and fill up those dreary walls? /imagine to the rescue.

Product Design

If the ecommerce bug has gotten you, white labeling might be something you are looking into. (And we recommend that you do.) Before you decide on an aesthetic direction for those products, use one of the art generation tools to mood board how you want those products to look and feel in relation to your practice. A few choice permutations on a single prompt could give you dozens of potential directions… or maybe one of them just hits everything you wanted and can be immediately used for product creative.

NFTs

They aren’t for everyone, but if you are someone who can get behind the idea of virtual ownership, then NFTs could be a fun way to get your patients with similar interests excited about your practice. You could potentially use polished artwork from something like Midjourney as the NFT itself, but you can use any option to figure out some general art ideas you want to have a professional pursue.

Really, anytime you would look for some kind of visual asset, AI art would be a reliable option. This doesn’t mean the need for graphic artists or artists in general is gone — but you might be able to do at least some of it on your own now. 

AI Art Is the Next Status Quo

These art models aren’t going anywhere. They will only continue to become smarter and more complex, which means at some point you will need to figure them out. And if you have to do it anyway, you may as well be ahead of the curve and use the tool to outperform your competitors. 

If you don't believe me, just take a look at what the earliest form of video generators are able to produce:

Video generated by OpenAI’s Sora.

Who knows what will be coming next?

Anyway, speaking of outdoing your competitors, part four of this series gets back to some digital marketing basics. As it turns out, not even search engine optimization is safe from the AI revolution. 

TL;DR An art diffusion model is an AI program designed to generate art based on user prompts. It uses a diffusion process where it adds noise to an image, making it unclear, then reverses the process, refining the image. Over time, the AI learns image patterns and can generate new images based on its database. Different AI models, like Midjourney, Stable Diffusion, and DALL-E 3, produce varied results. While Stable Diffusion is open-source and offers flexibility, DALL-E 3 operates on a credit system. Midjourney, a favorite among many, operates via Discord and is known for high-quality outputs. These AI art models can be used for social media content, physical art, product design, and even NFTs. As AI art models evolve, they are set to become a standard tool in the digital realm.