How to use your own imagination on AI models? 

Artificial Intelligence (AI) stands at the forefront of the technological revolution, reshaping our world with an unprecedented pace and scope. At the heart of this transformative wave, AI emerges not just as a set of tools and technologies but as a fundamental shift in how we interact with data, solve problems, and envision the future. In this article, we will provide a straightforward example of how to harness the potential of this emerging trend.

Problem Definition:

Face swap is a widespread application nowadays, and it simply changes the face in a picture to another one. There are several intentions behind this application, for example, putting your face on a beautiful picture, fixing a blurred face, creating a meme image, etc. 

The traditional way of doing this has several steps. Take the example of putting a face on a beautiful picture: First, use an image editor like Photoshop to crop the face in the original picture and delete it from the picture. Second, do the same to crop another face that you want to replace with. Then, replace the face in the picture and do all the post-processing (which may include fixing boundaries, adjusting lighting, adding effects, etc.). 

You can see that if we want a good result, there is a lot of work to do by hand. So, we want to find a quicker and automatic solution.

How to Solve It:

There could be several ways to achieve our goal, but here, I would like to introduce the “AI” way of doing this.

First, we need some tools to crop the face. There is a task called Segmentation in Computer Vision (CV). It is a traditional research topic in CV, which solves the problem of separating and creating boundaries for objects in a picture. This is the exact thing we want! (Except we only want the face to be separated.) There are some models that can segment human body parts in a picture, we can make use of that.

Second, we need to have another face for replacement. Instead of simply doing another cropping, I would suggest using Stable Diffusion (SD) to generate a new face on the original picture. The advantage of this is, if you generate the face on the picture, the boundary and lighting problem can be solved automatically (we will show you how to do it in the next section), and you can modify the details of the face by using different prompts in SD. 

Now we have all the models we want, let’s dive in to see how things are done together.

The implementation:

In this section, we will do a walkthrough of the implementation, the code can be found at our github repository, you can check it if you are confused about some steps.

To make this work locally, we need some prerequisites.

  • Python

Any version should be fine. Just make sure it’s working with PyTorch

  • PyTorch

To install, check their official website. We will need both torch and torchvision.

  • diffusers, transformers

Huggingface pipelines, we need them to run model inference.

  • Pillow (PIL)

Python image tool, we need it for some image manipulation.

After these are ready, we can start to create our own face swap application.

We chose mattmdjaga/segformer_b2_clothes as our open-source segmentation model for face segmentation. (You can choose your own model, too; as long as it can detect and mark the face in the picture, we will need the mark to generate a mask.)  The picture above is an example of its output, and it simply marks which pixel belongs to which body part, and we only need to choose pixels that are marked with “face”, or, you can choose “face” and “hair” if you want a new hairstyle too.

Now, for face generation, we will use the Stable Diffusion Inpainting pipeline. It requires a mask and the original picture as the input, and then it will generate new content on the masked area. So, we need some preparation.

To generate our mask, which is a black-white picture where the white area is the inpainting area, we simply use the segmented information, then replace face pixels with white color and other areas with black color. The picture below (left) shows a sample mask directly retrieved from segformer_b2_clothes. You may notice that the shape of the mask is very ugly, and there is a small dot outside of the face part. These can sometimes influence the results of the SD model, so we want some post-process on it. 

To emit the extra dots and make the boundary more rounded, we applied Gaussian blur on the mask and conditionally reverted the blurred part back to black or white. Fortunately, PIL’s ImageFilter module has a GaussianBlur() function, so we can make direct use. After the mask is blurred, simply choose a threshold between 0 to 255, make every pixel below that threshold black and every pixel above the threshold white. The above picture (right) shows our resulting new mask. (In this process, you may want to try your own parameters for a better result.)

The code we used is:

mask = mask.filter(ImageFilter.GaussianBlur(radius = 15))
pixel_map = mask.load()
w,h = mask.size
for i in range(w):
    for j in range(h):
        pixel_map[i,j] = 0 if pixel_map[i,j] < 110 else 255

Now we have the mask and the picture, we can finally go to the SD for face generation. We used the original runwayml/stable-diffusion-inpainting model, and you can definitely use your own SD model that is capable of inpainting. Following HuggingFace’s diffusers pipeline, simply input the prompt, original image, and mask image, and you should get a decent result already. Below is a comparison between the original image and the new image.

Run Our API Without Coding

To implement the entire application, you will need some coding knowledge, but don’t worry, we have the fully working API at ClustroAI! 

First, upload your image to an image hosting website (for example, imgur.com), login to https://console.clustro.ai/ (you will need to register if you haven’t yet), and find the model under “Explore.”

Then, go to the “Test” tab, input your image url and prompt, and simply click Run, you will see the result come out on the right panel!

At the End

This article demonstrated a simple face-swap application using open-source AI models. In addition, this approach can be further improved by adjusting parameters using different AI models, and it can also derive lots of other use cases like replacing your own faces. As we said in the beginning, this could be the way we are going to add our creativity into AI models; don’t let your imagination become your own barrier!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>