Stable Diffusion GPT by ClustroAI

OpenAI has recently introduced GPTs, a groundbreaking development in the field of AI. This feature allows users to tailor their own versions of GPTs, akin to an AppStore. Recently, ClustroAI published “Stable Diffusion GPT” —https://chat.openai.com/g/g-H8NmIVTln-stable-diffusion-gpt

This particular iteration of GPTs harnesses the power of various Stable Diffusion models to produce images of exceptional quality, surpassing the capabilities of OpenAI’s native image generation model, DallE. When integrated with GPT-4, the user experience is enhanced, offering an interaction level that rivals, and sometimes exceeds, that of MidJourney.

As the example shows, the Stable Diffusion GPT generated the image with much better quality, and it also communicated back in Chinese, which was the language that the user used.

The core innovation here lies in the user’s ability to communicate in any human language with GPT-4, which then identifies the user’s needs and selects the most appropriate specialized model for image generation. For instance, while the default Stable Diffusion XL model may not excel in detailed human portraits, ClustroAI provides a fine-tuned alternative for this specific purpose. These models, accessible via API, are seamlessly integrated into Stable Diffusion GPT as “Actions”. The GPT autonomously chooses the best-suited model for the user’s request and crafts the appropriate prompts, making the process straightforward and language-agnostic.

The implementation process involves several steps,

  1. Starting with deploying multiple Stable Diffusion XL execution function APIs on ClustroAI. Testing the ClustroAI API’s image generation capabilities.
  2. The next step is to create and configure GPTs with descriptions and Actions. This setup illustrates the remarkable adaptability and comprehension of GPTs. On the GPTs side, programming is essentially carried out in natural human language, enabling the system to automatically select the best model for each task without traditional coding constructs like if-else statements. Specifically, the instruction was as simple as this:

In the example of two specialized models to use, the Action Schema is like this.

{
    “openapi”: “3.1.0”,
    “info”: {
        “title”: “Generate image”,
        “description”: “Generate image with stable difussion XL”,
        “version”: “v1.0.0”
    },
    “servers”: [
        {
            “url”: “https://api.clustro.ai"
        }
    ],
    “paths”: {
        “/v1/public_models/clustroai/SD-XL-XXMix_9realisticSDXL/invoke_sync”: {
            “post”: {
                “description”: “Image generation for people as the object.”,
                “operationId”: “GenerateImageWithHuman”,
                “parameters”: [
                    {
                        “name”: “input”,
                        “in”: “query”,
                        “description”: “the prompt to generate image”,
                        “required”: true,
                        “schema”: {
                            “type”: “string”
                        }
                    }
                ],
                “deprecated”: false
            }
        },
        “/v1/public_models/clustroai/stable-diffusion-xl-1–0/invoke_sync”: {
            “post”: {
                “description”: “General purpose”,
                “operationId”: “GenerateImageGeneric”,
                “parameters”: [
                    {
                        “name”: “input”,
                        “in”: “query”,
                        “description”: “the prompt to generate image”,
                        “required”: true,
                        “schema”: {
                            “type”: “string”
                        }
                    }
                ],
                “deprecated”: false
            }
        }
    },
    “components”: {
        “schemas”: {}
    }
}

Finally, test it.

This approach marks a significant evolution in human-computer interaction, reminiscent of the transformative impact of web pages and mobile apps in their respective eras. The combination of GPTs and ClustroAI epitomizes an architecture where large language models serve as the core of logic and interaction, delegating specific tasks to expert models through APIs. This union of technology opens doors to unparalleled possibilities, underscoring the vast potential of integrating intelligent language models with robust backend APIs.

ClustroAI, with its distributed computing infrastructure, utilizes idle computing resources from home-based 4090 computers, offering a cost-effective solution for AI inference. This synergy of GPTs and ClustroAI is a testament to the limitless possibilities in the realm of AI-driven solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>