basilicom/ai-image-generator-bundle

v0.9.0 2023-11-13 11:33 UTC

This package is auto-updated.

Last update: 2024-11-08 16:36:57 UTC


README

This is bundle utalizes common APIs for generative image AIs to generate images in the Pimcore Backend.

Installation

composer update basilicom/ai-image-generator-bundle

Make sure to also install the bundle via BundleSetupSubscriber or console.

Support

Configuration

ai_image_generator:
   brand:
    colors:
      - "#0062FF"
      - "#B34197"
      - "#FF444A"

  prompt_enhancement:
    service:        ~|ollama|basilicom|open_ai

    services:
      ollama:
        baseUrl:    "http://localhost:11434/"
        model:      "llama2"

      basilicom:
        baseUrl:    "http://localhost:8080/"

      open_ai:
        baseUrl:    "https://api.openai.com/v1"
        apiKey:     "%env(OPEN_AI_API_KEY)%"

  feature_services:
    txt2img:            open_ai | stable_diffusion_api | dream_studio | clip_drop
    image_variations:   open_ai | stable_diffusion_api | dream_studio | clip_drop
    upscale:            -       | stable_diffusion_api | dream_studio | clip_drop
    inpaint:            open_ai | stable_diffusion_api | dream_studio | -
    inpaint_background: open_ai | stable_diffusion_api | -            | clip_drop

  services:
    stable_diffusion_api:
      baseUrl:        "http://host.docker.internal:7860"
      model:          "JuggernautXL"
      inpaint_model:  "JuggernautXL"
      steps:          30
      upscaler:       "ESRGAN_4x"
    
    dream_studio:
      baseUrl:        "https://api.stability.ai"
      model:          "stable-diffusion-xl-beta-v2-2-2"
      inpaint_model:  "stable-diffusion-xl-1024-v1-0"
      steps:          10 
      apiKey:         "%env(DREAM_STUDIO_API_KEY)%"
      upscaler:       "esrgan-v1-x2plus"
      
    open_ai:
      baseUrl:        "https://api.openai.com/v1"
      apiKey:         "%env(OPEN_AI_API_KEY)%"
      
    clip_drop:
      baseUrl:        "https://clipdrop-api.co"
      apiKey:         "%env(CLIP_DROP_API_KEY)%"

Usage

Generating images in documents

If no prompt is given, the prompt will be generated (and not translated!) from

  • document SEO title
  • document SEO description
  • h1-Elements
  • h2-Elements
  • h3- and h4-elements if the previous mentioned sources are empty

Image editables will get a button to generate an image

Generating images in DataObjects

If no prompt is given, the prompt will be generated (and not translated!) by trying to access the following properties:

  • key
  • title
  • name
  • productName
  • description

Image and ImageGallery fields will get a context-menu-item to generate an image

API

(POST) /admin/ai-images/generate/{context}-{id}

Generate an image based on a document or object context. If the prompt is empty, the budle-logic for prompting will take effect.

(POST) /admin/ai-images/upscale/{id}

Upscale image, while the target upscaling size is AI-Service specific

(POST) /admin/ai-images/vary/{id}

Inpaint backgrounds where the background logic differ for provided AI-Services.

Responses

Based on the Accept-header, you can say if you want to have a JSON-response or the image itself.

Accept: application/json

{
  success: true,
  id: Pimcore-Asset-ID,
  image: "base64-decoded Image",   
}
{
  success: false,
  message: "..."
}

Accept: image/jpeg

// the base64 decoded image

Using Stable Diffusion API

When running Automatic1111 locally, you can define http://host.docker.internal:7860 as your local API-url.

Additionally, make sure you started Automatic1111 with --api:

  ./webui.sh --api # windows
  ./webui.bat --api # linux/mac

If you want to know which models you have, call the Models-Endpoint and copy the name of a model of your choice.

Plugins used

  • ControlNet with canny and ip2p
  • SD Upscaler Post Processor Script

Using LLM-driven prompt enhancing

In order to enhance prompts, we use local images of LLMs. There are three supported prompt enhancement services:

  • open_ai (ChatGPT)
  • basilicom (a simple LLM implementation, see Docker Hub)
  • ollama (see Github)

Limitations

Additional ideas

  • Prompting
    • enhance prompts, especially for background inpainting, like
      background = "a creepy forest at night"
      image_type = "a haunted castle background"
      characters = "medieval warriors"
      action = "fighting for the honor"
      prompt = f"{image_type} in {background} with {characters} {action}"
      
  • generate prompt in lightbox before sending?
  • background-inpainting for other service by using masks
  • CLIP interrogate in order to optimize variation prompting
    • allow variants by img2img and CLIP
  • run IMG2IMG with low denoise on background-inpainting
  • LCM for super fast preview generation => midjourney-like/inpainting-like image selection before upscaling, etc.
  • outpainting via Thumbnail
  • better error handling (warnings and fallbacks if credits exceeded)
  • ComfyUI + Nodes to Python as fixed presets
    • allow docker images with presets
  • InvokeAI

Authors

Alexander Heidrich