Doramagic Project Pack · Human Manual

Generative-Media-Skills

Generative-Media-Skills provides AI agents with:

Getting Started

Related topics: Architecture Overview, CLI Commands Reference, Agent Integration Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: Architecture Overview, CLI Commands Reference, Agent Integration Guide

Getting Started

Welcome to Generative-Media-Skills — a comprehensive multimodal toolset enabling AI agents (Claude Code, Cursor, Gemini CLI) to generate, edit, and display professional-grade images, videos, and audio content through the muapi-cli interface.

This guide walks you through installation, configuration, and your first generation to get up and running in minutes.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Architecture Overview

Related topics: Getting Started, Expert Skills Library, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Directory Structure

Continue reading this section for the full explanation and source context.

Section Platform Utilities

Continue reading this section for the full explanation and source context.

Section Media Editing Core

Continue reading this section for the full explanation and source context.

Related topics: Getting Started, Expert Skills Library, Schema Reference

Architecture Overview

This repository implements a Core/Library split architecture designed for AI agents to generate, edit, and display professional-grade images, videos, and audio through the muapi.ai platform. The architecture prioritizes agent-native workflows with CLI-powered scripts, structured JSON outputs, and Model Context Protocol (MCP) integration.

High-Level Architecture

The Generative-Media-Skills repository acts as a skill layer that translates creative intent into technical directives, delegating actual API calls to the underlying muapi-cli tool. This separation allows the repository to focus on expert knowledge while leveraging a robust, maintained API client.

graph TD
    subgraph "AI Agents"
        A["Claude Code"]
        B["Cursor"]
        C["Gemini CLI"]
        D["MCP Clients"]
    end
    
    subgraph "Generative-Media-Skills"
        E["Expert Library /library"]
        F["Core Primitives /core"]
        G["Recipe Pack"]
    end
    
    subgraph "muapi-cli"
        H["CLI Interface"]
        I["API Client"]
    end
    
    subgraph "muapi.ai Platform"
        J["100+ AI Models"]
        K["Media Generation APIs"]
    end
    
    A --> E
    B --> E
    C --> E
    D --> H
    E --> F
    F --> H
    G --> H
    H --> I
    I --> J
    I --> K

Source: README.md

Core Primitives (`/core`)

The Core layer provides thin wrappers around muapi-cli for direct API access. These are low-level building blocks that handle raw platform operations.

Directory Structure

DirectoryPurpose
core/media/File upload operations
core/edit/Image editing (prompt-based)
core/platform/Setup, authentication, and result polling

Platform Utilities

Located in core/platform/, these scripts handle API configuration and async operation management:

ScriptDescription
setup.shConfigure API key, show config, test key validity
check-result.shPoll for async generation results by request ID

Source: core/platform/SKILL.md

Media Editing Core

Located in core/edit/, these scripts provide enhancement operations:

ScriptDescription
edit-image.shPrompt-based image editing
enhance-image.shOne-click operations: upscale, background removal, face swap
lipsync.shSync video lip movement to audio
video-effects.shVideo/image effects

Source: core/edit/SKILL.md

Expert Library (`/library`)

The Library layer contains high-value skills that implement domain-specific knowledge for professional results.

Skill Categories

graph LR
    A["Library"] --> B["Motion / Video"]
    A --> C["Social"]
    A --> D["Visual / Images"]
    A --> E["Edit"]
    
    B --> B1["Cinema Director"]
    B --> B2["Seedance 2"]
    B --> B3["AI Clipping"]
    
    C --> C1["YouTube Shorts"]
    C --> C2["UGC Ads"]
    
    D --> D1["Nano-Banana"]
    D --> D2["UI Designer"]
    D --> D3["Logo Creator"]
    
    E --> E1["AI Clipping"]

Key Expert Skills

SkillCategoryDescription
Cinema DirectorMotionTechnical film direction & cinematography
Nano-BananaVisualReasoning-driven image generation (Gemini 3 Style)
UI DesignerVisualHigh-fidelity mobile/web mockups (Atomic Design)
Logo CreatorVisualMinimalist vector branding
Seedance 2MotionDirector-level cinematic video generation
AI ClippingEditLong video → ranked vertical short clips

Source: README.md

Recipe Pack

Forty-one LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines. Each skill is a SKILL.md file the agent reads and follows.

Recipe Categories

CategoryCountDescription
Motion / Video16Film generation, animation, product showcases
Social5Instagram posts, UGC ads, social media packs
Visual / Design21Action figures, brand kits, logos, interior design

Example Recipes

SkillPathDescription
3D Logo Animationlibrary/motion/3d-logo-animation/Premium 3D logo animation
AI Fight Scene Generatorlibrary/motion/ai-fight-scene/16-cell storyboard → video choreography
Animal Vlogger Videolibrary/motion/animal-video-generator/Anthropomorphic animal content
Action Figure Generatorlibrary/visual/action-figure-generator/Photo → 3D collectible
Amazon Product Listinglibrary/visual/amazon-product-listing/Full Amazon listing image set

Source: README.md

MCP Server Architecture

The Model Context Protocol server exposes all tools directly to MCP-compatible agents.

Exposed Tools (19 Total)

ToolCategoryModels Supported
muapi_image_generateImage14 models
muapi_image_editImage11 models
muapi_video_generateVideo13 models
muapi_video_from_imageVideo16 models
muapi_audio_createAudioSuno (music)
muapi_audio_from_textAudioMMAudio (sound effects)
muapi_enhance_upscaleEnhancementAI upscaling
muapi_enhance_bg_removeEnhancementBackground removal
muapi_enhance_face_swapEnhancementFace swap (image/video)
muapi_enhance_ghibliEnhancementGhibli style transfer
muapi_edit_lipsyncEditingLip sync to audio
muapi_edit_clippingEditingAI highlight extraction
muapi_predict_resultUtilityPoll prediction status
muapi_upload_fileUtilityUpload local file → URL
muapi_keys_listAccountList API keys
muapi_keys_createAccountCreate API key
muapi_keys_deleteAccountDelete API key
muapi_account_balanceAccountGet credit balance
muapi_account_topupAccountAdd credits (Stripe)

Source: README.md

MCP Configuration

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Schema Reference

The repository includes schema_data.json for runtime validation:

  • Model ID Validation: Ensures requested models exist
  • Endpoint Resolution: Maps model names to API endpoints
  • Parameter Checking: Validates aspect_ratio, resolution, and duration

CLI Model Discovery

muapi models list
muapi models list --category video --output-json

Agentic Pipeline Flow

The architecture supports asynchronous operations through a polling pattern:

sequenceDiagram
    participant Agent
    participant CLI as muapi-cli
    participant API as muapi.ai API
    participant Agent2 as Agent (other work)
    
    Agent->>CLI: Submit async request
    CLI->>API: POST /generate (async=true)
    API-->>CLI: request_id
    CLI-->>Agent: request_id
    Agent->>Agent2: Do other work
    Agent2-->>Agent: Continue...
    Agent->>CLI: Poll for result
    CLI->>API: GET /predict/{request_id}
    API-->>CLI: status
    alt Still processing
        CLI->>API: GET /predict/{request_id}
    else Complete
        API-->>CLI: result URL
        CLI-->>Agent: Download media
    end

Example Pipeline Commands

# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running" \
  --model kling-master --no-wait --output-json --jq '.request_id')

# Poll when ready
muapi predict wait "$REQUEST_ID" --download ./outputs

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url')
muapi image edit "make it like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Supported AI Agents

The architecture is optimized for the next generation of AI development environments:

AgentIntegration Method
Claude CodeDirect terminal execution + MCP server mode
CursorSeamless local script execution
Gemini CLICLI tool integration
WindsurfCLI tool integration
Any MCP ClientFull MCP server mode

Common Flags

All core scripts support standardized CLI flags:

FlagPurpose
--asyncSubmit request without waiting
--jsonOutput raw JSON
--downloadAuto-download generated media
--viewAuto-download and open in system viewer
--output-jsonJSON output mode
--jq '<filter>'Extract specific JSON fields
--timeout NSet operation timeout

Requirements

ComponentRequirement
muapi-cliInstalled via npm install -g muapi-cli or pip install muapi-cli
API KeyConfigured via muapi auth configure
System Toolscurl, jq, python3
Node.jsFor npm installation

Source: core/edit/SKILL.md

Key Design Principles

  1. Agent-Native Design: CLI-powered scripts with structured JSON outputs and semantic exit codes
  2. No Boilerplate: All primitives delegate to muapi-cli — no curl or manual JSON parsing
  3. Direct Media Display: --view flag for automatic download and viewing
  4. Local File Support: Auto-upload from local machine to CDN
  5. Schema Validation: Runtime validation of models and parameters
  6. CI/CD Ready: --output-json, --jq, semantic exit codes for scripting

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

MCP Server Setup

Related topics: CLI Commands Reference, Agent Integration Guide, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Image Generation & Editing

Continue reading this section for the full explanation and source context.

Section Video Generation

Continue reading this section for the full explanation and source context.

Section Audio Generation

Continue reading this section for the full explanation and source context.

Related topics: CLI Commands Reference, Agent Integration Guide, Schema Reference

MCP Server Setup

The MCP (Model Context Protocol) Server in Generative-Media-Skills exposes all 19 media generation tools as structured MCP endpoints, enabling AI agents like Claude Desktop, Cursor, and other MCP-compatible clients to seamlessly invoke image, video, and audio generation without requiring shell script execution or manual API calls.

Architecture Overview

graph TD
    A[Claude Desktop / Cursor / MCP Client] -->|MCP Protocol| B[muapi mcp serve]
    B --> C[muapi-cli Core]
    C --> D[muapi.ai API]
    D --> E[100+ AI Models]
    
    F[Local Files] -->|auto-upload| C
    G[Skills Library] -->|workflows| C

The MCP Server acts as a thin bridge between MCP-compatible AI agents and the muapi.ai platform. It provides fully typed JSON Schema definitions for all tools, eliminating the need for prompt engineering or manual request construction. Source: README.md

Prerequisites

Before configuring the MCP Server, ensure you have:

RequirementVersion/Details
Node.jsv18+ recommended
muapi-cliLatest stable
muapi.ai API keyAvailable at muapi.ai/dashboard

Install muapi-cli via npm or pip:

# via npm (recommended)
npm install -g muapi-cli

# via pip
pip install muapi-cli

Configure your API key:

muapi auth configure --api-key "YOUR_MUAPI_KEY"

Source: README.md

Starting the MCP Server

Launch the MCP server in foreground or background mode:

muapi mcp serve

The server exposes all 19 tools with full JSON Schema input/output definitions. It runs as a long-lived process that handles MCP protocol communication on the local machine.

Claude Desktop Configuration

To integrate with Claude Desktop, add muapi to your Claude configuration file.

macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": {
        "MUAPI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Alternatively, if you configured your API key globally via muapi auth configure, you can omit the env block:

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"]
    }
  }
}

After editing the config, restart Claude Desktop to load the new MCP server.

Source: README.md

Available MCP Tools

The MCP Server exposes 19 structured tools organized by category:

Image Generation & Editing

ToolDescriptionInput Models
muapi_image_generateText-to-image generation14 models (Flux, Midjourney, DALL-E, etc.)
muapi_image_editImage-to-image editing11 models (Flux Kontext, GPT-4o, Midjourney, Qwen)

Video Generation

ToolDescriptionInput Models
muapi_video_generateText-to-video generation13 models (Kling, Veo, Seedance, etc.)
muapi_video_from_imageImage-to-video animation16 models

Audio Generation

ToolDescriptionPlatform
muapi_audio_createMusic generationSuno
muapi_audio_from_textSound effectsMMAudio

Enhancement & Effects

ToolDescriptionModels/Options
muapi_enhance_upscaleAI upscalingMultiple engines
muapi_enhance_bg_removeBackground removalOne-click
muapi_enhance_face_swapFace swap for image/videoMultiple modes
muapi_enhance_ghibliGhibli style transferOne-click
muapi_edit_lipsyncLip sync to audioSync Labs, LatentSync, Creatify, Veed
muapi_edit_clippingAI highlight extraction from videoServer-side transcription

Utility & Account

ToolDescription
muapi_predict_resultPoll async prediction status
muapi_upload_fileUpload local file to CDN, returns URL
muapi_keys_listList existing API keys
muapi_keys_createCreate new API key
muapi_keys_deleteDelete an API key
muapi_account_balanceGet current credit balance
muapi_account_topupAdd credits via Stripe checkout

Source: README.md

Other MCP-Compatible Clients

The MCP Server is not limited to Claude Desktop. Any MCP-compatible agent can use these tools:

ClientIntegration Method
CursorAdd to Cursor settings using same JSON config structure
WindsurfMCP server configuration in IDE settings
Gemini CLIDirect CLI execution of MCP tools
Custom AgentsAny MCP-compatible agent with tool execution

For Cursor and Windsurf, use the same server configuration as Claude Desktop.

Workflow Examples

Image Generation Workflow

graph LR
    A[Agent Request] -->|muapi_image_generate| B[MCP Server]
    B --> C[muapi.ai API]
    C --> D[Image Model]
    D --> E[Generated Image URL]
    E --> B
    B --> F[Agent Receives Result]

Example Claude Desktop prompt:

Generate a cyberpunk city image with neon lights using the muapi_image_generate tool.

Async Video Pipeline

graph TD
    A[Submit Request] -->|muapi_video_generate --no-wait| B[Get request_id]
    B --> C[Do other work]
    C --> D[Poll muapi_predict_result]
    D -->|Still processing| D
    D -->|Complete| E[Download via muapi_predict_result --download]

Example terminal workflow:

# Submit async job
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# Poll for result
muapi predict wait "$REQUEST_ID" --download ./outputs

Chained Workflow

graph LR
    A[Local Image] -->|muapi_upload_file| B[Get CDN URL]
    B -->|muapi_image_edit| C[Apply Edit]
    C -->|muapi_enhance_upscale| D[Upscale]
    D -->|muapi_enhance_bg_remove| E[Final Output]

Example:

# Upload local file
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')

# Edit the image
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Platform Utilities via MCP

The MCP Server also exposes account management tools for programmatic control:

ToolUse Case
muapi_keys_listAudit active API keys in CI/CD
muapi_keys_createProvision keys for different projects
muapi_account_balanceCheck credits before large batch jobs
muapi_account_topupAutomated credit replenishment

These utilities enable fully automated pipelines without manual dashboard interaction.

Source: core/platform/SKILL.md

Supported AI Models

Discover all available models at runtime:

# List all models
muapi models list

# Filter by category
muapi models list --category video --output-json

# Check supported parameters
muapi models list --category image --output-json | jq '.[] | {id, aspect_ratio, resolution}'

Model availability is validated against schema_data.json at runtime, ensuring requests specify only supported parameters.

Source: README.md

Schema Reference

All MCP tools use fully typed JSON Schema definitions. This provides:

  • Input Validation — Requests are validated against supported parameters
  • Autocomplete — IDEs can suggest valid parameter values
  • Documentation — Tool descriptions are embedded in the schema

The schema_data.json file validates:

ValidationDescription
Model IDsEnsures requested model exists
Endpoint ResolutionMaps model names to API endpoints
Parameter CheckingValidates aspect_ratio, resolution, duration

Source: README.md

Troubleshooting

Server Won't Start

# Verify muapi-cli installation
muapi --version

# Check API key configuration
muapi auth configure --show

# Test connectivity
muapi auth configure --test

Tools Not Appearing in Agent

  1. Verify Claude Desktop config JSON is valid
  2. Restart Claude Desktop after config changes
  3. Check the MCP server process is running
  4. Confirm MUAPI_API_KEY is set or global config exists

Async Requests Timeout

Use --no-wait for long-running tasks and poll separately:

muapi predict wait "REQUEST_ID" --timeout 300

Upload Failures

The muapi_upload_file tool automatically handles local file uploads. Ensure files are accessible and within size limits.

The MCP Server provides direct access to core primitives. For higher-level workflows, consider these expert skills:

SkillDescription
Cinema DirectorTechnical film direction & cinematography
AI ClippingLong video → ranked vertical clips
Seedance 2Cinematic video with native audio-video sync
YouTube ShortsPlatform-aware clip presets

Source: README.md

See Also

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

CLI Commands Reference

Related topics: MCP Server Setup, Schema Reference, Troubleshooting

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prerequisites

Continue reading this section for the full explanation and source context.

Section API Key Configuration

Continue reading this section for the full explanation and source context.

Section Authentication

Continue reading this section for the full explanation and source context.

Related topics: MCP Server Setup, Schema Reference, Troubleshooting

CLI Commands Reference

Overview

The Generative-Media-Skills repository provides a comprehensive CLI-based interface for AI-powered media generation and manipulation. Built around the muapi-cli tool, these commands enable AI agents (Claude Code, Cursor, Gemini CLI) to generate images, videos, and audio through a standardized command-line interface with structured JSON outputs.

The CLI architecture follows a Core/Library split:

  • Core Primitives (/core): Thin wrappers for raw API access
  • Expert Library (/library): High-value skills with domain-specific logic

Source: README.md

Installation

Prerequisites

Before using the CLI commands, install the muapi-cli package:

# via npm (recommended)
npm install -g muapi-cli

# via pip
pip install muapi-cli

# or run without installing
npx muapi-cli --help

API Key Configuration

Configure your muapi.ai API key before making any requests:

# Interactive setup
muapi auth configure

# Pass key directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"

# Get your key at https://muapi.ai/dashboard

Platform setup scripts are located in core/platform/:

ScriptPurpose
setup.shConfigure API key, show config, test key validity
check-result.shPoll for async generation results by request ID
# Save API key
bash core/platform/setup.sh --add-key "YOUR_MUAPI_KEY"

# Show current configuration
bash core/platform/setup.sh --show-config

# Test API key validity
bash core/platform/setup.sh --test

Source: README.md, core/platform/SKILL.md

Core Platform Commands

Authentication

muapi auth configure
muapi auth configure --api-key "YOUR_MUAPI_KEY"

Model Discovery

List available models by category:

# List all models
muapi models list

# List video models only
muapi models list --category video

# JSON output for scripting
muapi models list --category image --output-json

Async Result Polling

For async operations, capture the request_id and poll for results:

# Capture request ID
REQUEST_ID=$(muapi video generate "a dog running" \
  --model kling-v3.0-pro --no-wait \
  --output-json --jq '.request_id' | tr -d '"')

# Poll until complete with auto-download
muapi predict wait "$REQUEST_ID" --download ./outputs

# Check once without polling
bash core/platform/check-result.sh --id "your-request-id" --once
# Check result script usage
bash core/platform/check-result.sh --id "your-request-id"

Source: README.md, core/platform/check-result.sh

Media Generation Commands

Image Generation

Generate images from text prompts:

# Basic generation
muapi image generate "a cyberpunk city at night"

# Specify model
muapi image generate "a sunset over mountains" --model flux-schnell

# Auto-download to directory
muapi image generate "product on white bg" --model flux-schnell --download ./outputs

# Extract URL for agent pipelines
muapi image generate "landscape" --model flux-dev --output-json --jq '.outputs[0]'

Available Models (14 text-to-image models):

  • flux-dev, flux-schnell, flux-kontext-pro (Flux family)
  • midjourney-v7, midjourney-v6.1 (Midjourney)
  • hidream-fast, hidream-pixel (HiDream)
  • gpt-image-1, gpt-4o (OpenAI)
  • veo3, veo2 (Google)
  • imagen4, imagen3

Video Generation

Generate videos from text or images:

# Text-to-video
muapi video generate "a dog running on a beach" --model kling-v3.0-pro

# Image-to-video
muapi video from-image "path/to/image.jpg" --model seedance-2 --subject "camera pans left"

# With duration
muapi video generate "ocean waves" --model kling-master --duration 10

Available Models:

  • Text-to-video: 13 models including kling-v3.0-pro, kling-master, seedance-2, veo3
  • Image-to-video: 16 models

Audio Generation

# Music generation (Suno)
muapi audio create "upbeat electronic dance track" --duration 30

# Sound effects (MMAudio)
muapi audio from-text "thunder rumbling in distance"

Source: README.md

Media Editing Commands

Image Editing

The edit-image.sh script provides prompt-based image editing:

bash core/edit/edit-image.sh \
  --image-url "https://example.com/image.jpg" \
  --prompt "add sunglasses" \
  --model flux-kontext-pro

Supported Models:

ModelUse Case
flux-kontext-proFlux Kontext editing
gpt-4oOpenAI vision editing
midjourney-v7Midjourney style editing
qwen-vl-maxQwen vision editing

Source: core/edit/edit-image.sh

Image Enhancement

One-click enhancement operations via enhance-image.sh:

# AI upscaling
bash core/edit/enhance-image.sh --op upscale --image-url "https://..."

# Background removal
bash core/edit/enhance-image.sh --op background-remove --image-url "https://..."

# Face swap
bash core/edit/enhance-image.sh --op face-swap --image-url "..." --face-url "..."

# Colorize
bash core/edit/enhance-image.sh --op colorize --image-url "..."

# Ghibli style transfer
bash core/edit/enhance-image.sh --op ghibli --image-url "..."

# Product shot
bash core/edit/enhance-image.sh --op product-shot --image-url "..."

Source: core/edit/enhance-image.sh

Lip Sync

Synchronize video lip movements to audio:

bash core/edit/lipsync.sh \
  --video-url "https://..." \
  --audio-url "https://..." \
  --model sync

Supported Models: sync (Sync Labs), latent-sync, creatify, veed

Source: core/edit/lipsync.sh

Video Effects

Apply effects to videos and images:

# Dance effect (image + audio → animated video)
bash core/edit/video-effects.sh \
  --op dance \
  --image-url "https://..." \
  --audio-url "https://..."

# Face swap
bash core/edit/video-effects.sh --op face-swap --video-url "..." --face-url "..."

# Dress change
bash core/edit/video-effects.sh --op dress-change --video-url "..." --dress-url "..."

# Luma reframing
bash core/edit/video-effects.sh --op reframe --video-url "..."

Source: core/edit/video-effects.sh

Common Flags

All core scripts support these standard flags:

FlagDescription
--asyncSubmit request without waiting for completion
--jsonOutput raw JSON response
--timeout NSet request timeout in seconds
--download <path>Auto-download results to specified directory
--viewDownload and open result in system viewer
--output-json --jq '<expr>'Extract specific field using jq
--helpShow usage information

Source: README.md, core/edit/SKILL.md

Agentic Pipeline Examples

Async Workflow

graph TD
    A[Submit Async Request] --> B[Capture request_id]
    B --> C[Do Other Work]
    C --> D[Poll for Result]
    D --> E{Complete?}
    E -->|No| D
    E -->|Yes| F[Download Output]
# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait \
  --output-json --jq '.request_id' | tr -d '"')

# ... do other work ...

muapi predict wait "$REQUEST_ID" --download ./outputs

File Upload Pipeline

# Upload local file → edit → download
URL=$(muapi upload file ./photo.jpg \
  --output-json --jq '.url' | tr -d '"')

muapi image edit "make it look like a painting" \
  --image "$URL" --model flux-kontext-pro --download ./outputs

Command Chaining

# Pipe prompt from another command
generate_prompt | muapi image generate - --model flux-dev

# Chain multiple operations
muapi upload file ./source.jpg | \
  muapi enhance image --op upscale | \
  muapi predict wait - --download ./final

Source: README.md

Expert Library Scripts

The /library directory contains specialized scripts for domain-specific workflows:

Cinema Director

Generate cinematic video with professional direction:

cd library/motion/cinema-director

# Create 10-second epic reveal
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --view

# Animate reference image into video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode i2v \
  --file ./concept.jpg \
  --subject "camera slowly pulls back" \
  --intent "reveal" \
  --view

# Extend existing video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode extend \
  --request-id "YOUR_REQUEST_ID" \
  --subject "camera continues pulling back" \
  --duration 10

Nano-Banana

Reasoning-driven image generation:

bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

Skill Installation for Agents

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

Source: README.md

MCP Server Mode

Run muapi as a Model Context Protocol server for direct tool access:

muapi mcp serve

Claude Desktop Configuration (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Exposed MCP Tools:

ToolDescription
muapi_image_generateText-to-image (14 models)
muapi_image_editImage-to-image editing (11 models)
muapi_video_generateText-to-video (13 models)
muapi_video_from_imageImage-to-video (16 models)
muapi_audio_createMusic generation (Suno)
muapi_audio_from_textSound effects (MMAudio)
muapi_enhance_upscaleAI upscaling
muapi_enhance_bg_removeBackground removal
muapi_enhance_face_swapFace swap image/video
muapi_enhance_ghibliGhibli style transfer
muapi_edit_lipsyncLip sync to audio
muapi_edit_clippingAI highlight extraction
muapi_predict_resultPoll prediction status
muapi_upload_fileUpload local file → URL
muapi_keys_listList API keys
muapi_keys_createCreate API key
muapi_keys_deleteDelete API key
muapi_account_balanceGet credit balance
muapi_account_topupAdd credits (Stripe checkout)

Source: README.md

Requirements

All core scripts require:

DependencyPurpose
MUAPI_KEY env varSet via core/platform/setup.sh
curlHTTP requests
jqJSON parsing
python3Helper scripts

Check requirements:

# Verify environment
muapi auth configure --test

# Show current config
muapi auth configure --show-config

Source: core/platform/SKILL.md, core/edit/SKILL.md

Troubleshooting

Common Issues

IssueSolution
ReferenceError: response is not definedEnsure API key is configured via muapi auth configure
Timeout errorsUse --timeout N flag to increase timeout
Model download stalls at 100%Verify model file integrity; re-download if corrupted
500 Internal Server ErrorServer overloaded; retry with exponential backoff
npm run dev hangsUse PowerShell or WSL; ensure Node.js 18+ installed

Verification Commands

# Test API connectivity
bash core/platform/setup.sh --test

# List configured models
muapi models list

# Check account balance
muapi account balance

Server Dependencies (Ubuntu)

If running server components:

apt install python3-dev make g++
pip install wheel
pip install -r requirements.txt
curl -sL https://deb.nodesource.com/setup_18.x | bash -

Source: core/platform/SKILL.md, README.md

See Also

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Expert Skills Library

Related topics: Recipe Pack, Workflow Scripts, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Motion / Video Skills

Continue reading this section for the full explanation and source context.

Section Visual / Images & Design Skills

Continue reading this section for the full explanation and source context.

Section Social Skills

Continue reading this section for the full explanation and source context.

Related topics: Recipe Pack, Workflow Scripts, Architecture Overview

Expert Skills Library

The Expert Skills Library is the high-value knowledge layer of the Generative-Media-Skills repository. It provides domain-specific skills that translate creative intent into technical directives for AI agents, enabling professional-grade image, video, and audio generation without requiring users to understand the underlying API complexity.

Overview

The repository uses a Core/Library split architecture:

LayerPurposeLocation
Core PrimitivesThin wrappers around muapi-cli for raw API access/core/
Expert LibraryDomain-specific skills with professional knowledge baked in/library/
Recipe PackLLM-orchestrated workflow recipes combining multiple skills/library/*/

Source: README.md

Architecture Diagram

graph TD
    subgraph "Expert Skills Library"
        A["🎬 Motion / Video<br/>(16 skills)"] --> D["Cinema Director"]
        A --> E["Seedance 2"]
        A --> F["AI Clipping"]
        
        B["🎨 Visual / Design<br/>(21 skills)"] --> G["Nano-Banana"]
        B --> H["UI Designer"]
        B --> I["Logo Creator"]
        
        C["📱 Social<br/>(5 skills)"] --> J["YouTube Shorts"]
        C --> K["UGC Ads Workflow"]
    end
    
    L["muapi-cli"] --> M["19 Structured Tools"]
    D --> L
    G --> L
    J --> L
    
    M --> N["Claude Code / Cursor / MCP"]

Skill Categories

Motion / Video Skills

The motion library contains 16 skills for video generation and animation.

SkillDescriptionKey Capability
Cinema DirectorTechnical film direction & cinematographyDirects Seedance 2.0 with camera movements, lighting, and timing
Seedance 2 (Doubao Video)Director-level cinematic video generationText-to-video, image-to-video, video extension with audio-video sync
AI Fight Scene GeneratorHigh-cut-density action sequences16-cell storyboard image drives Seedance 2.0 i2v
3D Logo AnimationPremium 3D logo animationTransforms 2D logos with cinematic effects
Animal Vlogger VideoAnthropomorphic animal contentUltra-realistic characters in real-world settings
Cartoon Dance AnimationPhoto to Pixar-style 3D animationReference dance/motion video driving
Drone-Style VideoAerial drone-perspective footageBird's-eye sweeps, orbit shots, flyovers
Giant Product ShowcaseDramatic giant-scale visualsBuilding-sized objects next to people
Jewelry Product VideoLuxury jewelry cinematographyMacro animation and commercial quality
Music VideoShort music video generationKeyframes per beat, music track matching
One-Shot VideoSingle continuous cinematic shotNo cuts, seamless flowing scene
Product Ad Cinematic5-10s product advertisementFrom product photo + brand brief
Product Showcase VideoDynamic product animationExplosive ingredient arrangement
Talking Baby VideoViral-style talking babyCustom costumes and scripts
UGC Lifestyle Try-OnLifestyle content generationAuthentic social-native photos & video
UGC Video Factory10s vertical UGC video adNano-Banana Pro Edit → Seedance 2.0 VIP i2v

Visual / Images & Design Skills

The visual library contains 21 skills for image generation and design.

SkillDescriptionOutput
Nano-BananaReasoning-driven image generationGemini 3 Style reasoning for high-quality outputs
UI DesignerHigh-fidelity mobile/web mockupsAtomic Design principles, component-based
Logo CreatorMinimalist vector brandingGeometric Primitives, accurate brand-name text
Action Figure GeneratorPhoto → custom 3D action figureCollectible toy packaging
Ad Creative SetHigh-converting ad assetsHero image, copy variations, platform crops
Amazon Product Listing PackFull Amazon listing imagesHero, lifestyle, infographic, comparisons
Blog HeaderProfessional blog header1200×628 with title composition guidance
Brand KitCohesive brand visual kitLogo concept, color palette, typography
Brochure DesignerMulti-page brochureCover, inner spread, back
Brand Design GuideComprehensive design systemPalette, typography, UI components
Couple Grid CreatorStylized couple grid6-box romantic poses in packaging
Fashion Try-OnVirtual outfit try-onPerson photo + clothing combination
Floor Plan Rendering2D → 3D architecturalRealistic 3D room visualization
Interior DesignPro interior visualizationsRedesign rooms, furniture styles
Interior Design VisualizerRoom furniture generationFill empty rooms or redesign existing
Keyboard Art MakerKeycap artTop-down artistic keyboard arrangements
Logo + Branding PackageComplete brandingVariations, palette, mockups
Multi-Angle ReshootMultiple camera anglesFish-eye, bird's-eye, low, macro shots
Multi-Angle ShotsFull product shot setFront, side, back, top-down, 45°
Storyboard GeneratorN keyframes for scenesStory sequence visualization
URL to DesignWebsite → redesigned UIAnalyze URL and generate improved design
YouTube ThumbnailHigh-CTR thumbnailsBold text, emotional faces, striking imagery

Social Skills

SkillDescriptionPlatforms
Instagram PostOn-brand Instagram contentInstagram
Product Campaign PackMulti-channel campaignMeta, Google, LinkedIn, TikTok
RedNote CoverXiaohongshu covers小红书
Social Media PackPlatform cropsInstagram, TikTok, Shorts, X
UGC Ads WorkflowVideo ad pipelineSocial-native UGC style
YouTube ShortsPlatform-aware short clipsShorts, TikTok, Reels, Feed

Edit Skills

The edit library provides post-processing capabilities.

Source: core/edit/SKILL.md

ScriptOperationDescription
edit-image.shPrompt-based editingFlux Kontext, GPT-4o, Midjourney, Qwen
enhance-image.shOne-click operationsUpscale, background removal, face swap, colorize, Ghibli style, product shots
lipsync.shLip syncSync Labs, LatentSync, Creatify, Veed
video-effects.shVideo effectsWan AI, face swap, dance, dress change, Luma

Core Expert Skills

Cinema Director

Technical film direction that translates creative intent into Seedance 2.0 directives.

Location: /library/motion/cinema-director/

Capabilities:

  • Camera movement planning
  • Lighting direction
  • Timing and pacing
  • Scene composition

Nano-Banana

Reasoning-driven image generation using chain-of-thought prompting.

Location: /library/visual/nano-banana/

Purpose: Apply "Gemini 3 Style" reasoning to generate high-quality images through explicit problem-solving steps.

UI Designer

High-fidelity mobile and web mockup generation using Atomic Design principles.

Location: /library/visual/ui-design/

Features:

  • Component-based design
  • Responsive layouts
  • Design system adherence

Logo Creator

Minimalist vector branding generation using geometric primitives.

Location: /library/visual/logo-creator/

Output: Accurate brand-name text rendering with clean vector aesthetic.

Seedance 2 (Doubao Video)

Director-level cinematic video generation supporting multiple modes.

Location: /library/motion/seedance-2/

ModeDescription
t2vText-to-video generation
i2vImage-to-video animation
extendVideo extension

Usage Example:

# Text-to-video
bash scripts/generate-seedance.sh --mode t2v --subject "a cybernetic dragon" --intent "epic" --duration 10 --view

# Image-to-video
bash scripts/generate-seedance.sh --mode i2v --file ./concept.jpg --subject "camera pulls back" --intent "reveal" --view

# Extend existing video
bash scripts/generate-seedance.sh --mode extend --request-id "YOUR_ID" --subject "camera continues" --duration 10

AI Clipping

Server-side long video processing for short clip extraction.

Location: /library/edit/ai-clipping/

Features:

  • Server-side transcription (no local Whisper)
  • Virality ranking
  • Deduplication
  • Face-tracked auto-crop

YouTube Shorts

Platform-aware preset over AI Clipping with optimized defaults.

Location: /library/social/youtube-shorts/

Platform Defaults:

PlatformAspect RatioDuration
Shorts9:1660s max
TikTok9:1660s max
Reels9:1690s max
Feed16:9 or 1:1Variable

Platform Utilities

The /core/platform/ directory provides essential utilities for skill execution.

Source: core/platform/SKILL.md

ScriptDescription
setup.shConfigure API key, show config, test key validity
check-result.shPoll for async generation results

Quick Start:

# Save API key
bash setup.sh --add-key "YOUR_MUAPI_KEY"

# Test connectivity
bash setup.sh --test

# Poll for result
bash check-result.sh --id "your-request-id"

Recipe Pack

41 LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines.

Characteristics:

  • Each skill is a SKILL.md file the agent reads and follows
  • Designed for consuming agents (Claude Code, Cursor, MCP)
  • Recipes, not bash wrappers
  • Bring your own executing agent

Integration with AI Agents

The Expert Skills Library is designed for seamless integration with AI development environments.

Supported Platforms

PlatformIntegration Method
Claude CodeDirect terminal execution via tools + MCP server mode
CursorMCP server mode
Gemini CLILocal scripts
WindsurfLocal scripts

MCP Server Mode

muapi mcp serve

This exposes 19 structured tools with full JSON Schema input/output definitions to Claude Desktop, Cursor, or any MCP-compatible agent.

Source: README.md

Requirements

All expert skills require:

RequirementDescription
muapi-cliCore CLI tool for API access
MUAPI_KEYAPI key configured via core/platform/setup.sh
Standard toolscurl, jq, python3 (varies by skill)

Common Workflow Patterns

Async Generation with Polling

sequenceDiagram
    participant Agent
    participant muapi as muapi-cli
    participant API as muapi.ai API
    
    Agent->>muapi: Submit async request (--no-wait)
    muapi->>API: POST request
    API-->>muapi: request_id
    muapi-->>Agent: Return request_id
    
    loop Poll until complete
        Agent->>muapi: check-result --id request_id
        muapi->>API: GET status
        API-->>muapi: status update
        muapi-->>Agent: Progress/Ready
    end
    
    Agent->>muapi: Download result (--download)
    muapi->>API: GET download
    API-->>muapi: Media file
    muapi-->>Agent: Saved output

Agentic Pipeline Example

# 1. Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# 2. Do other work...

# 3. Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Community Considerations

Based on community feedback, several areas are frequently discussed:

TopicStatusNotes
Publishing destinationsFeature requestUsers request post-generation publishing (e.g., Vynly integration)
GPU accelerationPlannedLocal model acceleration discussed for future support
Multilingual contentSupportedUpload documents and query in any language
Server errorsResolvedInitial 500 errors addressed in later versions

Source: GitHub Issues #7, #89, #24

Quick Reference

Skill CategoryCountPrimary Use Case
Motion / Video16Cinematic video, animation, product showcases
Visual / Design21Branding, UI, product imagery, marketing
Social5Platform-specific content, UGC ads
Edit4Post-processing, enhancement, effects

Total: 46+ expert skills organized for agentic execution.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Recipe Pack

Related topics: Expert Skills Library, Workflow Scripts

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Recipe Structure

Continue reading this section for the full explanation and source context.

Section Motion / Video (16 Recipes)

Continue reading this section for the full explanation and source context.

Section Social (5 Recipes)

Continue reading this section for the full explanation and source context.

Related topics: Expert Skills Library, Workflow Scripts

Recipe Pack

The Recipe Pack is a curated collection of 41 LLM-orchestrated workflow recipes that translate creative intent into executable muapi-cli pipelines. Each recipe is a self-contained SKILL.md file containing structured instructions that AI agents can read and execute without additional configuration. Source: README.md

Overview

Recipe Pack workflows combine multiple muapi-cli calls into named end-to-end pipelines. Rather than requiring developers to manually chain image generation, video creation, and enhancement operations, recipes provide:

  • Pre-defined creative logic — domain expertise baked into executable steps
  • Multi-step pipelines — complex outputs from simple inputs (e.g., "photo of person → 3D action figure")
  • Agent-native format — SKILL.md files that LLMs can parse and follow directly
  • Professional quality — cinematographic, branding, and design best practices embedded Source: README.md

Architecture

Recipes follow a layered architecture that separates creative intent from technical execution:

graph TD
    A[User Input / Agent Prompt] --> B[SKILL.md Recipe]
    B --> C[muapi-cli Calls]
    C --> D[muapi.ai API]
    D --> E[Generated Media]
    
    F[Core Primitives] --> C
    G[Expert Library] --> B

Recipe Structure

Each recipe declares its inputs and a Steps body. The executing agent reads the SKILL.md and translates instructions into muapi CLI calls. Source: README.md

LayerLocationPurpose
Core Primitives/core/Thin wrappers around muapi-cli for raw API access (media, edit, platform)
Expert Library/library/High-value skills translating creative intent to technical directives
Recipe Pack/library/*/41 named pipelines combining multiple primitives

Recipe Categories

The Recipe Pack is organized into three primary categories:

Motion / Video (16 Recipes)

High-production video workflows including cinematography, animation, and UGC content.

SkillDescription
3D Logo AnimationTransform a 2D logo into a premium 3D version with cinematic effects
AI Fight Scene GeneratorHigh-cut-density action sequence — 16-cell storyboard drives Seedance 2.0 i2v
Animal Vlogger VideoAnthropomorphic animal vlogger in real-world settings
Cartoon Dance AnimationPhoto → Pixar-style 3D cartoon with dance animation
Character Story VideoMulti-part animated story with consistent character
Drone-Style VideoAerial footage — bird's-eye sweeps, orbit shots, flyovers
Giant Product ShowcaseBuilding-sized product visual with optional animation
Jewelry Product VideoLuxury jewelry ad with macro animation
Music VideoShort music video from song theme — keyframes per beat
One-Shot VideoSingle continuous cinematic shot
Cinematic Product Ad5–10s product ad from photo + brand brief
Product Showcase VideoDynamic product showcase with motion animation
Product Video Ad MakerCinematic video ad from product photo
Talking Baby VideoViral-style talking baby with costumes and scripts
UGC Lifestyle Try-OnLifestyle photos & video of person using product
UGC Video FactoryPerson + product + script → 10s vertical UGC video ad

Source: README.md

Social (5 Recipes)

Platform-optimized social media content and multi-channel campaigns.

SkillDescription
Instagram PostHero image + caption + hashtags
Product Campaign PackMulti-channel campaign — hero visuals, social assets, video, crops
RedNote CoverXiaohongshu (小红书) cover — lifestyle aesthetic with typography
Social Media PackHero image → Instagram / TikTok / Shorts / X aspect ratios
UGC Ads WorkflowSelfie + product image + script → animated ad

Source: README.md

Visual / Images & Design (21 Recipes)

Professional image generation, branding, and design assets.

SkillDescription
Action Figure GeneratorPhoto → custom 3D action figure with collectible packaging
Ad Creative SetHero image + copy variations + platform crops
Amazon Product Listing PackHero, lifestyle, infographic, comparison images
Blog Header1200×628 blog header with title composition
Brand KitLogo concept + color palette + typography pairings
Brochure DesignerMulti-page brochure — cover, inner spread, back
Couple Grid Creator6-box stylized grid in cardboard packaging frames
Brand Design GuidePalette, typography, UI components, visual identity
Fashion Try-OnPerson + clothing → fashion model video
Floor Plan Rendering2D floor plan → realistic 3D architectural rendering
Interior DesignPro interior design visualizations
Interior Design VisualizerEmpty room → filled with furniture / redesign existing room
Keyboard Art MakerKeycaps spelling custom messages
Logo + Branding PackageLogo variations (dark/light/icon) + palette + mockups
Logo GeneratorQuick single-shot polished logo
Multi-Angle ReshootSubject from fish-eye, bird's-eye, low, macro angles
Multi-Angle ShotsFull product shot set — front, side, back, top-down, 45°
Selfie with CelebritiesRealistic selfie with celebrity; optional cinematic
Storyboard GeneratorN keyframes for story or scene sequence
URL to DesignWebsite → redesigned UI with modern aesthetics
YouTube ThumbnailHigh-CTR thumbnail — bold text, emotional imagery

Source: README.md

Execution Model

Recipes are designed for agentic execution. The consuming agent (Claude Code, Cursor, MCP, etc.) reads the SKILL.md file and executes the steps via muapi CLI calls. Source: README.md

Typical Recipe Flow

graph LR
    A[Input Media<br/>or Description] --> B[Parse SKILL.md]
    B --> C[Step 1: Generate<br/>Base Asset]
    C --> D[Step 2: Enhance<br/>or Transform]
    D --> E[Step 3: Apply<br/>Effects/Animation]
    E --> F[Output:<br/>Final Media]

Key Recipe Patterns

Pattern 1: Image-to-Video Pipeline

# 1. Generate or upload source image
muapi image generate "product photo on white" --model flux-schnell

# 2. Animate into video
muapi video from-image \
  --image "SOURCE_IMAGE_URL" \
  --subject "camera slowly orbits the product" \
  --model seedance-2.0-vip

Pattern 2: Multi-Asset Composite

# 1. Generate selfie
muapi image generate "professional selfie" --model flux-dev

# 2. Generate product
muapi image generate "product photo" --model flux-schnell

# 3. Combine in video
muapi video from-image \
  --image "COMPOSITE_URL" \
  --model kling-v3.0-pro

Pattern 3: Async Pipeline with Polling

# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs

Example Recipes

3D Logo Animation

Transforms a 2D logo into an animated 3D version with cinematic effects.

Location: library/motion/3d-logo-animation/SKILL.md

Workflow:

  1. Accept 2D logo input (URL or local file)
  2. Generate 3D version using image-to-3D model
  3. Apply animation choreography (rotation, light sweep, particle effects)
  4. Output final video asset

Models Used:

  • Image generation: Flux variants, Midjourney
  • 3D conversion: Dedicated 3D models
  • Video: Seedance 2.0, Kling 3.0

Cinematic Product Ad

Creates a 5–10 second product advertisement from a product photo and brand brief.

Location: library/motion/product-ad-cinematic/SKILL.md

Workflow:

  1. Accept product photo and brand brief (tone, colors, messaging)
  2. Generate lifestyle background scene
  3. Composite product into scene
  4. Animate with cinematic camera movement
  5. Apply color grading matching brand identity

Output: Professional-grade product commercial

Action Figure Generator

Converts a photo of a person into a custom 3D action figure with collectible toy packaging.

Location: library/visual/action-figure-generator/SKILL.md

Workflow:

  1. Accept subject photo
  2. Generate 3D-rendered action figure likeness
  3. Create collectible packaging (blister card, header card)
  4. Apply toy-grade styling (plastic sheen, stylized proportions)

Use Cases:

  • Personalized gifts
  • Marketing materials
  • Fan merchandise concepts

YouTube Shorts Generator

Converts long-form video content into platform-optimized short clips.

Location: library/social/youtube-shorts/SKILL.md

Workflow:

  1. Upload or reference source video
  2. AI identifies best highlights (using transcription + virality ranking)
  3. Extract vertical clips (9:16 aspect ratio)
  4. Auto-crop to face-tracked subjects
  5. Apply platform-specific formatting (TikTok, Reels, Shorts)

Features:

  • Server-side transcription (no local Whisper required)
  • Deduplication of similar clips
  • Face-tracked auto-crop

Integration with AI Agents

Installing to Claude Code

npx skills add SamurAIGPT/Generative-Media-Skills --all

Installing to Specific Agents

npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

MCP Server Mode

Recipes can also be executed via the Model Context Protocol server:

muapi mcp serve

This exposes 19 structured tools directly to MCP-compatible agents. Source: README.md

Claude Desktop Configuration

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Running Recipes Manually

Each recipe includes executable shell scripts for direct invocation:

# Generate a cinematic film
cd library/motion/cinema-director
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --view

# Use Nano-Banana reasoning for image generation
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

Extending the Recipe Pack

Recipes follow a consistent structure that makes them easy to extend:

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Workflow Scripts

Related topics: Expert Skills Library, Recipe Pack, CLI Commands Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section list-workflows.sh

Continue reading this section for the full explanation and source context.

Section discover-workflow.sh

Continue reading this section for the full explanation and source context.

Related topics: Expert Skills Library, Recipe Pack, CLI Commands Reference

Workflow Scripts

Workflow Scripts provide the foundational infrastructure for executing multi-step generative media pipelines within the Generative-Media-Skills framework. These scripts orchestrate complex operations by chaining together muapi-cli commands, enabling AI agents to execute sophisticated end-to-end media generation workflows with minimal configuration.

Overview

The workflow system serves as a bridge between high-level creative intent and low-level API operations. Rather than requiring agents to manually construct and sequence individual API calls, workflow scripts encapsulate entire pipelines as executable units that handle:

  • Input validation and parameter passing
  • State management across pipeline stages
  • Error handling and recovery mechanisms
  • Result aggregation from multiple generation steps

Source: library/workflow/SKILL.md

Architecture

The workflow subsystem follows a Core/Library architectural pattern consistent with the broader Generative-Media-Skills project:

graph TD
    A[User/Agent Request] --> B[Workflow Selection]
    B --> C{Interactive or Direct?}
    C -->|Interactive| D[interactive-run.sh]
    C -->|Direct| E[run-workflow.sh]
    D --> F[Parameter Collection]
    E --> G[Execute Pipeline]
    F --> G
    G --> H[muapi-cli Operations]
    H --> I[Media Generation]
    I --> J[Result Aggregation]
    J --> K[Output Delivery]
    
    L[discover-workflow.sh] -.->|Discovery| B
    M[list-workflows.sh] -.->|Catalog| N[Available Workflows]

Component Responsibilities

ComponentRoleLocation
SKILL.mdMetadata, usage documentation, and skill definition/library/workflow/
discover-workflow.shScans and identifies available workflow definitions/library/workflow/scripts/
run-workflow.shExecutes workflows with provided parameters/library/workflow/scripts/
generate-workflow.shCreates new workflow definitions or generates workflow output/library/workflow/scripts/
interactive-run.shGuides users through workflow execution via prompts/library/workflow/scripts/
list-workflows.shDisplays catalog of available workflows/library/workflow/scripts/

Source: library/workflow/scripts/run-workflow.sh, library/workflow/scripts/interactive-run.sh

Core Scripts Reference

list-workflows.sh

Lists all available workflow definitions in the system. This script scans the workflow directory and presents workflows in a structured format suitable for both human review and agent consumption.

bash list-workflows.sh [--format json|text]

Parameters:

ParameterTypeDescription
--formatstringOutput format: json for machine parsing, text for human-readable (default: text)

Source: library/workflow/scripts/list-workflows.sh

discover-workflow.sh

Performs discovery and validation of workflow definitions. This script identifies all workflow files, parses their metadata, and verifies structural integrity before execution.

bash discover-workflow.sh [--path <directory>] [--validate]

Parameters:

ParameterTypeDescription
--pathstringDirectory path to scan for workflows (default: current workflow library)
--validateflagPerform structural validation of discovered workflows

Discovery Output Structure:

{
  "workflows": [
    {
      "id": "workflow-identifier",
      "name": "Human Readable Name",
      "description": "Workflow purpose and capabilities",
      "inputs": ["required", "parameters"],
      "outputs": ["expected", "results"],
      "steps": ["sequential", "operations"]
    }
  ]
}

Source: library/workflow/scripts/discover-workflow.sh

run-workflow.sh

Executes a specified workflow with provided or default parameters. This is the primary execution engine that coordinates the actual muapi-cli calls.

bash run-workflow.sh \
  --workflow <workflow-id> \
  --input <input-path-or-url> \
  --output <output-directory> \
  [--param-key value...]

Parameters:

ParameterTypeRequiredDescription
--workflowstringYesUnique identifier of workflow to execute
--inputstringYesPrimary input (file path, URL, or prompt)
--outputstringNoOutput directory (default: ./outputs)
--param-*mixedNoAdditional workflow-specific parameters

Exit Codes:

CodeMeaning
0Workflow completed successfully
1Invalid workflow ID
2Input validation failed
3API call failed
4Output generation failed

Source: library/workflow/scripts/run-workflow.sh

generate-workflow.sh

Generates workflow definitions or produces workflow-based outputs. This script supports both workflow creation (for defining new pipelines) and output generation (for producing artifacts).

bash generate-workflow.sh \
  --template <template-id> \
  --spec <specification-file> \
  --output <output-path>

Parameters:

ParameterTypeDescription
--templatestringTemplate identifier to base new workflow on
--specstringYAML/JSON specification file defining workflow structure
--outputstringDestination for generated workflow definition or output

Source: library/workflow/scripts/generate-workflow.sh

interactive-run.sh

Provides an interactive, question-driven interface for workflow execution. Users are prompted for required inputs, and the script validates each parameter before proceeding.

bash interactive-run.sh [--workflow <workflow-id>]

Interactive Flow:

graph LR
    A[Start] --> B{Workflow ID Provided?}
    B -->|No| C[List Available Workflows]
    C --> D[Select Workflow]
    B -->|Yes| E[Load Workflow Definition]
    D --> E
    E --> F[Prompt: Input 1]
    F --> G[Validate Input 1]
    G -->|Valid| H[Prompt: Input 2]
    G -->|Invalid| F
    H --> I[... Continue N times]
    I --> J[Execute Workflow]
    J --> K[Display Results]
    K --> L[End]

Supported Prompts:

Prompt TypeValidationDescription
textregex patternFree-form text input
urlURL formatWeb resource URLs
filepath existsLocal file paths
selectenum valuesEnumerated choice
confirmbooleanYes/No confirmation

Source: library/workflow/scripts/interactive-run.sh

Workflow Execution Pipeline

When executing a workflow, the system follows a consistent pipeline pattern:

graph TD
    subgraph "Stage 1: Initialization"
        A1[Parse Workflow Definition] --> A2[Resolve Input Parameters]
        A2 --> A3[Initialize Output Directory]
    end
    
    subgraph "Stage 2: Execution"
        A3 --> B1[Execute Step 1]
        B1 --> B2{Step 1 Success?}
        B2 -->|Yes| B3[Execute Step 2]
        B2 -->|No| B4[Log Error]
        B4 --> B5[Rollback if Needed]
        B3 --> B6{Step 2 Success?}
        B6 -->|Yes| B7[Execute Step N]
        B6 -->|No| B4
    end
    
    subgraph "Stage 3: Aggregation"
        B7 --> C1[Collect Step Outputs]
        C1 --> C2[Merge Results]
        C2 --> C3[Generate Metadata]
        C3 --> C4[Write Final Output]
    end

Step Execution Model

Each workflow step follows this execution model:

# Pseudo-code for step execution
for step in workflow.steps:
    result = muapi-cli <operation> <step.params>
    if result.success:
        cache(step.id, result)
    else:
        handle_error(step, result)

Source: library/workflow/SKILL.md, library/workflow/scripts/run-workflow.sh

Integration with Recipe Pack

The workflow scripts serve as the execution backbone for the Recipe Pack — a collection of 41 pre-built LLM-orchestrated workflow recipes. Each recipe in the library maps to one or more workflow script invocations.

Recipe Categories

CategoryCountExample Workflows
Motion/Video163D Logo Animation, AI Fight Scene Generator, Drone-Style Video
Visual/Images21Action Figure Generator, Brand Kit, Interior Design
Social5Instagram Post, Product Campaign Pack, UGC Ads Workflow
Edit1AI Clipping
Motion Specialized4Cinema Director, Seedance 2.0, YouTube Shorts

Source: README.md - Recipe Pack Section

Recipe-to-Workflow Mapping

Complex recipes often combine multiple workflow scripts:

# Example: Action Figure Generator Recipe
# Step 1: Generate base image
muapi image generate "3D render of action figure" --model flux-dev

# Step 2: Enhance with workflow
bash run-workflow.sh --workflow enhance-3d \
  --input "./outputs/step1.png" \
  --output "./outputs"

# Step 3: Add packaging
bash run-workflow.sh --workflow product-packaging \
  --input "./outputs/step2.png" \
  --output "./outputs/final"

Source: library/motion/ai-fight-scene/SKILL.md, library/visual/action-figure-generator/SKILL.md

Common Workflow Patterns

Async Polling Pattern

For long-running operations, workflows implement async polling:

# Submit generation request
REQUEST_ID=$(muapi video generate "prompt" --model kling-v3.0-pro \
  --no-wait --output-json --jq '.request_id')

# Poll for completion via workflow
bash run-workflow.sh --workflow poll-result \
  --request-id "$REQUEST_ID" \
  --max-attempts 60 \
  --interval 10

Source: README.md - Agentic Pipeline Examples

File Upload Pattern

Workflows that process local files auto-upload to CDN:

# Workflow handles upload transparently
bash run-workflow.sh --workflow image-edit \
  --input "./local-photo.jpg" \
  --prompt "apply cinematic color grading"

# Equivalent raw operations:
URL=$(muapi upload file ./local-photo.jpg --output-json --jq '.url')
muapi image edit "apply cinematic color grading" --image "$URL" --model flux-kontext-pro

Source: library/workflow/SKILL.md

Chaining Pattern

Multiple workflows can be chained for complex pipelines:

# Chain: Generate → Edit → Upscale → Export
bash run-workflow.sh --workflow generate-portrait --input "professional headshot" --output ./step1
bash run-workflow.sh --workflow retouch-portrait --input ./step1/output.jpg --output ./step2
bash run-workflow.sh --workflow upscale-4k --input ./step2/output.jpg --output ./step3
bash run-workflow.sh --workflow export-social --input ./step3/output.jpg --output ./final

Source: library/workflow/scripts/generate-workflow.sh

Configuration

Environment Variables

VariableRequiredDefaultDescription
MUAPI_API_KEYYesmuapi.ai API key for authentication
MUAPI_OUTPUT_DIRNo./outputsDefault output directory
MUAPI_TIMEOUTNo300Default timeout in seconds
MUAPI_RETRY_COUNTNo3Number of retries on failure

Workflow Definition Schema

New workflows can be defined using YAML or JSON:

# workflow-definition.yaml
name: custom-workflow
version: "1.0"
description: Custom media generation pipeline

inputs:
  - id: source_image
    type: file
    required: true
  - id: style
    type: select
    options: [realistic, cartoon, anime]
    default: realistic

steps:
  - name: enhance
    operation: muapi_image_edit
    params:
      image: "{{ inputs.source_image }}"
      prompt: "Apply {{ inputs.style }} style"
      model: flux-kontext-pro
  
  - name: upscale
    operation: muapi_enhance_upscale
    params:
      image: "{{ steps.enhance.output }}"
      scale: 2

outputs:
  - id: final_image
    source: steps.upscale.output

Source: library/workflow/scripts/generate-workflow.sh

Error Handling

The workflow system implements tiered error handling:

Error LevelTriggerResponse
WarningNon-critical parameter mismatchLog and continue with defaults
RetryTransient API failureRetry up to MUAPI_RETRY_COUNT times
AbortValidation failure or unrecoverable errorStop workflow, log context, exit with code
RollbackStep failure with side effectsAttempt to undo previous operations

Source: library/workflow/scripts/run-workflow.sh, library/workflow/SKILL.md

Community Considerations

Based on community feedback, several workflow-related patterns have emerged:

Performance Optimization

Users have inquired about GPU acceleration and faster execution times. Workflows that call local model operations (where supported) can leverage GPU resources by specifying appropriate model variants:

# Specify GPU-capable model in workflow parameters
bash run-workflow.sh --workflow generate-image \
  --model ggml-vic13b-q5_1 \
  --input "complex prompt"

Note: Not all models support GPU acceleration. Refer to individual model documentation.

Source: issues/7, issues/16

Batch Processing

Community members have requested the ability to process multiple files from network shares. Workflows support batch mode via input directories:

# Process all images in directory
bash run-workflow.sh --workflow batch-enhance \
  --input ./batch-input/ \
  --output ./batch-output/

Source: issues/73

See Also

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Schema Reference

Related topics: CLI Commands Reference, MCP Server Setup, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Model Discovery

Continue reading this section for the full explanation and source context.

Section Validation Rules

Continue reading this section for the full explanation and source context.

Section Tool Schemas

Continue reading this section for the full explanation and source context.

Related topics: CLI Commands Reference, MCP Server Setup, Architecture Overview

Schema Reference

The Schema Reference system is the validation and discovery layer for Generative-Media-Skills. It provides a centralized configuration that core scripts use at runtime to ensure type safety, endpoint accuracy, and parameter validation across all media generation operations.

Overview

The system centers on schema_data.json, a structured configuration file that powers the entire muapi-cli ecosystem. This schema serves three primary functions at runtime:

FunctionDescription
Model ID ValidationEnsures requested models exist in the platform
Endpoint ResolutionAutomatically maps model names to API endpoints
Parameter CheckingValidates supported aspect_ratio, resolution, and duration values

Source: README.md:schema_reference

Architecture

graph TD
    A[CLI Command] --> B[Schema Data Validation]
    B --> C{Valid?}
    C -->|Yes| D[Resolve Endpoint]
    C -->|No| E[Error: Invalid Model/Parameter]
    D --> F[Execute API Call]
    F --> G[Return Result]
    
    H[Schema Data JSON] --> B
    H --> D

The schema acts as a contract between the CLI interface and the underlying muapi.ai platform, ensuring all requests are properly formatted before execution.

Core Schema Functions

Model Discovery

The CLI provides commands to discover all available models via the schema:

# List all models
muapi models list

# List models by category
muapi models list --category video --output-json

# Filter by specific capability
muapi models list --category image --output-json

Source: README.md:schema_reference and README.md:schema_commands

Validation Rules

The schema enforces validation across multiple dimensions:

Validation TypePurposeExample Values
model_idEnsures model existsflux-dev, kling-v3.0-pro, seedance-2.0
aspect_ratioImage/video dimensions1:1, 16:9, 9:16, 4:3
resolutionOutput quality1k, 2k, 4k, 1024x1024
durationVideo length in seconds5, 10, 15, 30

Source: schema_data.json:validation_rules

MCP Server Schema Integration

When running muapi as a Model Context Protocol server, all tools are exposed with full JSON Schema input/output definitions. The schema definitions enable:

  • Type Checking: Automatic validation of tool inputs
  • Auto-completion: IDEs can suggest valid parameters
  • Documentation: Rich descriptions for each tool parameter

Tool Schemas

The MCP server exposes 19 structured tools with typed schemas:

ToolCategorySchema Purpose
muapi_image_generateMediaText-to-image generation (14 models)
muapi_image_editMediaImage-to-image editing (11 models)
muapi_video_generateMediaText-to-video generation (13 models)
muapi_video_from_imageMediaImage-to-video conversion (16 models)
muapi_audio_createMediaMusic generation via Suno
muapi_enhance_upscaleEnhancementAI-powered image upscaling
muapi_enhance_bg_removeEnhancementBackground removal
muapi_edit_lipsyncEditingLip sync to audio

Source: README.md:mcp_server_tools

Runtime Integration

Core scripts integrate with the schema at multiple points:

graph LR
    A[setup.sh] --> B[Configure API Key]
    A --> C[Test Connectivity]
    
    D[check-result.sh] --> E[Poll for Results]
    D --> F[Async Status Check]
    
    G[edit-image.sh] --> H[Validate Image URL]
    G --> I[Apply Model Schema]
    
    J[enhance-image.sh] --> K[Validate Operation Type]
    J --> L[Apply Enhancement Schema]

Script Integration Points

Each core script validates inputs against the schema before execution:

Source: core/platform/SKILL.md:scripts Source: core/edit/SKILL.md:scripts

ScriptSchema Usage
core/platform/setup.shAPI key configuration and validation
core/platform/check-result.shRequest ID format validation
core/edit/edit-image.shModel selection and parameter validation
core/edit/enhance-image.shOperation type and parameter validation

Configuration

Environment Variables

The schema system relies on the following environment configuration:

# Set via setup.sh
MUAPI_API_KEY=your-api-key-here

# Config location
~/.muapi/config.json

Source: core/platform/SKILL.md:requirements

Schema File Location

The schema_data.json file is located at the repository root and is loaded by core scripts at runtime. The file structure follows this pattern:

{
  "models": { ... },
  "endpoints": { ... },
  "parameters": {
    "aspect_ratio": [...],
    "resolution": [...],
    "duration": [...]
  }
}

Common Validation Errors

Based on community issues, common schema-related errors include:

ErrorCauseResolution
Invalid model IDModel not in schemaRun muapi models list to see valid options
Unsupported parameterParameter value not in allowed listCheck schema for valid values
Endpoint resolution failureModel missing endpoint mappingVerify schema_data.json is up to date

Source: GitHub Issue #38

Best Practices

  1. Always validate before requesting: Use muapi models list to confirm model availability before generating
  2. Check parameter constraints: Verify aspect_ratio, resolution, and duration are supported
  3. Use JSON output for automation: --output-json flag provides schema-compliant output for piping
  4. Keep schema updated: Pull latest changes when new models are added to the platform

See Also

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Agent Integration Guide

Related topics: MCP Server Setup, Getting Started, CLI Commands Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Method 1: Install muapi-cli

Continue reading this section for the full explanation and source context.

Section Method 2: Install Skill Packages

Continue reading this section for the full explanation and source context.

Section Starting the MCP Server

Continue reading this section for the full explanation and source context.

Related topics: MCP Server Setup, Getting Started, CLI Commands Reference

Agent Integration Guide

This guide covers all methods for integrating Generative-Media-Skills with AI agents including Claude Code, Cursor, Gemini CLI, and other MCP-compatible agents.

Overview

Generative-Media-Skills provides a CLI-first architecture designed specifically for agentic workflows. Rather than relying on GUI interfaces or manual operations, agents interact with the system through structured CLI commands, MCP protocol, or skill packages that agents can read and execute.

The integration layer consists of three primary components:

ComponentPurposeBest For
muapi-cliCore CLI tool with structured JSON outputsDirect terminal execution, shell pipelines
MCP ServerModel Context Protocol serverClaude Desktop, Cursor, MCP-compatible agents
Skill PackagesPre-packaged workflows (SKILL.md + scripts)Claude Code, Cursor, automated ingestion

Source: README.md

Architecture Overview

graph TD
    A[AI Agent] --> B[muapi-cli]
    A --> C[MCP Server]
    A --> D[Skill Packages]
    
    B --> E[Structured JSON Output]
    B --> F[Semantic Exit Codes]
    B --> G[--jq Filtering]
    
    C --> H[19 MCP Tools]
    C --> I[JSON Schema Validation]
    
    D --> J[41 Workflow Recipes]
    D --> K[Expert Library Skills]
    D --> L[Core Primitives]
    
    E --> L
    G --> M[Agentic Pipelines]
    H --> M
    J --> M

Supported Agents

The repository officially supports integration with:

  • Claude Code — Direct terminal execution via tools + MCP server mode
  • Cursor — MCP server mode for native tool calling
  • Gemini CLI — Seamless integration as local scripts
  • Windsurf — MCP-compatible integration
  • Any MCP-compatible agent — Via the MCP server protocol

Source: README.md

Installation Methods

Method 1: Install muapi-cli

The core CLI tool is available via multiple package managers:

# via npm (recommended — no Python required)
npm install -g muapi-cli

# via pip
pip install muapi-cli

# or run without installing
npx muapi-cli --help

After installation, configure your API key:

# Interactive setup
muapi auth configure

# Or pass directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"

# Get your key at https://muapi.ai/dashboard

Source: README.md

Method 2: Install Skill Packages

Install pre-packaged skills directly to your AI agent:

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

Source: README.md

MCP Server Integration

The MCP server exposes all 19 generation tools directly to Claude Desktop, Cursor, or any MCP-compatible agent without requiring shell scripts.

Starting the MCP Server

muapi mcp serve

Claude Desktop Configuration

Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Available MCP Tools

The server exposes 19 structured tools with full JSON Schema input/output definitions:

ToolDescriptionCategory
muapi_image_generateText-to-image generationGeneration
muapi_image_editImage-to-image editingEditing
muapi_video_generateText-to-video generationGeneration
muapi_video_from_imageImage-to-video animationGeneration
muapi_audio_createMusic generation via SunoAudio
muapi_audio_from_textSound effects via MMAudioAudio
muapi_enhance_upscaleAI upscalingEnhancement
muapi_enhance_bg_removeBackground removalEnhancement
muapi_enhance_face_swapFace swap for image/videoEnhancement
muapi_enhance_ghibliGhibli style transferEnhancement
muapi_edit_lipsyncLip sync to audioEditing
muapi_edit_clippingAI highlight extractionEditing
muapi_predict_resultPoll prediction statusUtility
muapi_upload_fileUpload local file → URLUtility
muapi_keys_listList API keysAccount
muapi_keys_createCreate API keyAccount
muapi_keys_deleteDelete API keyAccount
muapi_account_balanceGet credit balanceAccount
muapi_account_topupAdd credits via StripeAccount

Source: README.md

CLI Usage for Agents

Basic Generation Commands

# Generate an image
muapi image generate "a cyberpunk city at night" --model flux-dev

# Download the result automatically
muapi image generate "a sunset over mountains" --model hidream-fast --download ./outputs

# Extract just the URL (agent-friendly)
muapi image generate "product on white bg" --model flux-schnell --output-json --jq '.outputs[0]'

Async Pipeline Workflow

For long-running operations, submit async requests and poll for results:

# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# ... do other work ...

muapi predict wait "$REQUEST_ID" --download ./outputs

Chaining Operations

# Pipe a prompt from another command
generate_prompt | muapi image generate - --model flux-dev

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Skill Package Structure

Each skill in the repository follows a consistent structure:

library/[category]/[skill-name]/
├── SKILL.md           # Description for agents to read
├── scripts/
│   ├── generate-[name].sh
│   └── [additional-scripts].sh
└── assets/            # Optional reference files

SKILL.md Format

Each skill includes metadata that agents parse:

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Troubleshooting

Related topics: CLI Commands Reference, Getting Started, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Related topics: CLI Commands Reference, Getting Started, Schema Reference

Troubleshooting

This page covers common issues, error conditions, and resolution steps for the Generative-Media-Skills repository. The troubleshooting content is organized by system component: API configuration, async generation workflows, media editing operations, and environment setup.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Identity risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 17 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

1. Security or permission risk: Security or permission risk requires verification

  • Severity: high
  • Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_e801ed325bcf4fbbb8e0d9cac02b5f7f | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/89

2. Identity risk: Identity risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a identity risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: identity.distribution | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

3. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_ec1cdc92d8c84b2bbce43cf37a668443 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/44

4. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_c7a0e07d61b547aba3280ac82ac305e2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/43

5. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_c6ace8e72f95491f945200849e083d96 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/46

6. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_de5cf59443c74838a8d10d1ecbec9457 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/45

7. Installation risk: Installation risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_8082f7c5e7be496c89fac789d932e74c | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/34

8. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.host_targets | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

9. Configuration risk: Configuration risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_14498a89412f40ceb03d61889ea96de2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/38

10. Capability evidence risk: Capability evidence risk requires verification

  • Severity: medium
  • Finding: README/documentation is current enough for a first validation pass.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: capability.assumptions | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

11. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_36c10c9b7ece4b10af4873b655817add | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/27

12. Runtime risk: Runtime risk requires verification

  • Severity: medium
  • Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
  • User impact: May increase setup, validation, or first-run risk for the user.
  • Recommended check: Reproduce the official install and quickstart path in an isolated environment.
  • Evidence: community_evidence:github | cevd_a8e460757f114a9d8e4357758c834524 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/54

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Generative-Media-Skills with real data or production workflows.

Source: Project Pack community evidence and pitfall evidence