Generative-Media-Skills Manual

Doramagic Project Pack · Human Manual

Generative-Media-Skills

Generative-Media-Skills provides AI agents with:

Getting Started

Related topics: Architecture Overview, CLI Commands Reference, Agent Integration Guide

Section Related Pages

Continue reading this section for the full explanation and source context.

Getting Started

Welcome to Generative-Media-Skills — a comprehensive multimodal toolset enabling AI agents (Claude Code, Cursor, Gemini CLI) to generate, edit, and display professional-grade images, videos, and audio content through the muapi-cli interface.

This guide walks you through installation, configuration, and your first generation to get up and running in minutes.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Architecture Overview

Related topics: Getting Started, Expert Skills Library, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Directory Structure

Continue reading this section for the full explanation and source context.

Section Platform Utilities

Continue reading this section for the full explanation and source context.

Section Media Editing Core

Continue reading this section for the full explanation and source context.

Architecture Overview

This repository implements a Core/Library split architecture designed for AI agents to generate, edit, and display professional-grade images, videos, and audio through the muapi.ai platform. The architecture prioritizes agent-native workflows with CLI-powered scripts, structured JSON outputs, and Model Context Protocol (MCP) integration.

High-Level Architecture

The Generative-Media-Skills repository acts as a skill layer that translates creative intent into technical directives, delegating actual API calls to the underlying muapi-cli tool. This separation allows the repository to focus on expert knowledge while leveraging a robust, maintained API client.

graph TD
    subgraph "AI Agents"
        A["Claude Code"]
        B["Cursor"]
        C["Gemini CLI"]
        D["MCP Clients"]
    end
    
    subgraph "Generative-Media-Skills"
        E["Expert Library /library"]
        F["Core Primitives /core"]
        G["Recipe Pack"]
    end
    
    subgraph "muapi-cli"
        H["CLI Interface"]
        I["API Client"]
    end
    
    subgraph "muapi.ai Platform"
        J["100+ AI Models"]
        K["Media Generation APIs"]
    end
    
    A --> E
    B --> E
    C --> E
    D --> H
    E --> F
    F --> H
    G --> H
    H --> I
    I --> J
    I --> K

Source: README.md

Core Primitives (`/core`)

The Core layer provides thin wrappers around muapi-cli for direct API access. These are low-level building blocks that handle raw platform operations.

Directory Structure

Directory	Purpose
`core/media/`	File upload operations
`core/edit/`	Image editing (prompt-based)
`core/platform/`	Setup, authentication, and result polling

Platform Utilities

Located in core/platform/, these scripts handle API configuration and async operation management:

Script	Description
`setup.sh`	Configure API key, show config, test key validity
`check-result.sh`	Poll for async generation results by request ID

Source: core/platform/SKILL.md

Media Editing Core

Located in core/edit/, these scripts provide enhancement operations:

Script	Description
`edit-image.sh`	Prompt-based image editing
`enhance-image.sh`	One-click operations: upscale, background removal, face swap
`lipsync.sh`	Sync video lip movement to audio
`video-effects.sh`	Video/image effects

Source: core/edit/SKILL.md

Expert Library (`/library`)

The Library layer contains high-value skills that implement domain-specific knowledge for professional results.

Skill Categories

graph LR
    A["Library"] --> B["Motion / Video"]
    A --> C["Social"]
    A --> D["Visual / Images"]
    A --> E["Edit"]
    
    B --> B1["Cinema Director"]
    B --> B2["Seedance 2"]
    B --> B3["AI Clipping"]
    
    C --> C1["YouTube Shorts"]
    C --> C2["UGC Ads"]
    
    D --> D1["Nano-Banana"]
    D --> D2["UI Designer"]
    D --> D3["Logo Creator"]
    
    E --> E1["AI Clipping"]

Key Expert Skills

Skill	Category	Description
Cinema Director	Motion	Technical film direction & cinematography
Nano-Banana	Visual	Reasoning-driven image generation (Gemini 3 Style)
UI Designer	Visual	High-fidelity mobile/web mockups (Atomic Design)
Logo Creator	Visual	Minimalist vector branding
Seedance 2	Motion	Director-level cinematic video generation
AI Clipping	Edit	Long video → ranked vertical short clips

Source: README.md

Recipe Pack

Forty-one LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines. Each skill is a SKILL.md file the agent reads and follows.

Recipe Categories

Category	Count	Description
Motion / Video	16	Film generation, animation, product showcases
Social	5	Instagram posts, UGC ads, social media packs
Visual / Design	21	Action figures, brand kits, logos, interior design

Example Recipes

Skill	Path	Description
3D Logo Animation	`library/motion/3d-logo-animation/`	Premium 3D logo animation
AI Fight Scene Generator	`library/motion/ai-fight-scene/`	16-cell storyboard → video choreography
Animal Vlogger Video	`library/motion/animal-video-generator/`	Anthropomorphic animal content
Action Figure Generator	`library/visual/action-figure-generator/`	Photo → 3D collectible
Amazon Product Listing	`library/visual/amazon-product-listing/`	Full Amazon listing image set

Source: README.md

MCP Server Architecture

The Model Context Protocol server exposes all tools directly to MCP-compatible agents.

Exposed Tools (19 Total)

Tool	Category	Models Supported
`muapi_image_generate`	Image	14 models
`muapi_image_edit`	Image	11 models
`muapi_video_generate`	Video	13 models
`muapi_video_from_image`	Video	16 models
`muapi_audio_create`	Audio	Suno (music)
`muapi_audio_from_text`	Audio	MMAudio (sound effects)
`muapi_enhance_upscale`	Enhancement	AI upscaling
`muapi_enhance_bg_remove`	Enhancement	Background removal
`muapi_enhance_face_swap`	Enhancement	Face swap (image/video)
`muapi_enhance_ghibli`	Enhancement	Ghibli style transfer
`muapi_edit_lipsync`	Editing	Lip sync to audio
`muapi_edit_clipping`	Editing	AI highlight extraction
`muapi_predict_result`	Utility	Poll prediction status
`muapi_upload_file`	Utility	Upload local file → URL
`muapi_keys_list`	Account	List API keys
`muapi_keys_create`	Account	Create API key
`muapi_keys_delete`	Account	Delete API key
`muapi_account_balance`	Account	Get credit balance
`muapi_account_topup`	Account	Add credits (Stripe)

Source: README.md

MCP Configuration

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Schema Reference

The repository includes schema_data.json for runtime validation:

Model ID Validation: Ensures requested models exist
Endpoint Resolution: Maps model names to API endpoints
Parameter Checking: Validates aspect_ratio, resolution, and duration

CLI Model Discovery

muapi models list
muapi models list --category video --output-json

Agentic Pipeline Flow

The architecture supports asynchronous operations through a polling pattern:

sequenceDiagram
    participant Agent
    participant CLI as muapi-cli
    participant API as muapi.ai API
    participant Agent2 as Agent (other work)
    
    Agent->>CLI: Submit async request
    CLI->>API: POST /generate (async=true)
    API-->>CLI: request_id
    CLI-->>Agent: request_id
    Agent->>Agent2: Do other work
    Agent2-->>Agent: Continue...
    Agent->>CLI: Poll for result
    CLI->>API: GET /predict/{request_id}
    API-->>CLI: status
    alt Still processing
        CLI->>API: GET /predict/{request_id}
    else Complete
        API-->>CLI: result URL
        CLI-->>Agent: Download media
    end

Example Pipeline Commands

# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running" \
  --model kling-master --no-wait --output-json --jq '.request_id')

# Poll when ready
muapi predict wait "$REQUEST_ID" --download ./outputs

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url')
muapi image edit "make it like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Supported AI Agents

The architecture is optimized for the next generation of AI development environments:

Agent	Integration Method
Claude Code	Direct terminal execution + MCP server mode
Cursor	Seamless local script execution
Gemini CLI	CLI tool integration
Windsurf	CLI tool integration
Any MCP Client	Full MCP server mode

Common Flags

All core scripts support standardized CLI flags:

Flag	Purpose
`--async`	Submit request without waiting
`--json`	Output raw JSON
`--download`	Auto-download generated media
`--view`	Auto-download and open in system viewer
`--output-json`	JSON output mode
`--jq '<filter>'`	Extract specific JSON fields
`--timeout N`	Set operation timeout

Requirements

Component	Requirement
muapi-cli	Installed via `npm install -g muapi-cli` or `pip install muapi-cli`
API Key	Configured via `muapi auth configure`
System Tools	`curl`, `jq`, `python3`
Node.js	For npm installation

Source: core/edit/SKILL.md

Key Design Principles

Agent-Native Design: CLI-powered scripts with structured JSON outputs and semantic exit codes
No Boilerplate: All primitives delegate to muapi-cli — no curl or manual JSON parsing
Direct Media Display: --view flag for automatic download and viewing
Local File Support: Auto-upload from local machine to CDN
Schema Validation: Runtime validation of models and parameters
CI/CD Ready: --output-json, --jq, semantic exit codes for scripting

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

MCP Server Setup

Related topics: CLI Commands Reference, Agent Integration Guide, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Image Generation & Editing

Continue reading this section for the full explanation and source context.

Section Video Generation

Continue reading this section for the full explanation and source context.

Section Audio Generation

Continue reading this section for the full explanation and source context.

MCP Server Setup

The MCP (Model Context Protocol) Server in Generative-Media-Skills exposes all 19 media generation tools as structured MCP endpoints, enabling AI agents like Claude Desktop, Cursor, and other MCP-compatible clients to seamlessly invoke image, video, and audio generation without requiring shell script execution or manual API calls.

Architecture Overview

graph TD
    A[Claude Desktop / Cursor / MCP Client] -->|MCP Protocol| B[muapi mcp serve]
    B --> C[muapi-cli Core]
    C --> D[muapi.ai API]
    D --> E[100+ AI Models]
    
    F[Local Files] -->|auto-upload| C
    G[Skills Library] -->|workflows| C

The MCP Server acts as a thin bridge between MCP-compatible AI agents and the muapi.ai platform. It provides fully typed JSON Schema definitions for all tools, eliminating the need for prompt engineering or manual request construction. Source: README.md

Prerequisites

Before configuring the MCP Server, ensure you have:

Requirement	Version/Details
Node.js	v18+ recommended
muapi-cli	Latest stable
muapi.ai API key	Available at muapi.ai/dashboard

Install muapi-cli via npm or pip:

# via npm (recommended)
npm install -g muapi-cli

# via pip
pip install muapi-cli

Configure your API key:

muapi auth configure --api-key "YOUR_MUAPI_KEY"

Source: README.md

Starting the MCP Server

Launch the MCP server in foreground or background mode:

muapi mcp serve

The server exposes all 19 tools with full JSON Schema input/output definitions. It runs as a long-lived process that handles MCP protocol communication on the local machine.

Claude Desktop Configuration

To integrate with Claude Desktop, add muapi to your Claude configuration file.

macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json

Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": {
        "MUAPI_API_KEY": "your-api-key-here"
      }
    }
  }
}

Alternatively, if you configured your API key globally via muapi auth configure, you can omit the env block:

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"]
    }
  }
}

After editing the config, restart Claude Desktop to load the new MCP server.

Source: README.md

Available MCP Tools

The MCP Server exposes 19 structured tools organized by category:

Image Generation & Editing

Tool	Description	Input Models
`muapi_image_generate`	Text-to-image generation	14 models (Flux, Midjourney, DALL-E, etc.)
`muapi_image_edit`	Image-to-image editing	11 models (Flux Kontext, GPT-4o, Midjourney, Qwen)

Video Generation

Tool	Description	Input Models
`muapi_video_generate`	Text-to-video generation	13 models (Kling, Veo, Seedance, etc.)
`muapi_video_from_image`	Image-to-video animation	16 models

Audio Generation

Tool	Description	Platform
`muapi_audio_create`	Music generation	Suno
`muapi_audio_from_text`	Sound effects	MMAudio

Enhancement & Effects

Tool	Description	Models/Options
`muapi_enhance_upscale`	AI upscaling	Multiple engines
`muapi_enhance_bg_remove`	Background removal	One-click
`muapi_enhance_face_swap`	Face swap for image/video	Multiple modes
`muapi_enhance_ghibli`	Ghibli style transfer	One-click
`muapi_edit_lipsync`	Lip sync to audio	Sync Labs, LatentSync, Creatify, Veed
`muapi_edit_clipping`	AI highlight extraction from video	Server-side transcription

Utility & Account

Tool	Description
`muapi_predict_result`	Poll async prediction status
`muapi_upload_file`	Upload local file to CDN, returns URL
`muapi_keys_list`	List existing API keys
`muapi_keys_create`	Create new API key
`muapi_keys_delete`	Delete an API key
`muapi_account_balance`	Get current credit balance
`muapi_account_topup`	Add credits via Stripe checkout

Source: README.md

Other MCP-Compatible Clients

The MCP Server is not limited to Claude Desktop. Any MCP-compatible agent can use these tools:

Client	Integration Method
Cursor	Add to Cursor settings using same JSON config structure
Windsurf	MCP server configuration in IDE settings
Gemini CLI	Direct CLI execution of MCP tools
Custom Agents	Any MCP-compatible agent with tool execution

For Cursor and Windsurf, use the same server configuration as Claude Desktop.

Workflow Examples

Image Generation Workflow

graph LR
    A[Agent Request] -->|muapi_image_generate| B[MCP Server]
    B --> C[muapi.ai API]
    C --> D[Image Model]
    D --> E[Generated Image URL]
    E --> B
    B --> F[Agent Receives Result]

Example Claude Desktop prompt:

Generate a cyberpunk city image with neon lights using the muapi_image_generate tool.

Async Video Pipeline

graph TD
    A[Submit Request] -->|muapi_video_generate --no-wait| B[Get request_id]
    B --> C[Do other work]
    C --> D[Poll muapi_predict_result]
    D -->|Still processing| D
    D -->|Complete| E[Download via muapi_predict_result --download]

Example terminal workflow:

# Submit async job
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# Poll for result
muapi predict wait "$REQUEST_ID" --download ./outputs

Chained Workflow

graph LR
    A[Local Image] -->|muapi_upload_file| B[Get CDN URL]
    B -->|muapi_image_edit| C[Apply Edit]
    C -->|muapi_enhance_upscale| D[Upscale]
    D -->|muapi_enhance_bg_remove| E[Final Output]

Example:

# Upload local file
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')

# Edit the image
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Platform Utilities via MCP

The MCP Server also exposes account management tools for programmatic control:

Tool	Use Case
`muapi_keys_list`	Audit active API keys in CI/CD
`muapi_keys_create`	Provision keys for different projects
`muapi_account_balance`	Check credits before large batch jobs
`muapi_account_topup`	Automated credit replenishment

These utilities enable fully automated pipelines without manual dashboard interaction.

Source: core/platform/SKILL.md

Supported AI Models

Discover all available models at runtime:

# List all models
muapi models list

# Filter by category
muapi models list --category video --output-json

# Check supported parameters
muapi models list --category image --output-json | jq '.[] | {id, aspect_ratio, resolution}'

Model availability is validated against schema_data.json at runtime, ensuring requests specify only supported parameters.

Source: README.md

Schema Reference

All MCP tools use fully typed JSON Schema definitions. This provides:

Input Validation — Requests are validated against supported parameters
Autocomplete — IDEs can suggest valid parameter values
Documentation — Tool descriptions are embedded in the schema

The schema_data.json file validates:

Validation	Description
Model IDs	Ensures requested model exists
Endpoint Resolution	Maps model names to API endpoints
Parameter Checking	Validates `aspect_ratio`, `resolution`, `duration`

Source: README.md

Troubleshooting

Server Won't Start

# Verify muapi-cli installation
muapi --version

# Check API key configuration
muapi auth configure --show

# Test connectivity
muapi auth configure --test

Tools Not Appearing in Agent

Verify Claude Desktop config JSON is valid
Restart Claude Desktop after config changes
Check the MCP server process is running
Confirm MUAPI_API_KEY is set or global config exists

Async Requests Timeout

Use --no-wait for long-running tasks and poll separately:

muapi predict wait "REQUEST_ID" --timeout 300

Upload Failures

The muapi_upload_file tool automatically handles local file uploads. Ensure files are accessible and within size limits.

The MCP Server provides direct access to core primitives. For higher-level workflows, consider these expert skills:

Skill	Description
Cinema Director	Technical film direction & cinematography
AI Clipping	Long video → ranked vertical clips
Seedance 2	Cinematic video with native audio-video sync
YouTube Shorts	Platform-aware clip presets

Source: README.md

CLI Commands Reference

Related topics: MCP Server Setup, Schema Reference, Troubleshooting

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Prerequisites

Continue reading this section for the full explanation and source context.

Section API Key Configuration

Continue reading this section for the full explanation and source context.

Section Authentication

Continue reading this section for the full explanation and source context.

CLI Commands Reference

Overview

The Generative-Media-Skills repository provides a comprehensive CLI-based interface for AI-powered media generation and manipulation. Built around the muapi-cli tool, these commands enable AI agents (Claude Code, Cursor, Gemini CLI) to generate images, videos, and audio through a standardized command-line interface with structured JSON outputs.

The CLI architecture follows a Core/Library split:

Core Primitives (/core): Thin wrappers for raw API access
Expert Library (/library): High-value skills with domain-specific logic

Source: README.md

Installation

Prerequisites

Before using the CLI commands, install the muapi-cli package:

# via npm (recommended)
npm install -g muapi-cli

# via pip
pip install muapi-cli

# or run without installing
npx muapi-cli --help

API Key Configuration

Configure your muapi.ai API key before making any requests:

# Interactive setup
muapi auth configure

# Pass key directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"

# Get your key at https://muapi.ai/dashboard

Platform setup scripts are located in core/platform/:

Script	Purpose
`setup.sh`	Configure API key, show config, test key validity
`check-result.sh`	Poll for async generation results by request ID

# Save API key
bash core/platform/setup.sh --add-key "YOUR_MUAPI_KEY"

# Show current configuration
bash core/platform/setup.sh --show-config

# Test API key validity
bash core/platform/setup.sh --test

Source: README.md, core/platform/SKILL.md

Core Platform Commands

Authentication

muapi auth configure
muapi auth configure --api-key "YOUR_MUAPI_KEY"

Model Discovery

List available models by category:

# List all models
muapi models list

# List video models only
muapi models list --category video

# JSON output for scripting
muapi models list --category image --output-json

Async Result Polling

For async operations, capture the request_id and poll for results:

# Capture request ID
REQUEST_ID=$(muapi video generate "a dog running" \
  --model kling-v3.0-pro --no-wait \
  --output-json --jq '.request_id' | tr -d '"')

# Poll until complete with auto-download
muapi predict wait "$REQUEST_ID" --download ./outputs

# Check once without polling
bash core/platform/check-result.sh --id "your-request-id" --once

# Check result script usage
bash core/platform/check-result.sh --id "your-request-id"

Source: README.md, core/platform/check-result.sh

Media Generation Commands

Image Generation

Generate images from text prompts:

# Basic generation
muapi image generate "a cyberpunk city at night"

# Specify model
muapi image generate "a sunset over mountains" --model flux-schnell

# Auto-download to directory
muapi image generate "product on white bg" --model flux-schnell --download ./outputs

# Extract URL for agent pipelines
muapi image generate "landscape" --model flux-dev --output-json --jq '.outputs[0]'

Available Models (14 text-to-image models):

flux-dev, flux-schnell, flux-kontext-pro (Flux family)
midjourney-v7, midjourney-v6.1 (Midjourney)
hidream-fast, hidream-pixel (HiDream)
gpt-image-1, gpt-4o (OpenAI)
veo3, veo2 (Google)
imagen4, imagen3

Video Generation

Generate videos from text or images:

# Text-to-video
muapi video generate "a dog running on a beach" --model kling-v3.0-pro

# Image-to-video
muapi video from-image "path/to/image.jpg" --model seedance-2 --subject "camera pans left"

# With duration
muapi video generate "ocean waves" --model kling-master --duration 10

Available Models:

Text-to-video: 13 models including kling-v3.0-pro, kling-master, seedance-2, veo3
Image-to-video: 16 models

Audio Generation

# Music generation (Suno)
muapi audio create "upbeat electronic dance track" --duration 30

# Sound effects (MMAudio)
muapi audio from-text "thunder rumbling in distance"

Source: README.md

Media Editing Commands

Image Editing

The edit-image.sh script provides prompt-based image editing:

bash core/edit/edit-image.sh \
  --image-url "https://example.com/image.jpg" \
  --prompt "add sunglasses" \
  --model flux-kontext-pro

Supported Models:

Model	Use Case
`flux-kontext-pro`	Flux Kontext editing
`gpt-4o`	OpenAI vision editing
`midjourney-v7`	Midjourney style editing
`qwen-vl-max`	Qwen vision editing

Source: core/edit/edit-image.sh

Image Enhancement

One-click enhancement operations via enhance-image.sh:

# AI upscaling
bash core/edit/enhance-image.sh --op upscale --image-url "https://..."

# Background removal
bash core/edit/enhance-image.sh --op background-remove --image-url "https://..."

# Face swap
bash core/edit/enhance-image.sh --op face-swap --image-url "..." --face-url "..."

# Colorize
bash core/edit/enhance-image.sh --op colorize --image-url "..."

# Ghibli style transfer
bash core/edit/enhance-image.sh --op ghibli --image-url "..."

# Product shot
bash core/edit/enhance-image.sh --op product-shot --image-url "..."

Source: core/edit/enhance-image.sh

Lip Sync

Synchronize video lip movements to audio:

bash core/edit/lipsync.sh \
  --video-url "https://..." \
  --audio-url "https://..." \
  --model sync

Supported Models: sync (Sync Labs), latent-sync, creatify, veed

Source: core/edit/lipsync.sh

Video Effects

Apply effects to videos and images:

# Dance effect (image + audio → animated video)
bash core/edit/video-effects.sh \
  --op dance \
  --image-url "https://..." \
  --audio-url "https://..."

# Face swap
bash core/edit/video-effects.sh --op face-swap --video-url "..." --face-url "..."

# Dress change
bash core/edit/video-effects.sh --op dress-change --video-url "..." --dress-url "..."

# Luma reframing
bash core/edit/video-effects.sh --op reframe --video-url "..."

Source: core/edit/video-effects.sh

Common Flags

All core scripts support these standard flags:

Flag	Description
`--async`	Submit request without waiting for completion
`--json`	Output raw JSON response
`--timeout N`	Set request timeout in seconds
`--download <path>`	Auto-download results to specified directory
`--view`	Download and open result in system viewer
`--output-json --jq '<expr>'`	Extract specific field using jq
`--help`	Show usage information

Source: README.md, core/edit/SKILL.md

Agentic Pipeline Examples

Async Workflow

graph TD
    A[Submit Async Request] --> B[Capture request_id]
    B --> C[Do Other Work]
    C --> D[Poll for Result]
    D --> E{Complete?}
    E -->|No| D
    E -->|Yes| F[Download Output]

# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait \
  --output-json --jq '.request_id' | tr -d '"')

# ... do other work ...

muapi predict wait "$REQUEST_ID" --download ./outputs

File Upload Pipeline

# Upload local file → edit → download
URL=$(muapi upload file ./photo.jpg \
  --output-json --jq '.url' | tr -d '"')

muapi image edit "make it look like a painting" \
  --image "$URL" --model flux-kontext-pro --download ./outputs

Command Chaining

# Pipe prompt from another command
generate_prompt | muapi image generate - --model flux-dev

# Chain multiple operations
muapi upload file ./source.jpg | \
  muapi enhance image --op upscale | \
  muapi predict wait - --download ./final

Source: README.md

Expert Library Scripts

The /library directory contains specialized scripts for domain-specific workflows:

Cinema Director

Generate cinematic video with professional direction:

cd library/motion/cinema-director

# Create 10-second epic reveal
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --view

# Animate reference image into video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode i2v \
  --file ./concept.jpg \
  --subject "camera slowly pulls back" \
  --intent "reveal" \
  --view

# Extend existing video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
  --mode extend \
  --request-id "YOUR_REQUEST_ID" \
  --subject "camera continues pulling back" \
  --duration 10

Nano-Banana

Reasoning-driven image generation:

bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

Skill Installation for Agents

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

Source: README.md

MCP Server Mode

Run muapi as a Model Context Protocol server for direct tool access:

muapi mcp serve

Claude Desktop Configuration (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Exposed MCP Tools:

Tool	Description
`muapi_image_generate`	Text-to-image (14 models)
`muapi_image_edit`	Image-to-image editing (11 models)
`muapi_video_generate`	Text-to-video (13 models)
`muapi_video_from_image`	Image-to-video (16 models)
`muapi_audio_create`	Music generation (Suno)
`muapi_audio_from_text`	Sound effects (MMAudio)
`muapi_enhance_upscale`	AI upscaling
`muapi_enhance_bg_remove`	Background removal
`muapi_enhance_face_swap`	Face swap image/video
`muapi_enhance_ghibli`	Ghibli style transfer
`muapi_edit_lipsync`	Lip sync to audio
`muapi_edit_clipping`	AI highlight extraction
`muapi_predict_result`	Poll prediction status
`muapi_upload_file`	Upload local file → URL
`muapi_keys_list`	List API keys
`muapi_keys_create`	Create API key
`muapi_keys_delete`	Delete API key
`muapi_account_balance`	Get credit balance
`muapi_account_topup`	Add credits (Stripe checkout)

Source: README.md

Requirements

All core scripts require:

Dependency	Purpose
`MUAPI_KEY` env var	Set via `core/platform/setup.sh`
`curl`	HTTP requests
`jq`	JSON parsing
`python3`	Helper scripts

Check requirements:

# Verify environment
muapi auth configure --test

# Show current config
muapi auth configure --show-config

Source: core/platform/SKILL.md, core/edit/SKILL.md

Troubleshooting

Common Issues

Issue	Solution
`ReferenceError: response is not defined`	Ensure API key is configured via `muapi auth configure`
Timeout errors	Use `--timeout N` flag to increase timeout
Model download stalls at 100%	Verify model file integrity; re-download if corrupted
500 Internal Server Error	Server overloaded; retry with exponential backoff
`npm run dev` hangs	Use PowerShell or WSL; ensure Node.js 18+ installed

Verification Commands

# Test API connectivity
bash core/platform/setup.sh --test

# List configured models
muapi models list

# Check account balance
muapi account balance

Server Dependencies (Ubuntu)

If running server components:

apt install python3-dev make g++
pip install wheel
pip install -r requirements.txt
curl -sL https://deb.nodesource.com/setup_18.x | bash -

Source: core/platform/SKILL.md, README.md

Expert Skills Library

Related topics: Recipe Pack, Workflow Scripts, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Motion / Video Skills

Continue reading this section for the full explanation and source context.

Section Visual / Images & Design Skills

Continue reading this section for the full explanation and source context.

Section Social Skills

Continue reading this section for the full explanation and source context.

Expert Skills Library

The Expert Skills Library is the high-value knowledge layer of the Generative-Media-Skills repository. It provides domain-specific skills that translate creative intent into technical directives for AI agents, enabling professional-grade image, video, and audio generation without requiring users to understand the underlying API complexity.

Overview

The repository uses a Core/Library split architecture:

Layer	Purpose	Location
Core Primitives	Thin wrappers around `muapi-cli` for raw API access	`/core/`
Expert Library	Domain-specific skills with professional knowledge baked in	`/library/`
Recipe Pack	LLM-orchestrated workflow recipes combining multiple skills	`/library/*/`

Source: README.md

Architecture Diagram

graph TD
    subgraph "Expert Skills Library"
        A["🎬 Motion / Video<br/>(16 skills)"] --> D["Cinema Director"]
        A --> E["Seedance 2"]
        A --> F["AI Clipping"]
        
        B["🎨 Visual / Design<br/>(21 skills)"] --> G["Nano-Banana"]
        B --> H["UI Designer"]
        B --> I["Logo Creator"]
        
        C["📱 Social<br/>(5 skills)"] --> J["YouTube Shorts"]
        C --> K["UGC Ads Workflow"]
    end
    
    L["muapi-cli"] --> M["19 Structured Tools"]
    D --> L
    G --> L
    J --> L
    
    M --> N["Claude Code / Cursor / MCP"]

Skill Categories

Motion / Video Skills

The motion library contains 16 skills for video generation and animation.

Skill	Description	Key Capability
Cinema Director	Technical film direction & cinematography	Directs Seedance 2.0 with camera movements, lighting, and timing
Seedance 2 (Doubao Video)	Director-level cinematic video generation	Text-to-video, image-to-video, video extension with audio-video sync
AI Fight Scene Generator	High-cut-density action sequences	16-cell storyboard image drives Seedance 2.0 i2v
3D Logo Animation	Premium 3D logo animation	Transforms 2D logos with cinematic effects
Animal Vlogger Video	Anthropomorphic animal content	Ultra-realistic characters in real-world settings
Cartoon Dance Animation	Photo to Pixar-style 3D animation	Reference dance/motion video driving
Drone-Style Video	Aerial drone-perspective footage	Bird's-eye sweeps, orbit shots, flyovers
Giant Product Showcase	Dramatic giant-scale visuals	Building-sized objects next to people
Jewelry Product Video	Luxury jewelry cinematography	Macro animation and commercial quality
Music Video	Short music video generation	Keyframes per beat, music track matching
One-Shot Video	Single continuous cinematic shot	No cuts, seamless flowing scene
Product Ad Cinematic	5-10s product advertisement	From product photo + brand brief
Product Showcase Video	Dynamic product animation	Explosive ingredient arrangement
Talking Baby Video	Viral-style talking baby	Custom costumes and scripts
UGC Lifestyle Try-On	Lifestyle content generation	Authentic social-native photos & video
UGC Video Factory	10s vertical UGC video ad	Nano-Banana Pro Edit → Seedance 2.0 VIP i2v

Visual / Images & Design Skills

The visual library contains 21 skills for image generation and design.

Skill	Description	Output
Nano-Banana	Reasoning-driven image generation	Gemini 3 Style reasoning for high-quality outputs
UI Designer	High-fidelity mobile/web mockups	Atomic Design principles, component-based
Logo Creator	Minimalist vector branding	Geometric Primitives, accurate brand-name text
Action Figure Generator	Photo → custom 3D action figure	Collectible toy packaging
Ad Creative Set	High-converting ad assets	Hero image, copy variations, platform crops
Amazon Product Listing Pack	Full Amazon listing images	Hero, lifestyle, infographic, comparisons
Blog Header	Professional blog header	1200×628 with title composition guidance
Brand Kit	Cohesive brand visual kit	Logo concept, color palette, typography
Brochure Designer	Multi-page brochure	Cover, inner spread, back
Brand Design Guide	Comprehensive design system	Palette, typography, UI components
Couple Grid Creator	Stylized couple grid	6-box romantic poses in packaging
Fashion Try-On	Virtual outfit try-on	Person photo + clothing combination
Floor Plan Rendering	2D → 3D architectural	Realistic 3D room visualization
Interior Design	Pro interior visualizations	Redesign rooms, furniture styles
Interior Design Visualizer	Room furniture generation	Fill empty rooms or redesign existing
Keyboard Art Maker	Keycap art	Top-down artistic keyboard arrangements
Logo + Branding Package	Complete branding	Variations, palette, mockups
Multi-Angle Reshoot	Multiple camera angles	Fish-eye, bird's-eye, low, macro shots
Multi-Angle Shots	Full product shot set	Front, side, back, top-down, 45°
Storyboard Generator	N keyframes for scenes	Story sequence visualization
URL to Design	Website → redesigned UI	Analyze URL and generate improved design
YouTube Thumbnail	High-CTR thumbnails	Bold text, emotional faces, striking imagery

Skill	Description	Platforms
Instagram Post	On-brand Instagram content	Instagram
Product Campaign Pack	Multi-channel campaign	Meta, Google, LinkedIn, TikTok
RedNote Cover	Xiaohongshu covers	小红书
Social Media Pack	Platform crops	Instagram, TikTok, Shorts, X
UGC Ads Workflow	Video ad pipeline	Social-native UGC style
YouTube Shorts	Platform-aware short clips	Shorts, TikTok, Reels, Feed

Edit Skills

The edit library provides post-processing capabilities.

Source: core/edit/SKILL.md

Script	Operation	Description
`edit-image.sh`	Prompt-based editing	Flux Kontext, GPT-4o, Midjourney, Qwen
`enhance-image.sh`	One-click operations	Upscale, background removal, face swap, colorize, Ghibli style, product shots
`lipsync.sh`	Lip sync	Sync Labs, LatentSync, Creatify, Veed
`video-effects.sh`	Video effects	Wan AI, face swap, dance, dress change, Luma

Core Expert Skills

Cinema Director

Technical film direction that translates creative intent into Seedance 2.0 directives.

Location: /library/motion/cinema-director/

Capabilities:

Camera movement planning
Lighting direction
Timing and pacing
Scene composition

Nano-Banana

Reasoning-driven image generation using chain-of-thought prompting.

Location: /library/visual/nano-banana/

Purpose: Apply "Gemini 3 Style" reasoning to generate high-quality images through explicit problem-solving steps.

UI Designer

High-fidelity mobile and web mockup generation using Atomic Design principles.

Location: /library/visual/ui-design/

Features:

Component-based design
Responsive layouts
Design system adherence

Logo Creator

Minimalist vector branding generation using geometric primitives.

Location: /library/visual/logo-creator/

Output: Accurate brand-name text rendering with clean vector aesthetic.

Seedance 2 (Doubao Video)

Director-level cinematic video generation supporting multiple modes.

Location: /library/motion/seedance-2/

Mode	Description
`t2v`	Text-to-video generation
`i2v`	Image-to-video animation
`extend`	Video extension

Usage Example:

# Text-to-video
bash scripts/generate-seedance.sh --mode t2v --subject "a cybernetic dragon" --intent "epic" --duration 10 --view

# Image-to-video
bash scripts/generate-seedance.sh --mode i2v --file ./concept.jpg --subject "camera pulls back" --intent "reveal" --view

# Extend existing video
bash scripts/generate-seedance.sh --mode extend --request-id "YOUR_ID" --subject "camera continues" --duration 10

AI Clipping

Server-side long video processing for short clip extraction.

Location: /library/edit/ai-clipping/

Features:

Server-side transcription (no local Whisper)
Virality ranking
Deduplication
Face-tracked auto-crop

YouTube Shorts

Platform-aware preset over AI Clipping with optimized defaults.

Location: /library/social/youtube-shorts/

Platform Defaults:

Platform	Aspect Ratio	Duration
Shorts	9:16	60s max
TikTok	9:16	60s max
Reels	9:16	90s max
Feed	16:9 or 1:1	Variable

Platform Utilities

The /core/platform/ directory provides essential utilities for skill execution.

Source: core/platform/SKILL.md

Script	Description
`setup.sh`	Configure API key, show config, test key validity
`check-result.sh`	Poll for async generation results

Quick Start:

# Save API key
bash setup.sh --add-key "YOUR_MUAPI_KEY"

# Test connectivity
bash setup.sh --test

# Poll for result
bash check-result.sh --id "your-request-id"

Recipe Pack

41 LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines.

Characteristics:

Each skill is a SKILL.md file the agent reads and follows
Designed for consuming agents (Claude Code, Cursor, MCP)
Recipes, not bash wrappers
Bring your own executing agent

Integration with AI Agents

The Expert Skills Library is designed for seamless integration with AI development environments.

Supported Platforms

Platform	Integration Method
Claude Code	Direct terminal execution via tools + MCP server mode
Cursor	MCP server mode
Gemini CLI	Local scripts
Windsurf	Local scripts

MCP Server Mode

muapi mcp serve

This exposes 19 structured tools with full JSON Schema input/output definitions to Claude Desktop, Cursor, or any MCP-compatible agent.

Source: README.md

Requirements

All expert skills require:

Requirement	Description
`muapi-cli`	Core CLI tool for API access
`MUAPI_KEY`	API key configured via `core/platform/setup.sh`
Standard tools	`curl`, `jq`, `python3` (varies by skill)

Common Workflow Patterns

Async Generation with Polling

sequenceDiagram
    participant Agent
    participant muapi as muapi-cli
    participant API as muapi.ai API
    
    Agent->>muapi: Submit async request (--no-wait)
    muapi->>API: POST request
    API-->>muapi: request_id
    muapi-->>Agent: Return request_id
    
    loop Poll until complete
        Agent->>muapi: check-result --id request_id
        muapi->>API: GET status
        API-->>muapi: status update
        muapi-->>Agent: Progress/Ready
    end
    
    Agent->>muapi: Download result (--download)
    muapi->>API: GET download
    API-->>muapi: Media file
    muapi-->>Agent: Saved output

Agentic Pipeline Example

# 1. Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# 2. Do other work...

# 3. Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Community Considerations

Based on community feedback, several areas are frequently discussed:

Topic	Status	Notes
Publishing destinations	Feature request	Users request post-generation publishing (e.g., Vynly integration)
GPU acceleration	Planned	Local model acceleration discussed for future support
Multilingual content	Supported	Upload documents and query in any language
Server errors	Resolved	Initial 500 errors addressed in later versions

Source: GitHub Issues #7, #89, #24

Quick Reference

Skill Category	Count	Primary Use Case
Motion / Video	16	Cinematic video, animation, product showcases
Visual / Design	21	Branding, UI, product imagery, marketing
Social	5	Platform-specific content, UGC ads
Edit	4	Post-processing, enhancement, effects

Total: 46+ expert skills organized for agentic execution.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Recipe Pack

Related topics: Expert Skills Library, Workflow Scripts

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Recipe Structure

Continue reading this section for the full explanation and source context.

Section Motion / Video (16 Recipes)

Continue reading this section for the full explanation and source context.

Section Social (5 Recipes)

Continue reading this section for the full explanation and source context.

Related topics: Expert Skills Library, Workflow Scripts

Recipe Pack

The Recipe Pack is a curated collection of 41 LLM-orchestrated workflow recipes that translate creative intent into executable muapi-cli pipelines. Each recipe is a self-contained SKILL.md file containing structured instructions that AI agents can read and execute without additional configuration. Source: README.md

Overview

Recipe Pack workflows combine multiple muapi-cli calls into named end-to-end pipelines. Rather than requiring developers to manually chain image generation, video creation, and enhancement operations, recipes provide:

Pre-defined creative logic — domain expertise baked into executable steps
Multi-step pipelines — complex outputs from simple inputs (e.g., "photo of person → 3D action figure")
Agent-native format — SKILL.md files that LLMs can parse and follow directly
Professional quality — cinematographic, branding, and design best practices embedded Source: README.md

Architecture

Recipes follow a layered architecture that separates creative intent from technical execution:

graph TD
    A[User Input / Agent Prompt] --> B[SKILL.md Recipe]
    B --> C[muapi-cli Calls]
    C --> D[muapi.ai API]
    D --> E[Generated Media]
    
    F[Core Primitives] --> C
    G[Expert Library] --> B

Recipe Structure

Each recipe declares its inputs and a Steps body. The executing agent reads the SKILL.md and translates instructions into muapi CLI calls. Source: README.md

Layer	Location	Purpose
Core Primitives	`/core/`	Thin wrappers around `muapi-cli` for raw API access (media, edit, platform)
Expert Library	`/library/`	High-value skills translating creative intent to technical directives
Recipe Pack	`/library/*/`	41 named pipelines combining multiple primitives

Recipe Categories

The Recipe Pack is organized into three primary categories:

Motion / Video (16 Recipes)

High-production video workflows including cinematography, animation, and UGC content.

Skill	Description
3D Logo Animation	Transform a 2D logo into a premium 3D version with cinematic effects
AI Fight Scene Generator	High-cut-density action sequence — 16-cell storyboard drives Seedance 2.0 i2v
Animal Vlogger Video	Anthropomorphic animal vlogger in real-world settings
Cartoon Dance Animation	Photo → Pixar-style 3D cartoon with dance animation
Character Story Video	Multi-part animated story with consistent character
Drone-Style Video	Aerial footage — bird's-eye sweeps, orbit shots, flyovers
Giant Product Showcase	Building-sized product visual with optional animation
Jewelry Product Video	Luxury jewelry ad with macro animation
Music Video	Short music video from song theme — keyframes per beat
One-Shot Video	Single continuous cinematic shot
Cinematic Product Ad	5–10s product ad from photo + brand brief
Product Showcase Video	Dynamic product showcase with motion animation
Product Video Ad Maker	Cinematic video ad from product photo
Talking Baby Video	Viral-style talking baby with costumes and scripts
UGC Lifestyle Try-On	Lifestyle photos & video of person using product
UGC Video Factory	Person + product + script → 10s vertical UGC video ad

Source: README.md

Platform-optimized social media content and multi-channel campaigns.

Skill	Description
Instagram Post	Hero image + caption + hashtags
Product Campaign Pack	Multi-channel campaign — hero visuals, social assets, video, crops
RedNote Cover	Xiaohongshu (小红书) cover — lifestyle aesthetic with typography
Social Media Pack	Hero image → Instagram / TikTok / Shorts / X aspect ratios
UGC Ads Workflow	Selfie + product image + script → animated ad

Source: README.md

Visual / Images & Design (21 Recipes)

Professional image generation, branding, and design assets.

Skill	Description
Action Figure Generator	Photo → custom 3D action figure with collectible packaging
Ad Creative Set	Hero image + copy variations + platform crops
Amazon Product Listing Pack	Hero, lifestyle, infographic, comparison images
Blog Header	1200×628 blog header with title composition
Brand Kit	Logo concept + color palette + typography pairings
Brochure Designer	Multi-page brochure — cover, inner spread, back
Couple Grid Creator	6-box stylized grid in cardboard packaging frames
Brand Design Guide	Palette, typography, UI components, visual identity
Fashion Try-On	Person + clothing → fashion model video
Floor Plan Rendering	2D floor plan → realistic 3D architectural rendering
Interior Design	Pro interior design visualizations
Interior Design Visualizer	Empty room → filled with furniture / redesign existing room
Keyboard Art Maker	Keycaps spelling custom messages
Logo + Branding Package	Logo variations (dark/light/icon) + palette + mockups
Logo Generator	Quick single-shot polished logo
Multi-Angle Reshoot	Subject from fish-eye, bird's-eye, low, macro angles
Multi-Angle Shots	Full product shot set — front, side, back, top-down, 45°
Selfie with Celebrities	Realistic selfie with celebrity; optional cinematic
Storyboard Generator	N keyframes for story or scene sequence
URL to Design	Website → redesigned UI with modern aesthetics
YouTube Thumbnail	High-CTR thumbnail — bold text, emotional imagery

Source: README.md

Execution Model

Recipes are designed for agentic execution. The consuming agent (Claude Code, Cursor, MCP, etc.) reads the SKILL.md file and executes the steps via muapi CLI calls. Source: README.md

Typical Recipe Flow

graph LR
    A[Input Media<br/>or Description] --> B[Parse SKILL.md]
    B --> C[Step 1: Generate<br/>Base Asset]
    C --> D[Step 2: Enhance<br/>or Transform]
    D --> E[Step 3: Apply<br/>Effects/Animation]
    E --> F[Output:<br/>Final Media]

Key Recipe Patterns

Pattern 1: Image-to-Video Pipeline

# 1. Generate or upload source image
muapi image generate "product photo on white" --model flux-schnell

# 2. Animate into video
muapi video from-image \
  --image "SOURCE_IMAGE_URL" \
  --subject "camera slowly orbits the product" \
  --model seedance-2.0-vip

Pattern 2: Multi-Asset Composite

# 1. Generate selfie
muapi image generate "professional selfie" --model flux-dev

# 2. Generate product
muapi image generate "product photo" --model flux-schnell

# 3. Combine in video
muapi video from-image \
  --image "COMPOSITE_URL" \
  --model kling-v3.0-pro

Pattern 3: Async Pipeline with Polling

# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs

Example Recipes

3D Logo Animation

Transforms a 2D logo into an animated 3D version with cinematic effects.

Location: library/motion/3d-logo-animation/SKILL.md

Workflow:

Accept 2D logo input (URL or local file)
Generate 3D version using image-to-3D model
Apply animation choreography (rotation, light sweep, particle effects)
Output final video asset

Models Used:

Image generation: Flux variants, Midjourney
3D conversion: Dedicated 3D models
Video: Seedance 2.0, Kling 3.0

Cinematic Product Ad

Creates a 5–10 second product advertisement from a product photo and brand brief.

Location: library/motion/product-ad-cinematic/SKILL.md

Workflow:

Accept product photo and brand brief (tone, colors, messaging)
Generate lifestyle background scene
Composite product into scene
Animate with cinematic camera movement
Apply color grading matching brand identity

Output: Professional-grade product commercial

Action Figure Generator

Converts a photo of a person into a custom 3D action figure with collectible toy packaging.

Location: library/visual/action-figure-generator/SKILL.md

Workflow:

Accept subject photo
Generate 3D-rendered action figure likeness
Create collectible packaging (blister card, header card)
Apply toy-grade styling (plastic sheen, stylized proportions)

Use Cases:

Personalized gifts
Marketing materials
Fan merchandise concepts

YouTube Shorts Generator

Converts long-form video content into platform-optimized short clips.

Location: library/social/youtube-shorts/SKILL.md

Workflow:

Upload or reference source video
AI identifies best highlights (using transcription + virality ranking)
Extract vertical clips (9:16 aspect ratio)
Auto-crop to face-tracked subjects
Apply platform-specific formatting (TikTok, Reels, Shorts)

Features:

Server-side transcription (no local Whisper required)
Deduplication of similar clips
Face-tracked auto-crop

Integration with AI Agents

Installing to Claude Code

npx skills add SamurAIGPT/Generative-Media-Skills --all

Installing to Specific Agents

npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

MCP Server Mode

Recipes can also be executed via the Model Context Protocol server:

muapi mcp serve

This exposes 19 structured tools directly to MCP-compatible agents. Source: README.md

Claude Desktop Configuration

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Running Recipes Manually

Each recipe includes executable shell scripts for direct invocation:

# Generate a cinematic film
cd library/motion/cinema-director
bash scripts/generate-film.sh \
  --subject "a cybernetic dragon over Tokyo" \
  --intent "epic" \
  --model "kling-v3.0-pro" \
  --duration 10 \
  --view

# Use Nano-Banana reasoning for image generation
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
  --file ./my-source-image.jpg \
  --subject "a glass hummingbird" \
  --style "macro photography" \
  --resolution "2k" \
  --view

Extending the Recipe Pack

Recipes follow a consistent structure that makes them easy to extend:

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Workflow Scripts

Related topics: Expert Skills Library, Recipe Pack, CLI Commands Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Component Responsibilities

Continue reading this section for the full explanation and source context.

Section list-workflows.sh

Continue reading this section for the full explanation and source context.

Section discover-workflow.sh

Continue reading this section for the full explanation and source context.

Workflow Scripts

Workflow Scripts provide the foundational infrastructure for executing multi-step generative media pipelines within the Generative-Media-Skills framework. These scripts orchestrate complex operations by chaining together muapi-cli commands, enabling AI agents to execute sophisticated end-to-end media generation workflows with minimal configuration.

Overview

The workflow system serves as a bridge between high-level creative intent and low-level API operations. Rather than requiring agents to manually construct and sequence individual API calls, workflow scripts encapsulate entire pipelines as executable units that handle:

Input validation and parameter passing
State management across pipeline stages
Error handling and recovery mechanisms
Result aggregation from multiple generation steps

Source: library/workflow/SKILL.md

Architecture

The workflow subsystem follows a Core/Library architectural pattern consistent with the broader Generative-Media-Skills project:

graph TD
    A[User/Agent Request] --> B[Workflow Selection]
    B --> C{Interactive or Direct?}
    C -->|Interactive| D[interactive-run.sh]
    C -->|Direct| E[run-workflow.sh]
    D --> F[Parameter Collection]
    E --> G[Execute Pipeline]
    F --> G
    G --> H[muapi-cli Operations]
    H --> I[Media Generation]
    I --> J[Result Aggregation]
    J --> K[Output Delivery]
    
    L[discover-workflow.sh] -.->|Discovery| B
    M[list-workflows.sh] -.->|Catalog| N[Available Workflows]

Component Responsibilities

Component	Role	Location
`SKILL.md`	Metadata, usage documentation, and skill definition	`/library/workflow/`
`discover-workflow.sh`	Scans and identifies available workflow definitions	`/library/workflow/scripts/`
`run-workflow.sh`	Executes workflows with provided parameters	`/library/workflow/scripts/`
`generate-workflow.sh`	Creates new workflow definitions or generates workflow output	`/library/workflow/scripts/`
`interactive-run.sh`	Guides users through workflow execution via prompts	`/library/workflow/scripts/`
`list-workflows.sh`	Displays catalog of available workflows	`/library/workflow/scripts/`

Source: library/workflow/scripts/run-workflow.sh, library/workflow/scripts/interactive-run.sh

Core Scripts Reference

list-workflows.sh

Lists all available workflow definitions in the system. This script scans the workflow directory and presents workflows in a structured format suitable for both human review and agent consumption.

bash list-workflows.sh [--format json|text]

Parameters:

Parameter	Type	Description
`--format`	string	Output format: `json` for machine parsing, `text` for human-readable (default: `text`)

Source: library/workflow/scripts/list-workflows.sh

discover-workflow.sh

Performs discovery and validation of workflow definitions. This script identifies all workflow files, parses their metadata, and verifies structural integrity before execution.

bash discover-workflow.sh [--path <directory>] [--validate]

Parameters:

Parameter	Type	Description
`--path`	string	Directory path to scan for workflows (default: current workflow library)
`--validate`	flag	Perform structural validation of discovered workflows

Discovery Output Structure:

{
  "workflows": [
    {
      "id": "workflow-identifier",
      "name": "Human Readable Name",
      "description": "Workflow purpose and capabilities",
      "inputs": ["required", "parameters"],
      "outputs": ["expected", "results"],
      "steps": ["sequential", "operations"]
    }
  ]
}

Source: library/workflow/scripts/discover-workflow.sh

run-workflow.sh

Executes a specified workflow with provided or default parameters. This is the primary execution engine that coordinates the actual muapi-cli calls.

bash run-workflow.sh \
  --workflow <workflow-id> \
  --input <input-path-or-url> \
  --output <output-directory> \
  [--param-key value...]

Parameters:

Parameter	Type	Required	Description
`--workflow`	string	Yes	Unique identifier of workflow to execute
`--input`	string	Yes	Primary input (file path, URL, or prompt)
`--output`	string	No	Output directory (default: `./outputs`)
`--param-*`	mixed	No	Additional workflow-specific parameters

Exit Codes:

Code	Meaning
`0`	Workflow completed successfully
`1`	Invalid workflow ID
`2`	Input validation failed
`3`	API call failed
`4`	Output generation failed

Source: library/workflow/scripts/run-workflow.sh

generate-workflow.sh

Generates workflow definitions or produces workflow-based outputs. This script supports both workflow creation (for defining new pipelines) and output generation (for producing artifacts).

bash generate-workflow.sh \
  --template <template-id> \
  --spec <specification-file> \
  --output <output-path>

Parameters:

Parameter	Type	Description
`--template`	string	Template identifier to base new workflow on
`--spec`	string	YAML/JSON specification file defining workflow structure
`--output`	string	Destination for generated workflow definition or output

Source: library/workflow/scripts/generate-workflow.sh

interactive-run.sh

Provides an interactive, question-driven interface for workflow execution. Users are prompted for required inputs, and the script validates each parameter before proceeding.

bash interactive-run.sh [--workflow <workflow-id>]

Interactive Flow:

graph LR
    A[Start] --> B{Workflow ID Provided?}
    B -->|No| C[List Available Workflows]
    C --> D[Select Workflow]
    B -->|Yes| E[Load Workflow Definition]
    D --> E
    E --> F[Prompt: Input 1]
    F --> G[Validate Input 1]
    G -->|Valid| H[Prompt: Input 2]
    G -->|Invalid| F
    H --> I[... Continue N times]
    I --> J[Execute Workflow]
    J --> K[Display Results]
    K --> L[End]

Supported Prompts:

Prompt Type	Validation	Description
`text`	regex pattern	Free-form text input
`url`	URL format	Web resource URLs
`file`	path exists	Local file paths
`select`	enum values	Enumerated choice
`confirm`	boolean	Yes/No confirmation

Source: library/workflow/scripts/interactive-run.sh

Workflow Execution Pipeline

When executing a workflow, the system follows a consistent pipeline pattern:

graph TD
    subgraph "Stage 1: Initialization"
        A1[Parse Workflow Definition] --> A2[Resolve Input Parameters]
        A2 --> A3[Initialize Output Directory]
    end
    
    subgraph "Stage 2: Execution"
        A3 --> B1[Execute Step 1]
        B1 --> B2{Step 1 Success?}
        B2 -->|Yes| B3[Execute Step 2]
        B2 -->|No| B4[Log Error]
        B4 --> B5[Rollback if Needed]
        B3 --> B6{Step 2 Success?}
        B6 -->|Yes| B7[Execute Step N]
        B6 -->|No| B4
    end
    
    subgraph "Stage 3: Aggregation"
        B7 --> C1[Collect Step Outputs]
        C1 --> C2[Merge Results]
        C2 --> C3[Generate Metadata]
        C3 --> C4[Write Final Output]
    end

Step Execution Model

Each workflow step follows this execution model:

# Pseudo-code for step execution
for step in workflow.steps:
    result = muapi-cli <operation> <step.params>
    if result.success:
        cache(step.id, result)
    else:
        handle_error(step, result)

Source: library/workflow/SKILL.md, library/workflow/scripts/run-workflow.sh

Integration with Recipe Pack

The workflow scripts serve as the execution backbone for the Recipe Pack — a collection of 41 pre-built LLM-orchestrated workflow recipes. Each recipe in the library maps to one or more workflow script invocations.

Recipe Categories

Category	Count	Example Workflows
Motion/Video	16	3D Logo Animation, AI Fight Scene Generator, Drone-Style Video
Visual/Images	21	Action Figure Generator, Brand Kit, Interior Design
Social	5	Instagram Post, Product Campaign Pack, UGC Ads Workflow
Edit	1	AI Clipping
Motion Specialized	4	Cinema Director, Seedance 2.0, YouTube Shorts

Source: README.md - Recipe Pack Section

Recipe-to-Workflow Mapping

Complex recipes often combine multiple workflow scripts:

# Example: Action Figure Generator Recipe
# Step 1: Generate base image
muapi image generate "3D render of action figure" --model flux-dev

# Step 2: Enhance with workflow
bash run-workflow.sh --workflow enhance-3d \
  --input "./outputs/step1.png" \
  --output "./outputs"

# Step 3: Add packaging
bash run-workflow.sh --workflow product-packaging \
  --input "./outputs/step2.png" \
  --output "./outputs/final"

Source: library/motion/ai-fight-scene/SKILL.md, library/visual/action-figure-generator/SKILL.md

Common Workflow Patterns

Async Polling Pattern

For long-running operations, workflows implement async polling:

# Submit generation request
REQUEST_ID=$(muapi video generate "prompt" --model kling-v3.0-pro \
  --no-wait --output-json --jq '.request_id')

# Poll for completion via workflow
bash run-workflow.sh --workflow poll-result \
  --request-id "$REQUEST_ID" \
  --max-attempts 60 \
  --interval 10

Source: README.md - Agentic Pipeline Examples

File Upload Pattern

Workflows that process local files auto-upload to CDN:

# Workflow handles upload transparently
bash run-workflow.sh --workflow image-edit \
  --input "./local-photo.jpg" \
  --prompt "apply cinematic color grading"

# Equivalent raw operations:
URL=$(muapi upload file ./local-photo.jpg --output-json --jq '.url')
muapi image edit "apply cinematic color grading" --image "$URL" --model flux-kontext-pro

Source: library/workflow/SKILL.md

Chaining Pattern

Multiple workflows can be chained for complex pipelines:

# Chain: Generate → Edit → Upscale → Export
bash run-workflow.sh --workflow generate-portrait --input "professional headshot" --output ./step1
bash run-workflow.sh --workflow retouch-portrait --input ./step1/output.jpg --output ./step2
bash run-workflow.sh --workflow upscale-4k --input ./step2/output.jpg --output ./step3
bash run-workflow.sh --workflow export-social --input ./step3/output.jpg --output ./final

Source: library/workflow/scripts/generate-workflow.sh

Configuration

Environment Variables

Variable	Required	Default	Description
`MUAPI_API_KEY`	Yes	—	muapi.ai API key for authentication
`MUAPI_OUTPUT_DIR`	No	`./outputs`	Default output directory
`MUAPI_TIMEOUT`	No	`300`	Default timeout in seconds
`MUAPI_RETRY_COUNT`	No	`3`	Number of retries on failure

Workflow Definition Schema

New workflows can be defined using YAML or JSON:

# workflow-definition.yaml
name: custom-workflow
version: "1.0"
description: Custom media generation pipeline

inputs:
  - id: source_image
    type: file
    required: true
  - id: style
    type: select
    options: [realistic, cartoon, anime]
    default: realistic

steps:
  - name: enhance
    operation: muapi_image_edit
    params:
      image: "{{ inputs.source_image }}"
      prompt: "Apply {{ inputs.style }} style"
      model: flux-kontext-pro
  
  - name: upscale
    operation: muapi_enhance_upscale
    params:
      image: "{{ steps.enhance.output }}"
      scale: 2

outputs:
  - id: final_image
    source: steps.upscale.output

Source: library/workflow/scripts/generate-workflow.sh

Error Handling

The workflow system implements tiered error handling:

Error Level	Trigger	Response
Warning	Non-critical parameter mismatch	Log and continue with defaults
Retry	Transient API failure	Retry up to `MUAPI_RETRY_COUNT` times
Abort	Validation failure or unrecoverable error	Stop workflow, log context, exit with code
Rollback	Step failure with side effects	Attempt to undo previous operations

Source: library/workflow/scripts/run-workflow.sh, library/workflow/SKILL.md

Community Considerations

Based on community feedback, several workflow-related patterns have emerged:

Performance Optimization

Users have inquired about GPU acceleration and faster execution times. Workflows that call local model operations (where supported) can leverage GPU resources by specifying appropriate model variants:

# Specify GPU-capable model in workflow parameters
bash run-workflow.sh --workflow generate-image \
  --model ggml-vic13b-q5_1 \
  --input "complex prompt"

Note: Not all models support GPU acceleration. Refer to individual model documentation.

Source: issues/7, issues/16

Batch Processing

Community members have requested the ability to process multiple files from network shares. Workflows support batch mode via input directories:

# Process all images in directory
bash run-workflow.sh --workflow batch-enhance \
  --input ./batch-input/ \
  --output ./batch-output/

Source: issues/73

Schema Reference

Related topics: CLI Commands Reference, MCP Server Setup, Architecture Overview

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Model Discovery

Continue reading this section for the full explanation and source context.

Section Validation Rules

Continue reading this section for the full explanation and source context.

Section Tool Schemas

Continue reading this section for the full explanation and source context.

Schema Reference

The Schema Reference system is the validation and discovery layer for Generative-Media-Skills. It provides a centralized configuration that core scripts use at runtime to ensure type safety, endpoint accuracy, and parameter validation across all media generation operations.

Overview

The system centers on schema_data.json, a structured configuration file that powers the entire muapi-cli ecosystem. This schema serves three primary functions at runtime:

Function	Description
Model ID Validation	Ensures requested models exist in the platform
Endpoint Resolution	Automatically maps model names to API endpoints
Parameter Checking	Validates supported `aspect_ratio`, `resolution`, and `duration` values

Source: README.md:schema_reference

Architecture

graph TD
    A[CLI Command] --> B[Schema Data Validation]
    B --> C{Valid?}
    C -->|Yes| D[Resolve Endpoint]
    C -->|No| E[Error: Invalid Model/Parameter]
    D --> F[Execute API Call]
    F --> G[Return Result]
    
    H[Schema Data JSON] --> B
    H --> D

The schema acts as a contract between the CLI interface and the underlying muapi.ai platform, ensuring all requests are properly formatted before execution.

Core Schema Functions

Model Discovery

The CLI provides commands to discover all available models via the schema:

# List all models
muapi models list

# List models by category
muapi models list --category video --output-json

# Filter by specific capability
muapi models list --category image --output-json

Source: README.md:schema_reference and README.md:schema_commands

Validation Rules

The schema enforces validation across multiple dimensions:

Validation Type	Purpose	Example Values
`model_id`	Ensures model exists	`flux-dev`, `kling-v3.0-pro`, `seedance-2.0`
`aspect_ratio`	Image/video dimensions	`1:1`, `16:9`, `9:16`, `4:3`
`resolution`	Output quality	`1k`, `2k`, `4k`, `1024x1024`
`duration`	Video length in seconds	`5`, `10`, `15`, `30`

Source: schema_data.json:validation_rules

MCP Server Schema Integration

When running muapi as a Model Context Protocol server, all tools are exposed with full JSON Schema input/output definitions. The schema definitions enable:

Type Checking: Automatic validation of tool inputs
Auto-completion: IDEs can suggest valid parameters
Documentation: Rich descriptions for each tool parameter

Tool Schemas

The MCP server exposes 19 structured tools with typed schemas:

Tool	Category	Schema Purpose
`muapi_image_generate`	Media	Text-to-image generation (14 models)
`muapi_image_edit`	Media	Image-to-image editing (11 models)
`muapi_video_generate`	Media	Text-to-video generation (13 models)
`muapi_video_from_image`	Media	Image-to-video conversion (16 models)
`muapi_audio_create`	Media	Music generation via Suno
`muapi_enhance_upscale`	Enhancement	AI-powered image upscaling
`muapi_enhance_bg_remove`	Enhancement	Background removal
`muapi_edit_lipsync`	Editing	Lip sync to audio

Source: README.md:mcp_server_tools

Runtime Integration

Core scripts integrate with the schema at multiple points:

graph LR
    A[setup.sh] --> B[Configure API Key]
    A --> C[Test Connectivity]
    
    D[check-result.sh] --> E[Poll for Results]
    D --> F[Async Status Check]
    
    G[edit-image.sh] --> H[Validate Image URL]
    G --> I[Apply Model Schema]
    
    J[enhance-image.sh] --> K[Validate Operation Type]
    J --> L[Apply Enhancement Schema]

Script Integration Points

Each core script validates inputs against the schema before execution:

Source: core/platform/SKILL.md:scripts Source: core/edit/SKILL.md:scripts

Script	Schema Usage
`core/platform/setup.sh`	API key configuration and validation
`core/platform/check-result.sh`	Request ID format validation
`core/edit/edit-image.sh`	Model selection and parameter validation
`core/edit/enhance-image.sh`	Operation type and parameter validation

Configuration

Environment Variables

The schema system relies on the following environment configuration:

# Set via setup.sh
MUAPI_API_KEY=your-api-key-here

# Config location
~/.muapi/config.json

Source: core/platform/SKILL.md:requirements

Schema File Location

The schema_data.json file is located at the repository root and is loaded by core scripts at runtime. The file structure follows this pattern:

{
  "models": { ... },
  "endpoints": { ... },
  "parameters": {
    "aspect_ratio": [...],
    "resolution": [...],
    "duration": [...]
  }
}

Common Validation Errors

Based on community issues, common schema-related errors include:

Error	Cause	Resolution
Invalid model ID	Model not in schema	Run `muapi models list` to see valid options
Unsupported parameter	Parameter value not in allowed list	Check schema for valid values
Endpoint resolution failure	Model missing endpoint mapping	Verify schema_data.json is up to date

Source: GitHub Issue #38

Best Practices

Always validate before requesting: Use muapi models list to confirm model availability before generating
Check parameter constraints: Verify aspect_ratio, resolution, and duration are supported
Use JSON output for automation: --output-json flag provides schema-compliant output for piping
Keep schema updated: Pull latest changes when new models are added to the platform

Agent Integration Guide

Related topics: MCP Server Setup, Getting Started, CLI Commands Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Section Method 1: Install muapi-cli

Continue reading this section for the full explanation and source context.

Section Method 2: Install Skill Packages

Continue reading this section for the full explanation and source context.

Section Starting the MCP Server

Continue reading this section for the full explanation and source context.

Agent Integration Guide

This guide covers all methods for integrating Generative-Media-Skills with AI agents including Claude Code, Cursor, Gemini CLI, and other MCP-compatible agents.

Overview

Generative-Media-Skills provides a CLI-first architecture designed specifically for agentic workflows. Rather than relying on GUI interfaces or manual operations, agents interact with the system through structured CLI commands, MCP protocol, or skill packages that agents can read and execute.

The integration layer consists of three primary components:

Component	Purpose	Best For
`muapi-cli`	Core CLI tool with structured JSON outputs	Direct terminal execution, shell pipelines
MCP Server	Model Context Protocol server	Claude Desktop, Cursor, MCP-compatible agents
Skill Packages	Pre-packaged workflows (`SKILL.md` + scripts)	Claude Code, Cursor, automated ingestion

Source: README.md

Architecture Overview

graph TD
    A[AI Agent] --> B[muapi-cli]
    A --> C[MCP Server]
    A --> D[Skill Packages]
    
    B --> E[Structured JSON Output]
    B --> F[Semantic Exit Codes]
    B --> G[--jq Filtering]
    
    C --> H[19 MCP Tools]
    C --> I[JSON Schema Validation]
    
    D --> J[41 Workflow Recipes]
    D --> K[Expert Library Skills]
    D --> L[Core Primitives]
    
    E --> L
    G --> M[Agentic Pipelines]
    H --> M
    J --> M

Supported Agents

The repository officially supports integration with:

Claude Code — Direct terminal execution via tools + MCP server mode
Cursor — MCP server mode for native tool calling
Gemini CLI — Seamless integration as local scripts
Windsurf — MCP-compatible integration
Any MCP-compatible agent — Via the MCP server protocol

Source: README.md

Installation Methods

Method 1: Install muapi-cli

The core CLI tool is available via multiple package managers:

# via npm (recommended — no Python required)
npm install -g muapi-cli

# via pip
pip install muapi-cli

# or run without installing
npx muapi-cli --help

After installation, configure your API key:

# Interactive setup
muapi auth configure

# Or pass directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"

# Get your key at https://muapi.ai/dashboard

Source: README.md

Method 2: Install Skill Packages

Install pre-packaged skills directly to your AI agent:

# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all

# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation

# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor

Source: README.md

MCP Server Integration

The MCP server exposes all 19 generation tools directly to Claude Desktop, Cursor, or any MCP-compatible agent without requiring shell scripts.

Starting the MCP Server

muapi mcp serve

Claude Desktop Configuration

Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "muapi": {
      "command": "muapi",
      "args": ["mcp", "serve"],
      "env": { "MUAPI_API_KEY": "your-key-here" }
    }
  }
}

Available MCP Tools

The server exposes 19 structured tools with full JSON Schema input/output definitions:

Tool	Description	Category
`muapi_image_generate`	Text-to-image generation	Generation
`muapi_image_edit`	Image-to-image editing	Editing
`muapi_video_generate`	Text-to-video generation	Generation
`muapi_video_from_image`	Image-to-video animation	Generation
`muapi_audio_create`	Music generation via Suno	Audio
`muapi_audio_from_text`	Sound effects via MMAudio	Audio
`muapi_enhance_upscale`	AI upscaling	Enhancement
`muapi_enhance_bg_remove`	Background removal	Enhancement
`muapi_enhance_face_swap`	Face swap for image/video	Enhancement
`muapi_enhance_ghibli`	Ghibli style transfer	Enhancement
`muapi_edit_lipsync`	Lip sync to audio	Editing
`muapi_edit_clipping`	AI highlight extraction	Editing
`muapi_predict_result`	Poll prediction status	Utility
`muapi_upload_file`	Upload local file → URL	Utility
`muapi_keys_list`	List API keys	Account
`muapi_keys_create`	Create API key	Account
`muapi_keys_delete`	Delete API key	Account
`muapi_account_balance`	Get credit balance	Account
`muapi_account_topup`	Add credits via Stripe	Account

Source: README.md

CLI Usage for Agents

Basic Generation Commands

# Generate an image
muapi image generate "a cyberpunk city at night" --model flux-dev

# Download the result automatically
muapi image generate "a sunset over mountains" --model hidream-fast --download ./outputs

# Extract just the URL (agent-friendly)
muapi image generate "product on white bg" --model flux-schnell --output-json --jq '.outputs[0]'

Async Pipeline Workflow

For long-running operations, submit async requests and poll for results:

# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
  --model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')

# ... do other work ...

muapi predict wait "$REQUEST_ID" --download ./outputs

Chaining Operations

# Pipe a prompt from another command
generate_prompt | muapi image generate - --model flux-dev

# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
  --model flux-kontext-pro --download ./outputs

Source: README.md

Skill Package Structure

Each skill in the repository follows a consistent structure:

library/[category]/[skill-name]/
├── SKILL.md           # Description for agents to read
├── scripts/
│   ├── generate-[name].sh
│   └── [additional-scripts].sh
└── assets/            # Optional reference files

SKILL.md Format

Each skill includes metadata that agents parse:

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Troubleshooting

Related topics: CLI Commands Reference, Getting Started, Schema Reference

Section Related Pages

Continue reading this section for the full explanation and source context.

Troubleshooting

This page covers common issues, error conditions, and resolution steps for the Generative-Media-Skills repository. The troubleshooting content is organized by system component: API configuration, async generation workflows, media editing operations, and environment setup.

Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual

Doramagic Pitfall Log

Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.

high Security or permission risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Identity risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

medium Installation risk requires verification

May increase setup, validation, or first-run risk for the user.

Doramagic Pitfall Log

Found 17 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.

1. Security or permission risk: Security or permission risk requires verification

Severity: high
Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_e801ed325bcf4fbbb8e0d9cac02b5f7f | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/89

2. Identity risk: Identity risk requires verification

Severity: medium
Finding: Project evidence flags a identity risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: identity.distribution | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

3. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_ec1cdc92d8c84b2bbce43cf37a668443 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/44

4. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_c7a0e07d61b547aba3280ac82ac305e2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/43

5. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_c6ace8e72f95491f945200849e083d96 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/46

6. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_de5cf59443c74838a8d10d1ecbec9457 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/45

7. Installation risk: Installation risk requires verification

Severity: medium
Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_8082f7c5e7be496c89fac789d932e74c | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/34

8. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.host_targets | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

9. Configuration risk: Configuration risk requires verification

Severity: medium
Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_14498a89412f40ceb03d61889ea96de2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/38

10. Capability evidence risk: Capability evidence risk requires verification

Severity: medium
Finding: README/documentation is current enough for a first validation pass.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: capability.assumptions | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills

11. Runtime risk: Runtime risk requires verification

Severity: medium
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_36c10c9b7ece4b10af4873b655817add | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/27

12. Runtime risk: Runtime risk requires verification

Severity: medium
Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
User impact: May increase setup, validation, or first-run risk for the user.
Recommended check: Reproduce the official install and quickstart path in an isolated environment.
Evidence: community_evidence:github | cevd_a8e460757f114a9d8e4357758c834524 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/54

Source: Doramagic discovery, validation, and Project Pack records

Community Discussion Evidence

These external discussion links are review inputs, not standalone proof that the project is production-ready.

Sources 12

Count of project-level external discussion links exposed on this manual page.

Use Review before install

Open the linked issues or discussions before treating the pack as ready for your environment.

Community Discussion Evidence

Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Generative-Media-Skills with real data or production workflows.

Optional: add a 'publish to Vynly' skill after media generation? - github / github_issue
how to use gpu instead cpu - github / github_issue
What's the best way to improve answer time ? - github / github_issue
Switch default language - github / github_issue
500 Internal Server Error - github / github_issue
npm run dev hang in certain point - github / github_issue
is possible add spanish documents and question / answer in spanish to? - github / github_issue
Model download error on 100% progress - github / github_issue
weird response - github / github_issue
Server install pre-requisites on Ubuntu - github / github_issue
Error downloading model - github / github_issue
npm run dev error - github / github_issue

Source: Project Pack community evidence and pitfall evidence