Doramagic Project Pack · Human Manual
Generative-Media-Skills
Generative-Media-Skills provides AI agents with:
Getting Started
Related topics: Architecture Overview, CLI Commands Reference, Agent Integration Guide
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Architecture Overview, CLI Commands Reference, Agent Integration Guide
Getting Started
Welcome to Generative-Media-Skills — a comprehensive multimodal toolset enabling AI agents (Claude Code, Cursor, Gemini CLI) to generate, edit, and display professional-grade images, videos, and audio content through the muapi-cli interface.
This guide walks you through installation, configuration, and your first generation to get up and running in minutes.
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Architecture Overview
Related topics: Getting Started, Expert Skills Library, Schema Reference
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Getting Started, Expert Skills Library, Schema Reference
Architecture Overview
This repository implements a Core/Library split architecture designed for AI agents to generate, edit, and display professional-grade images, videos, and audio through the muapi.ai platform. The architecture prioritizes agent-native workflows with CLI-powered scripts, structured JSON outputs, and Model Context Protocol (MCP) integration.
High-Level Architecture
The Generative-Media-Skills repository acts as a skill layer that translates creative intent into technical directives, delegating actual API calls to the underlying muapi-cli tool. This separation allows the repository to focus on expert knowledge while leveraging a robust, maintained API client.
graph TD
subgraph "AI Agents"
A["Claude Code"]
B["Cursor"]
C["Gemini CLI"]
D["MCP Clients"]
end
subgraph "Generative-Media-Skills"
E["Expert Library /library"]
F["Core Primitives /core"]
G["Recipe Pack"]
end
subgraph "muapi-cli"
H["CLI Interface"]
I["API Client"]
end
subgraph "muapi.ai Platform"
J["100+ AI Models"]
K["Media Generation APIs"]
end
A --> E
B --> E
C --> E
D --> H
E --> F
F --> H
G --> H
H --> I
I --> J
I --> KSource: README.md
Core Primitives (`/core`)
The Core layer provides thin wrappers around muapi-cli for direct API access. These are low-level building blocks that handle raw platform operations.
Directory Structure
| Directory | Purpose |
|---|---|
core/media/ | File upload operations |
core/edit/ | Image editing (prompt-based) |
core/platform/ | Setup, authentication, and result polling |
Platform Utilities
Located in core/platform/, these scripts handle API configuration and async operation management:
| Script | Description |
|---|---|
setup.sh | Configure API key, show config, test key validity |
check-result.sh | Poll for async generation results by request ID |
Source: core/platform/SKILL.md
Media Editing Core
Located in core/edit/, these scripts provide enhancement operations:
| Script | Description |
|---|---|
edit-image.sh | Prompt-based image editing |
enhance-image.sh | One-click operations: upscale, background removal, face swap |
lipsync.sh | Sync video lip movement to audio |
video-effects.sh | Video/image effects |
Source: core/edit/SKILL.md
Expert Library (`/library`)
The Library layer contains high-value skills that implement domain-specific knowledge for professional results.
Skill Categories
graph LR
A["Library"] --> B["Motion / Video"]
A --> C["Social"]
A --> D["Visual / Images"]
A --> E["Edit"]
B --> B1["Cinema Director"]
B --> B2["Seedance 2"]
B --> B3["AI Clipping"]
C --> C1["YouTube Shorts"]
C --> C2["UGC Ads"]
D --> D1["Nano-Banana"]
D --> D2["UI Designer"]
D --> D3["Logo Creator"]
E --> E1["AI Clipping"]Key Expert Skills
| Skill | Category | Description |
|---|---|---|
| Cinema Director | Motion | Technical film direction & cinematography |
| Nano-Banana | Visual | Reasoning-driven image generation (Gemini 3 Style) |
| UI Designer | Visual | High-fidelity mobile/web mockups (Atomic Design) |
| Logo Creator | Visual | Minimalist vector branding |
| Seedance 2 | Motion | Director-level cinematic video generation |
| AI Clipping | Edit | Long video → ranked vertical short clips |
Source: README.md
Recipe Pack
Forty-one LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines. Each skill is a SKILL.md file the agent reads and follows.
Recipe Categories
| Category | Count | Description |
|---|---|---|
| Motion / Video | 16 | Film generation, animation, product showcases |
| Social | 5 | Instagram posts, UGC ads, social media packs |
| Visual / Design | 21 | Action figures, brand kits, logos, interior design |
Example Recipes
| Skill | Path | Description |
|---|---|---|
| 3D Logo Animation | library/motion/3d-logo-animation/ | Premium 3D logo animation |
| AI Fight Scene Generator | library/motion/ai-fight-scene/ | 16-cell storyboard → video choreography |
| Animal Vlogger Video | library/motion/animal-video-generator/ | Anthropomorphic animal content |
| Action Figure Generator | library/visual/action-figure-generator/ | Photo → 3D collectible |
| Amazon Product Listing | library/visual/amazon-product-listing/ | Full Amazon listing image set |
Source: README.md
MCP Server Architecture
The Model Context Protocol server exposes all tools directly to MCP-compatible agents.
Exposed Tools (19 Total)
| Tool | Category | Models Supported |
|---|---|---|
muapi_image_generate | Image | 14 models |
muapi_image_edit | Image | 11 models |
muapi_video_generate | Video | 13 models |
muapi_video_from_image | Video | 16 models |
muapi_audio_create | Audio | Suno (music) |
muapi_audio_from_text | Audio | MMAudio (sound effects) |
muapi_enhance_upscale | Enhancement | AI upscaling |
muapi_enhance_bg_remove | Enhancement | Background removal |
muapi_enhance_face_swap | Enhancement | Face swap (image/video) |
muapi_enhance_ghibli | Enhancement | Ghibli style transfer |
muapi_edit_lipsync | Editing | Lip sync to audio |
muapi_edit_clipping | Editing | AI highlight extraction |
muapi_predict_result | Utility | Poll prediction status |
muapi_upload_file | Utility | Upload local file → URL |
muapi_keys_list | Account | List API keys |
muapi_keys_create | Account | Create API key |
muapi_keys_delete | Account | Delete API key |
muapi_account_balance | Account | Get credit balance |
muapi_account_topup | Account | Add credits (Stripe) |
Source: README.md
MCP Configuration
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": { "MUAPI_API_KEY": "your-key-here" }
}
}
}
Schema Reference
The repository includes schema_data.json for runtime validation:
- Model ID Validation: Ensures requested models exist
- Endpoint Resolution: Maps model names to API endpoints
- Parameter Checking: Validates
aspect_ratio,resolution, andduration
CLI Model Discovery
muapi models list
muapi models list --category video --output-json
Agentic Pipeline Flow
The architecture supports asynchronous operations through a polling pattern:
sequenceDiagram
participant Agent
participant CLI as muapi-cli
participant API as muapi.ai API
participant Agent2 as Agent (other work)
Agent->>CLI: Submit async request
CLI->>API: POST /generate (async=true)
API-->>CLI: request_id
CLI-->>Agent: request_id
Agent->>Agent2: Do other work
Agent2-->>Agent: Continue...
Agent->>CLI: Poll for result
CLI->>API: GET /predict/{request_id}
API-->>CLI: status
alt Still processing
CLI->>API: GET /predict/{request_id}
else Complete
API-->>CLI: result URL
CLI-->>Agent: Download media
endExample Pipeline Commands
# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running" \
--model kling-master --no-wait --output-json --jq '.request_id')
# Poll when ready
muapi predict wait "$REQUEST_ID" --download ./outputs
# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url')
muapi image edit "make it like a painting" --image "$URL" \
--model flux-kontext-pro --download ./outputs
Source: README.md
Supported AI Agents
The architecture is optimized for the next generation of AI development environments:
| Agent | Integration Method |
|---|---|
| Claude Code | Direct terminal execution + MCP server mode |
| Cursor | Seamless local script execution |
| Gemini CLI | CLI tool integration |
| Windsurf | CLI tool integration |
| Any MCP Client | Full MCP server mode |
Common Flags
All core scripts support standardized CLI flags:
| Flag | Purpose |
|---|---|
--async | Submit request without waiting |
--json | Output raw JSON |
--download | Auto-download generated media |
--view | Auto-download and open in system viewer |
--output-json | JSON output mode |
--jq '<filter>' | Extract specific JSON fields |
--timeout N | Set operation timeout |
Requirements
| Component | Requirement |
|---|---|
| muapi-cli | Installed via npm install -g muapi-cli or pip install muapi-cli |
| API Key | Configured via muapi auth configure |
| System Tools | curl, jq, python3 |
| Node.js | For npm installation |
Source: core/edit/SKILL.md
Key Design Principles
- Agent-Native Design: CLI-powered scripts with structured JSON outputs and semantic exit codes
- No Boilerplate: All primitives delegate to
muapi-cli— no curl or manual JSON parsing - Direct Media Display:
--viewflag for automatic download and viewing - Local File Support: Auto-upload from local machine to CDN
- Schema Validation: Runtime validation of models and parameters
- CI/CD Ready:
--output-json,--jq, semantic exit codes for scripting
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
MCP Server Setup
Related topics: CLI Commands Reference, Agent Integration Guide, Schema Reference
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: CLI Commands Reference, Agent Integration Guide, Schema Reference
MCP Server Setup
The MCP (Model Context Protocol) Server in Generative-Media-Skills exposes all 19 media generation tools as structured MCP endpoints, enabling AI agents like Claude Desktop, Cursor, and other MCP-compatible clients to seamlessly invoke image, video, and audio generation without requiring shell script execution or manual API calls.
Architecture Overview
graph TD
A[Claude Desktop / Cursor / MCP Client] -->|MCP Protocol| B[muapi mcp serve]
B --> C[muapi-cli Core]
C --> D[muapi.ai API]
D --> E[100+ AI Models]
F[Local Files] -->|auto-upload| C
G[Skills Library] -->|workflows| CThe MCP Server acts as a thin bridge between MCP-compatible AI agents and the muapi.ai platform. It provides fully typed JSON Schema definitions for all tools, eliminating the need for prompt engineering or manual request construction. Source: README.md
Prerequisites
Before configuring the MCP Server, ensure you have:
| Requirement | Version/Details |
|---|---|
| Node.js | v18+ recommended |
| muapi-cli | Latest stable |
| muapi.ai API key | Available at muapi.ai/dashboard |
Install muapi-cli via npm or pip:
# via npm (recommended)
npm install -g muapi-cli
# via pip
pip install muapi-cli
Configure your API key:
muapi auth configure --api-key "YOUR_MUAPI_KEY"
Source: README.md
Starting the MCP Server
Launch the MCP server in foreground or background mode:
muapi mcp serve
The server exposes all 19 tools with full JSON Schema input/output definitions. It runs as a long-lived process that handles MCP protocol communication on the local machine.
Claude Desktop Configuration
To integrate with Claude Desktop, add muapi to your Claude configuration file.
macOS/Linux: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": {
"MUAPI_API_KEY": "your-api-key-here"
}
}
}
}
Alternatively, if you configured your API key globally via muapi auth configure, you can omit the env block:
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"]
}
}
}
After editing the config, restart Claude Desktop to load the new MCP server.
Source: README.md
Available MCP Tools
The MCP Server exposes 19 structured tools organized by category:
Image Generation & Editing
| Tool | Description | Input Models |
|---|---|---|
muapi_image_generate | Text-to-image generation | 14 models (Flux, Midjourney, DALL-E, etc.) |
muapi_image_edit | Image-to-image editing | 11 models (Flux Kontext, GPT-4o, Midjourney, Qwen) |
Video Generation
| Tool | Description | Input Models |
|---|---|---|
muapi_video_generate | Text-to-video generation | 13 models (Kling, Veo, Seedance, etc.) |
muapi_video_from_image | Image-to-video animation | 16 models |
Audio Generation
| Tool | Description | Platform |
|---|---|---|
muapi_audio_create | Music generation | Suno |
muapi_audio_from_text | Sound effects | MMAudio |
Enhancement & Effects
| Tool | Description | Models/Options |
|---|---|---|
muapi_enhance_upscale | AI upscaling | Multiple engines |
muapi_enhance_bg_remove | Background removal | One-click |
muapi_enhance_face_swap | Face swap for image/video | Multiple modes |
muapi_enhance_ghibli | Ghibli style transfer | One-click |
muapi_edit_lipsync | Lip sync to audio | Sync Labs, LatentSync, Creatify, Veed |
muapi_edit_clipping | AI highlight extraction from video | Server-side transcription |
Utility & Account
| Tool | Description |
|---|---|
muapi_predict_result | Poll async prediction status |
muapi_upload_file | Upload local file to CDN, returns URL |
muapi_keys_list | List existing API keys |
muapi_keys_create | Create new API key |
muapi_keys_delete | Delete an API key |
muapi_account_balance | Get current credit balance |
muapi_account_topup | Add credits via Stripe checkout |
Source: README.md
Other MCP-Compatible Clients
The MCP Server is not limited to Claude Desktop. Any MCP-compatible agent can use these tools:
| Client | Integration Method |
|---|---|
| Cursor | Add to Cursor settings using same JSON config structure |
| Windsurf | MCP server configuration in IDE settings |
| Gemini CLI | Direct CLI execution of MCP tools |
| Custom Agents | Any MCP-compatible agent with tool execution |
For Cursor and Windsurf, use the same server configuration as Claude Desktop.
Workflow Examples
Image Generation Workflow
graph LR
A[Agent Request] -->|muapi_image_generate| B[MCP Server]
B --> C[muapi.ai API]
C --> D[Image Model]
D --> E[Generated Image URL]
E --> B
B --> F[Agent Receives Result]Example Claude Desktop prompt:
Generate a cyberpunk city image with neon lights using the muapi_image_generate tool.
Async Video Pipeline
graph TD
A[Submit Request] -->|muapi_video_generate --no-wait| B[Get request_id]
B --> C[Do other work]
C --> D[Poll muapi_predict_result]
D -->|Still processing| D
D -->|Complete| E[Download via muapi_predict_result --download]Example terminal workflow:
# Submit async job
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')
# Poll for result
muapi predict wait "$REQUEST_ID" --download ./outputs
Chained Workflow
graph LR
A[Local Image] -->|muapi_upload_file| B[Get CDN URL]
B -->|muapi_image_edit| C[Apply Edit]
C -->|muapi_enhance_upscale| D[Upscale]
D -->|muapi_enhance_bg_remove| E[Final Output]Example:
# Upload local file
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
# Edit the image
muapi image edit "make it look like a painting" --image "$URL" \
--model flux-kontext-pro --download ./outputs
Source: README.md
Platform Utilities via MCP
The MCP Server also exposes account management tools for programmatic control:
| Tool | Use Case |
|---|---|
muapi_keys_list | Audit active API keys in CI/CD |
muapi_keys_create | Provision keys for different projects |
muapi_account_balance | Check credits before large batch jobs |
muapi_account_topup | Automated credit replenishment |
These utilities enable fully automated pipelines without manual dashboard interaction.
Source: core/platform/SKILL.md
Supported AI Models
Discover all available models at runtime:
# List all models
muapi models list
# Filter by category
muapi models list --category video --output-json
# Check supported parameters
muapi models list --category image --output-json | jq '.[] | {id, aspect_ratio, resolution}'
Model availability is validated against schema_data.json at runtime, ensuring requests specify only supported parameters.
Source: README.md
Schema Reference
All MCP tools use fully typed JSON Schema definitions. This provides:
- Input Validation — Requests are validated against supported parameters
- Autocomplete — IDEs can suggest valid parameter values
- Documentation — Tool descriptions are embedded in the schema
The schema_data.json file validates:
| Validation | Description |
|---|---|
| Model IDs | Ensures requested model exists |
| Endpoint Resolution | Maps model names to API endpoints |
| Parameter Checking | Validates aspect_ratio, resolution, duration |
Source: README.md
Troubleshooting
Server Won't Start
# Verify muapi-cli installation
muapi --version
# Check API key configuration
muapi auth configure --show
# Test connectivity
muapi auth configure --test
Tools Not Appearing in Agent
- Verify Claude Desktop config JSON is valid
- Restart Claude Desktop after config changes
- Check the MCP server process is running
- Confirm
MUAPI_API_KEYis set or global config exists
Async Requests Timeout
Use --no-wait for long-running tasks and poll separately:
muapi predict wait "REQUEST_ID" --timeout 300
Upload Failures
The muapi_upload_file tool automatically handles local file uploads. Ensure files are accessible and within size limits.
Related Skills
The MCP Server provides direct access to core primitives. For higher-level workflows, consider these expert skills:
| Skill | Description |
|---|---|
| Cinema Director | Technical film direction & cinematography |
| AI Clipping | Long video → ranked vertical clips |
| Seedance 2 | Cinematic video with native audio-video sync |
| YouTube Shorts | Platform-aware clip presets |
Source: README.md
See Also
- Quick Start Guide — Initial setup steps
- Core Primitives — Low-level tool wrappers
- Expert Library — High-value workflow skills
- muapi-cli Documentation — CLI reference
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
CLI Commands Reference
Related topics: MCP Server Setup, Schema Reference, Troubleshooting
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: MCP Server Setup, Schema Reference, Troubleshooting
CLI Commands Reference
Overview
The Generative-Media-Skills repository provides a comprehensive CLI-based interface for AI-powered media generation and manipulation. Built around the muapi-cli tool, these commands enable AI agents (Claude Code, Cursor, Gemini CLI) to generate images, videos, and audio through a standardized command-line interface with structured JSON outputs.
The CLI architecture follows a Core/Library split:
- Core Primitives (
/core): Thin wrappers for raw API access - Expert Library (
/library): High-value skills with domain-specific logic
Source: README.md
Installation
Prerequisites
Before using the CLI commands, install the muapi-cli package:
# via npm (recommended)
npm install -g muapi-cli
# via pip
pip install muapi-cli
# or run without installing
npx muapi-cli --help
API Key Configuration
Configure your muapi.ai API key before making any requests:
# Interactive setup
muapi auth configure
# Pass key directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"
# Get your key at https://muapi.ai/dashboard
Platform setup scripts are located in core/platform/:
| Script | Purpose |
|---|---|
setup.sh | Configure API key, show config, test key validity |
check-result.sh | Poll for async generation results by request ID |
# Save API key
bash core/platform/setup.sh --add-key "YOUR_MUAPI_KEY"
# Show current configuration
bash core/platform/setup.sh --show-config
# Test API key validity
bash core/platform/setup.sh --test
Source: README.md, core/platform/SKILL.md
Core Platform Commands
Authentication
muapi auth configure
muapi auth configure --api-key "YOUR_MUAPI_KEY"
Model Discovery
List available models by category:
# List all models
muapi models list
# List video models only
muapi models list --category video
# JSON output for scripting
muapi models list --category image --output-json
Async Result Polling
For async operations, capture the request_id and poll for results:
# Capture request ID
REQUEST_ID=$(muapi video generate "a dog running" \
--model kling-v3.0-pro --no-wait \
--output-json --jq '.request_id' | tr -d '"')
# Poll until complete with auto-download
muapi predict wait "$REQUEST_ID" --download ./outputs
# Check once without polling
bash core/platform/check-result.sh --id "your-request-id" --once
# Check result script usage
bash core/platform/check-result.sh --id "your-request-id"
Source: README.md, core/platform/check-result.sh
Media Generation Commands
Image Generation
Generate images from text prompts:
# Basic generation
muapi image generate "a cyberpunk city at night"
# Specify model
muapi image generate "a sunset over mountains" --model flux-schnell
# Auto-download to directory
muapi image generate "product on white bg" --model flux-schnell --download ./outputs
# Extract URL for agent pipelines
muapi image generate "landscape" --model flux-dev --output-json --jq '.outputs[0]'
Available Models (14 text-to-image models):
flux-dev,flux-schnell,flux-kontext-pro(Flux family)midjourney-v7,midjourney-v6.1(Midjourney)hidream-fast,hidream-pixel(HiDream)gpt-image-1,gpt-4o(OpenAI)veo3,veo2(Google)imagen4,imagen3
Video Generation
Generate videos from text or images:
# Text-to-video
muapi video generate "a dog running on a beach" --model kling-v3.0-pro
# Image-to-video
muapi video from-image "path/to/image.jpg" --model seedance-2 --subject "camera pans left"
# With duration
muapi video generate "ocean waves" --model kling-master --duration 10
Available Models:
- Text-to-video: 13 models including
kling-v3.0-pro,kling-master,seedance-2,veo3 - Image-to-video: 16 models
Audio Generation
# Music generation (Suno)
muapi audio create "upbeat electronic dance track" --duration 30
# Sound effects (MMAudio)
muapi audio from-text "thunder rumbling in distance"
Source: README.md
Media Editing Commands
Image Editing
The edit-image.sh script provides prompt-based image editing:
bash core/edit/edit-image.sh \
--image-url "https://example.com/image.jpg" \
--prompt "add sunglasses" \
--model flux-kontext-pro
Supported Models:
| Model | Use Case |
|---|---|
flux-kontext-pro | Flux Kontext editing |
gpt-4o | OpenAI vision editing |
midjourney-v7 | Midjourney style editing |
qwen-vl-max | Qwen vision editing |
Source: core/edit/edit-image.sh
Image Enhancement
One-click enhancement operations via enhance-image.sh:
# AI upscaling
bash core/edit/enhance-image.sh --op upscale --image-url "https://..."
# Background removal
bash core/edit/enhance-image.sh --op background-remove --image-url "https://..."
# Face swap
bash core/edit/enhance-image.sh --op face-swap --image-url "..." --face-url "..."
# Colorize
bash core/edit/enhance-image.sh --op colorize --image-url "..."
# Ghibli style transfer
bash core/edit/enhance-image.sh --op ghibli --image-url "..."
# Product shot
bash core/edit/enhance-image.sh --op product-shot --image-url "..."
Source: core/edit/enhance-image.sh
Lip Sync
Synchronize video lip movements to audio:
bash core/edit/lipsync.sh \
--video-url "https://..." \
--audio-url "https://..." \
--model sync
Supported Models: sync (Sync Labs), latent-sync, creatify, veed
Source: core/edit/lipsync.sh
Video Effects
Apply effects to videos and images:
# Dance effect (image + audio → animated video)
bash core/edit/video-effects.sh \
--op dance \
--image-url "https://..." \
--audio-url "https://..."
# Face swap
bash core/edit/video-effects.sh --op face-swap --video-url "..." --face-url "..."
# Dress change
bash core/edit/video-effects.sh --op dress-change --video-url "..." --dress-url "..."
# Luma reframing
bash core/edit/video-effects.sh --op reframe --video-url "..."
Source: core/edit/video-effects.sh
Common Flags
All core scripts support these standard flags:
| Flag | Description |
|---|---|
--async | Submit request without waiting for completion |
--json | Output raw JSON response |
--timeout N | Set request timeout in seconds |
--download <path> | Auto-download results to specified directory |
--view | Download and open result in system viewer |
--output-json --jq '<expr>' | Extract specific field using jq |
--help | Show usage information |
Source: README.md, core/edit/SKILL.md
Agentic Pipeline Examples
Async Workflow
graph TD
A[Submit Async Request] --> B[Capture request_id]
B --> C[Do Other Work]
C --> D[Poll for Result]
D --> E{Complete?}
E -->|No| D
E -->|Yes| F[Download Output]# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait \
--output-json --jq '.request_id' | tr -d '"')
# ... do other work ...
muapi predict wait "$REQUEST_ID" --download ./outputs
File Upload Pipeline
# Upload local file → edit → download
URL=$(muapi upload file ./photo.jpg \
--output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" \
--image "$URL" --model flux-kontext-pro --download ./outputs
Command Chaining
# Pipe prompt from another command
generate_prompt | muapi image generate - --model flux-dev
# Chain multiple operations
muapi upload file ./source.jpg | \
muapi enhance image --op upscale | \
muapi predict wait - --download ./final
Source: README.md
Expert Library Scripts
The /library directory contains specialized scripts for domain-specific workflows:
Cinema Director
Generate cinematic video with professional direction:
cd library/motion/cinema-director
# Create 10-second epic reveal
bash scripts/generate-film.sh \
--subject "a cybernetic dragon over Tokyo" \
--intent "epic" \
--model "kling-v3.0-pro" \
--duration 10 \
--view
# Animate reference image into video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
--mode i2v \
--file ./concept.jpg \
--subject "camera slowly pulls back" \
--intent "reveal" \
--view
# Extend existing video
bash library/motion/seedance-2/scripts/generate-seedance.sh \
--mode extend \
--request-id "YOUR_REQUEST_ID" \
--subject "camera continues pulling back" \
--duration 10
Nano-Banana
Reasoning-driven image generation:
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
--file ./my-source-image.jpg \
--subject "a glass hummingbird" \
--style "macro photography" \
--resolution "2k" \
--view
Skill Installation for Agents
# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all
# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor
Source: README.md
MCP Server Mode
Run muapi as a Model Context Protocol server for direct tool access:
muapi mcp serve
Claude Desktop Configuration (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": { "MUAPI_API_KEY": "your-key-here" }
}
}
}
Exposed MCP Tools:
| Tool | Description |
|---|---|
muapi_image_generate | Text-to-image (14 models) |
muapi_image_edit | Image-to-image editing (11 models) |
muapi_video_generate | Text-to-video (13 models) |
muapi_video_from_image | Image-to-video (16 models) |
muapi_audio_create | Music generation (Suno) |
muapi_audio_from_text | Sound effects (MMAudio) |
muapi_enhance_upscale | AI upscaling |
muapi_enhance_bg_remove | Background removal |
muapi_enhance_face_swap | Face swap image/video |
muapi_enhance_ghibli | Ghibli style transfer |
muapi_edit_lipsync | Lip sync to audio |
muapi_edit_clipping | AI highlight extraction |
muapi_predict_result | Poll prediction status |
muapi_upload_file | Upload local file → URL |
muapi_keys_list | List API keys |
muapi_keys_create | Create API key |
muapi_keys_delete | Delete API key |
muapi_account_balance | Get credit balance |
muapi_account_topup | Add credits (Stripe checkout) |
Source: README.md
Requirements
All core scripts require:
| Dependency | Purpose |
|---|---|
MUAPI_KEY env var | Set via core/platform/setup.sh |
curl | HTTP requests |
jq | JSON parsing |
python3 | Helper scripts |
Check requirements:
# Verify environment
muapi auth configure --test
# Show current config
muapi auth configure --show-config
Source: core/platform/SKILL.md, core/edit/SKILL.md
Troubleshooting
Common Issues
| Issue | Solution |
|---|---|
ReferenceError: response is not defined | Ensure API key is configured via muapi auth configure |
| Timeout errors | Use --timeout N flag to increase timeout |
| Model download stalls at 100% | Verify model file integrity; re-download if corrupted |
| 500 Internal Server Error | Server overloaded; retry with exponential backoff |
npm run dev hangs | Use PowerShell or WSL; ensure Node.js 18+ installed |
Verification Commands
# Test API connectivity
bash core/platform/setup.sh --test
# List configured models
muapi models list
# Check account balance
muapi account balance
Server Dependencies (Ubuntu)
If running server components:
apt install python3-dev make g++
pip install wheel
pip install -r requirements.txt
curl -sL https://deb.nodesource.com/setup_18.x | bash -
Source: core/platform/SKILL.md, README.md
See Also
- README.md — Full project documentation
- muapi-cli — CLI tool documentation
- Schema Reference — Model validation and endpoint definitions
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Expert Skills Library
Related topics: Recipe Pack, Workflow Scripts, Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Recipe Pack, Workflow Scripts, Architecture Overview
Expert Skills Library
The Expert Skills Library is the high-value knowledge layer of the Generative-Media-Skills repository. It provides domain-specific skills that translate creative intent into technical directives for AI agents, enabling professional-grade image, video, and audio generation without requiring users to understand the underlying API complexity.
Overview
The repository uses a Core/Library split architecture:
| Layer | Purpose | Location |
|---|---|---|
| Core Primitives | Thin wrappers around muapi-cli for raw API access | /core/ |
| Expert Library | Domain-specific skills with professional knowledge baked in | /library/ |
| Recipe Pack | LLM-orchestrated workflow recipes combining multiple skills | /library/*/ |
Source: README.md
Architecture Diagram
graph TD
subgraph "Expert Skills Library"
A["🎬 Motion / Video<br/>(16 skills)"] --> D["Cinema Director"]
A --> E["Seedance 2"]
A --> F["AI Clipping"]
B["🎨 Visual / Design<br/>(21 skills)"] --> G["Nano-Banana"]
B --> H["UI Designer"]
B --> I["Logo Creator"]
C["📱 Social<br/>(5 skills)"] --> J["YouTube Shorts"]
C --> K["UGC Ads Workflow"]
end
L["muapi-cli"] --> M["19 Structured Tools"]
D --> L
G --> L
J --> L
M --> N["Claude Code / Cursor / MCP"]Skill Categories
Motion / Video Skills
The motion library contains 16 skills for video generation and animation.
| Skill | Description | Key Capability |
|---|---|---|
| Cinema Director | Technical film direction & cinematography | Directs Seedance 2.0 with camera movements, lighting, and timing |
| Seedance 2 (Doubao Video) | Director-level cinematic video generation | Text-to-video, image-to-video, video extension with audio-video sync |
| AI Fight Scene Generator | High-cut-density action sequences | 16-cell storyboard image drives Seedance 2.0 i2v |
| 3D Logo Animation | Premium 3D logo animation | Transforms 2D logos with cinematic effects |
| Animal Vlogger Video | Anthropomorphic animal content | Ultra-realistic characters in real-world settings |
| Cartoon Dance Animation | Photo to Pixar-style 3D animation | Reference dance/motion video driving |
| Drone-Style Video | Aerial drone-perspective footage | Bird's-eye sweeps, orbit shots, flyovers |
| Giant Product Showcase | Dramatic giant-scale visuals | Building-sized objects next to people |
| Jewelry Product Video | Luxury jewelry cinematography | Macro animation and commercial quality |
| Music Video | Short music video generation | Keyframes per beat, music track matching |
| One-Shot Video | Single continuous cinematic shot | No cuts, seamless flowing scene |
| Product Ad Cinematic | 5-10s product advertisement | From product photo + brand brief |
| Product Showcase Video | Dynamic product animation | Explosive ingredient arrangement |
| Talking Baby Video | Viral-style talking baby | Custom costumes and scripts |
| UGC Lifestyle Try-On | Lifestyle content generation | Authentic social-native photos & video |
| UGC Video Factory | 10s vertical UGC video ad | Nano-Banana Pro Edit → Seedance 2.0 VIP i2v |
Visual / Images & Design Skills
The visual library contains 21 skills for image generation and design.
| Skill | Description | Output |
|---|---|---|
| Nano-Banana | Reasoning-driven image generation | Gemini 3 Style reasoning for high-quality outputs |
| UI Designer | High-fidelity mobile/web mockups | Atomic Design principles, component-based |
| Logo Creator | Minimalist vector branding | Geometric Primitives, accurate brand-name text |
| Action Figure Generator | Photo → custom 3D action figure | Collectible toy packaging |
| Ad Creative Set | High-converting ad assets | Hero image, copy variations, platform crops |
| Amazon Product Listing Pack | Full Amazon listing images | Hero, lifestyle, infographic, comparisons |
| Blog Header | Professional blog header | 1200×628 with title composition guidance |
| Brand Kit | Cohesive brand visual kit | Logo concept, color palette, typography |
| Brochure Designer | Multi-page brochure | Cover, inner spread, back |
| Brand Design Guide | Comprehensive design system | Palette, typography, UI components |
| Couple Grid Creator | Stylized couple grid | 6-box romantic poses in packaging |
| Fashion Try-On | Virtual outfit try-on | Person photo + clothing combination |
| Floor Plan Rendering | 2D → 3D architectural | Realistic 3D room visualization |
| Interior Design | Pro interior visualizations | Redesign rooms, furniture styles |
| Interior Design Visualizer | Room furniture generation | Fill empty rooms or redesign existing |
| Keyboard Art Maker | Keycap art | Top-down artistic keyboard arrangements |
| Logo + Branding Package | Complete branding | Variations, palette, mockups |
| Multi-Angle Reshoot | Multiple camera angles | Fish-eye, bird's-eye, low, macro shots |
| Multi-Angle Shots | Full product shot set | Front, side, back, top-down, 45° |
| Storyboard Generator | N keyframes for scenes | Story sequence visualization |
| URL to Design | Website → redesigned UI | Analyze URL and generate improved design |
| YouTube Thumbnail | High-CTR thumbnails | Bold text, emotional faces, striking imagery |
Social Skills
| Skill | Description | Platforms |
|---|---|---|
| Instagram Post | On-brand Instagram content | |
| Product Campaign Pack | Multi-channel campaign | Meta, Google, LinkedIn, TikTok |
| RedNote Cover | Xiaohongshu covers | 小红书 |
| Social Media Pack | Platform crops | Instagram, TikTok, Shorts, X |
| UGC Ads Workflow | Video ad pipeline | Social-native UGC style |
| YouTube Shorts | Platform-aware short clips | Shorts, TikTok, Reels, Feed |
Edit Skills
The edit library provides post-processing capabilities.
Source: core/edit/SKILL.md
| Script | Operation | Description |
|---|---|---|
edit-image.sh | Prompt-based editing | Flux Kontext, GPT-4o, Midjourney, Qwen |
enhance-image.sh | One-click operations | Upscale, background removal, face swap, colorize, Ghibli style, product shots |
lipsync.sh | Lip sync | Sync Labs, LatentSync, Creatify, Veed |
video-effects.sh | Video effects | Wan AI, face swap, dance, dress change, Luma |
Core Expert Skills
Cinema Director
Technical film direction that translates creative intent into Seedance 2.0 directives.
Location: /library/motion/cinema-director/
Capabilities:
- Camera movement planning
- Lighting direction
- Timing and pacing
- Scene composition
Nano-Banana
Reasoning-driven image generation using chain-of-thought prompting.
Location: /library/visual/nano-banana/
Purpose: Apply "Gemini 3 Style" reasoning to generate high-quality images through explicit problem-solving steps.
UI Designer
High-fidelity mobile and web mockup generation using Atomic Design principles.
Location: /library/visual/ui-design/
Features:
- Component-based design
- Responsive layouts
- Design system adherence
Logo Creator
Minimalist vector branding generation using geometric primitives.
Location: /library/visual/logo-creator/
Output: Accurate brand-name text rendering with clean vector aesthetic.
Seedance 2 (Doubao Video)
Director-level cinematic video generation supporting multiple modes.
Location: /library/motion/seedance-2/
| Mode | Description |
|---|---|
t2v | Text-to-video generation |
i2v | Image-to-video animation |
extend | Video extension |
Usage Example:
# Text-to-video
bash scripts/generate-seedance.sh --mode t2v --subject "a cybernetic dragon" --intent "epic" --duration 10 --view
# Image-to-video
bash scripts/generate-seedance.sh --mode i2v --file ./concept.jpg --subject "camera pulls back" --intent "reveal" --view
# Extend existing video
bash scripts/generate-seedance.sh --mode extend --request-id "YOUR_ID" --subject "camera continues" --duration 10
AI Clipping
Server-side long video processing for short clip extraction.
Location: /library/edit/ai-clipping/
Features:
- Server-side transcription (no local Whisper)
- Virality ranking
- Deduplication
- Face-tracked auto-crop
YouTube Shorts
Platform-aware preset over AI Clipping with optimized defaults.
Location: /library/social/youtube-shorts/
Platform Defaults:
| Platform | Aspect Ratio | Duration |
|---|---|---|
| Shorts | 9:16 | 60s max |
| TikTok | 9:16 | 60s max |
| Reels | 9:16 | 90s max |
| Feed | 16:9 or 1:1 | Variable |
Platform Utilities
The /core/platform/ directory provides essential utilities for skill execution.
Source: core/platform/SKILL.md
| Script | Description |
|---|---|
setup.sh | Configure API key, show config, test key validity |
check-result.sh | Poll for async generation results |
Quick Start:
# Save API key
bash setup.sh --add-key "YOUR_MUAPI_KEY"
# Test connectivity
bash setup.sh --test
# Poll for result
bash check-result.sh --id "your-request-id"
Recipe Pack
41 LLM-orchestrated workflow recipes that combine multiple muapi-cli calls into named end-to-end pipelines.
Characteristics:
- Each skill is a
SKILL.mdfile the agent reads and follows - Designed for consuming agents (Claude Code, Cursor, MCP)
- Recipes, not bash wrappers
- Bring your own executing agent
Integration with AI Agents
The Expert Skills Library is designed for seamless integration with AI development environments.
Supported Platforms
| Platform | Integration Method |
|---|---|
| Claude Code | Direct terminal execution via tools + MCP server mode |
| Cursor | MCP server mode |
| Gemini CLI | Local scripts |
| Windsurf | Local scripts |
MCP Server Mode
muapi mcp serve
This exposes 19 structured tools with full JSON Schema input/output definitions to Claude Desktop, Cursor, or any MCP-compatible agent.
Source: README.md
Requirements
All expert skills require:
| Requirement | Description |
|---|---|
muapi-cli | Core CLI tool for API access |
MUAPI_KEY | API key configured via core/platform/setup.sh |
| Standard tools | curl, jq, python3 (varies by skill) |
Common Workflow Patterns
Async Generation with Polling
sequenceDiagram
participant Agent
participant muapi as muapi-cli
participant API as muapi.ai API
Agent->>muapi: Submit async request (--no-wait)
muapi->>API: POST request
API-->>muapi: request_id
muapi-->>Agent: Return request_id
loop Poll until complete
Agent->>muapi: check-result --id request_id
muapi->>API: GET status
API-->>muapi: status update
muapi-->>Agent: Progress/Ready
end
Agent->>muapi: Download result (--download)
muapi->>API: GET download
API-->>muapi: Media file
muapi-->>Agent: Saved outputAgentic Pipeline Example
# 1. Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')
# 2. Do other work...
# 3. Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs
# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
--model flux-kontext-pro --download ./outputs
Community Considerations
Based on community feedback, several areas are frequently discussed:
| Topic | Status | Notes |
|---|---|---|
| Publishing destinations | Feature request | Users request post-generation publishing (e.g., Vynly integration) |
| GPU acceleration | Planned | Local model acceleration discussed for future support |
| Multilingual content | Supported | Upload documents and query in any language |
| Server errors | Resolved | Initial 500 errors addressed in later versions |
Source: GitHub Issues #7, #89, #24
Quick Reference
| Skill Category | Count | Primary Use Case |
|---|---|---|
| Motion / Video | 16 | Cinematic video, animation, product showcases |
| Visual / Design | 21 | Branding, UI, product imagery, marketing |
| Social | 5 | Platform-specific content, UGC ads |
| Edit | 4 | Post-processing, enhancement, effects |
Total: 46+ expert skills organized for agentic execution.
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Recipe Pack
Related topics: Expert Skills Library, Workflow Scripts
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Expert Skills Library, Workflow Scripts
Recipe Pack
The Recipe Pack is a curated collection of 41 LLM-orchestrated workflow recipes that translate creative intent into executable muapi-cli pipelines. Each recipe is a self-contained SKILL.md file containing structured instructions that AI agents can read and execute without additional configuration. Source: README.md
Overview
Recipe Pack workflows combine multiple muapi-cli calls into named end-to-end pipelines. Rather than requiring developers to manually chain image generation, video creation, and enhancement operations, recipes provide:
- Pre-defined creative logic — domain expertise baked into executable steps
- Multi-step pipelines — complex outputs from simple inputs (e.g., "photo of person → 3D action figure")
- Agent-native format — SKILL.md files that LLMs can parse and follow directly
- Professional quality — cinematographic, branding, and design best practices embedded Source: README.md
Architecture
Recipes follow a layered architecture that separates creative intent from technical execution:
graph TD
A[User Input / Agent Prompt] --> B[SKILL.md Recipe]
B --> C[muapi-cli Calls]
C --> D[muapi.ai API]
D --> E[Generated Media]
F[Core Primitives] --> C
G[Expert Library] --> BRecipe Structure
Each recipe declares its inputs and a Steps body. The executing agent reads the SKILL.md and translates instructions into muapi CLI calls. Source: README.md
| Layer | Location | Purpose |
|---|---|---|
| Core Primitives | /core/ | Thin wrappers around muapi-cli for raw API access (media, edit, platform) |
| Expert Library | /library/ | High-value skills translating creative intent to technical directives |
| Recipe Pack | /library/*/ | 41 named pipelines combining multiple primitives |
Recipe Categories
The Recipe Pack is organized into three primary categories:
Motion / Video (16 Recipes)
High-production video workflows including cinematography, animation, and UGC content.
| Skill | Description |
|---|---|
| 3D Logo Animation | Transform a 2D logo into a premium 3D version with cinematic effects |
| AI Fight Scene Generator | High-cut-density action sequence — 16-cell storyboard drives Seedance 2.0 i2v |
| Animal Vlogger Video | Anthropomorphic animal vlogger in real-world settings |
| Cartoon Dance Animation | Photo → Pixar-style 3D cartoon with dance animation |
| Character Story Video | Multi-part animated story with consistent character |
| Drone-Style Video | Aerial footage — bird's-eye sweeps, orbit shots, flyovers |
| Giant Product Showcase | Building-sized product visual with optional animation |
| Jewelry Product Video | Luxury jewelry ad with macro animation |
| Music Video | Short music video from song theme — keyframes per beat |
| One-Shot Video | Single continuous cinematic shot |
| Cinematic Product Ad | 5–10s product ad from photo + brand brief |
| Product Showcase Video | Dynamic product showcase with motion animation |
| Product Video Ad Maker | Cinematic video ad from product photo |
| Talking Baby Video | Viral-style talking baby with costumes and scripts |
| UGC Lifestyle Try-On | Lifestyle photos & video of person using product |
| UGC Video Factory | Person + product + script → 10s vertical UGC video ad |
Source: README.md
Social (5 Recipes)
Platform-optimized social media content and multi-channel campaigns.
| Skill | Description |
|---|---|
| Instagram Post | Hero image + caption + hashtags |
| Product Campaign Pack | Multi-channel campaign — hero visuals, social assets, video, crops |
| RedNote Cover | Xiaohongshu (小红书) cover — lifestyle aesthetic with typography |
| Social Media Pack | Hero image → Instagram / TikTok / Shorts / X aspect ratios |
| UGC Ads Workflow | Selfie + product image + script → animated ad |
Source: README.md
Visual / Images & Design (21 Recipes)
Professional image generation, branding, and design assets.
| Skill | Description |
|---|---|
| Action Figure Generator | Photo → custom 3D action figure with collectible packaging |
| Ad Creative Set | Hero image + copy variations + platform crops |
| Amazon Product Listing Pack | Hero, lifestyle, infographic, comparison images |
| Blog Header | 1200×628 blog header with title composition |
| Brand Kit | Logo concept + color palette + typography pairings |
| Brochure Designer | Multi-page brochure — cover, inner spread, back |
| Couple Grid Creator | 6-box stylized grid in cardboard packaging frames |
| Brand Design Guide | Palette, typography, UI components, visual identity |
| Fashion Try-On | Person + clothing → fashion model video |
| Floor Plan Rendering | 2D floor plan → realistic 3D architectural rendering |
| Interior Design | Pro interior design visualizations |
| Interior Design Visualizer | Empty room → filled with furniture / redesign existing room |
| Keyboard Art Maker | Keycaps spelling custom messages |
| Logo + Branding Package | Logo variations (dark/light/icon) + palette + mockups |
| Logo Generator | Quick single-shot polished logo |
| Multi-Angle Reshoot | Subject from fish-eye, bird's-eye, low, macro angles |
| Multi-Angle Shots | Full product shot set — front, side, back, top-down, 45° |
| Selfie with Celebrities | Realistic selfie with celebrity; optional cinematic |
| Storyboard Generator | N keyframes for story or scene sequence |
| URL to Design | Website → redesigned UI with modern aesthetics |
| YouTube Thumbnail | High-CTR thumbnail — bold text, emotional imagery |
Source: README.md
Execution Model
Recipes are designed for agentic execution. The consuming agent (Claude Code, Cursor, MCP, etc.) reads the SKILL.md file and executes the steps via muapi CLI calls. Source: README.md
Typical Recipe Flow
graph LR
A[Input Media<br/>or Description] --> B[Parse SKILL.md]
B --> C[Step 1: Generate<br/>Base Asset]
C --> D[Step 2: Enhance<br/>or Transform]
D --> E[Step 3: Apply<br/>Effects/Animation]
E --> F[Output:<br/>Final Media]Key Recipe Patterns
Pattern 1: Image-to-Video Pipeline
# 1. Generate or upload source image
muapi image generate "product photo on white" --model flux-schnell
# 2. Animate into video
muapi video from-image \
--image "SOURCE_IMAGE_URL" \
--subject "camera slowly orbits the product" \
--model seedance-2.0-vip
Pattern 2: Multi-Asset Composite
# 1. Generate selfie
muapi image generate "professional selfie" --model flux-dev
# 2. Generate product
muapi image generate "product photo" --model flux-schnell
# 3. Combine in video
muapi video from-image \
--image "COMPOSITE_URL" \
--model kling-v3.0-pro
Pattern 3: Async Pipeline with Polling
# Submit async, capture request_id
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')
# Poll for completion
muapi predict wait "$REQUEST_ID" --download ./outputs
Example Recipes
3D Logo Animation
Transforms a 2D logo into an animated 3D version with cinematic effects.
Location: library/motion/3d-logo-animation/SKILL.md
Workflow:
- Accept 2D logo input (URL or local file)
- Generate 3D version using image-to-3D model
- Apply animation choreography (rotation, light sweep, particle effects)
- Output final video asset
Models Used:
- Image generation: Flux variants, Midjourney
- 3D conversion: Dedicated 3D models
- Video: Seedance 2.0, Kling 3.0
Cinematic Product Ad
Creates a 5–10 second product advertisement from a product photo and brand brief.
Location: library/motion/product-ad-cinematic/SKILL.md
Workflow:
- Accept product photo and brand brief (tone, colors, messaging)
- Generate lifestyle background scene
- Composite product into scene
- Animate with cinematic camera movement
- Apply color grading matching brand identity
Output: Professional-grade product commercial
Action Figure Generator
Converts a photo of a person into a custom 3D action figure with collectible toy packaging.
Location: library/visual/action-figure-generator/SKILL.md
Workflow:
- Accept subject photo
- Generate 3D-rendered action figure likeness
- Create collectible packaging (blister card, header card)
- Apply toy-grade styling (plastic sheen, stylized proportions)
Use Cases:
- Personalized gifts
- Marketing materials
- Fan merchandise concepts
YouTube Shorts Generator
Converts long-form video content into platform-optimized short clips.
Location: library/social/youtube-shorts/SKILL.md
Workflow:
- Upload or reference source video
- AI identifies best highlights (using transcription + virality ranking)
- Extract vertical clips (9:16 aspect ratio)
- Auto-crop to face-tracked subjects
- Apply platform-specific formatting (TikTok, Reels, Shorts)
Features:
- Server-side transcription (no local Whisper required)
- Deduplication of similar clips
- Face-tracked auto-crop
Integration with AI Agents
Installing to Claude Code
npx skills add SamurAIGPT/Generative-Media-Skills --all
Installing to Specific Agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor
MCP Server Mode
Recipes can also be executed via the Model Context Protocol server:
muapi mcp serve
This exposes 19 structured tools directly to MCP-compatible agents. Source: README.md
Claude Desktop Configuration
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": { "MUAPI_API_KEY": "your-key-here" }
}
}
}
Running Recipes Manually
Each recipe includes executable shell scripts for direct invocation:
# Generate a cinematic film
cd library/motion/cinema-director
bash scripts/generate-film.sh \
--subject "a cybernetic dragon over Tokyo" \
--intent "epic" \
--model "kling-v3.0-pro" \
--duration 10 \
--view
# Use Nano-Banana reasoning for image generation
bash library/visual/nano-banana/scripts/generate-nano-art.sh \
--file ./my-source-image.jpg \
--subject "a glass hummingbird" \
--style "macro photography" \
--resolution "2k" \
--view
Extending the Recipe Pack
Recipes follow a consistent structure that makes them easy to extend:
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Workflow Scripts
Related topics: Expert Skills Library, Recipe Pack, CLI Commands Reference
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: Expert Skills Library, Recipe Pack, CLI Commands Reference
Workflow Scripts
Workflow Scripts provide the foundational infrastructure for executing multi-step generative media pipelines within the Generative-Media-Skills framework. These scripts orchestrate complex operations by chaining together muapi-cli commands, enabling AI agents to execute sophisticated end-to-end media generation workflows with minimal configuration.
Overview
The workflow system serves as a bridge between high-level creative intent and low-level API operations. Rather than requiring agents to manually construct and sequence individual API calls, workflow scripts encapsulate entire pipelines as executable units that handle:
- Input validation and parameter passing
- State management across pipeline stages
- Error handling and recovery mechanisms
- Result aggregation from multiple generation steps
Source: library/workflow/SKILL.md
Architecture
The workflow subsystem follows a Core/Library architectural pattern consistent with the broader Generative-Media-Skills project:
graph TD
A[User/Agent Request] --> B[Workflow Selection]
B --> C{Interactive or Direct?}
C -->|Interactive| D[interactive-run.sh]
C -->|Direct| E[run-workflow.sh]
D --> F[Parameter Collection]
E --> G[Execute Pipeline]
F --> G
G --> H[muapi-cli Operations]
H --> I[Media Generation]
I --> J[Result Aggregation]
J --> K[Output Delivery]
L[discover-workflow.sh] -.->|Discovery| B
M[list-workflows.sh] -.->|Catalog| N[Available Workflows]Component Responsibilities
| Component | Role | Location |
|---|---|---|
SKILL.md | Metadata, usage documentation, and skill definition | /library/workflow/ |
discover-workflow.sh | Scans and identifies available workflow definitions | /library/workflow/scripts/ |
run-workflow.sh | Executes workflows with provided parameters | /library/workflow/scripts/ |
generate-workflow.sh | Creates new workflow definitions or generates workflow output | /library/workflow/scripts/ |
interactive-run.sh | Guides users through workflow execution via prompts | /library/workflow/scripts/ |
list-workflows.sh | Displays catalog of available workflows | /library/workflow/scripts/ |
Source: library/workflow/scripts/run-workflow.sh, library/workflow/scripts/interactive-run.sh
Core Scripts Reference
list-workflows.sh
Lists all available workflow definitions in the system. This script scans the workflow directory and presents workflows in a structured format suitable for both human review and agent consumption.
bash list-workflows.sh [--format json|text]
Parameters:
| Parameter | Type | Description |
|---|---|---|
--format | string | Output format: json for machine parsing, text for human-readable (default: text) |
Source: library/workflow/scripts/list-workflows.sh
discover-workflow.sh
Performs discovery and validation of workflow definitions. This script identifies all workflow files, parses their metadata, and verifies structural integrity before execution.
bash discover-workflow.sh [--path <directory>] [--validate]
Parameters:
| Parameter | Type | Description |
|---|---|---|
--path | string | Directory path to scan for workflows (default: current workflow library) |
--validate | flag | Perform structural validation of discovered workflows |
Discovery Output Structure:
{
"workflows": [
{
"id": "workflow-identifier",
"name": "Human Readable Name",
"description": "Workflow purpose and capabilities",
"inputs": ["required", "parameters"],
"outputs": ["expected", "results"],
"steps": ["sequential", "operations"]
}
]
}
Source: library/workflow/scripts/discover-workflow.sh
run-workflow.sh
Executes a specified workflow with provided or default parameters. This is the primary execution engine that coordinates the actual muapi-cli calls.
bash run-workflow.sh \
--workflow <workflow-id> \
--input <input-path-or-url> \
--output <output-directory> \
[--param-key value...]
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
--workflow | string | Yes | Unique identifier of workflow to execute |
--input | string | Yes | Primary input (file path, URL, or prompt) |
--output | string | No | Output directory (default: ./outputs) |
--param-* | mixed | No | Additional workflow-specific parameters |
Exit Codes:
| Code | Meaning |
|---|---|
0 | Workflow completed successfully |
1 | Invalid workflow ID |
2 | Input validation failed |
3 | API call failed |
4 | Output generation failed |
Source: library/workflow/scripts/run-workflow.sh
generate-workflow.sh
Generates workflow definitions or produces workflow-based outputs. This script supports both workflow creation (for defining new pipelines) and output generation (for producing artifacts).
bash generate-workflow.sh \
--template <template-id> \
--spec <specification-file> \
--output <output-path>
Parameters:
| Parameter | Type | Description |
|---|---|---|
--template | string | Template identifier to base new workflow on |
--spec | string | YAML/JSON specification file defining workflow structure |
--output | string | Destination for generated workflow definition or output |
Source: library/workflow/scripts/generate-workflow.sh
interactive-run.sh
Provides an interactive, question-driven interface for workflow execution. Users are prompted for required inputs, and the script validates each parameter before proceeding.
bash interactive-run.sh [--workflow <workflow-id>]
Interactive Flow:
graph LR
A[Start] --> B{Workflow ID Provided?}
B -->|No| C[List Available Workflows]
C --> D[Select Workflow]
B -->|Yes| E[Load Workflow Definition]
D --> E
E --> F[Prompt: Input 1]
F --> G[Validate Input 1]
G -->|Valid| H[Prompt: Input 2]
G -->|Invalid| F
H --> I[... Continue N times]
I --> J[Execute Workflow]
J --> K[Display Results]
K --> L[End]Supported Prompts:
| Prompt Type | Validation | Description |
|---|---|---|
text | regex pattern | Free-form text input |
url | URL format | Web resource URLs |
file | path exists | Local file paths |
select | enum values | Enumerated choice |
confirm | boolean | Yes/No confirmation |
Source: library/workflow/scripts/interactive-run.sh
Workflow Execution Pipeline
When executing a workflow, the system follows a consistent pipeline pattern:
graph TD
subgraph "Stage 1: Initialization"
A1[Parse Workflow Definition] --> A2[Resolve Input Parameters]
A2 --> A3[Initialize Output Directory]
end
subgraph "Stage 2: Execution"
A3 --> B1[Execute Step 1]
B1 --> B2{Step 1 Success?}
B2 -->|Yes| B3[Execute Step 2]
B2 -->|No| B4[Log Error]
B4 --> B5[Rollback if Needed]
B3 --> B6{Step 2 Success?}
B6 -->|Yes| B7[Execute Step N]
B6 -->|No| B4
end
subgraph "Stage 3: Aggregation"
B7 --> C1[Collect Step Outputs]
C1 --> C2[Merge Results]
C2 --> C3[Generate Metadata]
C3 --> C4[Write Final Output]
endStep Execution Model
Each workflow step follows this execution model:
# Pseudo-code for step execution
for step in workflow.steps:
result = muapi-cli <operation> <step.params>
if result.success:
cache(step.id, result)
else:
handle_error(step, result)
Source: library/workflow/SKILL.md, library/workflow/scripts/run-workflow.sh
Integration with Recipe Pack
The workflow scripts serve as the execution backbone for the Recipe Pack — a collection of 41 pre-built LLM-orchestrated workflow recipes. Each recipe in the library maps to one or more workflow script invocations.
Recipe Categories
| Category | Count | Example Workflows |
|---|---|---|
| Motion/Video | 16 | 3D Logo Animation, AI Fight Scene Generator, Drone-Style Video |
| Visual/Images | 21 | Action Figure Generator, Brand Kit, Interior Design |
| Social | 5 | Instagram Post, Product Campaign Pack, UGC Ads Workflow |
| Edit | 1 | AI Clipping |
| Motion Specialized | 4 | Cinema Director, Seedance 2.0, YouTube Shorts |
Source: README.md - Recipe Pack Section
Recipe-to-Workflow Mapping
Complex recipes often combine multiple workflow scripts:
# Example: Action Figure Generator Recipe
# Step 1: Generate base image
muapi image generate "3D render of action figure" --model flux-dev
# Step 2: Enhance with workflow
bash run-workflow.sh --workflow enhance-3d \
--input "./outputs/step1.png" \
--output "./outputs"
# Step 3: Add packaging
bash run-workflow.sh --workflow product-packaging \
--input "./outputs/step2.png" \
--output "./outputs/final"
Source: library/motion/ai-fight-scene/SKILL.md, library/visual/action-figure-generator/SKILL.md
Common Workflow Patterns
Async Polling Pattern
For long-running operations, workflows implement async polling:
# Submit generation request
REQUEST_ID=$(muapi video generate "prompt" --model kling-v3.0-pro \
--no-wait --output-json --jq '.request_id')
# Poll for completion via workflow
bash run-workflow.sh --workflow poll-result \
--request-id "$REQUEST_ID" \
--max-attempts 60 \
--interval 10
Source: README.md - Agentic Pipeline Examples
File Upload Pattern
Workflows that process local files auto-upload to CDN:
# Workflow handles upload transparently
bash run-workflow.sh --workflow image-edit \
--input "./local-photo.jpg" \
--prompt "apply cinematic color grading"
# Equivalent raw operations:
URL=$(muapi upload file ./local-photo.jpg --output-json --jq '.url')
muapi image edit "apply cinematic color grading" --image "$URL" --model flux-kontext-pro
Source: library/workflow/SKILL.md
Chaining Pattern
Multiple workflows can be chained for complex pipelines:
# Chain: Generate → Edit → Upscale → Export
bash run-workflow.sh --workflow generate-portrait --input "professional headshot" --output ./step1
bash run-workflow.sh --workflow retouch-portrait --input ./step1/output.jpg --output ./step2
bash run-workflow.sh --workflow upscale-4k --input ./step2/output.jpg --output ./step3
bash run-workflow.sh --workflow export-social --input ./step3/output.jpg --output ./final
Source: library/workflow/scripts/generate-workflow.sh
Configuration
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
MUAPI_API_KEY | Yes | — | muapi.ai API key for authentication |
MUAPI_OUTPUT_DIR | No | ./outputs | Default output directory |
MUAPI_TIMEOUT | No | 300 | Default timeout in seconds |
MUAPI_RETRY_COUNT | No | 3 | Number of retries on failure |
Workflow Definition Schema
New workflows can be defined using YAML or JSON:
# workflow-definition.yaml
name: custom-workflow
version: "1.0"
description: Custom media generation pipeline
inputs:
- id: source_image
type: file
required: true
- id: style
type: select
options: [realistic, cartoon, anime]
default: realistic
steps:
- name: enhance
operation: muapi_image_edit
params:
image: "{{ inputs.source_image }}"
prompt: "Apply {{ inputs.style }} style"
model: flux-kontext-pro
- name: upscale
operation: muapi_enhance_upscale
params:
image: "{{ steps.enhance.output }}"
scale: 2
outputs:
- id: final_image
source: steps.upscale.output
Source: library/workflow/scripts/generate-workflow.sh
Error Handling
The workflow system implements tiered error handling:
| Error Level | Trigger | Response |
|---|---|---|
| Warning | Non-critical parameter mismatch | Log and continue with defaults |
| Retry | Transient API failure | Retry up to MUAPI_RETRY_COUNT times |
| Abort | Validation failure or unrecoverable error | Stop workflow, log context, exit with code |
| Rollback | Step failure with side effects | Attempt to undo previous operations |
Source: library/workflow/scripts/run-workflow.sh, library/workflow/SKILL.md
Community Considerations
Based on community feedback, several workflow-related patterns have emerged:
Performance Optimization
Users have inquired about GPU acceleration and faster execution times. Workflows that call local model operations (where supported) can leverage GPU resources by specifying appropriate model variants:
# Specify GPU-capable model in workflow parameters
bash run-workflow.sh --workflow generate-image \
--model ggml-vic13b-q5_1 \
--input "complex prompt"
Note: Not all models support GPU acceleration. Refer to individual model documentation.
Batch Processing
Community members have requested the ability to process multiple files from network shares. Workflows support batch mode via input directories:
# Process all images in directory
bash run-workflow.sh --workflow batch-enhance \
--input ./batch-input/ \
--output ./batch-output/
Source: issues/73
See Also
- Core Primitives — Low-level muapi-cli wrappers
- Recipe Pack — Pre-built workflow recipes
- Expert Library — Domain-specific skills
- MCP Server — Protocol integration for Claude/Cursor
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Schema Reference
Related topics: CLI Commands Reference, MCP Server Setup, Architecture Overview
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: CLI Commands Reference, MCP Server Setup, Architecture Overview
Schema Reference
The Schema Reference system is the validation and discovery layer for Generative-Media-Skills. It provides a centralized configuration that core scripts use at runtime to ensure type safety, endpoint accuracy, and parameter validation across all media generation operations.
Overview
The system centers on schema_data.json, a structured configuration file that powers the entire muapi-cli ecosystem. This schema serves three primary functions at runtime:
| Function | Description |
|---|---|
| Model ID Validation | Ensures requested models exist in the platform |
| Endpoint Resolution | Automatically maps model names to API endpoints |
| Parameter Checking | Validates supported aspect_ratio, resolution, and duration values |
Source: README.md:schema_reference
Architecture
graph TD
A[CLI Command] --> B[Schema Data Validation]
B --> C{Valid?}
C -->|Yes| D[Resolve Endpoint]
C -->|No| E[Error: Invalid Model/Parameter]
D --> F[Execute API Call]
F --> G[Return Result]
H[Schema Data JSON] --> B
H --> DThe schema acts as a contract between the CLI interface and the underlying muapi.ai platform, ensuring all requests are properly formatted before execution.
Core Schema Functions
Model Discovery
The CLI provides commands to discover all available models via the schema:
# List all models
muapi models list
# List models by category
muapi models list --category video --output-json
# Filter by specific capability
muapi models list --category image --output-json
Source: README.md:schema_reference and README.md:schema_commands
Validation Rules
The schema enforces validation across multiple dimensions:
| Validation Type | Purpose | Example Values |
|---|---|---|
model_id | Ensures model exists | flux-dev, kling-v3.0-pro, seedance-2.0 |
aspect_ratio | Image/video dimensions | 1:1, 16:9, 9:16, 4:3 |
resolution | Output quality | 1k, 2k, 4k, 1024x1024 |
duration | Video length in seconds | 5, 10, 15, 30 |
Source: schema_data.json:validation_rules
MCP Server Schema Integration
When running muapi as a Model Context Protocol server, all tools are exposed with full JSON Schema input/output definitions. The schema definitions enable:
- Type Checking: Automatic validation of tool inputs
- Auto-completion: IDEs can suggest valid parameters
- Documentation: Rich descriptions for each tool parameter
Tool Schemas
The MCP server exposes 19 structured tools with typed schemas:
| Tool | Category | Schema Purpose |
|---|---|---|
muapi_image_generate | Media | Text-to-image generation (14 models) |
muapi_image_edit | Media | Image-to-image editing (11 models) |
muapi_video_generate | Media | Text-to-video generation (13 models) |
muapi_video_from_image | Media | Image-to-video conversion (16 models) |
muapi_audio_create | Media | Music generation via Suno |
muapi_enhance_upscale | Enhancement | AI-powered image upscaling |
muapi_enhance_bg_remove | Enhancement | Background removal |
muapi_edit_lipsync | Editing | Lip sync to audio |
Source: README.md:mcp_server_tools
Runtime Integration
Core scripts integrate with the schema at multiple points:
graph LR
A[setup.sh] --> B[Configure API Key]
A --> C[Test Connectivity]
D[check-result.sh] --> E[Poll for Results]
D --> F[Async Status Check]
G[edit-image.sh] --> H[Validate Image URL]
G --> I[Apply Model Schema]
J[enhance-image.sh] --> K[Validate Operation Type]
J --> L[Apply Enhancement Schema]Script Integration Points
Each core script validates inputs against the schema before execution:
Source: core/platform/SKILL.md:scripts Source: core/edit/SKILL.md:scripts
| Script | Schema Usage |
|---|---|
core/platform/setup.sh | API key configuration and validation |
core/platform/check-result.sh | Request ID format validation |
core/edit/edit-image.sh | Model selection and parameter validation |
core/edit/enhance-image.sh | Operation type and parameter validation |
Configuration
Environment Variables
The schema system relies on the following environment configuration:
# Set via setup.sh
MUAPI_API_KEY=your-api-key-here
# Config location
~/.muapi/config.json
Source: core/platform/SKILL.md:requirements
Schema File Location
The schema_data.json file is located at the repository root and is loaded by core scripts at runtime. The file structure follows this pattern:
{
"models": { ... },
"endpoints": { ... },
"parameters": {
"aspect_ratio": [...],
"resolution": [...],
"duration": [...]
}
}
Common Validation Errors
Based on community issues, common schema-related errors include:
| Error | Cause | Resolution |
|---|---|---|
| Invalid model ID | Model not in schema | Run muapi models list to see valid options |
| Unsupported parameter | Parameter value not in allowed list | Check schema for valid values |
| Endpoint resolution failure | Model missing endpoint mapping | Verify schema_data.json is up to date |
Source: GitHub Issue #38
Best Practices
- Always validate before requesting: Use
muapi models listto confirm model availability before generating - Check parameter constraints: Verify
aspect_ratio,resolution, anddurationare supported - Use JSON output for automation:
--output-jsonflag provides schema-compliant output for piping - Keep schema updated: Pull latest changes when new models are added to the platform
See Also
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Agent Integration Guide
Related topics: MCP Server Setup, Getting Started, CLI Commands Reference
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: MCP Server Setup, Getting Started, CLI Commands Reference
Agent Integration Guide
This guide covers all methods for integrating Generative-Media-Skills with AI agents including Claude Code, Cursor, Gemini CLI, and other MCP-compatible agents.
Overview
Generative-Media-Skills provides a CLI-first architecture designed specifically for agentic workflows. Rather than relying on GUI interfaces or manual operations, agents interact with the system through structured CLI commands, MCP protocol, or skill packages that agents can read and execute.
The integration layer consists of three primary components:
| Component | Purpose | Best For |
|---|---|---|
muapi-cli | Core CLI tool with structured JSON outputs | Direct terminal execution, shell pipelines |
| MCP Server | Model Context Protocol server | Claude Desktop, Cursor, MCP-compatible agents |
| Skill Packages | Pre-packaged workflows (SKILL.md + scripts) | Claude Code, Cursor, automated ingestion |
Source: README.md
Architecture Overview
graph TD
A[AI Agent] --> B[muapi-cli]
A --> C[MCP Server]
A --> D[Skill Packages]
B --> E[Structured JSON Output]
B --> F[Semantic Exit Codes]
B --> G[--jq Filtering]
C --> H[19 MCP Tools]
C --> I[JSON Schema Validation]
D --> J[41 Workflow Recipes]
D --> K[Expert Library Skills]
D --> L[Core Primitives]
E --> L
G --> M[Agentic Pipelines]
H --> M
J --> MSupported Agents
The repository officially supports integration with:
- Claude Code — Direct terminal execution via tools + MCP server mode
- Cursor — MCP server mode for native tool calling
- Gemini CLI — Seamless integration as local scripts
- Windsurf — MCP-compatible integration
- Any MCP-compatible agent — Via the MCP server protocol
Source: README.md
Installation Methods
Method 1: Install muapi-cli
The core CLI tool is available via multiple package managers:
# via npm (recommended — no Python required)
npm install -g muapi-cli
# via pip
pip install muapi-cli
# or run without installing
npx muapi-cli --help
After installation, configure your API key:
# Interactive setup
muapi auth configure
# Or pass directly
muapi auth configure --api-key "YOUR_MUAPI_KEY"
# Get your key at https://muapi.ai/dashboard
Source: README.md
Method 2: Install Skill Packages
Install pre-packaged skills directly to your AI agent:
# Install all skills to your AI agent
npx skills add SamurAIGPT/Generative-Media-Skills --all
# Or install a specific skill
npx skills add SamurAIGPT/Generative-Media-Skills --skill muapi-media-generation
# Install to specific agents
npx skills add SamurAIGPT/Generative-Media-Skills --all -a claude-code -a cursor
Source: README.md
MCP Server Integration
The MCP server exposes all 19 generation tools directly to Claude Desktop, Cursor, or any MCP-compatible agent without requiring shell scripts.
Starting the MCP Server
muapi mcp serve
Claude Desktop Configuration
Add the following to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"muapi": {
"command": "muapi",
"args": ["mcp", "serve"],
"env": { "MUAPI_API_KEY": "your-key-here" }
}
}
}
Available MCP Tools
The server exposes 19 structured tools with full JSON Schema input/output definitions:
| Tool | Description | Category |
|---|---|---|
muapi_image_generate | Text-to-image generation | Generation |
muapi_image_edit | Image-to-image editing | Editing |
muapi_video_generate | Text-to-video generation | Generation |
muapi_video_from_image | Image-to-video animation | Generation |
muapi_audio_create | Music generation via Suno | Audio |
muapi_audio_from_text | Sound effects via MMAudio | Audio |
muapi_enhance_upscale | AI upscaling | Enhancement |
muapi_enhance_bg_remove | Background removal | Enhancement |
muapi_enhance_face_swap | Face swap for image/video | Enhancement |
muapi_enhance_ghibli | Ghibli style transfer | Enhancement |
muapi_edit_lipsync | Lip sync to audio | Editing |
muapi_edit_clipping | AI highlight extraction | Editing |
muapi_predict_result | Poll prediction status | Utility |
muapi_upload_file | Upload local file → URL | Utility |
muapi_keys_list | List API keys | Account |
muapi_keys_create | Create API key | Account |
muapi_keys_delete | Delete API key | Account |
muapi_account_balance | Get credit balance | Account |
muapi_account_topup | Add credits via Stripe | Account |
Source: README.md
CLI Usage for Agents
Basic Generation Commands
# Generate an image
muapi image generate "a cyberpunk city at night" --model flux-dev
# Download the result automatically
muapi image generate "a sunset over mountains" --model hidream-fast --download ./outputs
# Extract just the URL (agent-friendly)
muapi image generate "product on white bg" --model flux-schnell --output-json --jq '.outputs[0]'
Async Pipeline Workflow
For long-running operations, submit async requests and poll for results:
# Submit async, capture request_id, poll when ready
REQUEST_ID=$(muapi video generate "a dog running on a beach" \
--model kling-master --no-wait --output-json --jq '.request_id' | tr -d '"')
# ... do other work ...
muapi predict wait "$REQUEST_ID" --download ./outputs
Chaining Operations
# Pipe a prompt from another command
generate_prompt | muapi image generate - --model flux-dev
# Chain: upload → edit → download
URL=$(muapi upload file ./photo.jpg --output-json --jq '.url' | tr -d '"')
muapi image edit "make it look like a painting" --image "$URL" \
--model flux-kontext-pro --download ./outputs
Source: README.md
Skill Package Structure
Each skill in the repository follows a consistent structure:
library/[category]/[skill-name]/
├── SKILL.md # Description for agents to read
├── scripts/
│ ├── generate-[name].sh
│ └── [additional-scripts].sh
└── assets/ # Optional reference files
SKILL.md Format
Each skill includes metadata that agents parse:
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Troubleshooting
Related topics: CLI Commands Reference, Getting Started, Schema Reference
Continue reading this section for the full explanation and source context.
Related Pages
Related topics: CLI Commands Reference, Getting Started, Schema Reference
Troubleshooting
This page covers common issues, error conditions, and resolution steps for the Generative-Media-Skills repository. The troubleshooting content is organized by system component: API configuration, async generation workflows, media editing operations, and environment setup.
Source: https://github.com/SamurAIGPT/Generative-Media-Skills / Human Manual
Doramagic Pitfall Log
Source-linked risks stay visible on the manual page so the preview does not read like a recommendation.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
May increase setup, validation, or first-run risk for the user.
Doramagic Pitfall Log
Found 17 structured pitfall item(s), including 1 high/blocking item(s). Top priority: Security or permission risk - Security or permission risk requires verification.
1. Security or permission risk: Security or permission risk requires verification
- Severity: high
- Finding: Project evidence flags a security or permission risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_e801ed325bcf4fbbb8e0d9cac02b5f7f | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/89
2. Identity risk: Identity risk requires verification
- Severity: medium
- Finding: Project evidence flags a identity risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: identity.distribution | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills
3. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_ec1cdc92d8c84b2bbce43cf37a668443 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/44
4. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_c7a0e07d61b547aba3280ac82ac305e2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/43
5. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_c6ace8e72f95491f945200849e083d96 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/46
6. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_de5cf59443c74838a8d10d1ecbec9457 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/45
7. Installation risk: Installation risk requires verification
- Severity: medium
- Finding: Project evidence flags a installation risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_8082f7c5e7be496c89fac789d932e74c | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/34
8. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.host_targets | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills
9. Configuration risk: Configuration risk requires verification
- Severity: medium
- Finding: Project evidence flags a configuration risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_14498a89412f40ceb03d61889ea96de2 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/38
10. Capability evidence risk: Capability evidence risk requires verification
- Severity: medium
- Finding: README/documentation is current enough for a first validation pass.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: capability.assumptions | github_repo:645381450 | https://github.com/SamurAIGPT/Generative-Media-Skills
11. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_36c10c9b7ece4b10af4873b655817add | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/27
12. Runtime risk: Runtime risk requires verification
- Severity: medium
- Finding: Project evidence flags a runtime risk. Review the linked source before relying on this workflow.
- User impact: May increase setup, validation, or first-run risk for the user.
- Recommended check: Reproduce the official install and quickstart path in an isolated environment.
- Evidence: community_evidence:github | cevd_a8e460757f114a9d8e4357758c834524 | https://github.com/SamurAIGPT/Generative-Media-Skills/issues/54
Source: Doramagic discovery, validation, and Project Pack records
Community Discussion Evidence
These external discussion links are review inputs, not standalone proof that the project is production-ready.
Count of project-level external discussion links exposed on this manual page.
Open the linked issues or discussions before treating the pack as ready for your environment.
Community Discussion Evidence
Doramagic exposes project-level community discussion separately from official documentation. Review these links before using Generative-Media-Skills with real data or production workflows.
- Optional: add a 'publish to Vynly' skill after media generation? - github / github_issue
- how to use gpu instead cpu - github / github_issue
- What's the best way to improve answer time ? - github / github_issue
- Switch default language - github / github_issue
- 500 Internal Server Error - github / github_issue
- npm run dev hang in certain point - github / github_issue
- is possible add spanish documents and question / answer in spanish to? - github / github_issue
- Model download error on 100% progress - github / github_issue
- weird response - github / github_issue
- Server install pre-requisites on Ubuntu - github / github_issue
- Error downloading model - github / github_issue
- npm run dev error - github / github_issue
Source: Project Pack community evidence and pitfall evidence