GitHub Stars
1
User Rating
Not Rated
Forks
0
Issues
0
Views
1
Favorites
0
OpenAI Compatible API MCP Server (English)
This project is a Model Context Protocol (MCP) server that integrates various tools based on OpenAI compatible APIs and SiliconFlow APIs. It offers a range of functionalities including image generation, image editing, speech synthesis (TTS), speech-to-text (STT), and video generation.
Core Features
- Image Generation: Generate images using OpenAI compatible APIs (e.g., DALL-E 3, gpt-image-1, or other Stable Diffusion models).
- Image Editing: Edit images using OpenAI compatible APIs (e.g., gpt-image-1).
- Speech Synthesis (TTS): Convert text into speech using OpenAI compatible APIs.
- Speech-to-Text (STT): Transcribe audio files into text using OpenAI compatible APIs.
- Video Generation: Submit text-to-video or image-to-video generation tasks using the SiliconFlow API.
- Background Task Processing: For time-consuming tasks (like image generation/editing with specific models, video generation), the server accepts the task and processes it asynchronously. Notifications are sent via configured channels (OneBot or Telegram) upon completion or failure.
- File Upload: Supports uploading generated images and videos to a configured ImgBed service based on MarSeventh/CloudFlare-ImgBed.
- Local Storage: All generated media files are saved locally.
Prerequisites
- Node.js (recommended version >= 18.x)
- Relevant API keys (OpenAI/compatible API key, SiliconFlow API key, etc.)
Installation and Setup
Clone the project (if you haven't already):
git clone <project_repository_url> cd openapi-integrator-mcp
Install dependencies:
npm install
Build the project:
npm run build
The build output will be in the
build
directory.
Configuration
Configure environment variables via a .env
file in the project root. If the .env
file does not exist, create it based on .env.example
(if provided) or the list below.
Key Environment Variables:
OPENAI_API_KEY
: (Required) Your OpenAI API key or a compatible API key.OPENAI_API_BASE_URL
: (Optional) Base URL for the OpenAI compatible API. Defaults tohttps://api.openai.com
.REQUEST_TIMEOUT
: (Optional) API request timeout in milliseconds. Defaults to180000
(3 minutes).OUTPUT_DIR
: (Optional) Base output directory for generated media files. Defaults to./output
. Subdirectories likeimages
,audio
,video
will be created here.
Default Model Configuration for Image Generation/Editing:
DEFAULT_IMAGE_MODEL
: (Optional) Default image generation model. Defaults todall-e-3
.DEFAULT_EDIT_IMAGE_MODEL
: (Optional) Default image editing model. Defaults togpt-image-1
.DEFAULT_IMAGE_WIDTH
: (Optional) Default image width (for non-DALL-E 3/gpt-image-1 models). Defaults to1024
.DEFAULT_IMAGE_HEIGHT
: (Optional) Default image height (for non-DALL-E 3/gpt-image-1 models). Defaults to768
.DEFAULT_IMAGE_STEPS
: (Optional) Default image generation steps (for non-DALL-E 3/gpt-image-1 models). Defaults to20
.
Default Speech Configuration:
DEFAULT_SPEECH_MODEL
: (Optional) Default speech synthesis model. Defaults totts-1
.DEFAULT_SPEECH_VOICE
: (Optional) Default speech synthesis voice. Defaults toalloy
.DEFAULT_SPEECH_SPEED
: (Optional) Default speech synthesis speed. Defaults to1.0
.DEFAULT_TRANSCRIPTION_MODEL
: (Optional) Default speech transcription model. Defaults towhisper-1
.
SiliconFlow Video Generation Configuration:
SILICONFLOW_API_KEY
: (Optional, Required if using video generation) SiliconFlow API key.SILICONFLOW_BASE_URL
: (Optional) SiliconFlow API base URL. Defaults tohttps://api.siliconflow.cn
.SILICONFLOW_VIDEO_MODEL
: (Optional) Default video generation model. Defaults toWan-AI/Wan2.1-T2V-14B-Turbo
.
Notification Configuration (configure at least one to receive background task results):
ONEBOT_HTTP_URL
: (Optional) OneBot HTTP post URL (e.g.,http://localhost:5700
).ONEBOT_ACCESS_TOKEN
: (Optional) OneBot Access Token (if required).ONEBOT_MESSAGE_TYPE
: (Optional) OneBot message type (private
orgroup
).ONEBOT_TARGET_ID
: (Optional) OneBot target user ID or group ID.TELEGRAM_BOT_TOKEN
: (Optional) Telegram Bot Token.TELEGRAM_CHAT_ID
: (Optional) Telegram Chat ID.
ImgBed Configuration (Optional, for MarSeventh/CloudFlare-ImgBed
):
CF_IMGBED_UPLOAD_URL
: (Optional) Your deployedCloudFlare-ImgBed
upload URL (e.g.,https://your-worker.your-domain.workers.dev/upload
).CF_IMGBED_API_KEY
: (Optional) TheAUTH_KEY
(orauthCode
as referred to in some contexts) configured for yourCloudFlare-ImgBed
instance.
Running the Server
Production Mode:
npm start
This will run the JavaScript files from the
build
directory.Development Mode (uses ts-node-dev for hot-reloading):
npm run dev
Once started, the MCP server will listen for requests on standard input/output (stdio).
Available MCP Tools
1. generate_image
Generates an image.
- Function: Creates an image based on a text prompt using OpenAI compatible APIs. Supports various models like DALL-E 3, gpt-image-1, and others. For DALL-E 3/gpt-image-1, tasks are processed in the background with results sent via notification. Other models return results synchronously.
- Key Parameters:
prompt
(string, required): Description of the image.model
(string, optional): Model to use, defaults toDEFAULT_IMAGE_MODEL
from config.n
(number, optional): Number of images to generate.- (DALL-E 3/gpt-image-1 specific):
quality
,size
,background
,moderation
. - (Other models specific):
width
,height
,steps
.
2. edit_image
Edits an image.
- Function: Modifies an existing image based on a text prompt using models like
gpt-image-1
. Tasks are processed in the background with results sent via notification. - Key Parameters:
image
(string, required): Path or URL of the image to edit.prompt
(string, required): Editing instructions.model
(string, optional): Model to use, defaults toDEFAULT_EDIT_IMAGE_MODEL
from config.n
(number, optional): Number of images to generate.size
(string, optional): Output image size.
3. generate_speech
Text-to-Speech.
- Function: Converts text into an audio file (MP3 format) and saves it locally.
- Key Parameters:
input
(string, required): Text to convert to speech.voice
(string, required): Voice to use.model
(string, optional): Model to use, defaults toDEFAULT_SPEECH_MODEL
from config.speed
(number, optional): Speech speed, defaults toDEFAULT_SPEECH_SPEED
from config.
4. transcribe_audio
Speech-to-Text.
- Function: Transcribes an audio file into text.
- Key Parameters:
file
(string, required): Local path or URL of the audio file.model
(string, optional): Model to use, defaults toDEFAULT_TRANSCRIPTION_MODEL
from config.
5. generate_video
Generates a video.
- Function: Submits a video generation task to the SiliconFlow API. Supports text-to-video and image-to-video. Tasks are processed in the background with results sent via notification.
- Key Parameters:
prompt
(string, required): Description of the video.image_size
(string, required): Video dimensions/aspect ratio (e.g., "1280x720").model
(string, optional): Video model to use, defaults toSILICONFLOW_VIDEO_MODEL
from config. Supported models include:Wan-AI/Wan2.1-T2V-14B
,Wan-AI/Wan2.1-T2V-14B-Turbo
,Wan-AI/Wan2.1-I2V-14B-720P
,Wan-AI/Wan2.1-I2V-14B-720P-Turbo
.image
(string, optional): Image URL or Base64 encoded data, required for Image-to-Video models.negative_prompt
(string, optional): Negative prompt.seed
(integer, optional): Random seed.
Notification Feature
For tasks processed in the background (e.g., specific image generation/editing, video generation), results will be sent via notifications configured through:
- OneBot: If
ONEBOT_HTTP_URL
,ONEBOT_MESSAGE_TYPE
, andONEBOT_TARGET_ID
are configured. - Telegram: If
TELEGRAM_BOT_TOKEN
andTELEGRAM_CHAT_ID
are configured.
Please ensure at least one notification method is configured to receive results for background tasks.
License
This project is licensed under the MIT License. See the LICENSE
file for details.