MCP-Server-MSPaint--AgenticAI

The MS Paint Application doesn't have API which is required for the MCP Server to access and connect it to the MCP Client. So, with the help of the system prompt and the user query, the agent automates the access to the MS Paint App, enters the text inside the app and the LLM gets access to the application through MCP client and MCP server.

GitHubスター

0

ユーザー評価

未評価

お気に入り

0

閲覧数

20

フォーク

0

イシュー

0

README
MCP_Server_Client_MSPaint_AgenticAI

Prompting is the Key in this project. The prompt is designed in such a way that LLM can call the paint tool through MCP Client Automatically, without the use of any manual commands.

Table of Contents
Demo

Demonstration_MCP-Paint-Perception-Memory-Decision-Action :

Demonstration_MCP-Paint-Planning-Reasoning :

https://github.com/user-attachments/assets/90147385-7243-4343-ab11-de200c30e77c

Motivation

Any Application needs to have API access in order to build MCP server for that particular application. The MS Paint Application doesn't have any API. Here, I am hacking the MS Paint application, the application which doesn't have API, to connect it to the MCP CLient. In a scenario where I need to automatically perform tasks on the paint application with the help of LLM by asking a question to it. How can I do that? This is where MCP comes into play. I write an MCP server which can call the necessary tools when the LLM requests, through the MCP client. I define a system prompt and user query and with the help of these prompts and MCP Client, I am making the agent to automatically open the paint application, without manually using any paint commands, making the agent to draw a rectangle on the paint window and enter the answer for the user query inside the rectangle.

What It Does

I write an MCP server which can call the necessary tools when the LLM requests, through the MCP client. I define a system prompt and user query and with the help of these prompts and MCP Client, I am making the agent to automatically open the paint application, without manually using any paint commands, making the agent to draw a rectangle on the paint window and enter the answer for the user query inside the rectangle.

There are further improvements on this task as well which I have divided into further two sections. So, there are three sections in total:
Section 1 : In the system prompt, I mention the tools to call and required instructions to follow while executing the user query. The Agent successfully opens the paint application, draws a rectangle and types the text inside it.

Section 2 : Now I design my system prompt such that it qualifies all the rules mentioned in the file. This makes sure that

  • The model makes step-by-step reasoning
  • The Prompt enforces a predictable output format, seperates the reasoning steps from the tool-use, computation steps
  • The prompt works in a multi-turn settings, instructs the model to self-verify, encourage the model to identify the type of reasoning used
  • The Prompt specifies necessary actions, in uncertain situations and when the tool fails
    and then repeat the task performed in the section 1.
  • MCP Server : MCP-Paint-Planning-Reasoning/example2-3.py
  • Client and the Agent : MCP-Paint-Planning-Reasoning/talk2mcp-3.py

Section 3 : Here, I create 4 different modules for 4 cognitive layers: Perception, Memory, Decision-Making, Action. Then in the talk2mcp2.py file, I configure the whole agent by integrating these files. Then I repeat the section 2. Here, I use pydantic for all inputs and outputs and modify the system prompt with pydantic related changes.

Getting Started

We will get started with installation and set up process. Clone the repository and open the folders using Vs Code or Cursor IDE.

Clone this repository into a local folder:
git clone https://github.com/dhanushpittala11/MCP-Server-MSPaint--AgenticAI.git
Create a .env file to store Gemini API Key.
Create a requirements.txt file
Setup and activate Virtual Environment using:
python -m venv .venv 
source .venv/bin/activate 
Install all the required libraries and packages using the command:
pip install -r requirements.txt
Usage
Run the script for the section 1 using:
  ```bash
  python talk2mcp-2.py
  ```
Run the script for the section 2 using:
  ```bash
  python talk2mcp-3.py
  ```
Run the script for the section 3 using:
  ```bash
  python talk2mcp2.py
  ```
Directory Tree
Bug / Feature Request

If you find a bug (the website couldn't handle the query and / or gave undesired results), kindly open an issue here

If you'd like to request a new function, feel free to do so by opening an issue here.

Techstack Used
Layer Technology/Library Purpose
LLM Google Generative AI (Gemini 2.0 Flash) Natural language reasoning and function planning
Server Framework FastMCP (mcp.server.fastmcp) Multimodal Command Protocol (MCP) tool interface for LLM-agent system
MCP Transport StdioServerParameters, ClientSession, stdio_client Communicate with MCP tools over subprocess (example2-4.py)
UI Automation pywinauto Control Paint GUI (open, click, type, drag mouse, etc.)
System APIs win32gui, win32con, win32api, GetSystemMetrics Handle Paint window placement, interaction, screen resolution
Math Computation math Exponentials, factorials, trigonometric operations
Async Handling asyncio, await, run_in_executor Concurrency and tool execution orchestration
Environment Management dotenv Load API keys securely from .env
Utilities os, sys, time, traceback, functools, concurrent.futures Miscellaneous operations, error tracing, retries
Tools Implemented
Tool Name Functionality
open_paint() Launch Paint and maximize it on secondary screen
draw_rectangle() Draws a rectangle using mouse events
add_text_in_paint() Types text inside Paint via GUI automation
add, subtract, etc. Math tools: add, subtract, factorial, etc.
strings_to_chars_to_int() Convert string to ASCII values
int_list_to_exponential_sum() Sum of exponentials of a list
fibonacci_numbers() Generate Fibonacci sequence
create_thumbnail() Create image thumbnail using PIL
Architecture
               ┌─────────────────────────────┐
               │      Google Gemini API      │
               └────────────▲────────────────┘
                            │
         ┌──────────────────┴──────────────────┐
         │                                     │
┌────────────────────┐             ┌────────────────────┐
│   talk2mcp-3.py     │            │    talk2mcp2.py    │
│ Agent and Client    │            │ Agent and Client   │
└─────────▲─────── ───┘            └─────────▲──────────┘
          │                                    │
┌─────────┴────────────┐           ┌───────────┴────────────┐
│   example2-3.py       │          │     example2-4.py       │
│  Tool Server (FastMCP)│          │   Tool Server (FastMCP) │
└─────────┬─────────────┘          └──────────┬──────────────┘
          │                                    │
       Paint GUI ←―――― Draw + Type in Paint ―――→ Math/ASCII/Image Tools
License
                GNU GENERAL PUBLIC LICENSE
                   Version 3, 29 June 2007

Copyright (C) 2007 Free Software Foundation, Inc. https://fsf.org/
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

                        Preamble

The GNU General Public License is a free, copyleft license for
software and other kinds of works.

Team

Dhanush Pittala - @Linkedin - dhanushpittala05@gmail.com

Credits

The School of AI(TSAI) -EAGv1 by Rohan Shravan