selenium-mcp-server-ai

This project implements a Selenium Master Control Program (MCP) server that utilizes AI to convert natural language test steps into Selenium WebDriver commands. Built with Java and Maven, it includes a TestNG client for logging test results. The server supports local or remote LLM endpoints and is configurable via property files.

GitHub Stars

1

User Rating

Not Rated

Favorites

0

Views

2

Forks

0

Issues

0

README

Selenium MCP Server with AI

This JAVA MAVEN TESTNG Project implements a Selenium Master Control Program (MCP) server in Java that executes browser automation tasks by converting natural language test steps into Selenium WebDriver commands using a Language Model (LLM). The server supports local Ollama or remote LLM endpoints and is configurable via property files. A TestNG-based client is included to send test prompts and log results.

Project Structure

  • src/main/java/McpServer.java: Main server class that handles HTTP requests, LLM interaction, and Selenium command execution.
  • src/main/java/LlmClient.java: Handles communication with the LLM (Ollama or remote API).
  • src/test/java/SeleniumTestClient.java: TestNG client that sends test steps to the server and logs results.
  • config.properties: Configuration file for server settings (port, browser, LLM endpoint).
  • selenium_operations.properties: Defines supported Selenium operations and their required parameters.
  • pom.xml: Maven configuration with dependencies.

Application Flow:

  • Step 1: Client input natural language test steps to MCP Server
  • Step 2: MCP Server processes the input and LLM is called with an augmented PROMPT
  • Step 3: LLM Responds with command for MCP
  • Step 4: MCP Server uses the response and perform Selenium commands on browser

Prerequisites

  • Java: 17 or later
  • Maven: 3.6.0 or later
  • WebDriver: ChromeDriver or FirefoxDriver (matching your browser version) installed and added to PATH
  • LLM: Either:
    • Local Ollama running on http://localhost:11434 (default) with a model like llama3
    • A remote LLM endpoint (e.g., OpenAI-compatible API) with an API key
  • Browser: Chrome or Firefox installed
  • TestNG: Included via Maven, no separate installation needed

Setup

STEP 1:

  • Clone the Project:
  • git clone
  • cd selenium-mcp

Install Dependencies:Run the following command to download dependencies:
mvn install

Configure the Server:Edit config.properties in the project root to set:

  • server.port: HTTP server port (default: 8080)
  • browser: Browser type (chrome or firefox, default: chrome)
  • llm.endpoint: LLM API URL (default: http://localhost:11434 for Ollama)
  • llm.model: LLM model name (default: llama3)
  • llm.apiKey: API key for remote LLM (leave empty for Ollama)

Sample config.properties:

STEP 2:

  • start server using McpServer.java
  • Run selenium client file to see test execution SeleniumTestClient.java

Configure Selenium Operations:The selenium_operations.properties file defines supported Selenium commands and their parameters. It includes operations like start_browser, navigate, click_element, and more. Modify this file to add or change operations if needed.
Example

  • selenium_operations.properties:
  • start_browser=browser,headless
  • navigate=url
  • click_element=by,value,timeout
... (other operations)

Set Up Ollama (if using local LLM):

Install Ollama (see Ollama documentation).
Start Ollama server:

ollama serve

Ensure the specified model (e.g., llama3) is pulled:ollama pull llama3

Running the Server

Start the Server:Run the McpServer class:

mvn exec:java -Dexec.mainClass="McpServer"

Or, use your IDE to run McpServer.java.
The server will start on the configured port (default: http://localhost:8080) and listen for POST requests to /execute.

Verify Server:Check the console for:
MCP Server started on port 8080

Running the Client

Run Tests:Execute the TestNG client to send test steps and generate results:
mvn test

Or, run SeleniumTestClient.java in your IDE with the TestNG plugin.
The client sends a sample test prompt (e.g., navigating to https://example.com, clicking a link, and verifying text) and logs results using TestNG.

View Test Results:TestNG generates reports in the test-output/ directory (HTML and XML formats) with pass/fail status and details.
Example test steps in SeleniumTestClient.java:

Input natural language test steps = "Navigate to en.wikipedia.org. Search for India. Take a screenshot";

Example LLM Output
The LLM must convert test steps into a JSON array of commands matching the operations in selenium_operations.properties. Example:

[

{"type": "start_browser", "browser": "chrome", "headless": "false"},

{"type": "navigate", "url": "https://example.com"},

{"type": "click_element", "by": "xpath", "value": "//a[text()='More information']", "timeout": "10000"},

{"type": "get_element_text", "by": "xpath", "value": "//body", "timeout": "10000"},

{"type": "close_session"}

]

Supported Selenium Operations:
The server supports the following operations (defined in selenium_operations.properties):

  • start_browser: Launches a browser session (browser, headless).
  • navigate: Navigates to a URL (url).
  • find_element: Locates an element (by, value, timeout).
  • click_element: Clicks an element (by, value, timeout).
  • send_keys: Sends text to an element (by, value, text, timeout).
  • get_element_text: Retrieves element text (by, value, timeout).
  • hover: Hovers over an element (by, value, timeout).
  • drag_and_drop: Drags one element to another (by, value, targetBy, targetValue, timeout).
  • double_click: Double-clicks an element (by, value, timeout).
  • right_click: Right-clicks an element (by, value, timeout).
  • press_key: Simulates a key press (key).
  • upload_file: Uploads a file (by, value, filePath, timeout).
  • take_screenshot: Captures a screenshot as base64 (outputPath optional).
  • close_session: Closes the current browser session.

Notes

LLM Configuration: Ensure the LLM returns structured JSON commands matching the operations in selenium_operations.properties. Adjust the prompt in McpServer.java if needed.
WebDriver Compatibility: Ensure ChromeDriver or GeckoDriver matches your browser version.
File Uploads: The upload_file operation requires a valid file path accessible to the server.
Screenshots: The take_screenshot operation returns base64 data to avoid server-side file system dependencies.
Error Handling: The server catches and reports errors for each command in the response.
Extending Operations: Add new operations to selenium_operations.properties and update executeCommand in McpServer.java as needed.

Troubleshooting

Server Fails to Start: Check config.properties for valid settings and ensure the LLM endpoint is accessible.
WebDriver Errors: Verify ChromeDriver/GeckoDriver is in PATH and compatible with your browser.
LLM Issues: Ensure Ollama is running or the remote LLM endpoint and API key are correct.
Test Failures: Check TestNG reports in test-output/ for detailed error messages.

License
This project is licensed under the MIT License.

Author Information
Jyoti Prakash

In quest of learning. Feel free to ping for chat, when we share we learn.

SFO

0

Followers

7

Repositories

0

Gists

0

Total Contributions