ai-webscraper-agent
The AI Webscraper Agent is an AI-powered web scraping tool that utilizes the Brightdata MCP server to extract and summarize content from the web. It features a modular architecture that allows for data extraction through a natural language interface and provides an interactive frontend.
GitHub Stars
1
User Rating
Not Rated
Favorites
0
Views
23
Forks
0
Issues
0
πΈοΈ AI Webscraper Agent
An AI-powered webscraping agent that uses the Brightdata MCP server to extract and summarize content from the web. Built with a modular architecture combining LLM reasoning, robust scraping, and a simple web interface.
π§ Tech Stack
- Frontend: Streamlit
- Backend: FastAPI
- Language: Python
- Scraping: Brightdata MCP Server
- AI Model: Anthropic LLM (Claude)
π Features
- Natural language interface to extract data from websites
- Uses Brightdata MCP for reliable web scraping
- LLM-powered summarization and reasoning
- Streamlit-based interactive frontend
- Async FastAPI backend integration
Environment Variables
Create a .env file and configure the following:
# .env
# Environment Variables for AI Webscraper Agent
# Replace 'your_key_here' with your actual API keys
# Bright Data
API_TOKEN=your_key_here
WEB_UNLOCKER_ZONE=your_key_here
BROWSER_AUTH="your_browser_auth_token"
#Anthropic AI API KEY
ANTHROPIC_API_KEY=your_key_here
π¦ Installation
git clone https://github.com/yourusername/ai-webscraper-agent.git
cd ai-webscraper-agent
uv pip install -r requirements.txt
RUN App
Start the FastAPI backend server and Streamlit app'
Start the backend FastAPI server
uv run backend.py
Start frontend Streamlit app
streamlit run frontend.py
Example Usage
Ask:
Scrape the top 5 news headlines from https://bbc.com and summarize them.
Get Response:
1. Headline A - Summary
2. Headline B - Summary
3. Headline C - Summary
4. Headline C - Summary
...
Agent Flow
[User Prompt] β‘ [Streamlit UI] β‘ [FastAPI Router] β‘ [LLM Agent]
β‘ [Brightdata Tool via MCP] β‘ [LLM Summarization] β‘ [UI Response]
π Full-Stack Architect | AWS-Certified Cloud Expert | AI Innovator π§© Mentor | Strategist | Builder
0
Followers
7
Repositories
0
Gists
0
Total Contributions
mcp-domain-availability is a Python library designed to check the availability of specified domain names. Users can easily verify the status of domains, and it can be integrated into automated workflows via an API. This tool streamlines the domain acquisition process, saving time and enhancing efficiency.
This hybrid network automation system integrates LangGraph for workflow orchestration, LangChain for natural language processing, and Model Context Protocol (MCP) for intelligent network device management. Users can interact with network devices using conversational commands, simplifying complex multi-device operations. It also features automated network mapping and topology discovery capabilities.