llm-prompts-for-dse

A repository of LLM prompts for data science and engineering, served over MCP. Includes complete data workflow prompts for analytics engineering tasks with schema reconciliation features.

GitHubスター

1

ユーザー評価

未評価

フォーク

0

イシュー

1

閲覧数

1

お気に入り

0

README
LLM Prompts for Data Science & Engineering

A repository of LLM prompts that can be served over MCP to your LLM. Write once, prompt anywhere.

Features
  • A collection of useful LLM prompts for data science and engineering workflows
  • Complete data workflow prompt pathway for analytics engineering tasks
  • Compact Markdown output format - De-noised JSON with collapsible details sections
  • Auto-publish path - Automated PR creation and merge workflow after successful reconciliation
  • User approval checkpoints - Built-in confirmation steps for quality control
  • Schema reconciliation features for safe data operations with 5-row risk summary tables
  • Macro style guardrails - Enforced <300 LOC, 2-space indent, Jinja lint requirements
  • LLM prompts served via MCP (Model Context Protocol)
  • Easily extendable for additional LLM prompts
  • Built with TypeScript and the official MCP SDK

🚀 Quick Start
Installation
  1. Clone the repository

    git clone git@github.com:markoniga/llm-prompts-for-dse.git
    cd llm-prompts-for-dse
    
  2. Install dependencies

    yarn install
    
  3. Build the project

    yarn build
    
Setup MCP Server

Add the MCP server to Cursor (or your other MCP client) to access these prompts:

// Add to your MCP config file (e.g., Cursor's mcp.json)
{
   "llm-prompts": {
      "command": "node",
      "args": [
        "/path/to/llm-prompts-for-dse/build/server.js"
      ]
    }
}
Example Usage

Once configured, you can use these prompts in your conversations:

Data Workflow Examples
  • "Use the interpret_intent prompt to help me understand what kind of dbt model I need for customer lifetime value analysis"
  • "Apply the generate_code prompt to create a dbt model that calculates monthly recurring revenue"
  • "Use the validate_risk prompt to assess the impact of adding a new column to our users table"
  • "Apply the reconcile prompt to automatically run dbt build and reconcile with Preset"
  • "Use the test_results prompt to help me understand why my dbt tests are failing"
  • "Apply the create_pr prompt to generate a pull request description for this new analytics model"
Advanced Workflow Examples
  • Auto-publish path: After successful reconciliation, automatically chains to PR creation with customizable titles
  • Legacy system refactoring: Example workflows for modernizing systems like money-movement, inventory, or customer data pipelines
  • Macro guardrails: Enforced <300 LOC per macro, 2-space indentation, required doc-blocks (/** Purpose, Args, Returns, Example */)
  • Test chain: getRunDbtPromptgetRunTestsPromptgetReconcilePrompt → auto-PR creation
  • Style compliance: All code validated with dbt deps && dbt compile before commit

📊 Data Workflow Pathway

The data workflow prompts form a complete pathway that guides data scientists and analytics engineers from initial question to deployed code:

flowchart TD
    Q[User asks a question] --> I(interpret_intent.prompt)
    I --> C(context_gap.prompt)
    C -->|needs code| G(generate_code.prompt)
    C -->|needs docs only| D(gen_docs.prompt)
    G --> V(validate_risk.prompt)
    V -->|safe| R(run_dbt.prompt)
    V -->|needs confirm| CF{{"Proceed?"}}
    CF -- yes --> R
    CF -- no --> ABORT[[Abort]]
    R -->|build success| RT(run_tests.prompt)
    RT -->|tests pass| RC(reconcile.prompt)
    RC -->|🟢 all green| PR(create_pr.prompt)
    RC -->|🟡 warnings| CF2{{"Review & Proceed?"}}
    RC -->|🔴 issues| FIX[Manual fix required]
    CF2 -- yes --> PR
    CF2 -- no --> ABORT
    R -->|build failure| T(test_results.prompt)
    RT -->|tests fail| T
    T -->|fail| FX(fixup_suggestions.prompt)
    PR --> M(merge_guard.prompt)
    M --> DONE[Code merged ✅]
    subgraph rollbacks
        V -->|high risk detected| RB(rollback_plan.prompt)
    end
    subgraph "Auto-Publish Path"
        RC -.->|auto-chain| PR
        PR -.->|auto-chain| M
    end
How to Use the Data Workflow
  1. Start with Intent Classification: "Use the interpret_intent prompt to understand what I'm trying to achieve with customer segmentation"

  2. Fill Context Gaps: "Apply the context_gap prompt to identify what schema information we need"

  3. Generate Code or Documentation: "Use the generate_code prompt to create a dbt model for customer segments"

  4. Validate and Test: "Apply the validate_risk prompt to check if this model change is safe"

  5. Create PR and Merge: "Use the create_pr prompt to generate a pull request for this new model"


📚 Available Prompts
Data Workflow Prompts
getInterpretIntentPrompt
  • Description: Returns a prompt to help clarify user goals and classify data workflow intents. Use this tool to determine if a user is asking a question, requesting a code change, or needing documentation updates.
  • Returns: The full contents of interpret_intent.prompt
getContextGapPrompt
  • Description: Returns a prompt to identify missing information needed to address a data workflow request. Use this tool to analyze context gaps and gather necessary schema information or business logic.
  • Returns: The full contents of context_gap.prompt
getGenerateCodePrompt
  • Description: Returns a prompt to generate high-quality dbt code for models, macros, tests, and other artifacts. Use this tool to create SQL code that follows best practices and project standards.
  • Returns: The full contents of generate_code.prompt
getGenDocsPrompt
  • Description: Returns a prompt to generate comprehensive documentation for dbt models, columns, and tests. Use this tool to create clear explanations of business logic and data lineage.
  • Returns: The full contents of gen_docs.prompt
getValidateRiskPrompt
  • Description: Returns a prompt to assess risks in proposed data operations, particularly SQL queries and dbt model changes. Use this tool to evaluate potential impacts on data integrity, performance, and cost, and perform schema reconciliation between environments.
  • Returns: The full contents of validate_risk.prompt
getRollbackPlanPrompt
  • Description: Returns a prompt to generate comprehensive rollback plans for data operations. Use this tool to create scripts and procedures to restore data to its previous state if an operation fails.
  • Returns: The full contents of rollback_plan.prompt
getRunDbtPrompt
  • Description: Returns a prompt to help execute dbt commands safely and efficiently. Use this tool to get guidance on running models, tests, and other dbt operations in a controlled manner, including schema reconciliation execution steps.
  • Returns: The full contents of run_dbt.prompt
getRunTestsPrompt
  • Description: Returns a prompt to execute dbt build and pytest validation tests. Runs dbt build --select {{ models }} then pytest tests/prompt_formatting to ensure comprehensive validation of both data models and prompt formatting.
  • Returns: The full contents of run_tests.prompt
getReconcilePrompt
  • Description: Returns a prompt to automate local dbt build and Preset reconciliation workflow. Runs dbt build --select {{ models }}, validates build success, executes Preset MCP reconciliation, and summarizes differences in a clear Markdown table format. Chains to auto-publish path when all systems are green.
  • Returns: The full contents of reconcile.prompt
getTestResultsPrompt
  • Description: Returns a prompt to analyze and interpret dbt test results. Use this tool to understand test outcomes, prioritize fixes, make data quality decisions, and validate schema reconciliation between environments.
  • Returns: The full contents of test_results.prompt
getFixupSuggestionsPrompt
  • Description: Returns a prompt to generate targeted code fixes for failed dbt tests or performance problems. Use this tool to get specific, actionable code changes to resolve identified issues.
  • Returns: The full contents of fixup_suggestions.prompt
getCreatePRPrompt
  • Description: Returns a prompt to generate comprehensive, well-structured pull request descriptions. Use this tool to create PR content that clearly communicates the purpose and implementation details of code changes, including schema reconciliation documentation.
  • Returns: The full contents of create_pr.prompt
getMergeGuardPrompt
  • Description: Returns a prompt to ensure pull requests meet all necessary requirements before being merged. Use this tool to validate that code reviews, tests, schema reconciliation, and other quality checks have been completed.
  • Returns: The full contents of merge_guard.prompt

🔄 Schema Reconciliation Features

The data workflow prompts include comprehensive schema reconciliation capabilities to ensure safe and consistent schema changes across environments:

Schema Reconciliation Process
  1. Risk Assessment (validate_risk.prompt):

    • Identifies schema changes (column additions, removals, type changes)
    • Quantifies impact on data volume and downstream dependencies
    • Classifies risk level of schema changes (high, medium, low)
    • Generates detailed schema comparison tables
  2. Execution (run_dbt.prompt):

    • Provides pre-execution reconciliation steps to capture baseline schemas
    • Includes specialized commands for reconciliation mode execution
    • Offers post-execution verification queries to validate changes
    • Generates schema change summary reports
  3. Validation (test_results.prompt):

    • Determines if schema reconciliation is needed based on test results
    • Performs reconciliation checks for schema comparison and data integrity
    • Analyzes downstream impact of schema changes
    • Provides reconciliation recommendations and verification steps
  4. Documentation (create_pr.prompt):

    • Generates comprehensive schema reconciliation documentation for PRs
    • Creates detailed schema comparison tables showing changes
    • Includes data volume impact analysis
    • Documents downstream dependency impacts
  5. Verification (merge_guard.prompt):

    • Ensures schema reconciliation is completed before approving merges
    • Verifies reconciliation documentation is complete and accurate
    • Identifies potential reconciliation blockers
    • Provides final reconciliation approval conditions
Example Schema Comparison Table
## 📊 SCHEMA COMPARISON: `model_name`

| Column Name | Production Type | Dev Type | Status | Impact | Rows Affected |
|-------------|-----------------|----------|---------|---------|---------------|
| user_id | INTEGER | INTEGER | ✅ MATCH | - | - |
| email | VARCHAR(255) | VARCHAR(500) | ⚠️ MODIFIED | Size increased | 0 |
| created_at | TIMESTAMP | TIMESTAMP | ✅ MATCH | - | - |
| new_column | - | VARCHAR(100) | 🆕 ADDED | New data point | All rows |
| old_column | VARCHAR(50) | - | 🗑️ REMOVED | Data loss risk | 1,234,567 |

**Summary**: 5 columns analyzed • 1 modified • 1 added • 1 removed • ⚠️ Data loss risk

These schema reconciliation features ensure that schema changes are properly assessed, documented, and verified throughout the development and deployment process, reducing the risk of data integrity issues and unexpected impacts on downstream dependencies.


🛠️ Development
Local Development Workflow
Folder Structure
src/
├── llm-prompts/               # Root directory for all prompts
│   └── data-workflow/         # Data workflow prompts directory
│       ├── interpret_intent.prompt
│       ├── context_gap.prompt
│       ├── generate_code.prompt
│       ├── gen_docs.prompt
│       ├── validate_risk.prompt
│       ├── rollback_plan.prompt
│       ├── run_dbt.prompt
│       ├── test_results.prompt
│       ├── fixup_suggestions.prompt
│       ├── create_pr.prompt
│       └── merge_guard.prompt
└── mcp-server/               # MCP server implementation
    └── server.ts
Adding New Prompts
Updating Existing Prompts
  1. Edit the prompt file in the appropriate directory
  2. Run the refresh script to update the manifest file:
    bash scripts/refresh-mcp-prompts.sh
    
  3. Rebuild the server to pick up the changes
Run the MCP server locally (with hot reload)
yarn dev
Debug with MCP Inspector
yarn build
yarn inspector

📖 References
作者情報

0

フォロワー

2

リポジトリ

0

Gist

43

貢献数

トップ貢献者

スレッド