llm-prompts-for-dse

Name: llm-prompts-for-dse
Availability: InStock
Author: markoniga

このリポジトリは、データサイエンスとエンジニアリングのワークフローに役立つLLMプロンプトのコレクションです。自動化されたPR作成やマクロスタイルのガードレールを備え、品質管理のためのユーザー承認チェックポイントも含まれています。

GitHub

GitHubスター

ユーザー評価

未評価

お気に入り

閲覧数

フォーク

イシュー

README

LLM Prompts for Data Science & Engineering

A repository of LLM prompts that can be served over MCP to your LLM. Write once, prompt anywhere.

Features

A collection of useful LLM prompts for data science and engineering workflows
Complete data workflow prompt pathway for analytics engineering tasks
Compact Markdown output format - De-noised JSON with collapsible details sections
Auto-publish path - Automated PR creation and merge workflow after successful reconciliation
User approval checkpoints - Built-in confirmation steps for quality control
Schema reconciliation features for safe data operations with 5-row risk summary tables
Macro style guardrails - Enforced <300 LOC, 2-space indent, Jinja lint requirements
LLM prompts served via MCP (Model Context Protocol)
Easily extendable for additional LLM prompts
Built with TypeScript and the official MCP SDK

🚀 Quick Start

Installation

Clone the repository

git clone git@github.com:markoniga/llm-prompts-for-dse.git
cd llm-prompts-for-dse

Install dependencies
```
yarn install
```
Build the project
```
yarn build
```

Setup MCP Server

Add the MCP server to Cursor (or your other MCP client) to access these prompts:

// Add to your MCP config file (e.g., Cursor's mcp.json)
{
   "llm-prompts": {
      "command": "node",
      "args": [
        "/path/to/llm-prompts-for-dse/build/server.js"
      ]
    }
}

Example Usage

Once configured, you can use these prompts in your conversations:

Data Workflow Examples

"Use the interpret_intent prompt to help me understand what kind of dbt model I need for customer lifetime value analysis"
"Apply the generate_code prompt to create a dbt model that calculates monthly recurring revenue"
"Use the validate_risk prompt to assess the impact of adding a new column to our users table"
"Apply the reconcile prompt to automatically run dbt build and reconcile with Preset"
"Use the test_results prompt to help me understand why my dbt tests are failing"
"Apply the create_pr prompt to generate a pull request description for this new analytics model"

Advanced Workflow Examples

Auto-publish path: After successful reconciliation, automatically chains to PR creation with customizable titles
Legacy system refactoring: Example workflows for modernizing systems like money-movement, inventory, or customer data pipelines
Macro guardrails: Enforced <300 LOC per macro, 2-space indentation, required doc-blocks (/** Purpose, Args, Returns, Example */)
Test chain: getRunDbtPrompt → getRunTestsPrompt → getReconcilePrompt → auto-PR creation
Style compliance: All code validated with dbt deps && dbt compile before commit

📊 Data Workflow Pathway

The data workflow prompts form a complete pathway that guides data scientists and analytics engineers from initial question to deployed code:

flowchart TD
    Q[User asks a question] --> I(interpret_intent.prompt)
    I --> C(context_gap.prompt)
    C -->|needs code| G(generate_code.prompt)
    C -->|needs docs only| D(gen_docs.prompt)
    G --> V(validate_risk.prompt)
    V -->|safe| R(run_dbt.prompt)
    V -->|needs confirm| CF{{"Proceed?"}}
    CF -- yes --> R
    CF -- no --> ABORT[[Abort]]
    R -->|build success| RT(run_tests.prompt)
    RT -->|tests pass| RC(reconcile.prompt)
    RC -->|🟢 all green| PR(create_pr.prompt)
    RC -->|🟡 warnings| CF2{{"Review & Proceed?"}}
    RC -->|🔴 issues| FIX[Manual fix required]
    CF2 -- yes --> PR
    CF2 -- no --> ABORT
    R -->|build failure| T(test_results.prompt)
    RT -->|tests fail| T
    T -->|fail| FX(fixup_suggestions.prompt)
    PR --> M(merge_guard.prompt)
    M --> DONE[Code merged ✅]
    subgraph rollbacks
        V -->|high risk detected| RB(rollback_plan.prompt)
    end
    subgraph "Auto-Publish Path"
        RC -.->|auto-chain| PR
        PR -.->|auto-chain| M
    end

How to Use the Data Workflow

Start with Intent Classification: "Use the interpret_intent prompt to understand what I'm trying to achieve with customer segmentation"
Fill Context Gaps: "Apply the context_gap prompt to identify what schema information we need"
Generate Code or Documentation: "Use the generate_code prompt to create a dbt model for customer segments"
Validate and Test: "Apply the validate_risk prompt to check if this model change is safe"
Create PR and Merge: "Use the create_pr prompt to generate a pull request for this new model"

📚 Available Prompts

Data Workflow Prompts

`getInterpretIntentPrompt`

Description: Returns a prompt to help clarify user goals and classify data workflow intents. Use this tool to determine if a user is asking a question, requesting a code change, or needing documentation updates.
Returns: The full contents of interpret_intent.prompt

`getContextGapPrompt`

Description: Returns a prompt to identify missing information needed to address a data workflow request. Use this tool to analyze context gaps and gather necessary schema information or business logic.
Returns: The full contents of context_gap.prompt

`getGenerateCodePrompt`

Description: Returns a prompt to generate high-quality dbt code for models, macros, tests, and other artifacts. Use this tool to create SQL code that follows best practices and project standards.
Returns: The full contents of generate_code.prompt

`getGenDocsPrompt`

Description: Returns a prompt to generate comprehensive documentation for dbt models, columns, and tests. Use this tool to create clear explanations of business logic and data lineage.
Returns: The full contents of gen_docs.prompt

`getValidateRiskPrompt`

Description: Returns a prompt to assess risks in proposed data operations, particularly SQL queries and dbt model changes. Use this tool to evaluate potential impacts on data integrity, performance, and cost, and perform schema reconciliation between environments.
Returns: The full contents of validate_risk.prompt

`getRollbackPlanPrompt`

Description: Returns a prompt to generate comprehensive rollback plans for data operations. Use this tool to create scripts and procedures to restore data to its previous state if an operation fails.
Returns: The full contents of rollback_plan.prompt

`getRunDbtPrompt`

Description: Returns a prompt to help execute dbt commands safely and efficiently. Use this tool to get guidance on running models, tests, and other dbt operations in a controlled manner, including schema reconciliation execution steps.
Returns: The full contents of run_dbt.prompt

`getRunTestsPrompt`

Description: Returns a prompt to execute dbt build and pytest validation tests. Runs dbt build --select {{ models }} then pytest tests/prompt_formatting to ensure comprehensive validation of both data models and prompt formatting.
Returns: The full contents of run_tests.prompt

`getReconcilePrompt`

Description: Returns a prompt to automate local dbt build and Preset reconciliation workflow. Runs dbt build --select {{ models }}, validates build success, executes Preset MCP reconciliation, and summarizes differences in a clear Markdown table format. Chains to auto-publish path when all systems are green.
Returns: The full contents of reconcile.prompt

`getTestResultsPrompt`

Description: Returns a prompt to analyze and interpret dbt test results. Use this tool to understand test outcomes, prioritize fixes, make data quality decisions, and validate schema reconciliation between environments.
Returns: The full contents of test_results.prompt

`getFixupSuggestionsPrompt`

Description: Returns a prompt to generate targeted code fixes for failed dbt tests or performance problems. Use this tool to get specific, actionable code changes to resolve identified issues.
Returns: The full contents of fixup_suggestions.prompt

`getCreatePRPrompt`

Description: Returns a prompt to generate comprehensive, well-structured pull request descriptions. Use this tool to create PR content that clearly communicates the purpose and implementation details of code changes, including schema reconciliation documentation.
Returns: The full contents of create_pr.prompt

`getMergeGuardPrompt`

Description: Returns a prompt to ensure pull requests meet all necessary requirements before being merged. Use this tool to validate that code reviews, tests, schema reconciliation, and other quality checks have been completed.
Returns: The full contents of merge_guard.prompt

🔄 Schema Reconciliation Features

The data workflow prompts include comprehensive schema reconciliation capabilities to ensure safe and consistent schema changes across environments:

Schema Reconciliation Process

Risk Assessment (validate_risk.prompt):
- Identifies schema changes (column additions, removals, type changes)
- Quantifies impact on data volume and downstream dependencies
- Classifies risk level of schema changes (high, medium, low)
- Generates detailed schema comparison tables
Execution (run_dbt.prompt):
- Provides pre-execution reconciliation steps to capture baseline schemas
- Includes specialized commands for reconciliation mode execution
- Offers post-execution verification queries to validate changes
- Generates schema change summary reports
Validation (test_results.prompt):
- Determines if schema reconciliation is needed based on test results
- Performs reconciliation checks for schema comparison and data integrity
- Analyzes downstream impact of schema changes
- Provides reconciliation recommendations and verification steps
Documentation (create_pr.prompt):
- Generates comprehensive schema reconciliation documentation for PRs
- Creates detailed schema comparison tables showing changes
- Includes data volume impact analysis
- Documents downstream dependency impacts
Verification (merge_guard.prompt):
- Ensures schema reconciliation is completed before approving merges
- Verifies reconciliation documentation is complete and accurate
- Identifies potential reconciliation blockers
- Provides final reconciliation approval conditions

Example Schema Comparison Table

## 📊 SCHEMA COMPARISON: `model_name`

| Column Name | Production Type | Dev Type | Status | Impact | Rows Affected |
|-------------|-----------------|----------|---------|---------|---------------|
| user_id | INTEGER | INTEGER | ✅ MATCH | - | - |
| email | VARCHAR(255) | VARCHAR(500) | ⚠️ MODIFIED | Size increased | 0 |
| created_at | TIMESTAMP | TIMESTAMP | ✅ MATCH | - | - |
| new_column | - | VARCHAR(100) | 🆕 ADDED | New data point | All rows |
| old_column | VARCHAR(50) | - | 🗑️ REMOVED | Data loss risk | 1,234,567 |

**Summary**: 5 columns analyzed • 1 modified • 1 added • 1 removed • ⚠️ Data loss risk

These schema reconciliation features ensure that schema changes are properly assessed, documented, and verified throughout the development and deployment process, reducing the risk of data integrity issues and unexpected impacts on downstream dependencies.

🛠️ Development

Local Development Workflow

Folder Structure

src/
├── llm-prompts/               # Root directory for all prompts
│   └── data-workflow/         # Data workflow prompts directory
│       ├── interpret_intent.prompt
│       ├── context_gap.prompt
│       ├── generate_code.prompt
│       ├── gen_docs.prompt
│       ├── validate_risk.prompt
│       ├── rollback_plan.prompt
│       ├── run_dbt.prompt
│       ├── test_results.prompt
│       ├── fixup_suggestions.prompt
│       ├── create_pr.prompt
│       └── merge_guard.prompt
└── mcp-server/               # MCP server implementation
    └── server.ts

Adding New Prompts

Add a new prompt to src/llm-prompts or create a new subdirectory for related prompts
Register the prompt in src/mcp-server/server.ts
Update the mcp-prompts-manifest.json by running the refresh script

Updating Existing Prompts

Edit the prompt file in the appropriate directory
Run the refresh script to update the manifest file:
```
bash scripts/refresh-mcp-prompts.sh
```
Rebuild the server to pick up the changes

Run the MCP server locally (with hot reload)

yarn dev

Debug with MCP Inspector

yarn build
yarn inspector

📖 References

MCP SDK Documentation

作者情報

markoniga

GitHub

フォロワー

リポジトリ

Gist

貢献数