PandasAI Analytics

Overview

PandasAI lets you chat with your DataFrames using natural language. It generates pandas code + matplotlib charts behind the scenes via an LLM. Use it as a supplementary analytics layer — NOT as core logic.

Core principle: PandasAI is a report/insight tool, not a computation engine. Always keep it separate from production pipelines.

When to Use

Generating visual reports from backtest/evaluation results
Exploring data patterns interactively ("which SKUs have highest error?")
Creating AI-narrated analytics summaries
Building ad-hoc dashboards from forecast output
Comparing model performance across segments/folds
Quick data profiling before modeling

When NOT to use:

Production inference pipelines (non-deterministic)
Core model training logic
Real-time API responses (LLM latency)
Security-sensitive data without Docker sandbox

Quick Setup

python

import pandasai as pai
from pandasai_litellm.litellm import LiteLLM

# Option 1: Gemini (recommended for cost)
llm = LiteLLM(model="gemini/gemini-2.0-flash", api_key=GEMINI_API_KEY)

# Option 2: OpenRouter 
llm = LiteLLM(
    model="openrouter/google/gemini-2.0-flash-001",
    api_key=OPENROUTER_KEY,
    api_base="https://openrouter.ai/api/v1",
)

# Option 3: OpenAI
llm = LiteLLM(model="gpt-4.1-mini", api_key=OPENAI_KEY)

pai.config.set({"llm": llm})

Use Cases for Forecast & Planner Projects

1. Backtest Result Analysis

python

import json, pandasai as pai

# Load backtest JSON → DataFrame
with open("output/backtest_results.json") as f:
    data = json.load(f)

rows = []
for fold in data["fold_results"]:
    om = fold.get("order_metrics", {})
    rows.append({
        "fold": fold["fold"],
        "train_window": fold["train_window"],
        "wmape": om.get("wmape"),
        "bias": om.get("bias"),
        "fnr": om.get("fnr"),
        "mae": om.get("mae"),
    })

df = pai.DataFrame(rows)

# Natural language queries
df.chat("Which fold has the lowest wMAPE? Show a bar chart comparing all folds")
df.chat("Is there a correlation between training window size and accuracy?")
df.chat("Summarize the bias trend — are we consistently over or under-predicting?")

2. Model Comparison (LightGBM vs AutoTS)

python

# Build comparison DataFrame from backtest
comparison = pai.DataFrame([
    {"model": "LightGBM", "wmape": 45.2, "mae": 1.23, "fnr": 12.5, "bias": -0.15},
    {"model": "AutoTS",   "wmape": 168.4, "mae": 2.89, "fnr": 16.1, "bias": 2.5},
])

comparison.chat("Create a grouped bar chart comparing all metrics between the two models")
comparison.chat("Which model is better for this use case and why? Consider all metrics.")

3. Segment Performance Deep-Dive

python

segments = pai.DataFrame([
    {"segment": "bau",     "wmape": 48.0, "bias": -0.12, "fnr": 10.0, "volume": 280},
    {"segment": "sale",    "wmape": 35.0, "bias":  0.25, "fnr":  5.0, "volume": 150},
    {"segment": "minor",   "wmape": 55.0, "bias": -0.30, "fnr": 18.0, "volume":  25},
    {"segment": "holiday", "wmape": 62.0, "bias": -0.45, "fnr": 22.0, "volume":  15},
])

segments.chat("Which segment needs the most improvement? Consider volume impact.")
segments.chat("Plot bias by segment. Highlight negative bias segments in red.")
segments.chat("What's the volume-weighted average wMAPE across all segments?")

4. SKU Type Profiling

python

# After data_profiler.run_profile() or from backtest sku_type_metrics
sku_df = pai.DataFrame(sku_type_data)

sku_df.chat("What percentage of total volume comes from each SKU type?")
sku_df.chat("Plot a pie chart of SKU type distribution by volume")
sku_df.chat("Which SKU type has the worst MAPE but highest volume? That's our priority.")

5. Event Impact Analysis

python

events = pai.DataFrame(event_tier_data)

events.chat("Plot MAPE by event tier. Which events are hardest to forecast?")
events.chat("Calculate the uplift ratio: event-day demand vs BAU average")
events.chat("Create a heatmap of error rate by event tier and day-of-week")

6. Time-Series Pattern Discovery

python

# Daily aggregated demand
daily = pai.DataFrame(daily_demand_data)

daily.chat("Plot the demand trend over the last 90 days with a 7-day moving average")
daily.chat("Are there any anomalous spikes? List dates with demand > 3x average")
daily.chat("What's the day-of-week seasonality pattern? Show as boxplot.")
daily.chat("Detect any trend change points in the last 6 months")

7. Workforce Planning Report

python

# staffing.py output → DataFrame
staffing = pai.DataFrame(staffing_data)

staffing.chat("Plot predicted headcount by warehouse for next 14 days")
staffing.chat("Which warehouses are understaffed vs overstaffed?")
staffing.chat("Calculate the cost impact of over-prediction vs under-prediction")

8. Feature Importance Analysis

python

# From model.get_feature_importance()
importance = pai.DataFrame([
    {"feature": name, "gain": value, "segment": seg}
    for seg, feats in model.get_feature_importance().items()
    for name, value in feats.items()
])

importance.chat("What are the top 10 features by gain for the 'sale' segment?")
importance.chat("Plot feature importance comparison between bau and sale segments")
importance.chat("Which features are important for sale but NOT for bau?")

General Analytics Use Cases

Data Exploration

python

df.chat("Show basic statistics for all numeric columns")
df.chat("What's the distribution of quantity? Show histogram with outlier markers")
df.chat("How many missing values per column? Show as percentage bar chart")

Cross-DataFrame Analysis

python

pai.chat("Join these datasets and find which products have orders but no forecast",
         orders_df, forecast_df)

Chart Customization

python

df.chat(
    "Plot monthly revenue trend as a line chart. "
    "Use dark theme, add grid lines, format y-axis as currency. "
    "Title: 'Monthly Revenue Trend 2025-2026'"
)

Integration Pattern (Standalone Add-on)

┌─────────────────────────────────────────┐
│         Core Pipeline (untouched)        │
│  pipeline.py → model.py → backtester.py │
│              ↓ JSON output               │
├─────────────────────────────────────────┤
│       Analytics Add-on (separate)        │
│  pandasai_config.py → report_generator   │
│              ↓ HTML report               │
└─────────────────────────────────────────┘

Key rules:

Never import PandasAI in core modules — keep it in src/analytics/
Read-only — analytics module only reads existing output files
Graceful degradation — works without API key (static charts only)
CLI entry point — scripts/run_analytics_report.py

Existing Implementation

The Boxme Auto Forecast project already has this integration:

File	Purpose
`src/analytics/pandasai_config.py`	Auto-detects Gemini/OpenRouter/OpenAI
`src/analytics/report_generator.py`	6 chart types + AI narrative + HTML
`scripts/run_analytics_report.py`	CLI: `--input`, `--all`, `--no-ai`
`tests/test_analytics_report.py`	17 unit tests

bash

# Generate report (static charts, no API key needed)
python3 scripts/run_analytics_report.py --input output/backtest.json --no-ai

# Generate with AI narrative
GEMINI_API_KEY=xxx python3 scripts/run_analytics_report.py --input output/backtest.json

Common Mistakes

Mistake	Fix
Using `pd.DataFrame` instead of `pai.DataFrame`	PandasAI needs its own DataFrame wrapper
Embedding PandasAI in production pipeline	Keep it as a separate analytics step
Asking vague questions	Be specific: "Plot wMAPE by segment as bar chart"
Not handling missing API keys	Always check `configure_pandasai()` return value
Charts not rendering in scripts	Set `matplotlib.use("Agg")` before importing pyplot
Large DataFrames causing token overflow	Aggregate data before passing to PandasAI

Dependencies

pandasai>=3.0.0
pandasai-litellm>=0.0.1  # For Gemini/OpenRouter/OpenAI
matplotlib>=3.7.0         # Chart rendering

Python 3.8-3.11 required (PandasAI constraint).

PandasAI Analytics ​

Overview ​

When to Use ​

Quick Setup ​

Use Cases for Forecast & Planner Projects ​

1. Backtest Result Analysis ​

2. Model Comparison (LightGBM vs AutoTS) ​

3. Segment Performance Deep-Dive ​

4. SKU Type Profiling ​

5. Event Impact Analysis ​

6. Time-Series Pattern Discovery ​

7. Workforce Planning Report ​

8. Feature Importance Analysis ​

General Analytics Use Cases ​

Data Exploration ​

Cross-DataFrame Analysis ​

Chart Customization ​

Integration Pattern (Standalone Add-on) ​

Existing Implementation ​

Common Mistakes ​

Dependencies ​

PandasAI Analytics

Overview

When to Use

Quick Setup

Use Cases for Forecast & Planner Projects

1. Backtest Result Analysis

2. Model Comparison (LightGBM vs AutoTS)

3. Segment Performance Deep-Dive

4. SKU Type Profiling

5. Event Impact Analysis

6. Time-Series Pattern Discovery

7. Workforce Planning Report

8. Feature Importance Analysis

General Analytics Use Cases

Data Exploration

Cross-DataFrame Analysis

Chart Customization

Integration Pattern (Standalone Add-on)

Existing Implementation

Common Mistakes

Dependencies