Marketplace Report Crawler v2.0
Key Principle: Agent-Native + Session Persistence
Login happens ONCE. OTP is entered ONCE. Sessions are saved FOREVER.
Marketplaces require OTP/CAPTCHA to login. OTP codes are held by customers/accountants and can't be requested repeatedly. Therefore:
- First login: AI agent uses its OWN browser to login (bypasses CAPTCHA naturally)
- Export cookies: After login, cookies are saved to disk
- Future runs: Inject saved cookies → skip login entirely
- Only re-login: When session actually expires (rare)
🔒 Security Rules
CRITICAL: These rules MUST be followed at all times.
- NEVER read/log/print
username/passwordcolumns fromaccounts.csv - ONLY use helper scripts to access credentials:
node scripts/get-credential.js <id> username→ raw usernamenode scripts/get-credential.js <id> password→ raw password
- Credentials go directly into browser form fields — never stored in context
Architecture
scripts/
├── get-login-info.js # Platform config (URL, selectors) — NO credentials
├── get-credential.js # Raw credential value for browser fill
├── session-store.js # Cookie persistence (save/check/inject/list)
├── auto-login.js # Automated login (agent-browser) + agent-delegate
├── pipeline.js # Sequential pipeline orchestrator
├── csv-reader.js # CSV parser with credential masking
├── logger.js # Activity + Bug logging
└── run-report.js # Progress report generator
config/
├── accounts.csv # 🔐 Credentials (gitignored)
├── pipeline-config.json # Platform selectors, timeouts
└── report-types.json # 25 report type definitions
~/.marketplace-crawler/sessions/ # Persistent cookie storage
└── <account_id>/
├── cookies.json
└── session-meta.jsonWorkflow: First-Time Login (One-Time)
Step 1: Get Login Info
bash
node scripts/get-login-info.js zott-lazada
# Returns: { loginUrl, platform, dashboardUrl, loginSelectors }Step 2: Login via Agent Browser
Use YOUR browser tool (browser_subagent / Claude browser / Manus Chrome):
- Navigate to the
loginUrlfrom step 1 - Get credentials:bash
node scripts/get-credential.js zott-lazada username # → nguy...@zott.vn node scripts/get-credential.js zott-lazada password # → raw password - Fill username into the login form
- Fill password
- Click login button
- If CAPTCHA: Solve it visually (your browser looks like a real user)
- If OTP: Ask user for the 6-digit code, enter it
- Verify you reached the dashboard
Step 3: Export & Save Cookies
After successful login, run this in the browser console:
javascript
// Get all cookies for the current domain
const cookies = document.cookie.split(';').map(c => {
const [name, ...rest] = c.trim().split('=');
return { name, value: rest.join('='), domain: location.hostname, path: '/' };
});
JSON.stringify(cookies);Save the output:
bash
echo '<cookies_json>' | node scripts/session-store.js save zott-lazada agent-browserStep 4: Verify Session Saved
bash
node scripts/session-store.js list
node scripts/session-store.js check zott-lazadaWorkflow: Subsequent Runs (Automatic)
bash
# 1. Check if session is still valid
node scripts/session-store.js check zott-lazada
# 2. Inject cookies and navigate to dashboard
node scripts/session-store.js inject zott-lazada
# → Outputs JS to set cookies in browser
# 3. Run pipeline for reports
node scripts/pipeline.js --period 202603 --platform lazada --dry-run
node scripts/pipeline.js --period 202603 --platform lazadaAuto-Login Fallback
For accounts without CAPTCHA/OTP challenges, auto-login.js can automate login:
bash
# Automated login via agent-browser CLI
node scripts/auto-login.js --account zott-lazada --headed
# If CAPTCHA detected → exits with code 2 + JSON with screenshot
# Agent then takes over with its own browser
# After agent solves challenge:
node scripts/auto-login.js --account zott-lazada --verifyPipeline Commands
bash
# Dry run — see what will happen
node scripts/pipeline.js --period 202603 --dry-run
# Run specific platform
node scripts/pipeline.js --period 202603 --platform lazada
# Run single account
node scripts/pipeline.js --period 202603 --account zott-lazada --headed
# Resume after interruption
node scripts/pipeline.js --period 202603 --resumeReport Types (25 total)
| Platform | Reports | Format |
|---|---|---|
| Shopee (7) | Income, PDF Report, Wallet, Affiliate Invoice/Product, Ads, Ads Wallet | Excel/PDF |
| Lazada (7) | Income, PDF Report, Wallet, Ads Onsite/Product, Ads Wallet, Shipping Fee | Excel/PDF |
| TikTok (7) | Income, Wallet, Affiliate Orders/Invoice, Ads Dashboard/Cost/Revenue | Excel |
Session Store Commands
bash
node scripts/session-store.js save <id> # Save cookies from stdin
node scripts/session-store.js check <id> # Check validity
node scripts/session-store.js inject <id> # Output injection JS
node scripts/session-store.js list # List all sessions
node scripts/session-store.js export-script <id> # Browser export helper