Documentation | Scout AI

Quick Start

Get up and running with Scout AI in minutes. This path keeps setup simple while still preparing all core services and credentials.

01

Clone Repository

Fetch the project and move into the workspace.

02

Run Setup Assistant

Use the guided installer to configure dependencies and services.

03

Start Stack

Boot API, worker, Redis, Postgres, and Qdrant in one command.

Quick setup commands

# 1) Clone and enter the project
git clone https://github.com/Ha4sh-447/Scout-ai.git
cd Scout-ai

# 2) Launch the interactive setup system
python setup.py

# 3) Start containers
docker compose up -d

The setup.py assistant detects your platform and walks through environment creation, dependency installation, and health checks.

Docker Setup

Scout AI is designed to run in a production-ready containerized environment. This ensures 100% environment parity and robust service orchestration. This is the recommended approach for most users.

# Start the entire stack in detached mode
docker compose up -d

# View service status
docker compose ps

Active Ports: Frontend (3000), API (8001), Qdrant (6333), Redis (6379), Postgres (5432).

Local Setup (No Docker)

For development or systems where Docker is unavailable, you can run all services locally. This approach requires manual service management but gives you full control. Requires Python 3.9+, Node.js 18+, Redis, and PostgreSQL installed on your system.

Prerequisites

# On macOS (using Homebrew)
brew install python@3.11 node redis postgresql

# On Ubuntu/Debian
sudo apt install python3.11 nodejs redis-server postgresql

1. Set Up Environment Files

Create two .env files for backend and frontend configurations:

# Root .env for backend/API services
cat > .env << EOF
DATABASE_URL=postgresql://user:password@localhost:5432/scout_ai
REDIS_URL=redis://localhost:6379/0
CELERY_BROKER_URL=redis://localhost:6379/0
QDRANT_URL=http://localhost:6334
JWT_SECRET_KEY=your-long-random-secret-key-here
GROQ_API_KEY=your-groq-api-key
MISTRAL_API_KEY=your-mistral-api-key
EMAIL_SENDER=your-email@gmail.com
EMAIL_PASSWORD=your-app-password
EMAIL_SMTP_HOST=smtp.gmail.com
EMAIL_SMTP_PORT=587
LANGCHAIN_API_KEY=optional-langchain-key
EOF

# Frontend .env.local for authentication
cat > frontend/.env.local << EOF
NEXT_PUBLIC_API_URL=http://localhost:8001
AUTH_SECRET=your-nextauth-secret-key-generate-with-openssl
EOF

2. Start Database and Cache Services

In separate terminals, start PostgreSQL and Redis:

# Terminal 1: Start PostgreSQL
postgres -D /usr/local/var/postgres  # macOS
# or on Linux:
sudo systemctl start postgresql

# Terminal 2: Start Redis
redis-server

3. Set Up Backend

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium

# Run database migrations
alembic upgrade head

# Terminal 3: Start FastAPI server
uvicorn api.main:app --reload

4. Start Celery Worker

# Terminal 4: Start Celery worker for async tasks
celery -A workers.tasks worker --loglevel=info

5. Start Qdrant (Vector Database)

Download and run Qdrant locally. You can either use Docker just for Qdrant or download the binary:

# Terminal 5: Using Docker for just Qdrant
docker run -d -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Or download pre-built binary from: https://qdrant.tech/documentation/quick-start/

6. Set Up Frontend

# Terminal 6: Start Next.js dev server
cd frontend
npm install
npm run dev

Technical Brief

This project is a production-style AI automation platform. It is designed to be explainable to hiring teams while still implementing advanced engineering patterns.

Agentic Workflow

LangGraph orchestrates discovery, matching, ranking, messaging, and notification as explicit stages.

RAG-Style Matching

Resume sections are embedded and retrieved from Qdrant to score each job with evidence-based semantic similarity.

Operational Reliability

Celery workers, retries, scheduler recovery, and run tracking provide resilient asynchronous processing.

Technical concept	What it means in this product	Why it matters for outcomes
Multi-source discovery	Collects jobs from major boards plus custom URLs, then routes extraction per platform.	Increases coverage while reducing manual search effort.
Dedup + freshness controls	Applies URL dedupe, seen-job filtering, and recency-aware retries.	Keeps results fresh and avoids repeated listings.
LLM structured parsing	Converts unstructured pages into normalized job fields used downstream.	Enables consistent matching and ranking quality.
Vector semantic retrieval	Matches full job context against resume chunk embeddings with user-level filters.	Finds deeper fit than keyword-only approaches.
Weighted ranking	Combines semantic score, recency, source quality, and configurable penalties.	Prioritizes opportunities most likely worth action.
LLM routing controls	Uses provider fallback, circuit breakers, caching, and quotas.	Balances reliability, speed, and cost.
Outreach automation	Generates concise email and LinkedIn drafts with sanitization and concurrency limits.	Reduces application friction while preserving message relevance.
Observable run history	Stores status, timing, counts, and errors for each pipeline execution.	Makes quality review and iteration measurable.

This architecture maps directly to common AI/LLM job keywords: agentic workflows, RAG pipelines, semantic matching, LLM reliability controls, API-first integration, and asynchronous production execution.

Project Structure

The codebase is partitioned into distinct domains to ensure high maintainability and testability across the automation pipeline.

High-level file map

├── agents/          # Agent orchestration: discovery, matching, ranking, messaging
├── api/             # FastAPI auth, users, jobs, pipeline endpoints
├── core/            # LLM router, embeddings, qdrant integrations
├── db/              # SQLAlchemy models, base, migrations
├── extractors/      # Job parsing, dedupe, content cleaning
├── frontend/        # Next.js dashboard and user workflows
├── scrapers/        # Playwright and platform-specific scraping
└── workers/         # Celery tasks for async processing and emails

Local Development

For deep debugging or contributing, a manual local setup is recommended. This requires Python 3.9+ and Node.js 18+.

Backend Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install chromium

Frontend Setup

cd frontend
npm install
npm run dev

How The Platform Works

The platform is built as a complete pipeline from resume ingestion to ranked job delivery and personalized notification. Here is the end-to-end flow used in local and production runs.

1

Upload Resume(s)

Users can upload one or multiple resumes. Each resume is chunked, embedded, and stored with user-specific metadata for isolated retrieval.

2

Set User Preferences

Configure role preferences like target job titles, experience level, location, work mode, and other filtering parameters from the User Preferences section.

3

Add Search URLs

Provide LinkedIn, Indeed, Reddit, or custom links to scrape. Search URLs can be persisted and reused across future runs.

4

Store Browser Session

For better result quality, authenticated browser state can be captured so scrapers access richer pages and avoid guest-mode limits.

5

Discovery Agent Scrapes Jobs

The discovery stage crawls the configured URLs, normalizes job postings, and deduplicates records before ranking.

6

Resume Matching Agent Scores Relevance

Jobs are semantically matched against resume embeddings. If multiple resumes exist, the best fit resume is selected per job.

7

Ranking Agent Prioritizes Opportunities

Final ranking combines semantic match quality, posting recency, and source signals to surface the highest-value opportunities first.

8

Messaging Agent Generates Outreach

Personalized email or LinkedIn-style outreach drafts are generated for top-ranked opportunities.

9

Email Notification Is Sent

The final digest is delivered to the configured recipient with top jobs, confidence, and suggested messaging.

API Keys & Secrets Reference

Configure these in your .env file to run locally. The table includes required, optional, and conditional keys used by the platform.

Variable	Purpose	Where to get it
`MISTRAL_API_KEY`	Embeddings and LLM fallback	mistral.ai
`GROQ_API_KEY`	Primary fast LLM routing and failover	groq.com
`JWT_SECRET_KEY`	API auth token signing	Generate a long random secret
`AUTH_SECRET`	Frontend session encryption (NextAuth)	Generate with `openssl rand -base64 32`
`QDRANT_API_KEY`	Auth for secured/cloud Qdrant (optional)	Qdrant Cloud dashboard
`LANGCHAIN_API_KEY`	Tracing/observability only (optional)	smith.langchain.com
`EMAIL_PASSWORD`	SMTP app password for digest emails	Gmail App Passwords

Also required local config values: DATABASE_URL, REDIS_URL, CELERY_BROKER_URL, QDRANT_URL, EMAIL_SENDER, EMAIL_SMTP_HOST, EMAIL_SMTP_PORT.
REDIS_URL and CELERY_BROKER_URL should point to the same Redis endpoint.

Troubleshooting

⛔
Docker Port Conflict
If 5432 or 6333 are taken, stop local Postgres/Qdrant services before starting Docker.
⚠️
LLM Rate Limits
Ensure your Groq/Mistral keys have active credits.