Quick Start
Get up and running with Scout AI in minutes. This path keeps setup simple while still preparing all core services and credentials.
Clone Repository
Fetch the project and move into the workspace.
Run Setup Assistant
Use the guided installer to configure dependencies and services.
Start Stack
Boot API, worker, Redis, Postgres, and Qdrant in one command.
Docker Setup
Scout AI is designed to run in a production-ready containerized environment. This ensures 100% environment parity and robust service orchestration. This is the recommended approach for most users.
Local Setup (No Docker)
For development or systems where Docker is unavailable, you can run all services locally. This approach requires manual service management but gives you full control. Requires Python 3.9+, Node.js 18+, Redis, and PostgreSQL installed on your system.
Prerequisites
1. Set Up Environment Files
Create two .env files for backend and frontend configurations:
2. Start Database and Cache Services
In separate terminals, start PostgreSQL and Redis:
3. Set Up Backend
4. Start Celery Worker
5. Start Qdrant (Vector Database)
Download and run Qdrant locally. You can either use Docker just for Qdrant or download the binary:
6. Set Up Frontend
Technical Brief
This project is a production-style AI automation platform. It is designed to be explainable to hiring teams while still implementing advanced engineering patterns.
Agentic Workflow
LangGraph orchestrates discovery, matching, ranking, messaging, and notification as explicit stages.
RAG-Style Matching
Resume sections are embedded and retrieved from Qdrant to score each job with evidence-based semantic similarity.
Operational Reliability
Celery workers, retries, scheduler recovery, and run tracking provide resilient asynchronous processing.
| Technical concept | What it means in this product | Why it matters for outcomes |
|---|---|---|
| Multi-source discovery | Collects jobs from major boards plus custom URLs, then routes extraction per platform. | Increases coverage while reducing manual search effort. |
| Dedup + freshness controls | Applies URL dedupe, seen-job filtering, and recency-aware retries. | Keeps results fresh and avoids repeated listings. |
| LLM structured parsing | Converts unstructured pages into normalized job fields used downstream. | Enables consistent matching and ranking quality. |
| Vector semantic retrieval | Matches full job context against resume chunk embeddings with user-level filters. | Finds deeper fit than keyword-only approaches. |
| Weighted ranking | Combines semantic score, recency, source quality, and configurable penalties. | Prioritizes opportunities most likely worth action. |
| LLM routing controls | Uses provider fallback, circuit breakers, caching, and quotas. | Balances reliability, speed, and cost. |
| Outreach automation | Generates concise email and LinkedIn drafts with sanitization and concurrency limits. | Reduces application friction while preserving message relevance. |
| Observable run history | Stores status, timing, counts, and errors for each pipeline execution. | Makes quality review and iteration measurable. |
Project Structure
The codebase is partitioned into distinct domains to ensure high maintainability and testability across the automation pipeline.
Local Development
For deep debugging or contributing, a manual local setup is recommended. This requires Python 3.9+ and Node.js 18+.
Backend Setup
Frontend Setup
How The Platform Works
The platform is built as a complete pipeline from resume ingestion to ranked job delivery and personalized notification. Here is the end-to-end flow used in local and production runs.
Upload Resume(s)
Users can upload one or multiple resumes. Each resume is chunked, embedded, and stored with user-specific metadata for isolated retrieval.
Set User Preferences
Configure role preferences like target job titles, experience level, location, work mode, and other filtering parameters from the User Preferences section.
Add Search URLs
Provide LinkedIn, Indeed, Reddit, or custom links to scrape. Search URLs can be persisted and reused across future runs.
Store Browser Session
For better result quality, authenticated browser state can be captured so scrapers access richer pages and avoid guest-mode limits.
Discovery Agent Scrapes Jobs
The discovery stage crawls the configured URLs, normalizes job postings, and deduplicates records before ranking.
Resume Matching Agent Scores Relevance
Jobs are semantically matched against resume embeddings. If multiple resumes exist, the best fit resume is selected per job.
Ranking Agent Prioritizes Opportunities
Final ranking combines semantic match quality, posting recency, and source signals to surface the highest-value opportunities first.
Messaging Agent Generates Outreach
Personalized email or LinkedIn-style outreach drafts are generated for top-ranked opportunities.
Email Notification Is Sent
The final digest is delivered to the configured recipient with top jobs, confidence, and suggested messaging.
API Keys & Secrets Reference
Configure these in your .env file to run locally. The table includes required, optional, and conditional keys used by the platform.
| Variable | Purpose | Where to get it |
|---|---|---|
MISTRAL_API_KEY |
Embeddings and LLM fallback | mistral.ai |
GROQ_API_KEY |
Primary fast LLM routing and failover | groq.com |
JWT_SECRET_KEY |
API auth token signing | Generate a long random secret |
AUTH_SECRET |
Frontend session encryption (NextAuth) | Generate with openssl rand -base64 32 |
QDRANT_API_KEY |
Auth for secured/cloud Qdrant (optional) | Qdrant Cloud dashboard |
LANGCHAIN_API_KEY |
Tracing/observability only (optional) | smith.langchain.com |
EMAIL_PASSWORD |
SMTP app password for digest emails | Gmail App Passwords |
Troubleshooting
-
Docker Port Conflict
If 5432 or 6333 are taken, stop local Postgres/Qdrant services before starting Docker.
-
LLM Rate Limits
Ensure your Groq/Mistral keys have active credits.