Local-First Multimodal Assistant · Hybrid RAG · Vision Tools

PathFinder-Ship

A production-style local AI assistant that routes intent, answers from private documents, gates web evidence, controls browser camera tools, and runs YOLO-NAS object detection through one FastAPI-backed web experience.

FastAPI ONNX INT8 Flan-T5 783M MiniLM Router Chroma + BM25 YOLO-NAS

View Repository Watch Demo Video

PathFinder-Ship assistant interface preview

Overview

PathFinder-Ship is not a simple chatbot wrapper. It is a local-first assistant architecture where every user message is classified, routed, grounded, executed, or safely rejected depending on intent confidence and context quality.

The project combines Passenger-Bot, an ONNX-deployed Flan-T5 Large model, with a MiniLM intent classifier, hybrid local retrieval, optional web evidence, browser camera control, object detection, local chat history, upload flows, voice mode, and Markdown export in one interface.

Runtime Local CPU

ONNX INT8 models run without remote inference

Core Model Flan-T5 783M

Passenger-Bot for chat, RAG, fallback, and tool narrations

Router MiniLM

Intent classifier for camera, detection, photo, and chat routes

Retrieval Hybrid RAG

ChromaDB vector search plus SQLite FTS5/BM25 scoring

Role Full-stack AI system design, backend services, local inference, frontend workflow

Problem Unify chat, RAG, web search, camera control, and object detection without cloud inference

Core Value No canned replies: responses are generated from model prompts, tools, and selected context

Demo Video

A working assistant loop, from intent to tool execution.

The demo shows the assistant switching between natural-language routing, camera actions, object detection, document QA, and export-oriented workflows.

Working Demo

Intent routing, browser camera access, YOLO-NAS detection, grounded answers, and chat export in one flow.

System Idea

One assistant, several specialized paths, no manual mode switching.

The user writes a natural-language request. The system decides whether to answer, retrieve, search, open the camera, capture, detect, or fall back.

Private Knowledge

Local PDFs, DOCX, and TXT files become a searchable personal corpus with semantic and keyword retrieval.

Tool-Aware Routing

MiniLM recognizes commands like open camera, take photo, object detect, close camera, or normal chat.

Visual Actions

YOLO-NAS can analyze live camera frames or uploaded images, save annotated results, and generate a careful summary.

Request Flow

The backend acts like a traffic controller for AI tools.

This request path is the main engineering story: route first, then choose the safest and most useful execution path.

Intent First

The frontend sends each message to /api/intent so the system can decide whether it is a tool command or a chat/RAG request.

Command Route

High-confidence camera or vision intents bypass normal chat and execute browser/backend actions immediately.

Knowledge Route

Non-command messages move through local RAG, web-augmented RAG, or model-only fallback depending on context strength.

Grounding Gate

Local retrieval score and web_strength decide whether to use local chunks, web chunks, both, or no external context.

Generated Answer

Passenger-Bot generates the final response from the selected prompt: chat, grounded RAG, fallback, or tool narration.

Architecture

A single FastAPI backend coordinates small, focused service layers instead of hiding everything behind one large chat endpoint. That makes the system easier to reason about: router, generator, retrieval, web search, vision, storage, email, and frontend each have clear responsibilities.

Frontend SPA

Chat sessions, camera access, file upload, localStorage history, Markdown rendering, web-search toggle, dark mode, and optional voice mode.

FastAPI Orchestrator

Single backend coordinates /api/intent, /api/chat, /api/rag, /api/photo, /api/detect, /api/upload, and /api/health.

Intent Router

MiniLM ONNX returns intent plus confidence for open_camera, close_camera, take_photo, object_detect, and chat decisions.

Generation Layer

Flan-T5 Large ONNX uses different prompt templates for chat, strict RAG answers, weak-context fallback, and vision action narration.

Retrieval Layer

Documents are cleaned, chunked, embedded with all-MiniLM-L6-v2, stored in ChromaDB, and indexed in SQLite FTS5 for BM25 search.

Vision Layer

YOLO-NAS ONNX detects objects from browser camera frames or uploaded images, draws annotations, saves outputs, and can email results.

RAG & Web Gating

Grounding is scored before it is trusted.

The assistant does not blindly stuff every search result into the prompt. Local chunks and web chunks are scored first, then the backend chooses local-only, web-only, local plus web, or a model-only fallback when evidence is too weak.

Document Index

PDF, DOCX, and TXT files are loaded, normalized, chunked with overlap, embedded, and indexed for semantic plus keyword retrieval.

Hybrid Score

Chroma similarities and BM25 scores are normalized into one relevance score using vector and keyword weights.

Web Gating

DuckDuckGo results are cleaned and chunked, then allowed into the answer only when web_strength passes the trust threshold.

Vision Pipeline

Camera commands become structured detection outputs.

The vision path is practical rather than decorative: it opens the browser camera, captures frames, runs detection, stores outputs, and returns both image and text.

Camera Tools

Natural-language commands can open the browser camera, close it, capture a frame, or trigger detection without manual tool selection.

YOLO-NAS ONNX

The backend preprocesses frames, applies detection and NMS, de-letterboxes boxes, and returns labels, confidence, summary, and image URL.

Storage & Email

Photo and detection outputs are saved with a ring buffer so the last N images stay available, then optional SMTP delivery sends attachments.

Technology Roles

FastAPI

Backend API and orchestration layer

ONNX Runtime

Local CPU inference for MiniLM, Flan-T5, and YOLO-NAS

Flan-T5 Large

Passenger-Bot response generation and tool narration

MiniLM

Intent classification and document embeddings

ChromaDB

Persistent vector store for document chunks

SQLite FTS5

BM25 keyword retrieval for hybrid search

YOLO-NAS

Object detection on camera frames and uploads

Vanilla JS

SPA chat, camera, upload, voice, and local history

My Contribution

I designed and implemented the full assistant architecture: FastAPI routes, intent routing, ONNX model services, hybrid RAG indexing and search, web-search gating, YOLO-NAS detection, camera/photo/upload endpoints, storage and email utilities, frontend interaction flow, chat history, voice support, and Markdown export. The project demonstrates how local AI models can be connected to real user-facing tools without reducing the system to a simple chatbot.

Technology Stack

Python FastAPI ONNX Runtime Flan-T5 Large MiniLM SentenceTransformers ChromaDB SQLite FTS5 BM25 DuckDuckGo / ddgs YOLO-NAS OpenCV Vanilla JavaScript Web Speech API LocalStorage