Interpaws: Agentic AI for Veterinary Practice Management
A Student Innovation Project exploring the application of Vector Embeddings and ReAct Agents to solve healthcare scheduling bottlenecks in veterinary practice management.
pgvector + Embeddings
384-dimensional semantic search for staff-patient matching
ReAct Agent Loop
Autonomous reasoning and action execution cycles
Local LLM Inference
Ollama-powered Llama 3 / Qwen for on-premise AI
Full-Stack Architecture
Next.js 15 + FastAPI + PostgreSQL stack
Project Technical Specifications
384
Embedding Dimensions
<200ms
Vector Query Latency
ReAct
Agent Architecture
100%
Local Inference
Innovation Claims & Technical Contributions
Three core innovations that differentiate Interpaws from conventional practice management systems. Each leverages modern AI/ML techniques to solve real veterinary workflow bottlenecks.
Beyond Keyword Search
Semantic Staff Matching
Moving beyond keyword search. Utilizing pgvector and 384-dimensional embeddings to match unstructured patient complaints with veterinarian skill sets mathematically.
-- Vector similarity search
SELECT staff.name,
1 - (staff.skill_embedding <=> $1) AS similarity
FROM staff
ORDER BY staff.skill_embedding <=> $1
LIMIT 3;Embedding Model
all-MiniLM-L6-v2
Dimensions
384
Similarity
Cosine
Reason + Act Agents
The ReAct Paradigm
Implementing 'Reason + Act' agents. Unlike standard chatbots, Interpaws uses a continuous execution loop to query inventory and calendars autonomously before responding.
# ReAct Agent Loop
while not done:
thought = llm.think(observation)
action = llm.decide(thought)
observation = execute(action)
if action == "respond":
done = TrueArchitecture
ReAct Loop
LLM Backend
Ollama
Model
Llama 3 / Qwen
Automated Patient Outreach
Proactive Wellness Loops
Automated identification of at-risk patients via temporal SQL analysis, triggering LLM-generated, personalized outreach emails.
-- Find overdue patients
SELECT pet.name, owner.email,
AGE(NOW(), MAX(visit.date)) as since_last
FROM pets
JOIN visits ON pet.id = visit.pet_id
GROUP BY pet.id
HAVING AGE(NOW(), MAX(visit.date)) > '12 months';Trigger
> 12 months
Analysis
Temporal SQL
Output
LLM Email
Semantic Search
Natural language queries matched against embedded staff profiles
Autonomous Loops
Self-correcting agent cycles until task completion
LLM Synthesis
Context-aware responses generated from retrieved data
Research Foundation
This project builds on established research in semantic search (Reimers & Gurevych, 2019), ReAct prompting (Yao et al., 2022), and healthcare scheduling optimization. All AI inference runs locally via Ollama, ensuring data privacy and HIPAA-aligned architecture.
Full-Stack Technical Implementation
A modern, production-grade architecture combining Next.js, FastAPI, and local LLM inference. Every component is designed for scalability, maintainability, and data privacy.
Frontend
Next.js 15 App Router & Shadcn/UI
Modern React framework with server-side rendering, type-safe routing, and optimized bundle splitting.
Backend
FastAPI, SQLAlchemy, Alembic
High-performance async Python backend with automatic OpenAPI docs and comprehensive data validation.
AI Engine
Ollama (Llama 3/Qwen) + SentenceTransformers
Fully local AI stack ensuring data privacy. No API calls leave your infrastructure.
Data Flow Pipeline
From user query to intelligent response in five stages
User Input
Natural language query from client portal
Vector Embedding
Text → 384-dim vector via SentenceTransformers
Cosine Similarity
pgvector search against staff skill embeddings
LLM Synthesis
Ollama generates contextual response
Response
Structured output to user interface
Architecture
Microservices
Database
PostgreSQL + pgvector
API Style
REST + WebSocket
Deployment
Docker Compose
backend/
├── app/
│ ├── api/
│ │ ├── routes/
│ │ │ ├── bookings.py
│ │ │ ├── staff.py
│ │ │ └── chat.py # ReAct Agent
│ │ └── deps.py
│ ├── core/
│ │ ├── llm.py # Ollama client
│ │ └── embeddings.py # SentenceTransformers
│ ├── models/ # SQLAlchemy models
│ └── services/
│ ├── vector_search.py # pgvector queries
│ └── agent.py # ReAct loop
└── alembic/ # Migrationsfrontend/
├── app/
│ ├── (auth)/
│ │ ├── login/
│ │ └── register/
│ ├── (dashboard)/
│ │ ├── admin/
│ │ │ ├── staff/
│ │ │ ├── bookings/
│ │ │ └── vector-logs/ # Embedding inspector
│ │ └── client/
│ │ ├── pets/
│ │ └── chat/ # Agent interface
│ └── components/
│ └── ui/ # Shadcn components
└── lib/
└── api.ts # Type-safe API clientTechnical Deep Dive
This project demonstrates a complete full-stack AI application with vector embeddings, ReAct agent architecture, and local LLM inference for veterinary practice management.
About this showcase: This site demonstrates the project's technical concepts and architecture. The full system includes local Ollama inference, PostgreSQL with pgvector, and a FastAPI backend.
Want to Learn More?
Have questions about the project or want to discuss the technical implementation? Feel free to reach out.