Student Innovation Project 2025

Interpaws: Agentic AI for Veterinary Practice Management

A Student Innovation Project exploring the application of Vector Embeddings and ReAct Agents to solve healthcare scheduling bottlenecks in veterinary practice management.

pgvector + Embeddings

384-dimensional semantic search for staff-patient matching

ReAct Agent Loop

Autonomous reasoning and action execution cycles

Local LLM Inference

Ollama-powered Llama 3 / Qwen for on-premise AI

Full-Stack Architecture

Next.js 15 + FastAPI + PostgreSQL stack

Project Technical Specifications

384

Embedding Dimensions

<200ms

Vector Query Latency

ReAct

Agent Architecture

100%

Local Inference

Technical Innovation

Innovation Claims & Technical Contributions

Three core innovations that differentiate Interpaws from conventional practice management systems. Each leverages modern AI/ML techniques to solve real veterinary workflow bottlenecks.

all-MiniLM-L6-v2384

Beyond Keyword Search

Semantic Staff Matching

Moving beyond keyword search. Utilizing pgvector and 384-dimensional embeddings to match unstructured patient complaints with veterinarian skill sets mathematically.

-- Vector similarity search
SELECT staff.name,
  1 - (staff.skill_embedding <=> $1) AS similarity
FROM staff
ORDER BY staff.skill_embedding <=> $1
LIMIT 3;

Embedding Model

all-MiniLM-L6-v2

Dimensions

384

Similarity

Cosine

ReAct LoopOllama

Reason + Act Agents

The ReAct Paradigm

Implementing 'Reason + Act' agents. Unlike standard chatbots, Interpaws uses a continuous execution loop to query inventory and calendars autonomously before responding.

# ReAct Agent Loop
while not done:
    thought = llm.think(observation)
    action = llm.decide(thought)
    observation = execute(action)
    if action == "respond":
        done = True

Architecture

ReAct Loop

LLM Backend

Ollama

Model

Llama 3 / Qwen

> 12 monthsTemporal SQL

Automated Patient Outreach

Proactive Wellness Loops

Automated identification of at-risk patients via temporal SQL analysis, triggering LLM-generated, personalized outreach emails.

-- Find overdue patients
SELECT pet.name, owner.email,
  AGE(NOW(), MAX(visit.date)) as since_last
FROM pets
JOIN visits ON pet.id = visit.pet_id
GROUP BY pet.id
HAVING AGE(NOW(), MAX(visit.date)) > '12 months';

Trigger

> 12 months

Analysis

Temporal SQL

Output

LLM Email

Semantic Search

Natural language queries matched against embedded staff profiles

Autonomous Loops

Self-correcting agent cycles until task completion

LLM Synthesis

Context-aware responses generated from retrieved data

Research Foundation

This project builds on established research in semantic search (Reimers & Gurevych, 2019), ReAct prompting (Yao et al., 2022), and healthcare scheduling optimization. All AI inference runs locally via Ollama, ensuring data privacy and HIPAA-aligned architecture.

100% Local LLM Inference
No External API Dependencies
Privacy-First Architecture
System Architecture

Full-Stack Technical Implementation

A modern, production-grade architecture combining Next.js, FastAPI, and local LLM inference. Every component is designed for scalability, maintainability, and data privacy.

Frontend

Next.js 15 App Router & Shadcn/UI

Modern React framework with server-side rendering, type-safe routing, and optimized bundle splitting.

Next.js 15App Router, Server Components
React 19Concurrent rendering
Tailwind CSS 4Utility-first styling
Shadcn/UIAccessible components

Backend

FastAPI, SQLAlchemy, Alembic

High-performance async Python backend with automatic OpenAPI docs and comprehensive data validation.

FastAPIAsync Python API
SQLAlchemy 2.0ORM with type hints
AlembicDatabase migrations
Pydantic v2Data validation

AI Engine

Ollama (Llama 3/Qwen) + SentenceTransformers

Fully local AI stack ensuring data privacy. No API calls leave your infrastructure.

OllamaLocal LLM inference
Llama 3 / Qwen8B parameter models
SentenceTransformersall-MiniLM-L6-v2
pgvectorVector similarity search

Data Flow Pipeline

From user query to intelligent response in five stages

STEP 01

User Input

Natural language query from client portal

STEP 02

Vector Embedding

Text → 384-dim vector via SentenceTransformers

STEP 03

Cosine Similarity

pgvector search against staff skill embeddings

STEP 04

LLM Synthesis

Ollama generates contextual response

STEP 05

Response

Structured output to user interface

Architecture

Microservices

Database

PostgreSQL + pgvector

API Style

REST + WebSocket

Deployment

Docker Compose

backend/
backend/
├── app/
│   ├── api/
│   │   ├── routes/
│   │   │   ├── bookings.py
│   │   │   ├── staff.py
│   │   │   └── chat.py      # ReAct Agent
│   │   └── deps.py
│   ├── core/
│   │   ├── llm.py           # Ollama client
│   │   └── embeddings.py    # SentenceTransformers
│   ├── models/              # SQLAlchemy models
│   └── services/
│       ├── vector_search.py # pgvector queries
│       └── agent.py         # ReAct loop
└── alembic/                 # Migrations
frontend/
frontend/
├── app/
│   ├── (auth)/
│   │   ├── login/
│   │   └── register/
│   ├── (dashboard)/
│   │   ├── admin/
│   │   │   ├── staff/
│   │   │   ├── bookings/
│   │   │   └── vector-logs/  # Embedding inspector
│   │   └── client/
│   │       ├── pets/
│   │       └── chat/         # Agent interface
│   └── components/
│       └── ui/               # Shadcn components
└── lib/
    └── api.ts                # Type-safe API client

Technical Deep Dive

This project demonstrates a complete full-stack AI application with vector embeddings, ReAct agent architecture, and local LLM inference for veterinary practice management.

ReAct AgentReason + Act loop
Vector Searchpgvector + embeddings
Full-StackNext.js + FastAPI
Local LLMOllama inference

About this showcase: This site demonstrates the project's technical concepts and architecture. The full system includes local Ollama inference, PostgreSQL with pgvector, and a FastAPI backend.

Want to Learn More?

Have questions about the project or want to discuss the technical implementation? Feel free to reach out.