Skip to main content
OpenEduCat logo
AI Providers

Run Meta Llama On-Premise with OpenEduCat

Meta's Llama models are open-weight, meaning you can download them, run them on your own servers, and serve them through an OpenAI-compatible API using tools like Ollama or vLLM. Through OpenEduCat's Bring Your Own Model (BYOM) feature, a self-hosted Llama instance powers all 9 AI tools with zero data leaving your network.

This is the configuration for institutions where data sovereignty is non-negotiable: government-funded schools with data localization requirements, institutions with board policies prohibiting cloud AI processing of student data, and research universities with existing GPU infrastructure that can be repurposed for educational AI.

Zero data egressOn-premise deploymentNo per-token costComplete sovereignty

How to connect self-hosted Llama to OpenEduCat

Three steps. All traffic stays inside your network.

1

Set up Llama on your server

Install Ollama on a Linux server with a compatible GPU. Run "ollama pull llama3.1" to download the model. Ollama serves a local OpenAI-compatible API endpoint automatically. For production scale with many concurrent users, vLLM provides better throughput and load balancing.

2

Point OpenEduCat BYOM to your server

In your OpenEduCat admin panel, go to AI Settings > Provider Configuration. Select Custom / Self-Hosted endpoint. Enter your server's internal URL (e.g., http://ai-server.yourdomain.local:11434). No API key is required for local Ollama; configure your vLLM API key if using that instead.

3

All 9 AI tools process on your hardware

Every AI tool in OpenEduCat (grading, quiz builder, lesson planner, IEP writer, student support) routes requests to your Llama server. Data never leaves your network. No cloud billing. Your IT team controls the entire stack.

Architecture: self-hosted configuration

Your Institution's Network

→ OpenEduCat Instance (your server)

→ BYOM Provider Router

→ Ollama / vLLM on your GPU server

→ Llama 3.1 model (local disk)

All traffic stays within your network perimeter. No external API calls for AI processing.

Who chooses self-hosted Llama

On-premise AI is not for everyone. These are the institutions for whom it is the right answer.

Government-funded institutions with data localization requirements

Public universities, state schools, and government-funded institutions in many jurisdictions face procurement rules that restrict which cloud providers can process student data, or prohibit cloud processing entirely for certain data categories. Running Llama on your own servers via Ollama or vLLM means AI processing happens on hardware you own, in a building you control, under your existing IT governance framework. No vendor approval process, no new data processing agreements, no jurisdictional concerns.

Zero data egress for the most sensitive student information

IEP documents, behavioral notes, mental health referrals, and disciplinary records carry the highest data sensitivity in an institution. Even with strong cloud DPAs in place, some institutions have board policies or legal counsel guidance that prohibits processing this category of data outside the institution's own infrastructure. Self-hosted Llama removes the question entirely: the data never leaves your servers, because the model runs on your servers.

Institutions with existing GPU infrastructure

Research universities and well-resourced technical institutions often maintain GPU clusters for research computing. A server already running NVIDIA A100s or H100s for research workloads can run Llama 3.1 70B comfortably alongside those workloads. The incremental cost of adding an Ollama or vLLM service layer is minimal compared to the alternative of paying per-token cloud API costs for a large student population.

Budget-constrained institutions planning for long-term AI usage

Cloud API costs scale with usage. An institution running AI tools for 5,000 students generates millions of tokens per month, and those costs compound at renewal. Self-hosted Llama converts that recurring cost into a one-time hardware investment plus energy and maintenance. For institutions with 3-5 year planning horizons, the breakeven calculation often favors on-premise for sustained, high-volume usage after the first year.

Meta Llama: key specs for IT teams

FeatureDetail
Supported modelsLlama 3.1 (8B, 70B, 405B), Llama 3 (8B, 70B), Llama 3.2 (1B, 3B), and compatible fine-tuned variants
Context windowUp to 128,000 tokens (Llama 3.1 models)
Data residencyComplete: model runs on your hardware, no data leaves your network
Pricing modelNo per-token cost after initial hardware investment; hardware and energy costs only
FERPA considerationsMaximum compliance posture: student data never leaves your servers, no third-party DPA required
GDPR considerationsComplete data control: processing stays within your jurisdiction by definition
Self-host optionYes, this IS the self-hosting option; runs via Ollama (simpler) or vLLM (production scale)
API compatibilityOpenAI-compatible endpoint via Ollama or vLLM; drops directly into OpenEduCat BYOM with localhost or internal URL

Frequently Asked Questions

No. When you run Llama on your own servers via Ollama or vLLM, AI requests from OpenEduCat go to your internal server, the same network your OpenEduCat instance runs on. No data leaves your network. The AI model processes requests entirely on your hardware. This is the maximum data sovereignty configuration available in OpenEduCat, and it requires no data processing agreement with any cloud provider because no cloud provider is involved.

Ready to run AI on your own servers?

Book a demo and we will walk through the self-hosted Llama setup, GPU sizing for your student population, and how to configure OpenEduCat BYOM to point to your on-premise endpoint.