Question 1

Do my students' data get sent anywhere when using self-hosted Llama?

Accepted Answer

No. When you run Llama on your own servers via Ollama or vLLM, AI requests from OpenEduCat go to your internal server, the same network your OpenEduCat instance runs on. No data leaves your network. The AI model processes requests entirely on your hardware. This is the maximum data sovereignty configuration available in OpenEduCat, and it requires no data processing agreement with any cloud provider because no cloud provider is involved.

Question 2

How do I set up Meta Llama for use with OpenEduCat?

Accepted Answer

The most straightforward setup is Ollama: install it on a Linux server with a compatible GPU, run "ollama pull llama3.1" to download the model, and Ollama automatically serves a local API at http://localhost:11434 in OpenAI-compatible format. In OpenEduCat, go to Admin > AI Settings > Provider Configuration, select Custom / Self-Hosted, and enter your server's internal URL (e.g., http://your-ai-server:11434). For production deployments serving many concurrent users, vLLM offers better throughput and load balancing. We provide setup guides for both.

Question 3

Does self-hosted Llama work with all 9 OpenEduCat AI tools?

Accepted Answer

Yes. Because Ollama and vLLM expose an OpenAI-compatible API, all 9 AI tools in OpenEduCat are fully compatible. Llama 3.1 70B performs well across grading, lesson planning, quiz generation, and most other educational tasks. For the most demanding tasks such as complex IEP drafting and deep essay analysis, Llama 3.1 70B is recommended over the smaller 8B variant. The 405B model provides maximum capability but requires significant GPU memory.

Question 4

What GPU hardware do I need to run Llama effectively?

Accepted Answer

For Llama 3.1 8B: a single NVIDIA RTX 4090 (24GB VRAM) or equivalent handles 15-25 concurrent requests comfortably, enough for a single school's daily usage. For Llama 3.1 70B: you need multiple GPUs totaling 48-80GB VRAM, such as two A100 40GB cards or equivalent. A server with two A100s handles 50-100 concurrent requests, sufficient for a medium-sized institution's peak load. For Llama 3.1 405B: you need a multi-GPU server with 8+ A100/H100s, appropriate for a large university. We can advise on hardware sizing during the implementation process.

Question 5

Can I switch from self-hosted Llama to a cloud provider later?

Accepted Answer

Yes. Switching your BYOM provider in OpenEduCat takes about two minutes: change the endpoint URL and API key in the provider settings, and future requests go to the new provider. All generated content (grades, feedback, lesson plans, IEP goals) is stored in your OpenEduCat database and is completely unaffected by the change. Some institutions start with a cloud provider for simplicity and move to self-hosted Llama once they have the infrastructure in place, or do the reverse, starting on-premise and moving to cloud during a hardware refresh cycle.

Feature	Detail
Supported models	Llama 3.1 (8B, 70B, 405B), Llama 3 (8B, 70B), Llama 3.2 (1B, 3B), and compatible fine-tuned variants
Context window	Up to 128,000 tokens (Llama 3.1 models)
Data residency	Complete: model runs on your hardware, no data leaves your network
Pricing model	No per-token cost after initial hardware investment; hardware and energy costs only
FERPA considerations	Maximum compliance posture: student data never leaves your servers, no third-party DPA required
GDPR considerations	Complete data control: processing stays within your jurisdiction by definition
Self-host option	Yes, this IS the self-hosting option; runs via Ollama (simpler) or vLLM (production scale)
API compatibility	OpenAI-compatible endpoint via Ollama or vLLM; drops directly into OpenEduCat BYOM with localhost or internal URL

Run Meta Llama On-Premise with OpenEduCat

How to connect self-hosted Llama to OpenEduCat

Set up Llama on your server

Point OpenEduCat BYOM to your server

All 9 AI tools process on your hardware

Who chooses self-hosted Llama

Government-funded institutions with data localization requirements

Zero data egress for the most sensitive student information

Institutions with existing GPU infrastructure

Budget-constrained institutions planning for long-term AI usage

Meta Llama: key specs for IT teams

Frequently Asked Questions

Compare other providers

Ready to run AI on your own servers?