Geodesia G-1 is a reverse proxy that speaks the OpenAI Chat Completions and Ollama protocols. Your application keeps speaking OpenAI; G-1 forwards to your engine of choice — vLLM (official, unmodified), SGLang, TensorRT-LLM, llama.cpp, Ollama, or any OpenAI-compatible cloud endpoint — and screens every turn on five axes with a calibrated risk reading in joules. Change one base URL.
The only signal G-1 needs from your engine is the standard logprobs: true flag on the OpenAI API. All six engines below expose it natively — including Ollama, whose recent releases ship full per-token logprob support over its OpenAI-compatible endpoint. All five detection axes run end-to-end on every engine.
| Engine | Form | Logprobs | Context halluc. | Closed-book halluc. | Prompt / Answer safety | Jailbreak | Streaming brakes |
|---|---|---|---|---|---|---|---|
| vLLMofficial 0.21+ · unmodified | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
| SGLangOpenAI-compatible · structured outputs | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
| TensorRT-LLMNVIDIA · production throughput | Self-hosted | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
| llama.cppedge / GGUF · OpenAI server | Self-hosted / edge | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
| OpenAI APIcloud · GPT-4o / GPT-5 family | Cloud | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
| Ollamalocal desktop · gguf models · recent versions | Local desktop | ✓ native | ✓ | ✓ | ✓ | ✓ | ✓ |
Any OpenAI-compatible endpoint that emits logprobs works. We have validated the six engines above end-to-end; the proxy is engine-agnostic, so additional engines (LMDeploy, MLC-LLM, MLX-LM, Azure OpenAI, OpenRouter, Together, Fireworks, Groq, etc.) integrate the same way.
No SDK to install. No model to retrain. No engine to fork. The Geodesia gateway speaks the OpenAI Chat Completions protocol; your existing code base only needs to flip one base URL. Every prompt is screened, every answer is scored on five axes, and a compliance-grade audit chain is written on the way through.
# Before — your existing code, talking to OpenAI / vLLM / SGLang / etc. from openai import OpenAI client = OpenAI(base_url="https://api.openai.com/v1") # or self-hosted: base_url="http://vllm.yourco.internal:8000/v1" # After — same SDK, same code path, now protected + scored + compliance-logged client = OpenAI(base_url="https://geodesia.yourco.internal/v1") resp = client.chat.completions.create( model="gpt-4o", # or any model your engine serves messages=[{"role": "user", "content": "..."}], stream=True, ) # Every chunk that streams back is screened on 5 axes; # if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK. # A signed audit record is written for every call. Auto-PDFs available on demand.
RAG faithfulness. Token-level spans. Beats DeBERTa-NLI on production RAG (0.912 vs 0.856).
Confident fabrication detected via the model's own logprobs. Cross-model transfer.
Adversarial intent, prompt injection, harmful requests. Distilled from frontier guards.
Harmful content in the response, scored as it streams.
Dedicated detector for jailbreak attempts that route around safety.
Older runtime-safety stacks lived inside the inference path: a patched vLLM, an architecture-specific hook, a hidden-state extractor wired to one model family. They worked — and they locked the customer into one model and one engine build. Geodesia G-1 inverts that. The detection engine is a single ~300M companion encoder that runs next to the model, reads only text and standard token logprobs, and is therefore truly model- and engine-agnostic.
One companion encoder serves Llama, Qwen, Mistral, Gemma, DeepSeek, Phi, gpt-4o, Claude — anything your engine can serve. Fine-tuned variants included.
The proxy speaks the OpenAI and Ollama wire protocols. Your engine of choice runs official, unmodified. Upgrade your engine without breaking your safety layer.
Cloud OpenAI today, self-hosted vLLM tomorrow, air-gapped llama.cpp on a defence-ministry HSM the day after. Same gateway. Same audit chain. Same PDFs.