Drop-in proxy · No engine forks · No model patches

Works with any inference engine
that exposes token logprobs.

Geodesia G-1 is a reverse proxy that speaks the OpenAI Chat Completions and Ollama protocols. Your application keeps speaking OpenAI; G-1 forwards to your engine of choice — vLLM (official, unmodified), SGLang, TensorRT-LLM, llama.cpp, Ollama, or any OpenAI-compatible cloud endpoint — and screens every turn on five axes with a calibrated risk reading in joules. Change one base URL.

See the one-URL integration G-1 product page →
ProtocolOpenAI · Ollama · streaming
RequirementEngine exposes logprobs
Engine forkNone. Ever.
Model patchesZero. Weights untouched.

Six engines.
All five axes. End to end.

The only signal G-1 needs from your engine is the standard logprobs: true flag on the OpenAI API. All six engines below expose it natively — including Ollama, whose recent releases ship full per-token logprob support over its OpenAI-compatible endpoint. All five detection axes run end-to-end on every engine.

Engine Form Logprobs Context halluc. Closed-book halluc. Prompt / Answer safety Jailbreak Streaming brakes
vLLMofficial 0.21+ · unmodified Self-hosted ✓ native
SGLangOpenAI-compatible · structured outputs Self-hosted ✓ native
TensorRT-LLMNVIDIA · production throughput Self-hosted ✓ native
llama.cppedge / GGUF · OpenAI server Self-hosted / edge ✓ native
OpenAI APIcloud · GPT-4o / GPT-5 family Cloud ✓ native
Ollamalocal desktop · gguf models · recent versions Local desktop ✓ native

Any OpenAI-compatible endpoint that emits logprobs works. We have validated the six engines above end-to-end; the proxy is engine-agnostic, so additional engines (LMDeploy, MLC-LLM, MLX-LM, Azure OpenAI, OpenRouter, Together, Fireworks, Groq, etc.) integrate the same way.

Point your client
at our gateway. That's it.

No SDK to install. No model to retrain. No engine to fork. The Geodesia gateway speaks the OpenAI Chat Completions protocol; your existing code base only needs to flip one base URL. Every prompt is screened, every answer is scored on five axes, and a compliance-grade audit chain is written on the way through.

Python · OpenAI SDK JavaScript · OpenAI SDK cURL
# Before — your existing code, talking to OpenAI / vLLM / SGLang / etc.
from openai import OpenAI
client = OpenAI(base_url="https://api.openai.com/v1")
# or self-hosted: base_url="http://vllm.yourco.internal:8000/v1"

# After — same SDK, same code path, now protected + scored + compliance-logged
client = OpenAI(base_url="https://geodesia.yourco.internal/v1")

resp = client.chat.completions.create(
    model="gpt-4o",                  # or any model your engine serves
    messages=[{"role": "user", "content": "..."}],
    stream=True,
)
# Every chunk that streams back is screened on 5 axes;
# if a risk barrier is crossed mid-sentence, the gateway halts and returns a BLOCK.
# A signed audit record is written for every call. Auto-PDFs available on demand.
01 · Context

RAG faithfulness. Token-level spans. Beats DeBERTa-NLI on production RAG (0.912 vs 0.856).

02 · Closed-book

Confident fabrication detected via the model's own logprobs. Cross-model transfer.

03 · Prompt safety

Adversarial intent, prompt injection, harmful requests. Distilled from frontier guards.

04 · Answer safety

Harmful content in the response, scored as it streams.

05 · Jailbreak

Dedicated detector for jailbreak attempts that route around safety.

The detector lives outside the model.
By design.

Older runtime-safety stacks lived inside the inference path: a patched vLLM, an architecture-specific hook, a hidden-state extractor wired to one model family. They worked — and they locked the customer into one model and one engine build. Geodesia G-1 inverts that. The detection engine is a single ~300M companion encoder that runs next to the model, reads only text and standard token logprobs, and is therefore truly model- and engine-agnostic.

Model-agnostic

One companion encoder serves Llama, Qwen, Mistral, Gemma, DeepSeek, Phi, gpt-4o, Claude — anything your engine can serve. Fine-tuned variants included.

Engine-agnostic

The proxy speaks the OpenAI and Ollama wire protocols. Your engine of choice runs official, unmodified. Upgrade your engine without breaking your safety layer.

Stack-agnostic

Cloud OpenAI today, self-hosted vLLM tomorrow, air-gapped llama.cpp on a defence-ministry HSM the day after. Same gateway. Same audit chain. Same PDFs.

Made in Europe with

Try it on your own engine.
Your own model.

Bring your existing OpenAI / Ollama base URL, your existing model, and your existing client code. We point our gateway at your engine, you point your client at our gateway, and you measure the difference. No SDK lift, no engine fork, no commitment.