A non-invasive runtime layered on top of vLLM. Zero-copy interception. Geometric hallucination detection. Mechanistic causal explainability. Async compliance evidence pipeline. On-premise. Air-gap capable.
Your application keeps speaking OpenAI. Your model keeps running unchanged. G-1 sits between them as a transparent symbiont.
From inbound API call to delivered response — every stage adds protection without breaking the OpenAI contract.
Client sends a chat completion request. Tenant identification, RBAC, and rate-limit checks happen at the edge. The payload is normalized to G-1's internal Inference Envelope and assigned an immutable Call ID.
The prompt is encoded by a small adapter and projected into a 16-centroid latent space trained on adversarial corpora (jailbreaks, prompt injection, policy violations). The classifier runs in parallel with the constitutional router. AUROC 0.82 — +79% over base-model refusals.
Every request is checked against the active constitution: EU Charter of Fundamental Rights, EU AI Act Article 5 prohibitions, GDPR principles, and any customer-defined ethics policy. The constitution is versioned, auditable, and customizable. Outputs of this stage flow into both the audit chain and the oversight queue triggers.
Inference runs on the customer's own LLM, served by vLLM. G-1 attaches a read-only hook to extract attention QKᵀ projections and last-layer hidden states without copying them off-device. The model's weights are never modified. Streaming and non-streaming modes are both supported.
The Neural Symbolic Potentials (NSP) Coherence Engine treats the latent trajectory of the response as a path on a Riemannian manifold. Four signals are computed: max coherence, smoothness, jerk (second derivative of curvature), and context gap. Combined into a single hallucination score. AUROC 0.96 — state-of-the-art for on-premise real-time detection.
While the response streams to the client, G-1 fires the async compliance pipeline: chain entry written, watermark applied, retention rule attached, oversight queue consulted (and, if needed, the call is escalated). Reports — FRIA, Annex IV, MiFID II audit bundles — are composed on demand from this evidence. Never blocks the user response.
Neural Symbolic Potentials. Reads the geometry of the response trajectory in latent space. Four geometric features → hallucination score. The state-of-the-art for on-premise real-time detection. Built on the GLAD-Manifold representation.
Research → GLAD-ManifoldThree methods for three regulatory needs. Integrated Gradients for axiomatic completeness. MuPAX for court-quality multi-dimensional attribution (peer-reviewed, Oxford). EVIDENCE for evolutionary deterministic explanation (peer-reviewed, EAAI 2025).
Research → XAI methodsEvery Inference Envelope — prompt, response, scores, decisions — is hashed (SHA-256) and chained (HMAC). You cannot delete entry N without breaking the chain. Verifiable in seconds. Court-admissible. Works inside the customer's database.
Auditing Hub →G-1 ships as a single container with a typed Helm chart. It runs on bare metal, on a single GPU node, in your Kubernetes cluster, or in a fully air-gapped enclave. No license server. No outbound calls. No telemetry.
Adapter training — the one-time step that calibrates G-1 to your specific base model and policy — runs on your own GPU. Geodesia.ai never has access to that hardware.
Geodesia.ai does not access, copy, store, transmit, or process client model weights, training data, prompts, or inference responses — by design, not by policy.