Technical Architecture

The architecture
of trust.

A non-invasive runtime layered on top of vLLM. Zero-copy interception. Geometric hallucination detection. Mechanistic causal explainability. Async compliance evidence pipeline. On-premise. Air-gap capable.

Jump to the stack Underlying research →
Inference path6 stages · <35 ms total
Inference enginevLLM 0.6+ · HF · SGLANG
DeploySingle Docker · K8s · air-gap
APIOpenAI-compatible · REST

Three layers.
One trust contract.

Your application keeps speaking OpenAI. Your model keeps running unchanged. G-1 sits between them as a transparent symbiont.

YOUR APPLICATION
Customer copilot· Clinical assistant· Loan officer agent· Multi-agent pipeline
OpenAI-compatible API · /v1/chat/completions
GEODESIA G-1 · TRUST LAYER
🛡️
Safety Gate
16-centroid latent classifier · <5 ms
🧬
Constitutional Router
EU Charter · GDPR · custom policy
🧠
NSP Coherence
Riemannian geometry of attention
🔍
Causal XAI
IG · MuPAX · EVIDENCE
⛓️
Audit Pipeline
HMAC-SHA256 chain · async
📑
Report Compositor
FRIA · Annex IV · MiFID II PDFs
vLLM · zero-copy hidden-state interception
YOUR LLM · UNCHANGED
Llama 3.3 70B Qwen 3 Mistral + Your fine-tune Gemma 4 DeepSeek Phi-4 Mini

A single request, six checkpoints.

From inbound API call to delivered response — every stage adds protection without breaking the OpenAI contract.

01

Request ingress

/v1/chat/completions · OpenAI-compatible

Client sends a chat completion request. Tenant identification, RBAC, and rate-limit checks happen at the edge. The payload is normalized to G-1's internal Inference Envelope and assigned an immutable Call ID.

Added latency<1 ms
Tenant isolationper-namespace
IdentifierUUID v7 + timestamp
02

Pre-generation safety gate

16-centroid latent classifier · adversarial intent

The prompt is encoded by a small adapter and projected into a 16-centroid latent space trained on adversarial corpora (jailbreaks, prompt injection, policy violations). The classifier runs in parallel with the constitutional router. AUROC 0.82 — +79% over base-model refusals.

Latency<5 ms
AUROC0.82
On unsafeblock + audit
03

Constitutional router

European Charter · GDPR · customer policy

Every request is checked against the active constitution: EU Charter of Fundamental Rights, EU AI Act Article 5 prohibitions, GDPR principles, and any customer-defined ethics policy. The constitution is versioned, auditable, and customizable. Outputs of this stage flow into both the audit chain and the oversight queue triggers.

Decisionallow · escalate · deny
Versioninggit-style
Customizableper-tenant
04

Model inference

vLLM · zero-copy hidden-state extraction

Inference runs on the customer's own LLM, served by vLLM. G-1 attaches a read-only hook to extract attention QKᵀ projections and last-layer hidden states without copying them off-device. The model's weights are never modified. Streaming and non-streaming modes are both supported.

EnginevLLM 0.6+
Read-only hookQKᵀ + hidden
Weightsuntouched
05

NSP Coherence Engine

Riemannian geometry of attention · post-generation

The Neural Symbolic Potentials (NSP) Coherence Engine treats the latent trajectory of the response as a path on a Riemannian manifold. Four signals are computed: max coherence, smoothness, jerk (second derivative of curvature), and context gap. Combined into a single hallucination score. AUROC 0.96 — state-of-the-art for on-premise real-time detection.

Latency<20 ms
AUROC0.96
Vs base model+316%
06

Compliance runtime · async

HMAC chain · watermark · oversight · auto-reports

While the response streams to the client, G-1 fires the async compliance pipeline: chain entry written, watermark applied, retention rule attached, oversight queue consulted (and, if needed, the call is escalated). Reports — FRIA, Annex IV, MiFID II audit bundles — are composed on demand from this evidence. Never blocks the user response.

ChainHMAC-SHA256 append-only
WatermarkHMAC · 6 languages
Reportscomposed on demand

The science under the hood.

🧠

NSP Coherence Engine

Neural Symbolic Potentials. Reads the geometry of the response trajectory in latent space. Four geometric features → hallucination score. The state-of-the-art for on-premise real-time detection. Built on the GLAD-Manifold representation.

Research → GLAD-Manifold
🔍

Causal XAI

Three methods for three regulatory needs. Integrated Gradients for axiomatic completeness. MuPAX for court-quality multi-dimensional attribution (peer-reviewed, Oxford). EVIDENCE for evolutionary deterministic explanation (peer-reviewed, EAAI 2025).

Research → XAI methods
⛓️

Append-only Audit Chain

Every Inference Envelope — prompt, response, scores, decisions — is hashed (SHA-256) and chained (HMAC). You cannot delete entry N without breaking the chain. Verifiable in seconds. Court-admissible. Works inside the customer's database.

Auditing Hub →

From a single Docker
to multi-region sovereign clouds.

G-1 ships as a single container with a typed Helm chart. It runs on bare metal, on a single GPU node, in your Kubernetes cluster, or in a fully air-gapped enclave. No license server. No outbound calls. No telemetry.

Adapter training — the one-time step that calibrates G-1 to your specific base model and policy — runs on your own GPU. Geodesia.ai never has access to that hardware.

  • Single Docker — for single-tenant evaluation and POV deployments
  • Helm chart for K8s — for multi-tenant, HA enterprise deployments
  • Air-gapped bundle — offline registry + signed images for classified environments
  • BYO infra — runs on AWS, GCP, Azure, OVH, sovereign EU clouds, or bare metal
Quick start · Docker
# 1. Pull the G-1 container docker pull registry.geodesia.ai/g1:1.4 # 2. Mount your model + adapter docker run --gpus all \ -v /models/llama3-70b:/model \ -v /adapters/g1-llama3:/adapter \ -p 8080:8080 g1:1.4 # 3. Point your app at the OpenAI-compatible endpoint curl http://localhost:8080/v1/chat/completions \ -H "Authorization: Bearer $TOKEN" \ -d '{"model":"g1","messages":[...]}'
The Geodesia Zero-Knowledge Guarantee

Geodesia.ai does not access, copy, store, transmit, or process client model weights, training data, prompts, or inference responses — by design, not by policy.

Read the architecture →

Architecture review session.

Two-hour deep dive with our principal engineers. Reference architecture for your stack. Sandbox token to test the OpenAI-compatible endpoint.