Architecture | Geodesia G-1 — vLLM Trust-Layer Deep Dive

The Stack

Three layers.
One trust contract.

Your application keeps speaking OpenAI. Your model keeps running unchanged. G-1 sits between them as a transparent symbiont.

YOUR APPLICATION

Customer copilot· Clinical assistant· Loan officer agent· Multi-agent pipeline

OpenAI-compatible API · /v1/chat/completions

GEODESIA G-1 · TRUST LAYER

🛡️

Safety Gate

16-centroid latent classifier · <5 ms

🧬

Constitutional Router

EU Charter · GDPR · custom policy

🧠

NSP Coherence

Riemannian geometry of attention

🔍

Causal XAI

IG · MuPAX · EVIDENCE

⛓️

Audit Pipeline

HMAC-SHA256 chain · async

📑

Report Compositor

FRIA · Annex IV · MiFID II PDFs

vLLM · zero-copy hidden-state interception

YOUR LLM · UNCHANGED

Llama 3.3 70B Qwen 3 Mistral + Your fine-tune Gemma 4 DeepSeek Phi-4 Mini

Inference Lifecycle

A single request, six checkpoints.

From inbound API call to delivered response — every stage adds protection without breaking the OpenAI contract.

01

Request ingress

/v1/chat/completions · OpenAI-compatible

Client sends a chat completion request. Tenant identification, RBAC, and rate-limit checks happen at the edge. The payload is normalized to G-1's internal Inference Envelope and assigned an immutable Call ID.

Added latency<1 ms

Tenant isolationper-namespace

IdentifierUUID v7 + timestamp

02

Pre-generation safety gate

16-centroid latent classifier · adversarial intent

The prompt is encoded by a small adapter and projected into a 16-centroid latent space trained on adversarial corpora (jailbreaks, prompt injection, policy violations). The classifier runs in parallel with the constitutional router. AUROC 0.82 — +79% over base-model refusals.

Latency<5 ms

AUROC0.82

On unsafeblock + audit

03

Constitutional router

European Charter · GDPR · customer policy

Every request is checked against the active constitution: EU Charter of Fundamental Rights, EU AI Act Article 5 prohibitions, GDPR principles, and any customer-defined ethics policy. The constitution is versioned, auditable, and customizable. Outputs of this stage flow into both the audit chain and the oversight queue triggers.

Decisionallow · escalate · deny

Versioninggit-style

Customizableper-tenant

04

Model inference

vLLM · zero-copy hidden-state extraction

Inference runs on the customer's own LLM, served by vLLM. G-1 attaches a read-only hook to extract attention QKᵀ projections and last-layer hidden states without copying them off-device. The model's weights are never modified. Streaming and non-streaming modes are both supported.

EnginevLLM 0.6+

Read-only hookQKᵀ + hidden

Weightsuntouched

05

NSP Coherence Engine

Riemannian geometry of attention · post-generation

The Neural Symbolic Potentials (NSP) Coherence Engine treats the latent trajectory of the response as a path on a Riemannian manifold. Four signals are computed: max coherence, smoothness, jerk (second derivative of curvature), and context gap. Combined into a single hallucination score. AUROC 0.96 — state-of-the-art for on-premise real-time detection.

Latency<20 ms

AUROC0.96

Vs base model+316%

06

Compliance runtime · async

HMAC chain · watermark · oversight · auto-reports

While the response streams to the client, G-1 fires the async compliance pipeline: chain entry written, watermark applied, retention rule attached, oversight queue consulted (and, if needed, the call is escalated). Reports — FRIA, Annex IV, MiFID II audit bundles — are composed on demand from this evidence. Never blocks the user response.

ChainHMAC-SHA256 append-only

WatermarkHMAC · 6 languages

Reportscomposed on demand

Key Engines

The science under the hood.

🧠

NSP Coherence Engine

Neural Symbolic Potentials. Reads the geometry of the response trajectory in latent space. Four geometric features → hallucination score. The state-of-the-art for on-premise real-time detection. Built on the GLAD-Manifold representation.

Research → GLAD-Manifold

🔍

Causal XAI

Three methods for three regulatory needs. Integrated Gradients for axiomatic completeness. MuPAX for court-quality multi-dimensional attribution (peer-reviewed, Oxford). EVIDENCE for evolutionary deterministic explanation (peer-reviewed, EAAI 2025).

Research → XAI methods

⛓️

Append-only Audit Chain

Every Inference Envelope — prompt, response, scores, decisions — is hashed (SHA-256) and chained (HMAC). You cannot delete entry N without breaking the chain. Verifiable in seconds. Court-admissible. Works inside the customer's database.

Auditing Hub →

Deployment

From a single Docker
to multi-region sovereign clouds.

G-1 ships as a single container with a typed Helm chart. It runs on bare metal, on a single GPU node, in your Kubernetes cluster, or in a fully air-gapped enclave. No license server. No outbound calls. No telemetry.

Adapter training — the one-time step that calibrates G-1 to your specific base model and policy — runs on your own GPU. Geodesia.ai never has access to that hardware.

✓ Single Docker — for single-tenant evaluation and POV deployments
✓ Helm chart for K8s — for multi-tenant, HA enterprise deployments
✓ Air-gapped bundle — offline registry + signed images for classified environments
✓ BYO infra — runs on AWS, GCP, Azure, OVH, sovereign EU clouds, or bare metal

Quick start · Docker

# 1. Pull the G-1 container
docker pull registry.geodesia.ai/g1:1.4

# 2. Mount your model + adapter
docker run --gpus all \
  -v /models/llama3-70b:/model \
  -v /adapters/g1-llama3:/adapter \
  -p 8080:8080 g1:1.4

# 3. Point your app at the OpenAI-compatible endpoint
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model":"g1","messages":[...]}'
          

The Geodesia Zero-Knowledge Guarantee

Geodesia.ai does not access, copy, store, transmit, or process client model weights, training data, prompts, or inference responses — by design, not by policy.

Read the architecture →

The architecture
of trust.

Three layers.
One trust contract.

A single request, six checkpoints.

Request ingress

Pre-generation safety gate

Constitutional router

Model inference

NSP Coherence Engine

Compliance runtime · async

The science under the hood.

NSP Coherence Engine

Causal XAI

Append-only Audit Chain

From a single Docker
to multi-region sovereign clouds.

Architecture review session.

The architecture of trust.

Three layers.One trust contract.

A single request, six checkpoints.

Request ingress

Pre-generation safety gate

Constitutional router

Model inference

NSP Coherence Engine

Compliance runtime · async

The science under the hood.

NSP Coherence Engine

Causal XAI

Append-only Audit Chain

From a single Dockerto multi-region sovereign clouds.

Architecture review session.

The architecture
of trust.

Three layers.
One trust contract.

From a single Docker
to multi-region sovereign clouds.