Platform Architecture — Data Island

Data Island Core — command center with SQL Editor, Tables, Ingestion, and Data Quality

Lighthouse — Chat workspace turning a natural-language question into a chart, SQL, and explanation

AI integration in BI and pipelines is hard. We cracked it.

For ten years every vendor has promised "AI on top of your warehouse" and shipped a chatbot bolted to a database. Lighthouse is the first agentic workbench that lives inside the platform — same RBAC, same audit, same data — not a third-party SaaS slurping your tables.

AI that reasons over your data

Lighthouse runs a 7-step agentic pipeline — understand → catalog_match → pattern → sql_gen → query_sql → data_analyze → anomaly_check — and routes each step to the best LLM with automatic fallback. Every claim traces back to a real value: no hallucinated numbers.

Lighthouse

Conversational BI — beyond dashboards

Ask in English, get a chart, the SQL, and an explanation. Save as widgets, compose into dashboards, schedule recurring reports. Power BI and Excel still plug in via OData 4.0 for the dashboards that already exist.

Lighthouse

Data quality that proposes its own rules

The Quality workbench profiles every table, narrates drift, hunts anomalies, and compiles plain-English rules ("never let gross_margin go negative") into structured constraints. A copilot for every steward.

Lighthouse

Data Silos

Marketing has spreadsheets, engineering has a lake, finance has a warehouse. Data Island unifies every team under one platform with organizational isolation.

Compliance Burden

DORA, SOC 2, and GDPR demand immutable audit trails and fine-grained access control. Data Island ships these built-in — not bolted on. Lighthouse inherits the same RBAC and audit trail at no extra cost.

Vendor Lock-In

Proprietary formats, egress fees, rewritten pipelines. Data Island stores standard Parquet on any backend and mirrors to Delta Lake and Iceberg. Lighthouse runs on your infrastructure with your choice of LLM provider.

Data Loss Risk

Bad UPDATEs and accidental DELETEs silently destroy data. Append-only versioned storage preserves every historical state — nothing is ever lost.

Prompts that leak data

Generic AI tools ship your tables to a third-party SaaS. Lighthouse sends column names + types, never row data, and the LLM router lets you swap to a self-hosted model with one config change. Your data stays where it belongs.

Four Layers. Complete Data Platform.

Each layer is independently deployable, fully API-driven, and purpose-built for its domain.

NL→SQL Quality Copilot Stewardship Auto-Discovery Multi-LLM

Connectors Notebooks Pipes Scheduler Monitoring

Versioned Storage SQL Analytics OData 4.0 MCP Server Data Quality Data Sharing

OAuth2 / OIDC SCIM 2.0 RS256 JWT MFA / TOTP Fernet Encryption SOC2 / DORA

Infrastructure: Redis for catalog and caching, object storage (S3 / Azure / GCS / MinIO / local) for Parquet data, and your choice of container orchestration (Docker, Kubernetes).

Next-Generation BI, Built In

Layer 4 — Lighthouse — turns the platform into an agentic system. The modern replacement for clickable dashboards: ask in English, get a grounded answer, and the system learns from every interaction.

Lighthouse Chat — natural-language question turning into a chart with SQL, explanation, and a one-click action bar.

Agentic Platform · LLM-Powered BI

From clickable dashboards to natural-language reporting.

Agentic, not just chat. A 7-step pipeline reasons across catalog, joins, and quality rules before answering — each step routed to the best LLM for the job, with automatic fallback.
Grounded LLM analysis. Every claim traces back to a real value in your data. System prompts forbid invention; quality narrations cite actual null rates, drift, and anomalies.
Data Chat reporting. Ask in English; get a chart, the SQL, and an explanation. Save as widgets, compose into dashboards, schedule as recurring reports.
It learns over time. Feedback teaches the pipeline; typed annotations on the master MCP persist across operators, sessions, and restarts.

Explore Lighthouse

How the Layers Connect

Data flows through the platform in a clear, auditable path from ingestion to insight.

Write Path

1

Client Request

Application sends data via REST API, Python SDK, or bulk upload. Every request carries a JWT token.

2

Gatekeeper Validates

Token is verified against JWKS, permissions resolved. RBAC checks ensure the caller has Write permission on the target table.

3

Core Processes

Data is validated against the table schema, deduplicated, compressed to Parquet, and written as an immutable version to object storage.

4

Catalog Updated

Redis catalog records the new version with metadata: row count, byte size, schema hash, and SHA-256 audit hash chained to the previous entry.

Read Path

1

Query Arrives

SQL query comes in via REST, OData, MCP, or the WebUI. Token carries the caller's identity and roles.

2

View Chain Applied

Base data passes through the view chain: Dedup View → Tombstone View → RBAC View (row/column filtering) → User Query.

3

Engine Selected

Query engine auto-selects: DuckDB Lite for small datasets, DuckDB Pro with caching for medium, Spark SQL via Thrift for large workloads.

4

Results Returned

Filtered, authorized results are returned as JSON, OData, or streamed Parquet. The query is audit-logged with latency and row count.

Built-in Capabilities

Everything you need to store, query, govern, and share data — shipped as one platform, not assembled from parts.

Versioned Storage

Every write creates an immutable snapshot. Point-in-time queries across full history.

SQL Editor

Built-in web editor with auto engine selection — DuckDB for speed, Spark for scale.

RBAC

Five permission tiers with table, row, and column-level security filters.

Audit Logging

Tamper-evident SHA-256 hash chains. 21 structured fields. 7-year retention.

Staging & Ingestion

Two-phase ingestion — stage, review, then commit. Parquet, CSV, and SDK inserts.

OData 4.0

Power BI, Excel, and Tableau connect with a URL and bearer token. No drivers.

MCP Server

24 tools for Claude Desktop, Cursor, and MCP-compatible AI assistants.

Data Sharing

Zero-copy cross-org sharing with column and row filters. Instant revocation.

Table Mirroring

Auto-export to Delta Lake, Iceberg, and Parquet after every write.

Data Quality

16 built-in checks, 5 anomaly detectors, quality scores with trend analysis.

Monitoring

Read/write metrics, latency breakdowns, active instances, and service health.

Multi-Cloud

S3, Azure, GCS, MinIO, local disk. Switch backends via config — no migration.

Deploy Your Way

From an embedded Python install to a full Docker Compose stack with orchestrated profiles. Production-ready from day one.

pip install

Install the Python SDK for embedded use, scripting, and CI/CD pipelines. Full API access without Docker.

pip install supertable

Docker Compose

Bring up infrastructure and services together with Compose profiles. Production-ready: orchestrated start order, health checks, and isolated networks.

docker compose --profile infra --profile services up -d

Technology Stack

Built on proven, open-source foundations. No vendor lock-in.

Component	Technology	Purpose
Runtime	`Python 3.10+`	Core language for all services
API Framework	`FastAPI`	Async REST API with auto-generated OpenAPI docs
Catalog & Cache	`Redis`	Metadata catalog, session store, pub/sub cache invalidation
SQL Engine (small)	`DuckDB`	In-process OLAP for sub-second analytics
SQL Engine (large)	`Apache Spark`	Distributed SQL via Thrift for large-scale workloads
Data Format	`Apache Parquet`	Columnar storage with compression and predicate pushdown
DataFrame Engine	`Polars`	High-performance Rust-backed dataframe processing
Interoperability	`Delta Lake / Iceberg`	Table mirroring for Spark, Databricks, Snowflake ecosystem
Object Storage	`S3 / Azure / GCS / MinIO`	Pluggable multi-cloud storage backends
Encryption	`Fernet (AES-128-CBC)`	Symmetric encryption at rest for sensitive catalog data

Built for Every Data Role

One platform, five workflows — from engineering to compliance to AI-assisted analytics.

Data Engineers

Build & Ingest

Write with the Python SDK, query with SQL. The platform handles schema evolution, dedup, and storage optimization — no Spark cluster to manage.

Python SDK Schema Evolution Streaming Writes

Compliance Officers

Audit & Governance

Immutable audit trails with SHA-256 hash chains, row- and column-level access control, and 7-year log retention. Built for DORA, SOC 2, and GDPR.

DORA SOC 2 GDPR 7-yr Retention

BI Analysts

Analyze & Visualize

Connect Power BI or Excel directly via OData. Live dashboards against production data — no waiting for engineers to build ETL pipelines.

OData 4.0 Power BI Excel Tableau

AI / ML Teams

Reason & Automate

Connect Claude Desktop or Cursor via MCP. Query tables, explore schemas, and profile columns through natural-language conversation.

MCP Server Claude Desktop Cursor

Platform Builders

Scale & Integrate

Multi-org isolation, cross-org data sharing, and open-format mirroring for Spark, Databricks, and dbt ecosystem interoperability.

Multi-Org Open Parquet Spark Mirror dbt

Decision Makers

Control & Sovereignty

Own the data layer end-to-end. No phone-home, no per-query egress, no vendor lock-in — sovereign infrastructure with a transparent license.

EU-Hosted No Telemetry Source-Available

The Data Island Platform

Versioned lakehouse + agentic AI workbench.

AI integration in BI and pipelines is hard. We cracked it.

AI that reasons over your data

Conversational BI — beyond dashboards

Data quality that proposes its own rules

Data Silos

Compliance Burden

Vendor Lock-In

Data Loss Risk

Prompts that leak data

Four Layers. Complete Data Platform.

Next-Generation BI, Built In

From clickable dashboards to natural-language reporting.

How the Layers Connect

Write Path

Client Request

Gatekeeper Validates

Core Processes

Catalog Updated

Read Path

Query Arrives

View Chain Applied

Engine Selected

Results Returned

Built-in Capabilities

Versioned Storage

SQL Editor

RBAC

Audit Logging

Staging & Ingestion

OData 4.0

MCP Server

Data Sharing

Table Mirroring

Data Quality

Monitoring

Multi-Cloud

Deploy Your Way

pip install

Docker Compose

Technology Stack

Built for Every Data Role

Data Engineers

Compliance Officers

BI Analysts

AI / ML Teams

Platform Builders

Decision Makers

Ready to Build on Data Island?