Python SDK

The supertable Python package is the primary client for Data Island. It provides a high-level API for table management, data writing with Polars DataFrames, SQL queries, time travel, and more.

Installation

Install from PyPI. Choose the variant that matches your storage backend.

Install
# Core package (local storage only)
pip install supertable

# With Amazon S3 support
pip install supertable[s3]

# With Azure Blob Storage support
pip install supertable[azure]

# With Google Cloud Storage support
pip install supertable[gcs]

# All storage backends
pip install supertable[all]

Requirements: Python 3.10+. The SDK depends on Polars, PyArrow, and httpx.

Configuration

The SDK reads configuration from environment variables. Set these before using the client.

Variable Required Description
SUPERTABLE_ORGANIZATION Yes Your organization identifier
SUPERTABLE_SUPERUSER_TOKEN Yes* Authentication token (or use API key)
SUPERTABLE_API_URL No API endpoint (default: http://localhost:8050)
STORAGE_TYPE No Storage backend: local, s3, azure, gcs
STORAGE_PATH No Root path for local storage

You can also pass configuration directly to the constructor:

Explicit Configuration
from supertable import SuperTable

st = SuperTable(
    organization="my-org",
    token="st_tok_...",
    api_url="http://localhost:8050"
)

Core Classes

The SDK is built around four main classes.

SuperTable

Primary client. Manages connections, tables, and queries. Entry point for all operations.

DataWriter

Handles data ingestion with transactional writes. Accepts Polars DataFrames and commits atomically.

DataReader

Streams data from tables with optional filtering, column projection, and pagination.

MetaReader

Reads table metadata, schemas, statistics, and version history without fetching actual data.

Common Operations

Create a Table

Table Management
st = SuperTable(organization="my-org")

# Create a table (schema is inferred from first write)
st.create_table(
    table_name="events",
    comment="Application event log"
)

# List existing tables
tables = st.list_tables()
for t in tables:
    print(f"{t.name} — {t.row_count} rows")

Write Data

The DataWriter accepts Polars DataFrames and commits them atomically.

Write Data
import polars as pl

df = pl.DataFrame({
    "event_type": ["page_view", "click", "purchase"],
    "user_id": ["u-100", "u-200", "u-100"],
    "amount": [None, None, 49.99],
    "timestamp": [
        "2026-03-01T09:00:00",
        "2026-03-01T09:05:00",
        "2026-03-01T09:12:00",
    ]
})

# Open a writer, send data, and commit
writer = st.data_writer(table_name="events")
writer.write(df)
writer.commit()

# For large datasets, write in batches
writer = st.data_writer(table_name="events")
for chunk in large_dataframe.iter_slices(n_rows=100_000):
    writer.write(chunk)
writer.commit()

Query with SQL

The execute_query method returns a Polars DataFrame by default.

SQL Queries
# Basic query
result = st.execute_query("""
    SELECT event_type, COUNT(*) AS cnt
    FROM events
    GROUP BY event_type
    ORDER BY cnt DESC
""")
print(result)

# With parameters
result = st.execute_query(
    "SELECT * FROM events WHERE user_id = :uid",
    params={"uid": "u-100"}
)
print(result)

Time Travel

Query data as of any point in time. Every write creates a new version that is retained according to your retention policy.

Time Travel
# Query as of a specific timestamp
snapshot = st.execute_query(
    "SELECT * FROM events",
    as_of="2026-03-01T09:05:00"
)
print(snapshot)

# Query as of a specific version number
snapshot_v2 = st.execute_query(
    "SELECT * FROM events",
    as_of_version=2
)
print(snapshot_v2)

# List available versions
meta = st.meta_reader(table_name="events")
versions = meta.list_versions()
for v in versions:
    print(f"v{v.version} — {v.timestamp} — {v.row_count} rows")

Read Data (Streaming)

Use the DataReader for large table scans with column projection and filtering.

Data Reader
reader = st.data_reader(table_name="events")

# Read all data
df = reader.read()

# Read with column selection
df = reader.read(columns=["event_type", "timestamp"])

# Read with row limit
df = reader.read(limit=1000)

Error Handling

The SDK raises typed exceptions for common failure modes.

Error Handling
from supertable.exceptions import (
    TableNotFoundError,
    AuthenticationError,
    PermissionDeniedError,
    QuerySyntaxError,
)

try:
    result = st.execute_query("SELECT * FROM nonexistent")
except TableNotFoundError as e:
    print(f"Table not found: {e}")
except QuerySyntaxError as e:
    print(f"Invalid SQL: {e}")
except AuthenticationError:
    print("Check your token or API key")

Next Steps