Python SDK

The supertable Python package is the primary client for Data Island. It provides a high-level API for table management, data writing with Polars DataFrames, SQL queries, time travel, and more.

Installation

Install from PyPI. Choose the variant that matches your storage backend.

# Core package (local storage only)
pip install supertable

# With Amazon S3 support
pip install supertable[s3]

# With Azure Blob Storage support
pip install supertable[azure]

# With Google Cloud Storage support
pip install supertable[gcs]

# All storage backends
pip install supertable[all]

Requirements: Python 3.10+. The SDK depends on Polars, PyArrow, and httpx.

Configuration

The SDK reads configuration from environment variables. Set these before using the client.

Variable	Required	Description
`SUPERTABLE_ORGANIZATION`	Yes	Your organization identifier
`SUPERTABLE_SUPERUSER_TOKEN`	Yes*	Authentication token (or use API key)
`SUPERTABLE_API_URL`	No	API endpoint (default: `http://localhost:8050`)
`STORAGE_TYPE`	No	Storage backend: `local`, `s3`, `azure`, `gcs`
`STORAGE_PATH`	No	Root path for local storage

You can also pass configuration directly to the constructor:

from supertable import SuperTable

st = SuperTable(
    organization="my-org",
    token="st_tok_...",
    api_url="http://localhost:8050"
)

Core Classes

The SDK is built around four main classes.

`SuperTable`

Primary client. Manages connections, tables, and queries. Entry point for all operations.

`DataWriter`

Handles data ingestion with transactional writes. Accepts Polars DataFrames and commits atomically.

`DataReader`

Streams data from tables with optional filtering, column projection, and pagination.

`MetaReader`

Reads table metadata, schemas, statistics, and version history without fetching actual data.

Common Operations

Create a Table

st = SuperTable(organization="my-org")

# Create a table (schema is inferred from first write)
st.create_table(
    table_name="events",
    comment="Application event log"
)

# List existing tables
tables = st.list_tables()
for t in tables:
    print(f"{t.name} — {t.row_count} rows")

Write Data

The DataWriter accepts Polars DataFrames and commits them atomically.

import polars as pl

df = pl.DataFrame({
    "event_type": ["page_view", "click", "purchase"],
    "user_id": ["u-100", "u-200", "u-100"],
    "amount": [None, None, 49.99],
    "timestamp": [
        "2026-03-01T09:00:00",
        "2026-03-01T09:05:00",
        "2026-03-01T09:12:00",
    ]
})

# Open a writer, send data, and commit
writer = st.data_writer(table_name="events")
writer.write(df)
writer.commit()

# For large datasets, write in batches
writer = st.data_writer(table_name="events")
for chunk in large_dataframe.iter_slices(n_rows=100_000):
    writer.write(chunk)
writer.commit()

Query with SQL

The execute_query method returns a Polars DataFrame by default.

# Basic query
result = st.execute_query("""
    SELECT event_type, COUNT(*) AS cnt
    FROM events
    GROUP BY event_type
    ORDER BY cnt DESC
""")
print(result)

# With parameters
result = st.execute_query(
    "SELECT * FROM events WHERE user_id = :uid",
    params={"uid": "u-100"}
)
print(result)

Time Travel

Query data as of any point in time. Every write creates a new version that is retained according to your retention policy.

# Query as of a specific timestamp
snapshot = st.execute_query(
    "SELECT * FROM events",
    as_of="2026-03-01T09:05:00"
)
print(snapshot)

# Query as of a specific version number
snapshot_v2 = st.execute_query(
    "SELECT * FROM events",
    as_of_version=2
)
print(snapshot_v2)

# List available versions
meta = st.meta_reader(table_name="events")
versions = meta.list_versions()
for v in versions:
    print(f"v{v.version} — {v.timestamp} — {v.row_count} rows")

Read Data (Streaming)

Use the DataReader for large table scans with column projection and filtering.

reader = st.data_reader(table_name="events")

# Read all data
df = reader.read()

# Read with column selection
df = reader.read(columns=["event_type", "timestamp"])

# Read with row limit
df = reader.read(limit=1000)

Error Handling

The SDK raises typed exceptions for common failure modes.

from supertable.exceptions import (
    TableNotFoundError,
    AuthenticationError,
    PermissionDeniedError,
    QuerySyntaxError,
)

try:
    result = st.execute_query("SELECT * FROM nonexistent")
except TableNotFoundError as e:
    print(f"Table not found: {e}")
except QuerySyntaxError as e:
    print(f"Invalid SQL: {e}")
except AuthenticationError:
    print("Check your token or API key")

Next Steps

REST API Reference MCP Server OData Integration