Python SDK
The supertable Python package is the primary client for Data Island. It provides a high-level API for table management, data writing with Polars DataFrames, SQL queries, time travel, and more.
Installation
Install from PyPI. Choose the variant that matches your storage backend.
# Core package (local storage only)
pip install supertable
# With Amazon S3 support
pip install supertable[s3]
# With Azure Blob Storage support
pip install supertable[azure]
# With Google Cloud Storage support
pip install supertable[gcs]
# All storage backends
pip install supertable[all]
Requirements: Python 3.10+. The SDK depends on Polars, PyArrow, and httpx.
Configuration
The SDK reads configuration from environment variables. Set these before using the client.
| Variable | Required | Description |
|---|---|---|
SUPERTABLE_ORGANIZATION |
Yes | Your organization identifier |
SUPERTABLE_SUPERUSER_TOKEN |
Yes* | Authentication token (or use API key) |
SUPERTABLE_API_URL |
No | API endpoint (default: http://localhost:8050) |
STORAGE_TYPE |
No | Storage backend: local, s3, azure, gcs |
STORAGE_PATH |
No | Root path for local storage |
You can also pass configuration directly to the constructor:
from supertable import SuperTable
st = SuperTable(
organization="my-org",
token="st_tok_...",
api_url="http://localhost:8050"
)
Core Classes
The SDK is built around four main classes.
SuperTable
Primary client. Manages connections, tables, and queries. Entry point for all operations.
DataWriter
Handles data ingestion with transactional writes. Accepts Polars DataFrames and commits atomically.
DataReader
Streams data from tables with optional filtering, column projection, and pagination.
MetaReader
Reads table metadata, schemas, statistics, and version history without fetching actual data.
Common Operations
Create a Table
st = SuperTable(organization="my-org")
# Create a table (schema is inferred from first write)
st.create_table(
table_name="events",
comment="Application event log"
)
# List existing tables
tables = st.list_tables()
for t in tables:
print(f"{t.name} — {t.row_count} rows")
Write Data
The DataWriter accepts Polars DataFrames and commits them atomically.
import polars as pl
df = pl.DataFrame({
"event_type": ["page_view", "click", "purchase"],
"user_id": ["u-100", "u-200", "u-100"],
"amount": [None, None, 49.99],
"timestamp": [
"2026-03-01T09:00:00",
"2026-03-01T09:05:00",
"2026-03-01T09:12:00",
]
})
# Open a writer, send data, and commit
writer = st.data_writer(table_name="events")
writer.write(df)
writer.commit()
# For large datasets, write in batches
writer = st.data_writer(table_name="events")
for chunk in large_dataframe.iter_slices(n_rows=100_000):
writer.write(chunk)
writer.commit()
Query with SQL
The execute_query method returns a Polars DataFrame by default.
# Basic query
result = st.execute_query("""
SELECT event_type, COUNT(*) AS cnt
FROM events
GROUP BY event_type
ORDER BY cnt DESC
""")
print(result)
# With parameters
result = st.execute_query(
"SELECT * FROM events WHERE user_id = :uid",
params={"uid": "u-100"}
)
print(result)
Time Travel
Query data as of any point in time. Every write creates a new version that is retained according to your retention policy.
# Query as of a specific timestamp
snapshot = st.execute_query(
"SELECT * FROM events",
as_of="2026-03-01T09:05:00"
)
print(snapshot)
# Query as of a specific version number
snapshot_v2 = st.execute_query(
"SELECT * FROM events",
as_of_version=2
)
print(snapshot_v2)
# List available versions
meta = st.meta_reader(table_name="events")
versions = meta.list_versions()
for v in versions:
print(f"v{v.version} — {v.timestamp} — {v.row_count} rows")
Read Data (Streaming)
Use the DataReader for large table scans with column projection and filtering.
reader = st.data_reader(table_name="events")
# Read all data
df = reader.read()
# Read with column selection
df = reader.read(columns=["event_type", "timestamp"])
# Read with row limit
df = reader.read(limit=1000)
Error Handling
The SDK raises typed exceptions for common failure modes.
from supertable.exceptions import (
TableNotFoundError,
AuthenticationError,
PermissionDeniedError,
QuerySyntaxError,
)
try:
result = st.execute_query("SELECT * FROM nonexistent")
except TableNotFoundError as e:
print(f"Table not found: {e}")
except QuerySyntaxError as e:
print(f"Invalid SQL: {e}")
except AuthenticationError:
print("Check your token or API key")