One of the most common requests from the Preswald community: "Can I read a CSV directly from S3, without downloading it first?"
As of this week, the answer is yes.
Preswald now supports CSV data sources hosted in S3-compatible object storage — including AWS S3, MinIO, and even local emulators — with no extra code required.

How It Works
You define your S3 CSV source in preswald.toml
, like so:
[data.my_s3_csv]
type = "s3csv"
s3_endpoint = "https://s3.amazonaws.com"
s3_access_key_id = "YOUR_KEY"
s3_secret_access_key = "YOUR_SECRET"
path = "my-bucket/myfile.csv"
Once defined, Preswald will:
-
Connect to your S3 bucket using the provided credentials
-
Stream the file (no full download)
-
Load it directly into DuckDB
-
Make it available via
get_df()
andquery()
From there, it's just Python:
from preswald import get_df, table
df = get_df("my_s3_csv")
table(df)
Use Cases
-
🧪 Private Data Lakes: Query data from cloud storage without pipelines
-
📊 Live Dashboards: Build tools on top of CSVs that update regularly
-
🧩 Multi-Source Apps: Combine S3, API, and Postgres data in one place
Authentication + Compatibility
Preswald supports:
-
AWS S3
-
MinIO
-
Self-hosted S3 emulators
-
Temporary session tokens and signed URLs (in-progress)
All credentials are handled securely, and you can split secrets into secrets.toml
as needed.
Example App Snippet
# preswald.toml
[data.research_results]
type = "s3csv"
s3_endpoint = "https://s3.us-west-2.amazonaws.com"
s3_access_key_id = "..."
s3_secret_access_key = "..."
path = "lab-data/experiment_results.csv"
# app.py
from preswald import get_df, text, table
df = get_df("research_results")
text("## Latest Lab Results")
table(df)
Now your app streams fresh CSV data on launch, no manual syncing or local copies needed.
Contribution Credit
This feature was contributed by Varnit Singh, and it's a big step forward for integrating Preswald into cloud-native environments.