Reading CSV Files from S3 in Preswald

Amrutha GujjarAmrutha Gujjar4 min read

Category: Product


One of the most common requests from the Preswald community: "Can I read a CSV directly from S3, without downloading it first?"

As of this week, the answer is yes.

Preswald now supports CSV data sources hosted in S3-compatible object storage — including AWS S3, MinIO, and even local emulators — with no extra code required.

img


How It Works

You define your S3 CSV source in preswald.toml, like so:

[data.my_s3_csv]
type = "s3csv"
s3_endpoint = "https://s3.amazonaws.com"
s3_access_key_id = "YOUR_KEY"
s3_secret_access_key = "YOUR_SECRET"
path = "my-bucket/myfile.csv"

Once defined, Preswald will:

  • Connect to your S3 bucket using the provided credentials

  • Stream the file (no full download)

  • Load it directly into DuckDB

  • Make it available via get_df() and query()

From there, it's just Python:

from preswald import get_df, table

df = get_df("my_s3_csv")
table(df)

Use Cases

  • 🧪 Private Data Lakes: Query data from cloud storage without pipelines

  • 📊 Live Dashboards: Build tools on top of CSVs that update regularly

  • 🧩 Multi-Source Apps: Combine S3, API, and Postgres data in one place


Authentication + Compatibility

Preswald supports:

  • AWS S3

  • MinIO

  • Self-hosted S3 emulators

  • Temporary session tokens and signed URLs (in-progress)

All credentials are handled securely, and you can split secrets into secrets.toml as needed.


Example App Snippet

# preswald.toml
[data.research_results]
type = "s3csv"
s3_endpoint = "https://s3.us-west-2.amazonaws.com"
s3_access_key_id = "..."
s3_secret_access_key = "..."
path = "lab-data/experiment_results.csv"
# app.py
from preswald import get_df, text, table

df = get_df("research_results")
text("## Latest Lab Results")
table(df)

Now your app streams fresh CSV data on launch, no manual syncing or local copies needed.


Contribution Credit

This feature was contributed by Varnit Singh, and it's a big step forward for integrating Preswald into cloud-native environments.

Get started today on GitHub