Faster health data analysis with MotherDuck & Preswald

Amrutha GujjarAmrutha Gujjar4 min read

Category: Use Case


In this post, we'll explore how to leverage MotherDuck and Preswald's interactive data apps to more easily and quickly analyze large public health datasets, specifically cholesterol measurements at a population scale.

In this post you’ll learn

<li>How MotherDuck extends DuckDB to handle multi-table queries in the cloud. </li><li>The importance of the read-scaling token for 4x faster data loading, especially when wrangling multiple tables. </li><li>How Preswald helps you build live, Python-based data apps that go beyond static dashboards. </li>

Challenges researchers face

Public health datasets come in all shapes and sizes, from CSV dumps to relational systems. Linking cholesterol levels to age groups, race/ethnicity, and comorbidities isn’t a single-step process. But existing solutions often require big clusters or fancy ETL pipelines just to run a few multi-join queries. And don’t even get us started on non-interactive dashboards or spreadsheets—they leave scientists clicking “refresh” and crossing their fingers.

Common Pain points

<li>Multiple, fragmented tables: e.g., demographics, lab results, comorbidities. </li><li>Slow ingest and overhead: “Scaling up” typically means big clusters or advanced ETL. </li><li>One-dimensional dashboards: Spreadsheets and static BI can’t handle evolving questions in real time. </li>

MotherDuck to the Rescue (with Read Scaling!)

MotherDuck is powered by the DuckDB engine you know and love, but supercharged in the cloud:

<li>Write standard SQL queries (no new query language to learn) </li><li>Lightning-fast aggregations. DuckDB’s columnar engine plus in-memory operations. </li><li>Automatically offload. If your dataset doesn’t fit on your laptop, MotherDuck picks up the slack. </li>

4x Throughput with Read-Scaling

For read-heavy tasks—like pulling data from multiple big tables—MotherDuck can parallelize your queries when you attach a special Read-Scaling Token. That means up to 4x faster ingestion and transformation. It’s as easy as adding a parameter to your connection string:

-- Example: Using read-scaling token in your connection string  
CONNECT 'md:health_data?read_scaling_token=YOUR_TOKEN_HERE';  

Read Scaling is particularly useful for handling highly concurrent read-heavy workloads, especially for data applications. By distributing read queries across multiple DuckDB instances, MotherDuck helps keep dashboards and apps responsive, even under heavy load. This really matters:

When to use read scaling:

<li>Joining multiple datasets (e.g., demographics + lab results). </li><li>Unioning or stacking big CSVs. </li><li>Grouping or filtering on billions of rows. </li><li>Complex queries that group or filter across columns with billions of entries. </li><li>Handling multiple people hammering the dashboard at once. </li>

Preswald: interactive data apps in Python

Preswald gives you a near-instant route to interactive data apps, without forcing you to wade through a sea of JavaScript frameworks or pricey BI licenses.

<li>Lightweight. Build dynamic dashboards with nothing but Python. </li><li>Charts refresh as soon as data changes. </li><li>No complicated front-end code or vendor lock-ins. </li><li>Anyone with the app link can start exploring data. </li>

Preswald is especially handy for public health folks who want to query large data one minute and spin up a live interactive chart the next. You don’t need to become a web developer to let your colleagues filter cholesterol ranges by age group or compare comorbidity severity across different ethnicities.

Bringing It All Together: A Quick Demo

<li>Install Dependencies </li><li>Connect to MotherDuck </li><li>Query the Cholesterol Table </li><li>Build a Preswald Dashboard (line chart, bar chart, scatter plot) </li><li>Run & View Your Interactive App </li>

Step 1: Install Dependencies

Make sure you have <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">duckdb</code>, <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">pandas</code>, <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">plotly</code>, and <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">preswald</code> installed in your Python environment.

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">pip install duckdb pandas plotly preswald</code>

Step 2: Connect to MotherDuck

You can connect to MotherDuck using your MotherDuck token. By default, <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">duckdb.connect("md:my_db")</code> will look for an environment variable called <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">MOTHERDUCK_TOKEN</code>. If you’d like read-scaling for faster queries, append <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">?read_scaling_token=YOUR_TOKEN_HERE</code> to the connection string.

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">import duckdb</code>

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># Example with environment variable:</code> <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># export MOTHERDUCK_TOKEN=<your_token_here></code> <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">con = duckdb.connect("md:my_db")</code>

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># OR with read scaling explicitly:</code> <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># con = duckdb.connect("md:my_db?read_scaling_token=<your_token_here>")</code>

Step 3: Query the Cholesterol Table

In this example, we’ll pull data from a table named <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">DQS_Cholesterol_in_adults_age_20</code>. Once connected, run a standard SQL query to bring your data into a Pandas DataFrame.

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># 1. Query your table</code> <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">df = con.execute("SELECT * FROM DQS_Cholesterol_in_adults_age_20").df()</code>

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm"># 2. Take a quick peek</code> <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">print(df.head())</code>

This shows you the first few rows, confirming you have the data you expect.

Step 4: Build a Preswald Dashboard

We’ll build three Plotly charts and present them with Preswald:

<li>A **line chart** showing cholesterol estimates over time </li><li>A **bar chart** comparing age-adjusted vs. crude estimates </li><li>A **scatter plot** to visualize estimates across different subgroups </li>

Here’s the <a target="_blank" rel="noopener noreferrer nofollow" class="text-blue-600 hover:text-blue-800 underline" href="https://github.com/StructuredLabs/preswald/tree/main/examples/health">full code</a> with comments explaining each part:

import pandas as pd  
import duckdb  
import plotly.express as px  
from preswald import text, plotly, view


# ----------------------------------------------------------------------------  
# STEP A: Connect to MotherDuck`  
# ----------------------------------------------------------------------------  
con = duckdb.connect("md:my_db")  
df = con.execute("SELECT * FROM DQS_Cholesterol_in_adults_age_20").df()

# ----------------------------------------------------------------------------  
# STEP B: Add descriptive text for Preswald  
# ----------------------------------------------------------------------------  
text("# Cholesterol Data Exploration")  
text("Below are several charts that help us visualize cholesterol estimates.")

# ---------------------------------------------------------------------------- 
# STEP C: Create a line chart of ESTIMATE over TIME_PERIOD  
# ----------------------------------------------------------------------------  
text("## Chart A: Trend of Cholesterol Estimates Over Time")

# Filter out rows that don’t have an actual ESTIMATE  
df_line = df.dropna(subset=["ESTIMATE"]).copy()

fig_a = px.line(  
df_line,  
x="TIME_PERIOD",  
y="ESTIMATE",  
color="ESTIMATE_TYPE", # e.g., "Percent of population, age adjusted" vs "crude"  
markers=True,  
title="Cholesterol Estimate by Time Period"  
)  
plotly(fig_a)

# ---------------------------------------------------------------------------- 
# STEP D: Create a grouped bar chart comparing ESTIMATE_TYPE  
# ----------------------------------------------------------------------------  
text("## Chart B: Comparison of Age Adjusted vs. Crude Estimates")

fig_b = px.bar(  
df_line,  
x="TIME_PERIOD",  
y="ESTIMATE",
color="ESTIMATE_TYPE",  
barmode="group",  
title="Age Adjusted vs. Crude Estimates"  
)  
plotly(fig_b)

# ---------------------------------------------------------------------------- 
# STEP E: Create a scatter plot of ESTIMATE vs. SUBGROUP  
# ----------------------------------------------------------------------------  
text("## Chart C: Scatter Plot of Estimate vs. Subgroup")

fig_c = px.scatter(  
df_line,  
x="SUBGROUP_ID",  
y="ESTIMATE",  
color="GROUP", # e.g. "Total" vs. "Race and Hispanic origin"  
size="ESTIMATE",  
hover_data=["TIME_PERIOD", "ESTIMATE_TYPE"],  
title="Cholesterol Estimate by Subgroup"  
)  
plotly(fig_c)

# ----------------------------------------------------------------------------  
# STEP F: Render the final output in Preswald  
# ----------------------------------------------------------------------------  
# We'll also show a table preview at the bottom.  
view(df)

# Close the DuckDB connection if you like  
con.close()

What’s Happening in Each Section

<li>**Connect to MotherDuck**: We use <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">duckdb.connect("md:my_db")</code> to establish a connection. </li><li>**Fetch Data**: A simple SQL query to pull all rows from the <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">DQS_Cholesterol_in_adults_age_20</code> table into a DataFrame. </li><li>**Preswald Text**: We insert headings and descriptions (<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">text()</code>) so people viewing the dashboard know what they’re looking at. </li><li>**Line Chart**: Shows cholesterol estimates vs. time, separated by <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">ESTIMATE_TYPE</code>. </li><li>**Bar Chart**: Compares different <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">ESTIMATE_TYPE</code> categories within each time period (grouped bars). </li><li>**Scatter Plot**: Visualizes how <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">ESTIMATE</code> varies by <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">SUBGROUP_ID</code> (e.g., an age or demographic marker), coloring by <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">GROUP</code>. </li><li>**View**: Finally, we call <code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">view(df)</code> to render everything as an interactive web app. </li>

Step 5: Run & View Your Interactive App

With everything in place, run the script using Preswald:

<code class="rounded-md bg-gray-100 px-1.5 py-0.5 font-mono text-sm">preswald run my_script.py</code>

This launches a local server. Open the provided URL in your web browser, and you’ll see your line chart, bar chart, scatter plot, plus a data table preview. From here, you can:

<li>Filter or pivot your data (if you add user inputs) </li><li>Refresh the script for near-instant updates </li><li>Share the app link with colleagues for real-time collaboration </li>

Bottom Line

Preswald is the quick, straightforward way to turn your data queries into interactive dashboards for broader consumption. Coupled with MotherDuck, you get speed and scalability for large datasets plus an easy path to real-time exploration (without spinning up a separate BI tool or writing tons of custom front-end code).

Here’s the <a target="_blank" rel="noopener noreferrer nofollow" class="text-blue-600 hover:text-blue-800 underline" href="https://github.com/StructuredLabs/preswald/tree/main/examples/health">code</a> from the example.