Embed-build CDN loading for PGlite/PostGIS¶
Date: 2026-06-12
Problem¶
The geolibre Python wheel grew from ~24 MB (v1.1.0) to ~47 MB (v1.1.1). The
entire increase comes from the in-browser PostGIS SQL engine added in PR #234
(feat(sql): add in-browser PostGIS SQL engine via PGlite). The wheel bundles
the built web app (geolibre/static/app), which now includes the lazily-loaded
PGlite assets:
| File | Compressed |
|---|---|
postgis.tar.gz (PostGIS extension bundle) |
19.58 MB |
pglite.wasm (PostgreSQL compiled to WASM) |
3.40 MB |
pglite.data |
1.85 MB |
initdb.wasm |
0.14 MB |
pglite.js |
0.14 MB |
| PGlite subtotal | 25.1 MB |
PGlite is already lazy-loaded at runtime (the SqlWorkspaceDialog is lazy(),
and pglite-workspace.ts uses dynamic import()), so on the web/desktop it only
downloads when a user opens the SQL workspace. The problem is purely that the
wheel embeds the whole dist-embed/ tree, so those lazy chunks plus the 19.6 MB
postgis.tar.gz ship inside the wheel regardless of whether the feature is used.
Goals¶
- Shrink the Jupyter wheel back to ~24 MB.
- Keep the in-browser PostGIS SQL workspace working in Jupyter, loading PGlite and the PostGIS extension from a CDN (jsDelivr) on first use.
- Leave the regular web build and the Tauri desktop build unchanged (they keep bundling PGlite so they work fully offline / vendored).
Non-goals¶
- No change to web or desktop bundling behavior.
- No offline support for the PostGIS SQL feature inside the Jupyter wheel. It is an optional feature; needing internet on first use is acceptable. The app already loads basemap tiles from the network.
- No change to the SQL-building logic or the dialog UI.
Constraints verified¶
- The embedded app is served by the Jupyter Server extension
(
python/src/geolibre/_extension.py) and rendered in a non-sandboxed iframe viaiframe.src(python/src/geolibre/_frontend.js). - The server sets only
X-Content-Type-Options: nosniff(for WASM MIME). There is no Content-Security-Policy anywhere in the embed/web path, so a cross-origin script/WASM/data fetch from jsDelivr is not blocked. @electric-sql/pglite(0.5.2) and@electric-sql/pglite-postgis(0.2.2) are pure ESM and support CDN loading: they resolve their own.wasm,.data, andpostgis.tarrelative to their module URL, which becomes the jsDelivr path.
Approach¶
A build-time flag, set only by the embed build, swaps the entire PGlite loader
module for a CDN variant at Rollup's resolveId stage. Because the bundled
loader is no longer in the module graph, Vite never emits the pglite chunk,
.wasm, .data, or postgis.tar.
Note: an earlier draft of this design used a single loader module with an
if (__PGLITE_CDN__) { ... } else { ... }branch, relying on Rollup to tree-shake the dead bundled branch. That approach was not shipped: a bundler emits a chunk for everyimport()it parses regardless of dead-code reachability, so the dead branch would still vendor the pglite chunk into the wheel. Sections 2 and 3 below describe the module-swap design that was actually built.
1. Flag plumbing¶
scripts/build-embed.mjs adds GEOLIBRE_PGLITE_CDN: "1" to the build env,
alongside the existing GEOLIBRE_APP_BASE=./.
2. Vite config (apps/geolibre-desktop/vite.config.ts)¶
Read the env var. Using createRequire, read the installed versions of
@electric-sql/pglite and @electric-sql/pglite-postgis from their
package.json so the CDN URLs cannot drift from the lockfile. Inject two
define constants:
__PGLITE_CDN_URL__:"https://cdn.jsdelivr.net/npm/@electric-sql/pglite@<ver>/dist/index.js"(ornullwhen not in embed mode)__PGLITE_POSTGIS_CDN_URL__: the same forpglite-postgis(ornull)
Also register pgliteCdnLoaderPlugin() (only when GEOLIBRE_PGLITE_CDN=1): a
Vite/Rollup plugin whose resolveId redirects any import of ./pglite-loader
(but never pglite-loader.cdn) to src/lib/pglite-loader.cdn.ts.
3. Loader modules (apps/geolibre-desktop/src/lib/pglite-loader*.ts)¶
pglite-workspace.ts is left unchanged except for importing
loadPgliteModules() from ./pglite-loader. There is no if branch in the
workspace code.
pglite-loader.ts(default): dynamically imports the bundled@electric-sql/pgliteand@electric-sql/pglite-postgis.pglite-loader.cdn.ts(embed only): dynamically imports the injected__PGLITE_CDN_URL__/__PGLITE_POSTGIS_CDN_URL__jsDelivr URLs with/* @vite-ignore */so Vite does not try to resolve/bundle them.
The plugin swaps pglite-loader → pglite-loader.cdn only in embed mode, so the
bundled imports (and their assets) vanish from the embed build while staying
intact everywhere else. A TypeScript ambient declaration (in vite-env.d.ts)
declares the two __PGLITE_*_URL__ globals so the source typechecks.
The CDN error path reuses the existing getState() behavior: on failure it
resets statePromise so the load is retryable, and the dialog surfaces the
error. The thrown message mentions that the PostGIS engine is fetched from a CDN
and needs network access, so a failure is diagnosable.
4. Build guard¶
scripts/build-embed.mjs already guards against absolute asset paths in
index.html. Add a second guard: after the build, fail if any postgis.tar*
file exists under dist-embed/assets/, so the wheel cannot silently regrow if
the dead-code-elimination ever stops working.
Verification¶
npm run build:embed, then assertdist-embed/assets/contains nopostgis.tar*,pglite*.wasm,pglite*.data,initdb*.wasm, or pglite JS chunk.- Confirm the staged
python/src/geolibre/static/apptree shrinks by ~25 MB and a freshly built wheel is back to ~24 MB. - Run the regular
npm run buildand confirm it still bundles the PGlite assets (web/desktop unchanged). npm run test:frontend(thepglite-sql.tsunit tests are unaffected, but confirm nothing regressed) andnpm run typecheck.- Manual: build the wheel, open the embedded app in Jupyter, open the SQL workspace, pick the PostGIS engine, and confirm it loads from jsDelivr and runs a query.
Risks¶
- Tree-shaking the bundled import in embed mode is the load-bearing assumption. Verified empirically in step 1 of Verification before declaring done; the build guard (step 4 of Approach) is the backstop.
- CDN availability at runtime. The Jupyter PostGIS SQL feature now needs internet on first use. jsDelivr CORS is permissive and no CSP blocks it. Accepted per the chosen direction.
- Version drift. CDN URLs are pinned from the installed package versions at build time, so they cannot diverge from the lockfile.
- No Subresource Integrity / tamper protection. The URLs are version-pinned
but unhashed. Dynamic
import()has nointegrityoption, so a compromised jsDelivr or a tampered package version that reached the registry would be served from the pinned URL with no integrity signal. A fetch-then-SHA-256-then-URL.createObjectURLstep could add verification, but at the cost of complexity for an optional, opt-in feature in a non-sandboxed iframe with no CSP. Accepted risk for CDN loading in the embed build; revisit if the embed path ever handles untrusted data or gains a CSP.