Live at licitapr.com
LicitaPR Inteligencia
Six government systems that don't talk to each other, unified into one forward-looking procurement-intelligence product.
Six government systems that don’t talk to each other, unified into one forward-looking procurement-intelligence product.
Vendor-facing intelligence built on Puerto Rico public data. It answers the questions a bidder actually has: What can I bid on? What does this work pay? Which contracts expire next — the re-bid — and who holds them?
Problem
Public procurement data in Puerto Rico is real but fragmented across six separate government systems that don’t interoperate. The data that exists is backward-looking — watchdog and transparency tools tell you what already happened. A vendor deciding where to spend bid-prep effort needs the opposite: what’s coming up for re-bid, what the work pays, and who they’d be bidding against.
The expiring-contract radar is the wedge. It is forward-looking re-bid intelligence — the kind of signal backward-looking tools structurally can’t produce — and it is what makes the product worth paying for.
Build
The product
A multi-category, searchable site spanning six procurement categories. Per category, vendors get a clear free-vs-paid line:
- Open RFPs — free
- Award pricing — paid
- ★ Expiring-contract radar — paid (the wedge)
- Competitor leaderboard — paid
On top of that sit firm and agency drill-down dossiers and server-rendered charts. A server-side paywall enforces the free/paid boundary with no view-source leak. The visual identity is deliberately institutional — Public Sans plus a Newsreader serif, tabular figures, semantic color — an intelligence product, not a dark-terminal cliché.
Data engineering (acquisition at scale)
Six Puerto Rico government systems scraped, structured, cross-linked, and quality-monitored:
- ~1.24M contracts ingested — roughly 90% of the known universe.
- 50K+ legislative measures toward the full record since 1985, with GPU-OCR’d bill text.
- 440K+ campaign-finance donations across 125K+ donors. The finding that 98.8% of donation dollars come from individuals drove the schema design.
- ~1,700 RFPs and a corporate-registry crawler feeding entity resolution.
Acquisition is resilient against difficult public sources, with resumable, idempotent crawlers and a supervisor that auto-restarts jobs.
Entity resolution
A corporate-registry crawler feeds Splink + deterministic entity resolution, collapsing 422K vendor records to ~345K resolved entities with confidence scores attached. The framing is strict and honest: the system maps public records and flags patterns for review — it never accuses. A tempting fraud thesis was tested adversarially and falsified 0/15, so it was killed rather than shipped.
Engineering quality and honest data
- A standalone 51-check data-quality monitor and integrity-checked nightly snapshots.
- A 62/62 QA harness on the product surface.
- Honest-data discipline: a completeness audit caught ~80K silently-missing measures, which were re-crawled rather than quietly served.
Architecture and ops
A clean security boundary keeps the internal warehouse separate from anything public:
- Internal 1.29 GB warehouse → nightly ETL → a small, indexed
product.sqlite. - A read-only public Flask app serves only the slimmed product database, behind a cloudflared tunnel.
- Result: instant indexed lookups for users, with the working data never exposed.
Numbers
- ~1.24M government contracts ingested (~90% of the universe)
- 422K → ~345K vendors resolved to entities (Splink + deterministic)
- 440K+ campaign-finance donations across 125K+ donors
- 50K+ legislative measures toward the full record since 1985
- 6-category live procurement-intelligence product
- 62/62 QA harness · 51-check data-quality monitor
- ~80K silently-missing measures caught by a completeness audit and re-crawled
Stack
Python · Flask · SQLite (WAL) · Splink · Patchright · httpx/selectolax · EasyOCR (GPU) · cloudflared
Stack
Have data? Let’s make it think.
Open to senior / lead data & AI roles, and to Vizlogic consulting engagements.