Data EngineeringML/AIProductOps

Live at licitapr.com

LicitaPR Inteligencia

Six government systems that don't talk to each other, unified into one forward-looking procurement-intelligence product.

~1.24Mgovernment contracts ingested (~90% of universe)
422K → ~345Kvendors resolved to entities (Splink + deterministic)
6procurement categories in the live product
62/62QA harness passing

Six government systems that don’t talk to each other, unified into one forward-looking procurement-intelligence product.

Vendor-facing intelligence built on Puerto Rico public data. It answers the questions a bidder actually has: What can I bid on? What does this work pay? Which contracts expire next — the re-bid — and who holds them?

Problem

Public procurement data in Puerto Rico is real but fragmented across six separate government systems that don’t interoperate. The data that exists is backward-looking — watchdog and transparency tools tell you what already happened. A vendor deciding where to spend bid-prep effort needs the opposite: what’s coming up for re-bid, what the work pays, and who they’d be bidding against.

The expiring-contract radar is the wedge. It is forward-looking re-bid intelligence — the kind of signal backward-looking tools structurally can’t produce — and it is what makes the product worth paying for.

Build

The product

A multi-category, searchable site spanning six procurement categories. Per category, vendors get a clear free-vs-paid line:

  • Open RFPs — free
  • Award pricing — paid
  • ★ Expiring-contract radar — paid (the wedge)
  • Competitor leaderboard — paid

On top of that sit firm and agency drill-down dossiers and server-rendered charts. A server-side paywall enforces the free/paid boundary with no view-source leak. The visual identity is deliberately institutional — Public Sans plus a Newsreader serif, tabular figures, semantic color — an intelligence product, not a dark-terminal cliché.

Data engineering (acquisition at scale)

Six Puerto Rico government systems scraped, structured, cross-linked, and quality-monitored:

  • ~1.24M contracts ingested — roughly 90% of the known universe.
  • 50K+ legislative measures toward the full record since 1985, with GPU-OCR’d bill text.
  • 440K+ campaign-finance donations across 125K+ donors. The finding that 98.8% of donation dollars come from individuals drove the schema design.
  • ~1,700 RFPs and a corporate-registry crawler feeding entity resolution.

Acquisition is resilient against difficult public sources, with resumable, idempotent crawlers and a supervisor that auto-restarts jobs.

Entity resolution

A corporate-registry crawler feeds Splink + deterministic entity resolution, collapsing 422K vendor records to ~345K resolved entities with confidence scores attached. The framing is strict and honest: the system maps public records and flags patterns for review — it never accuses. A tempting fraud thesis was tested adversarially and falsified 0/15, so it was killed rather than shipped.

Engineering quality and honest data

  • A standalone 51-check data-quality monitor and integrity-checked nightly snapshots.
  • A 62/62 QA harness on the product surface.
  • Honest-data discipline: a completeness audit caught ~80K silently-missing measures, which were re-crawled rather than quietly served.

Architecture and ops

A clean security boundary keeps the internal warehouse separate from anything public:

  • Internal 1.29 GB warehouse → nightly ETL → a small, indexed product.sqlite.
  • A read-only public Flask app serves only the slimmed product database, behind a cloudflared tunnel.
  • Result: instant indexed lookups for users, with the working data never exposed.

Numbers

  • ~1.24M government contracts ingested (~90% of the universe)
  • 422K → ~345K vendors resolved to entities (Splink + deterministic)
  • 440K+ campaign-finance donations across 125K+ donors
  • 50K+ legislative measures toward the full record since 1985
  • 6-category live procurement-intelligence product
  • 62/62 QA harness · 51-check data-quality monitor
  • ~80K silently-missing measures caught by a completeness audit and re-crawled

Stack

Python · Flask · SQLite (WAL) · Splink · Patchright · httpx/selectolax · EasyOCR (GPU) · cloudflared

Stack

PythonFlaskSQLite (WAL)SplinkPatchrighthttpx/selectolaxEasyOCR (GPU)cloudflared

← All work

Have data? Let’s make it think.

Open to senior / lead data & AI roles, and to Vizlogic consulting engagements.