Data EngineeringML/AIProductOps

Live at nidopr.app

Nidopr

The Zillow-style data layer Puerto Rico never had — taken end-to-end from data to live product, and it runs itself.

~64,000listings aggregated & served (10+ sources, nightly)
+44% → +5%valuation-model bias after back-testing & trust-gating
~79.6%coverage on calibrated 80% price-prediction intervals
34scheduled production jobs running unattended nightly

Puerto Rico’s real-estate listings are scattered across a dozen disconnected sources, with no single searchable, priced, trustworthy view of the market. Nidopr unifies that fragmented market into one bilingual (ES/EN), AI-priced, self-monitoring product — live in production on self-managed infrastructure, designed and delivered end-to-end while directing an AI-assisted implementation layer.

Problem

A buyer or analyst in PR has no equivalent of a national listings portal: the data is spread across 10+ sources, formats disagree, the same property appears in several places, and nothing tells you whether an asking price is reasonable. Nidopr was built to be that missing data layer — acquisition, pricing, search, and the ops that keep it all alive, as a single product.

Build — Data engineering

  • 10+ acquisition sources unified into one schema, with resilient acquisition from difficult public sources (circuit breakers, resumable queues, graceful degradation).
  • Cross-source entity resolution so the same property from different sources resolves to one listing, with four precision guards against false merges.
  • Strict no-delete governance: soft-expire plus a nightly verify-or-expire pass, so the record stays honest without silently dropping data.
  • Enrichment pipeline: regex plus local-LLM extraction, CRIM property-registry matching (~71.6%), geocoding, and photo junk-filtering, feeding a five-stage nightly quality pipeline.

Build — ML / AI

  • LightGBM valuation model with split-conformal calibrated 80% prediction intervals, holding ~79.6% empirical coverage against the 80% target.
  • A back-testing trust gate that cut model bias from +44% to +5% and withholds low-confidence numbers rather than showing a figure it can’t stand behind.
  • Hybrid semantic search — Postgres full-text plus pgvector/HNSW over 768-dimension embeddings, with graceful fallback.
  • Local LLM pipelines (Ollama / qwen2.5) and a buyer-facing comparable-sales PDF report with an LLM narrative checked by a regex fact-checker.
Nidopr market-analytics page — a plain-language read on whether prices are fair, powered by the calibrated valuation model
The buyer-facing market-analytics view, built on the same calibrated valuation model. See it live ↗

Build — Product

  • Nuxt 3 + TypeScript + Tailwind + Pinia + ECharts front end.
  • An ES/EN natural-language search parser backed by a ~29k-check test suite.
  • Auth with anonymous→login action-replay and Google OAuth; saved-search email alerts.
  • Self-hosted behavioral analytics (funnels, rage-clicks, zero-result searches) and SEO with a ~22.4k-URL dynamic sitemap, verified in Search Console.

Ops

  • Self-managed Ubuntu server, git push-to-deploy, Dockerized Postgres.
  • 34 systemd timers running scheduled production jobs unattended nightly.
  • Private networking via Cloudflare Tunnel (no exposed ports) and Tailscale.
  • Restore-verified restic backups and an authorized security audit (21 findings, posture strong).

Sub-system — Nidopr Monitor

One pane of glass for the whole business, alerting to your phone, no vendor lock-in. An installable PWA (desktop and Galaxy Fold) running 40 production health checks across three alert sinks — Discord, email, and web-push to phone. It adds visitor analytics with bot/human/internal classification, a search→view→contact behavior funnel, a self-healing client, and a performance fix that took one endpoint from 4.8s to 0.1s (~48×).

Sub-system — Facebook auto-poster

A marketing channel that runs itself — the data finds the deal, writes the post, safety-checks the photo, publishes, and monitors its own health.

  • Deals chosen by the data, reusing the site’s own below-market “heat” engine; rotates three categories (below-market sale / rent / Section-8) oldest-first with 30-day cooldowns.
  • 7 posts/day at peak hours via the Meta Graph API, as proper timeline feed posts.
  • An OCR photo-safety guard (tesseract): a phone-number watermark on any photo condemns the whole listing, so it never funnels a caller to a competitor and never falls back to an unsafe image.
  • Spanish captions generated from structured fields, plus self-monitoring — a Social dashboard tab logging every post and skip with a reason, a shared invariant battery, schedule-aware liveness alerting, a kill switch, and token hygiene.

Numbers

  • ~64,000 property listings aggregated and served (10+ sources, nightly).
  • 640,000+ photos managed (~29 GB) with restore-verified backups.
  • Valuation-model bias cut from +44% to +5%; ~79.6% coverage on calibrated 80% intervals.
  • 40 automated health checks and 34 unattended nightly production jobs.
  • 7/day autonomous social posts, deal-picked and OCR photo-safety checked.

Stack

Python · FastAPI · SQLAlchemy 2 · Postgres 16 · pgvector · Nuxt 3 · Vue 3 · TypeScript · Tailwind · LightGBM · conformal prediction · Ollama · tesseract · Docker · Cloudflare · Tailscale · restic.

Stack

PythonFastAPISQLAlchemy 2Postgres 16pgvectorNuxt 3Vue 3TypeScriptTailwindLightGBMConformal predictionOllamatesseractDockerCloudflareTailscalerestic

← All work

Have data? Let’s make it think.

Open to senior / lead data & AI roles, and to Vizlogic consulting engagements.