Live at nidopr.app
Nidopr
The Zillow-style data layer Puerto Rico never had — taken end-to-end from data to live product, and it runs itself.
Puerto Rico’s real-estate listings are scattered across a dozen disconnected sources, with no single searchable, priced, trustworthy view of the market. Nidopr unifies that fragmented market into one bilingual (ES/EN), AI-priced, self-monitoring product — live in production on self-managed infrastructure, designed and delivered end-to-end while directing an AI-assisted implementation layer.
Problem
A buyer or analyst in PR has no equivalent of a national listings portal: the data is spread across 10+ sources, formats disagree, the same property appears in several places, and nothing tells you whether an asking price is reasonable. Nidopr was built to be that missing data layer — acquisition, pricing, search, and the ops that keep it all alive, as a single product.
Build — Data engineering
- 10+ acquisition sources unified into one schema, with resilient acquisition from difficult public sources (circuit breakers, resumable queues, graceful degradation).
- Cross-source entity resolution so the same property from different sources resolves to one listing, with four precision guards against false merges.
- Strict no-delete governance: soft-expire plus a nightly verify-or-expire pass, so the record stays honest without silently dropping data.
- Enrichment pipeline: regex plus local-LLM extraction, CRIM property-registry matching (~71.6%), geocoding, and photo junk-filtering, feeding a five-stage nightly quality pipeline.
Build — ML / AI
- LightGBM valuation model with split-conformal calibrated 80% prediction intervals, holding ~79.6% empirical coverage against the 80% target.
- A back-testing trust gate that cut model bias from +44% to +5% and withholds low-confidence numbers rather than showing a figure it can’t stand behind.
- Hybrid semantic search — Postgres full-text plus pgvector/HNSW over 768-dimension embeddings, with graceful fallback.
- Local LLM pipelines (Ollama / qwen2.5) and a buyer-facing comparable-sales PDF report with an LLM narrative checked by a regex fact-checker.
Build — Product
- Nuxt 3 + TypeScript + Tailwind + Pinia + ECharts front end.
- An ES/EN natural-language search parser backed by a ~29k-check test suite.
- Auth with anonymous→login action-replay and Google OAuth; saved-search email alerts.
- Self-hosted behavioral analytics (funnels, rage-clicks, zero-result searches) and SEO with a ~22.4k-URL dynamic sitemap, verified in Search Console.
Ops
- Self-managed Ubuntu server, git push-to-deploy, Dockerized Postgres.
- 34 systemd timers running scheduled production jobs unattended nightly.
- Private networking via Cloudflare Tunnel (no exposed ports) and Tailscale.
- Restore-verified restic backups and an authorized security audit (21 findings, posture strong).
Sub-system — Nidopr Monitor
One pane of glass for the whole business, alerting to your phone, no vendor lock-in. An installable PWA (desktop and Galaxy Fold) running 40 production health checks across three alert sinks — Discord, email, and web-push to phone. It adds visitor analytics with bot/human/internal classification, a search→view→contact behavior funnel, a self-healing client, and a performance fix that took one endpoint from 4.8s to 0.1s (~48×).
Sub-system — Facebook auto-poster
A marketing channel that runs itself — the data finds the deal, writes the post, safety-checks the photo, publishes, and monitors its own health.
- Deals chosen by the data, reusing the site’s own below-market “heat” engine; rotates three categories (below-market sale / rent / Section-8) oldest-first with 30-day cooldowns.
- 7 posts/day at peak hours via the Meta Graph API, as proper timeline feed posts.
- An OCR photo-safety guard (tesseract): a phone-number watermark on any photo condemns the whole listing, so it never funnels a caller to a competitor and never falls back to an unsafe image.
- Spanish captions generated from structured fields, plus self-monitoring — a Social dashboard tab logging every post and skip with a reason, a shared invariant battery, schedule-aware liveness alerting, a kill switch, and token hygiene.
Numbers
- ~64,000 property listings aggregated and served (10+ sources, nightly).
- 640,000+ photos managed (~29 GB) with restore-verified backups.
- Valuation-model bias cut from +44% to +5%; ~79.6% coverage on calibrated 80% intervals.
- 40 automated health checks and 34 unattended nightly production jobs.
- 7/day autonomous social posts, deal-picked and OCR photo-safety checked.
Stack
Python · FastAPI · SQLAlchemy 2 · Postgres 16 · pgvector · Nuxt 3 · Vue 3 · TypeScript · Tailwind · LightGBM · conformal prediction · Ollama · tesseract · Docker · Cloudflare · Tailscale · restic.
Stack
Have data? Let’s make it think.
Open to senior / lead data & AI roles, and to Vizlogic consulting engagements.