Skip to content
Diego Alvarez
Portfolio
Data Engineering2025 — PresentProduct Lead & Data Engineer

App-store intelligence platform

Market-intelligence platform for app brokers — Google Play crawler, live-deals scraper, and a TimescaleDB warehouse holding ~45M App Store listings, surfaced through an Elixir/Phoenix API.

Elixir / PhoenixTimescaleDBObanFly.ioMixpanel
  • ~45M Apple listings (~32 GB) warehoused in Timescale
  • Google Play crawler and live-deals scraper in production
  • Powers industry reports for app brokers

Context

App brokers buy and sell mobile apps for a living. They need to know, at any given moment, which apps are climbing in ranking, which categories are losing eyeballs, and where the comparable sales are landing. Most of that data is technically public — App Store and Play Store listings, third-party deal pages — but it isn't structured, and it isn't searchable in the shapes a broker actually needs.

App Peak is the platform that pulls all of it together: a backend service and an internal admin that turn fragmented public data into a queryable market view.

Role

Product Lead and Data Engineer. I own the problem framing, the data architecture, and the implementation of the ingestion side — the Google Play crawler and the live-deals scraper feeding broker industry reports — alongside the Elixir/Phoenix API that exposes it all.

What we shipped

A small set of services that turn unstructured public data into a market intelligence product:

  1. Google Play crawler. Polls Play Store listings on a schedule, normalizes app metadata, and writes incremental snapshots into the warehouse so we can see week-over-week changes per app, developer, and category.
  2. Live-deals scraper. Pulls comparable-sales data off public deal listings — the source of truth brokers actually use when pricing — and structures it into rows that join cleanly against the listings warehouse.
  3. Phoenix API. Exposes the warehouse through an Elixir/Phoenix service deployed to Fly.io (staging and production), with Mixpanel-instrumented analytics on top.
  4. Industry reports. Generated for app brokers off the joined dataset — listings, snapshots, and live deals — so the broker workflow stops being a tab-juggling exercise.

The data layer is Nocturnal: a TimescaleDB warehouse on Tiger Cloud holding ~45M Apple listing snapshots (~32 GB) and growing Google Play snapshots, with ingestion running as Oban background jobs.

Product decisions worth writing down

Scraping is a product, not a side project. Every broker workflow eventually breaks on a bad scrape. We invested early in observability — per-source success rates, schema-drift alerts, and snapshot lineage — so the system fails loudly rather than silently misreporting.

Time-series first, relational second. App-store data is fundamentally temporal: a listing today vs. a listing yesterday is the whole point. We modeled around TimescaleDB hypertables from the start instead of bolting time-series on later.

Don't ship every column. The raw listing has 36 fields per row. Brokers care about ~8 of them. Industry reports lean on the curated subset; the long tail stays in the warehouse for ad-hoc questions.

Outcome

In production. The platform powers the industry reports app brokers use to price acquisitions, and the Phoenix API serves the internal admin team plus downstream report-generation jobs. Quantified business outcomes are being measured and will land here.