Case · 2026 · Real-estate market analysts (RID Analytics)
Real-estate parser pipeline
13 city-developer parsers + monthly aggregation pipeline for new-build apartment market analysis. From naively 5min Playwright to 12-30s curl_cffi+API.
Result: 53 objects, 2093 apartments / month
PythonPlaywrightcurl_cffiopenpyxlCOM Excel
Monthly pipeline: monitor наш.дом.рф → new-object detection → run parsers (parsers/*.py per developer or add_rashet_from_pd.py from declarations) → fill_rashet → consolidated table → manual check → fill_baza.
Highlights
- API-first architecture — moved monitor_nash_dom + kpdgazstroi.ru from Playwright (5 min) to curl_cffi+API (12-30 sec) using
__NEXT_DATA__SSR-state extraction - Idempotent fill_rashet — code API ≠ apartment numbers in Kristall buildings, separate matching by area+floor+rooms
- COM Excel write — Times New Roman 11, taskkill before COM, no openpyxl conflict
- Backup-before-write — main.py overwrites, daily backup auto + restore on demand
- 2 data sources unified — site parsers and government declarations both feed same fill_rashet
What you get
Industry-specific market data pipeline. Same shape works for any monthly aggregation: car listings, vacancy reports, FX rates — anything that has 10+ heterogeneous sources to homogenize.
Demo by request
Want similar — let's talk