Project Notes (full article to be written)
- Implementation of group data platform (CDP) consolidating customer data for 20M+ contacts and algorithms across multiple brands
- More than €4B in sales covered
- Enables data enrichment between brands
- Cloud Platform: GCP
- Data Warehouse: BigQuery (multi-petabyte scale)
- Orchestration: Airflow (100+ DAGs)
- Infrastructure: Terraform (infrastructure as code)
- Compute: Kubernetes
- Languages: Python, SQL
- Customer deduplication algorithm through development of customer graph with multithreading
- Infrastructure as code and versioning of entire platform with Terraform
- Implementation of Airflow CI/CD for pipeline orchestration
- Compliant data sharing and enrichment without group-level opt-in
- Challenge of non-common definitions across multiple brands
- Lack of customer deduplication within individual brands while group-level deduplication is performed
- 20M+ customers deduplicated across brands
- Customer 360 view covering €4B+ in sales
- Multi-brand customers are 2.5x more valuable
- €50M+ e-commerce investment secured based on data insights