Zalando's transition from classic deep learning to graph neural networks (GNNs) for their landing page recommender system, addressing challenges in data preparation, training, and production deployment while delivering contextual embeddings for improved personalization.
Mariia Bulycheva, Senior Machine Learning Engineer at Intapp, shares her experience implementing graph neural networks at Zalando to enhance their landing page recommender system. The presentation details the transition from traditional deep learning approaches to GNNs, highlighting both the technical complexities and the practical solutions that emerged during production deployment.
The Recommendation Challenge
Zalando, Europe's leading online fashion platform, needed to optimize content selection on their landing page. With approximately 2,000 pieces of content (articles, carousels, videos) competing for placement among 40 slots, the scoring model had to predict click probabilities accurately to maintain user engagement and drive revenue from sponsored content.
The existing deep learning system had reached a performance plateau, unable to effectively optimize for longer-term metrics like user retention and final purchases. This limitation prompted exploration of graph-based approaches that could capture more complex user-content relationships.
Why Graphs for Recommendations?
Several factors made graph neural networks particularly suitable for Zalando's use case:
Natural representation of user engagement: User interactions with content naturally form graph structures, with users and content items as nodes and interactions (views, clicks, purchases) as edges.
Higher-order relationships: Graphs explicitly model multi-hop connections, enabling discovery of patterns like "items often bought together" or "users with similar preferences."
Rich feature integration: Node features can incorporate diverse data types, including user demographics and visual embeddings from product images.
Weighted relationships: Link weights can represent interaction strength, recency, or other contextual factors like video watch duration.
Data Preparation Challenges
The transition to graph-based learning required significant data engineering efforts. User logs stored as tabular data needed conversion to graph structures, with careful attention to train-test separation.
Critical requirement: Train and test graphs must be completely disconnected to prevent data leakage. The team implemented a temporal split strategy, using seven days of data for training and one consecutive day for testing.
Node features: Each user node contains embeddings of their 25 most recent purchases, while content nodes feature embeddings of associated items. These 25×128 feature matrices capture visual similarity in latent space but lack contextual relationship information.
Heterogeneous graph structure: The system supports multiple node types (users, content, brands) and link types (views, clicks, follows), though the team simplified to focus on view-to-click prediction.
Training Process Deep Dive
The core innovation lies in the message-passing mechanism that generates contextual embeddings:
- Feature preprocessing: Initial node features undergo transformation (LSTM for users, pooling for content)
- Neighbor sampling: Random sampling from 1-hop and 2-hop neighborhoods to manage computational complexity
- Message passing: Features propagate through sampled neighbors via trainable transformation matrices
- Embedding generation: Nodes receive updated representations incorporating neighborhood context
- Link prediction: Dot product classifiers predict click probabilities from user-content pairs
The team experimented with both Deep Graph Library (TensorFlow-based) and PyTorch Geometric, ultimately choosing the latter for its flexibility despite the downstream TensorFlow dependency.
Production Deployment Challenges
Several obstacles emerged when moving from offline evaluation to production:
Inference latency: Real-time graph inference proved too slow for the landing page's strict performance requirements. The solution involved decoupling embedding generation from click prediction.
Frequent retraining needs: The previous system retrained every 30 minutes to capture shifting preferences. Graph-based retraining required more complex data preparation, creating operational overhead.
New entity handling: Cold-start problems for new users and content required fallback strategies, including separate sampling pipelines for content with fewer than 500 views.
Hybrid Architecture Solution
The team developed a pragmatic hybrid approach that preserved GNN benefits while meeting production constraints:
- Daily offline embedding generation: GNNs train on daily data to produce contextual user and content embeddings
- Feature store integration: Generated embeddings stored for downstream consumption
- Existing model utilization: Downstream deep and cross network continues handling real-time inference
- Incremental value: Contextual embeddings provide performance improvements without full GNN deployment
This architecture achieved ROC-AUC improvements while maintaining sub-millisecond inference latency.
Key Technical Insights
Data leakage prevention: The team discovered that even using links only for supervision can cause information leakage. They implemented a "disjoint train ratio" parameter, holding out 30% of links from message passing while using them for training labels.
Sampling strategy importance: While the team used random sampling, they noted that smarter strategies considering link recency or importance could further improve performance.
Over-smoothing awareness: Deep networks with large neighborhoods risk producing similar embeddings across nodes, reducing their discriminative power.
Future Directions
Several enhancement opportunities remain unexplored:
- Smarter sampling strategies: Incorporating link importance metrics beyond random selection
- Feature enrichment: Leveraging Zalando's broader data ecosystem beyond purchase history
- Diversity control: Using GNN parameters to explicitly manage content novelty and variety
- Multi-objective optimization: Balancing engagement with discovery and exploration
Business Impact
The graph neural network implementation delivered measurable improvements in recommendation quality while highlighting the practical challenges of production ML systems. The hybrid approach demonstrates how organizations can extract value from advanced techniques without wholesale architectural changes.
The work underscores a critical insight for industrial ML: sometimes the most valuable contribution isn't the headline technology itself, but the intermediate representations and features it generates for existing systems.
Practical Takeaways
For organizations considering similar transitions:
- Start with data preparation: Graph learning requires fundamentally different data handling than tabular approaches
- Expect operational complexity: Graph retraining and inference introduce new infrastructure requirements
- Consider hybrid approaches: Full GNN deployment may be unnecessary if embeddings can benefit existing systems
- Plan for cold-start: New entities require dedicated handling strategies
- Balance depth and efficiency: Model complexity must align with production constraints
The Zalando case study provides a roadmap for organizations seeking to leverage graph neural networks while navigating the practical realities of production machine learning systems.

Comments
Please log in or register to join the discussion