Xiaohongshu (RED) has released FireRed-Image-Edit, an open-source image editing foundation model that achieves state-of-the-art results on major benchmarks through a three-stage training approach and novel text-editing rewards.
Xiaohongshu, the Chinese social commerce platform known as RED, has open-sourced its new image editing foundation model, FireRed-Image-Edit. The release includes code, a technical report, and demo pages on GitHub and Hugging Face, with model weights expected to follow shortly.
Benchmark Performance
The model has achieved state-of-the-art (SOTA) results on several leading image editing benchmarks, including ImgEdit and GEdit. Beyond these established benchmarks, Xiaohongshu introduced RedEdit Bench, a proprietary evaluation framework covering 15 sub-tasks such as object insertion/removal, portrait enhancement, and low-quality image restoration. The company plans to open-source this benchmark as well.
Technical Architecture
FireRed-Image-Edit employs a three-stage training strategy designed to optimize different aspects of image editing performance:
Pre-training Stage: Uses multi-condition perceptual bucket sampling and dynamic instruction augmentation to improve generalization across diverse editing scenarios.
Fine-tuning Stage: Leverages high-quality curated data to refine editing performance on specific tasks.
Reinforcement Learning Stage: Implements a novel Layout-Aware OCR-based Reward mechanism that penalizes typos, misaligned characters, abnormal font scaling, and layout distortions. This approach significantly improves text-editing accuracy and stylistic consistency.
Core Capabilities
The model demonstrates strong performance across multiple image editing tasks:
- Instruction Following: Accurately interprets and executes complex editing requests
- Text Editing: Precise manipulation of text elements within images
- Style Transfer: Seamless application of artistic styles to images
- Multi-reference Image Fusion: Combines elements from multiple source images
- Old Photo Restoration: Recovers and enhances degraded historical images
- High-fidelity Image Enhancement: Improves image quality while preserving details
Future Development
Xiaohongshu has outlined plans for future enhancements, including strengthening portrait retouching capabilities, improving text editing precision, and better consistency preservation. The company also announced plans to release additional open-source models, including text-to-image foundation models, in the coming months.
Industry Context
The release of FireRed-Image-Edit represents a significant contribution to the open-source image editing ecosystem. By open-sourcing both the model and evaluation framework, Xiaohongshu is enabling broader research and development in this space. The focus on text editing accuracy through the Layout-Aware OCR-based Reward mechanism addresses a common pain point in image editing systems, where text manipulation often produces suboptimal results.
Source: QbitAI
Tags: #Xiaohongshu #ImageEditingSOTA #OpenSource

Comments
Please log in or register to join the discussion