arxiv:2603.02210

HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images

Published on Mar 2

· Submitted by

Donghao Zhou on Mar 6

ByteDance

Upvote

Authors:

Donghao Zhou ,

Abstract

HiFi-Inpaint generates high-fidelity human-product images using shared enhancement attention and detail-aware loss with a new 40K-image dataset.

AI-generated summary

Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-based inpainting offers a targeted solution by leveraging product reference images to guide the inpainting process. However, limitations remain in three key aspects: the lack of diverse large-scale training data, the struggle of current models to focus on product detail preservation, and the inability of coarse supervision for achieving precise guidance. To address these issues, we propose HiFi-Inpaint, a novel high-fidelity reference-based inpainting framework tailored for generating human-product images. HiFi-Inpaint introduces Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision using high-frequency maps. Additionally, we construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering. Experimental results show that HiFi-Inpaint achieves state-of-the-art performance, delivering detail-preserving human-product images.

View arXiv page View PDF Project page GitHub 22 Add to collection

Community

donghao-zhou

Paper author Paper submitter 1 day ago

[🔥CVPR 2026] HiFi-Inpaint enables high-fidelity reference-based inpainting. HiFi-Inpaint can seamlessly integrate product reference images into masked human images, generating high-quality human-product images with high-fidelity detail preservation.