Abstract
The image fusion method is widely used in many different fields. The fusion processes both need models to extract semantic information and contain details. Traditional image processing techniques used for this issue have limited ability to extract semantic features from images, and advanced deep learning techniques often lose the details. In this work, we propose the Controlled Fusion Network (CFN) that adopts a multi-step progressive generation method and injects control elements at every step. We test the model in the emoji fusion task which accepts various emojis and combines them. We find that the generated emojis sufficiently retain and reasonably combine the semantic information of the input images, while the result images also conform to human intuitive perception.
Type
Publication
The Second Tiny Papers Track at ICLR 2024