Continual Few-shot Patch-based Learning for Anime-style Colorization

Akinobu Maejima, Seitaro Shinagawa, Hiroyuki Kubo, Takuya Funatomi,
Tatsuo Yotsukura, Satoshi Nakamura, Yasuhiro Mukaigawa

OLM Digital, Inc., IMAGICA GROUP Inc.
Nara Institute of Science and Technology (NAIST)
Chiba University
Computational Visual Media 2024

Paper Supplementary Video

Our method colorizes a line-drawing image (first and thrid row) using only a few (typically 1 to 5 out of 10 to 20) line-drawings which are manually colorized by an artist as training data (surrounded by blue dashed lines). Our continual patch-based learning enables us to train a model tailored for colorization of the target sequence within 90s.

Abstract

The automatic colorization of anime line drawings is a challenging problem in production pipelines. Recent advances in deep neural networks have addressed this problem; however, collecting many images of colorization targets in novel anime work before the colorization process starts leads to chicken-and-egg problems and has become an obstacle to using them in production pipelines. To overcome this obstacle, we propose a new patch-based learning method for few-shot anime-style colorization. The learning method adopts an efficient patch sampling technique with position embedding according to the characteristics of anime line drawings. We also present a continual learning strategy that continuously updates our colorization model using new samples colorized by human artists. The advantage of our method is that it can learn our colorization model from scratch or pre-trained weights only using a few pre- and post-colorized line drawings that are created by artists in their usual colorization work. Therefore, our method can be easily implemented into existing production pipelines. We demonstrated that our colorization method outperformed state-of-the-art methods using a quantitative evaluation.

Method

Colorization procedures. Given few pre- and post-colorized line drawings from reference frames in the target sequence, the colorization model is trained using the proposed anime-specific patch-based learning. The line drawings of the remaining frames are then colorized frame-by-frame using the colorization model. Note that all processes run on the sequence to be colorized. Images originate from Deadline (c) OLM Asia SDN BHD.

Colorization Results

To demonstrate the effectiveness of our method, we colorized the line drawings from all shots from our hand-drawn dataset using following methods, and then compared colorization accuracy for all methods. We used mean Intersection-over-Union (mIoU) and region-wise accuracy (Region-wise Acc.) as an accuracy criterion. Our method is simple to implement however achieves state-of-the-art colorization accuracy.

SGA /w f. t. [Li+ 2022]: Li, Z.; Geng, Z.; Kang, Z.; Chen, W.; Yang, Y. Eliminating gradient conflict in reference-based line-art colorization. In: European Conference on Computer Vision, 579–596, 2022. (We fine-tuned their pre-trained model using 1 to 5 reference images for each sequence with 400 epochs.)
LAVC /w f. t.[Shi+ 2023]: Shi, M.; Zhang, J. -Q.; Chen, S. -Y.; Gao, L.; Lai, Y. -K.; Zhang, F. -L. Reference-based deep line art video colorization. In: IEEE Transactions on Visualization and Computer Graphics Vol. 29, No. 6, 2965–2979, 2023. (We fine-tuned their pre-trained model using sequential triplets extracted from reference images with 250 iterations.)
Cadmium App [Casey+ 2021]: Casey, E.; P´erez, V.; Li, Z. The Animation Transformer: Visual correspondence via segment matching. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 11323–11332, 2021.
Inclusion Matching [Dai+ 2024]: Dai, Y.; Zhou, S.; Li, Q.; Li, C.; Loy, C. -C. Learning Inclusion Matching for Animation Paint Bucket Colorization. CVPR2024, 2021. (We used their pretrained model (BasicPBC) and codes with nearest mode released in 25.04.2024.)
Ours from scratch: used our colorization model trained using 1 to 5 reference images for each sequence from scratch.
Ours: used our colorization model trained using 1 to 5 reference images for each sequence with continual learning strategy.

In each video below, images surrounded by dashed blue boxes represent reference frames used and were not counted in the evaluation. Also, the magenta pixels indicate the incorrect prediction compared with manual work.

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	24.66	8.73	46.44	61.03	53.08	71.59
Region-wise Acc. [%]	29.14	9.75	48.65	69.37	62.01	69.64

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	38.28	12.61	57.76	64.91	59.37	67.10
Region-wise Acc. [%]	33.41	7.36	44.79	59.17	58.76	60.34

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	22.61	16.85	57.73	64.64	65.99	77.20
Region-wise Acc. [%]	24.13	10.94	60.82	77.15	68.31	69.46

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	29.63	23.51	68.23	72.61	61.21	83.14
Region-wise Acc. [%]	30.03	27.13	58.15	69.64	73.30	77.80

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	41.39	25.07	78.52	72.05	84.12	80.84
Region-wise Acc. [%]	39.66	34.51	69.47	80.91	76.07	75.50

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	18.48	10.38	63.67	70.45	71.54	74.72
Region-wise Acc. [%]	24.32	16.01	54.35	66.60	68.34	72.31

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	33.38	22.01	73.70	79.40	83.62	87.74
Region-wise Acc. [%]	36.22	32.84	70.71	66.98	73.82	77.80

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	40.46	19.02	71.99	76.91	76.82	78.30
Region-wise Acc. [%]	37.03	19.37	67.41	73.58	70.44	72.27

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	31.87	26.93	61.86	63.98	56.99	68.39
Region-wise Acc. [%]	35.26	27.97	51.90	55.01	47.32	55.48

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	21.65	17.91	78.87	72.86	81.37	80.52
Region-wise Acc. [%]	10.64	8.86	46.45	64.31	46.83	46.37

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	55.63	30.38	87.41	86.58	88.37	92.13
Region-wise Acc. [%]	43.24	30.71	78.67	76.43	78.24	82.26

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	62.86	46.98	94.24	91.57	97.74	98.11
Region-wise Acc. [%]	41.34	33.06	71.76	88.71	86.99	86.06

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	42.64	33.90	88.92	86.25	74.11	75.38
Region-wise Acc. [%]	34.54	21.66	74.76	70.97	74.32	71.97

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	33.06	26.91	71.02	73.04	74.55	83.16
Region-wise Acc. [%]	16.72	13.80	60.40	73.07	65.12	73.49

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	29.65	36.39	83.11	89.94	84.22	89.98
Region-wise Acc. [%]	25.97	17.89	59.91	78.84	69.65	73.25

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	62.44	63.92	92.46	92.46	92.11	92.01
Region-wise Acc. [%]	70.09	66.89	91.39	85.07	88.82	88.87

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	65.22	53.39	87.38	92.51	93.27	94.18
Region-wise Acc. [%]	54.25	44.11	78.29	85.83	85.61	85.21

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	51.26	46.42	89.63	94.25	93.23	93.60
Region-wise Acc. [%]	31.31	23.93	73.74	82.53	74.82	76.25

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	20.21	18.05	52.85	47.17	79.15	80.66
Region-wise Acc. [%]	17.68	14.72	42.39	42.88	64.21	65.41

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	21.76	8.29	69.01	73.58	45.24	47.56
Region-wise Acc. [%]	18.90	10.54	61.13	69.38	46.02	45.54

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	22.99	7.43	49.34	53.73	57.74	65.01
Region-wise Acc. [%]	24.45	10.03	48.49	55.47	60.17	66.79

Metric	SGA /w f. t. [Li+ 2022]	LAVC /w f. t. [Shi+ 2023]	Cadmium App [Casey+ 2021]	Inclusion Matching [Li+ 2024]	Ours from scratch	Ours
mIoU [%]	49.94	40.51	58.90	54.93	78.75	82.55
Region-wise Acc. [%]	37.80	34.81	56.99	64.33	62.03	68.44

Citation


@article{cfplac_2024, 
title={Continual few-shot patch-based learning for anime-style colorization}, 
volume={}, 
ISSN={2096-0662}, 
url={https://doi.org/10.1007/s41095-024-0414-4}, 
DOI={10.1007/s41095-024-0414-4}, 
number={}, 
journal={Comp. Visual Media (2024)}, 
publisher={Springer Science and Business Media LLC}, 
author={Maejima, Akinobu and Shinagawa, Seitaro and Kubo, Hiroyuki and Funatomi, Takuya and Yotsukura, Tatsuo and Nakamura, Satoshi, and Mukaigawa, Yasuhiro}, 
year={2024}, 
month=july, 
pages={} }

Acknowledgements

We would like to thank the anonymous reviewers for their constructive comments. We are grateful to Zekun Li and Prof. Fang Lue Zhang for providing their code and data for comparison. We give thanks to Mohammad Shafiq Bin Md Shawal, Muhammad Mohamad Din Yati, Yap Fei, Raihanah Ayuna Faiz, Kiyoumi Agemura, and Shogo Sakurazawa for testing our colorization system and for providing valuable feedback from production side. Thanks to Ken Anjyo, Marc Salvati, and Alexandre Derouet-Jourdan for reviewing the paper and providing feedback. Finally, we would like to give our thanks and appreciation to Junpei Inuzuka and IMAGICA INFOS for their permission for us to use images from Restaurant to Another World 2 for research purpose.

Contact

If you have any question or request, please feel free to reach me out akinobu.maejima (at) olm.co.jp.