2025年度 / Apr. 2025 〜 Mar. 2026 の研究に関わる業績(研究発表・論文・外部資金・受賞等)です。
The list shows published papers, grants, and awards in 2025-2026 season.

  • 修士論文 / Master’s Thesis
  • 卒業論文 / Graduate Thesis
  • 雑誌論文 (Journal Papers): 6 papers
      • Yikang Wang, Xingming Wang, Chee Siang Leow, Qishan Zhang, Ming Li, Hiromitsu Nishizaki, “Enhancing the Robustness of Speech Anti-spoofing Countermeasures through Joint Optimization and Transfer Learning”, Vol.E108-D,No.12,pp.-, Dec./2025. IF:0.8, DOI: 10.1587/transinf.2025EDP7044
        Abstract: Currently, research in deepfake speech detection focuses on the generalization of detection systems towards different spoofing methods, mainly for noise-free clean speech. However, the performance of speech anti-spoofing countermeasure (CM) systems often does not work well in more complicated scenarios, such as those involving noise and reverberation. To address the problem of enhancing the robustness of CM systems, we propose a transfer learning-based hybrid approach with Speech Enhancement front-end and Counter Measure back-end Joint optimization (SECM-Joint), investigating its effectiveness in improving robustness against noise and reverberation. Experimental results show that our SECM-Joint method reduces EER by 19.11% to 64.05% relatively in most noisy conditions and 23.23% to 30.67% relatively in reverberant environments compared to a Conformer-based CM baseline system without pre-training. Additionally, our dual-path U-Net (DUMENet) further enhances the robustness for real-world applications. These results demonstrate that the proposed method effectively enhances the robustness of CM systems in noisy and reverberant conditions. Codes and experimental data supporting this work are publicly available at: https://github.com/ikou-austin/SECM-Joint
      • Hwai Ing Soon, Azian Azamimi Abdullah, Hiromitsu Nishizaki, Latifah Munirah Kamarudin, “Optimizing mRNA Vaccine Degradation Prediction via Penalized Dropout Approaches,” IEEE Access, vol. 13, pp. 137561-137578, 2025, ISSN: 2169-3536, DOI: 10.1109/ACCESS.2025.3595155, IF: 3.6
        Abstract: Predicting mRNA vaccine degradation rates with precision is essential for ensuring stability, efficacy, and optimal deployment strategies, particularly given the unique challenges posed by their rapid degradation. This study introduces a comprehensive approach that integrates bioinformatic insights with advanced computational methodologies to address these challenges. A novel tetramer-label encoding approach (4-mer-lbA) was proposed, integrating biological relevance with data-driven analysis to enhance predictive accuracy. To further optimize model performance, two advanced hyperparameter optimization (HPO) techniques—Dropout-Enhanced Technique (DEet) and Hyperparameter Optimization Algorithm Penalizer (HOPeR)—are proposed to mitigate overfitting, address inefficiencies in conventional HPO algorithms (HPOAs), and accelerate model convergence. The methodologies were validated on mRNA degradation datasets using deep neural network (DNN) architectures, with particular attention to the comparative performance of sequential and wrapped architectural designs. Results demonstrate that sequential architectures outperform wrapped models by reducing overfitting and computational demands. The integration of DEet and HOPeR further optimized hyperparameter exploration, with DEet enhancing model robustness through dropout regularization and HOPeR introducing adaptive penalties to systematically eliminate suboptimal configurations. The experimental outcomes highlight significant advancements in convergence rates and error reduction, particularly in complex models like 3-layer-wrapped Bidirectional Long Short-Term Memory (3wBiLSTM). By the 100th epoch, training and validation losses reached 0.0023 and 0.0029, respectively, indicating a substantial improvement over baseline models. These methodologies extend beyond mRNA vaccine research, demonstrating versatility across diverse machine learning domains. By addressing critical challenges in HPO and predictive modeling, the study offers scalable and robust solutions for advancing biotechnology and interdisciplinary research.
      • Chee Siang Leow, Tomoki Kitagawa, Hideaki Yajima, Hiromitsu Nishizaki, “Handwritten Character Image Generation for Effective Data Augmentation,” IEICE Transactions on Information and Systems, Vol. E108-D, No. 8, pp.1-10, Aug. 2025, DOI: 10.1587/transinf.2024EDP7201, IF: 0.8
        Abstract: This study introduces data augmentation techniques to enhance training datasets for a Japanese handwritten character classification model, addressing the high cost of collecting extensive handwritten character data. A novel method is proposed to automatically generate a largescale dataset of handwritten characters from a smaller dataset, utilizing a style transformation approach, particularly Adaptive Instance Normalization (AdaIN). Additionally, the study presents an innovative technique to improve character structural information by integrating features from the Contrastive Language-Image Pre-training (CLIP) text encoder. This approach enables the creation of diverse handwritten character images, including Kanji, by merging content and style elements. The effectiveness of our approach is demonstrated by evaluating a handwritten character classification model using an expanded dataset, which includes Japanese hiragana, katakana, and Kanji from the ETL Character Database. The character classification model’s macro F1 score improved from 0.9733 with the original dataset to 0.9861 using the augmented dataset by the proposed approach. This result indicated that our proposed character generation model was able to generate new character images that were not included in the original dataset and that they effectively contributed to training the handwritten character classification model.
      • Ryosuke Shimazu, Chee Siang Leow, Prawit Buayai, Xiaoyang Mao, Wan-Young Chung, Hiromitsu Nishizaki, “Non-invasive estimation of Shine Muscat grape color and sensory evaluation from standard camera images,” The Visual Computer, pp.1-16, May 2025, DOI: 10.1007/s00371-025-03925-6, IF: 3.0, (international co-authored paper)
        Abstract: This study proposes a non-invasive method to estimate both color and sensory attributes of Shine Muscat grapes from standard camera images. First, we focus on color estimation by integrating a Vision Transformer (ViT) feature extractor with interquartile range (IQR)-based outlier removal. Experimental results show that our approach achieves 97.2% accuracy, significantly outperforming Convolutional Neural Network (CNN) models. This improvement underscores the importance of capturing global contextual information to differentiate subtle color variations in grape ripeness. Second, we address human sensory evaluation by collecting questionnaire responses on 13 attributes (e.g., “Sweetness,” “Overall taste rating”), each rated on a five-point scale. Because these ratings tend to cluster around midrange values (labels “2,” “3,” and “4”), we initially limit the dataset to the extreme labels “1” (“lowest grade”) and “5” (“highest grade”) for binary classification. Three attributes—“Overall color,” “Sweetness,” and “Overall taste rating”—exhibit relatively high classification accuracies of 79.9%, 75.1%, and 75.7%, respectively. By contrast, the other 10 attributes reach only 50%–66%, suggesting that subjective variations and limited visual cues pose significant challenges. Overall, the proposed approach demonstrates the feasibility of an image-based system that integrates color estimation and sensory evaluation to support more objective, data-driven harvest timing decisions for Shine Muscat grapes.
      • Taoqi Bao, Jiangnan Ye, Zhankong Bao, Chee Siang Leow, Haoji Hu, Jianfeng Lu, Issei Fujishiro, Jiayi Xu, “L2H-NeRF: low- to high-frequency-guided NeRF for 3D reconstruction with a few input scenes,” The Visual Computer, pp.1-12, May 2025, DOI: 10.1007/s00371-025-03974-x, IF: 3.0, (international co-authored paper)
        Abstract: Nowadays, three-dimensional (3D) reconstruction techniques are becoming increasingly important in the fields of architecture, game development, movie production, and more. Due to common issues in the reconstruction process, such as perspective distortion and occlusion, traditional 3D reconstruction methods face significant challenges in achieving high-precision results, even when dense data are used as inputs. With the advent of neural radiance field (NeRF) technology, high-fidelity 3D reconstruction results are now possible. However, high computational resources are usually required for NeRF computations. Recently, a few data inputs are used to ensure the highest quality. In this paper, we propose an innovative low- to high-frequency-guided NeRF (L2H-NeRF) framework that decomposes scene reconstruction into coarse and fine stages. For the first stage, a low-frequency enhancement network based on a vision transformer is proposed, where the low-frequency-based globally coherent geometric structure is recovered, with the dense depth restored in a depth completion way. In the second stage, a high-frequency enhancement network is incorporated, where the high-frequency-related detail is compensated by robust feature alignment across adjacent views using a plug-and-play feature extraction and matching module. Experiments demonstrate that both the accuracy of the geometric structure and the feature detail of the proposed L2H-NeRF outperforms state-of-the-art methods.
      • Ziying Li, Haifeng Zhao, Hiromitsu Nishizaki, Chee Siang Leow, Xingfa Shen, “Chinese Character Recognition based on Swin Transformer-Encoder” Digital Signal Processing, Vol. 161, No. C, 105080, pp.1-10, May 2025, DOI:https://doi.org/10.1016/j.dsp.2025.105080, IF: 2.9, ※international co-authored paper
        Abstract: Optical Character Recognition (OCR) technology, which converts printed or handwritten text into machine-readable text, holds significant application and research value in document digitization, information automation, and multilingual support. However, existing methods predominantly focus on English text recognition and often struggle with addressing the complexities of Chinese characters. This study proposes a Chinese text recognition model based on the Swin Transformer encoder, demonstrating its remarkable adaptability to Chinese character recognition. In the image preprocessing stage, we introduced an overlapping segmentation technique that enables the encoder to effectively capture the complex structural relationships between individual strokes in lengthy Chinese texts. Additionally, by incorporating a mapping layer between the encoder and decoder, we enhanced the Swin Transformer’s adaptability to small image scenarios, thereby improving its feasibility for Chinese text recognition tasks. Experimental results indicate that this model outperforms classical models such as CRNN and ASTER on handwritten and web-based datasets, validating its robustness and reliability.
    • 国際会議論文 (Reviewed conference papers): 4 papers
      • One paper has been accepted at IECON 2025.
      • Two papers have been accepted at GCCE 2025.
      • Yuxi Wang, Yikang Wang, Qishan Zhang, Hiromitsu Nishizaki, Ming Li, “VCapAV: A Video-Caption Based Audio-Visual Deepfake Detection Dataset”, Proceedings of INTERSPEECH 2025, pp. 3908-3912, Aug/2025, Presented in Rotterdam, The Netherlands, DOI:10.21437/Interspeech.2025-1713
    • 口頭発表 (Domestic conference, not reviewed) 6 papers
    • 外部資金新規採択分 (Grant, only new acceptance)
      • 「人工知能技術を活用した言語バリアフリー授業の実現」,科研費・基盤研究(A),課題番号:25H00566,令和7年4月〜令和11年3月,西崎博光(代表)
    • 書籍・雑誌記事 (Books, Magazines)
    • 学外授業・セミナー
    • 表彰・報道等