2023年度 / Achievements in 2023-2024

2023年度の研究に関わる業績（研究発表・論文・外部資金・受賞等）です。
The list shows published papers, grants, and awards in 2023-2024 season.

2023年度 / Apr. 2023 〜 Mar. 2024

博士論文 / Doctral Thesis
- Tasneem Binti Sofri： Intelligent Hybrid Multi-Stage Feature Selection and Assessment for 5G Base Station Antenna Health Effect Detection ※Dual Degree Student with University Perlis Malaysia
- Leow Chee Siang：Studies on Text Detection and Character Image Generation for Advanced Text Recognition
  Abstract: This doctoral thesis focuses on improving the accuracy of Optical Character Recognition (OCR) using deep learning techniques. The research is divided into three main topics: 1) enhancing OCR accuracy through data augmentation using deep learning-based generative models, 2) improving the recognition of narrow multi-line text, and 3) developing methods for multi-line text recognition. The thesis proposes a novel Y-Autoencoder (Y-AE) model for generating diverse character images and introduces a post-processing method to improve character recognition rates in existing deep learning models, particularly for characters with narrow line spacing. Additionally, the research addresses the limitations of conventional TrOCR systems by proposing a pre-processing technique for multi-line character recognition within TrOCR’s fixed-size input constraints. The thesis contributes to advancing text recognition, text detection, and multiple-lines text recognition using deep learning techniques.
- 王宇：A Study on Real-Time Automatic Speech Recognition System on Edge Devices
  Abstract: This doctoral thesis aims to develop speech recognition systems for deployment on edge devices and cloud platforms. For cloud-based systems, a toolkit named ExKaldi-RT was created to facilitate the integration of deep learning models with the Kaldi decoder, enabling high-accuracy real-time speech recognition. For edge devices, a lightweight end-to-end speech recognition model based on convolutional neural networks and an optimized beam search decoder were proposed, significantly reducing memory usage while maintaining high recognition accuracy. Additionally, a noise-robust voice activity detection (VAD) model was developed for edge devices, enhancing the overall performance of the speech recognition system in noisy environments. The research contributes to the advancement of speech recognition technology by providing tailored solutions for both cloud and edge devices, enabling efficient deployment and high-quality performance.

修士論文 / Master Thesis
- 何英浩：魚眼顔認識の向上ための顔画像データ拡張方法
  本研究では、魚眼カメラで撮影された顔画像の認識精度向上を目的とし、CNNベースの手法を提案した。まず、データ拡張手法を検討し、歪みを含む画像を生成した。次に、STNを用いて画像の歪み補正を行い、HRNetV2による特徴抽出とArcFaceによる識別を行った。提案手法により、F1スコアが0.669まで向上し、魚眼画像の顔認識に有効であることを示した。
- 北川智樹：文字認識器訓練のための生成モデルを用いた手書き文字生成　　※優秀発表賞受賞
  本研究では、ディープラーニングを用いた日本語手書き文字画像の生成手法を提案した。AdaINとCLIPを組み合わせたモデルにより、特定のフォントスタイルを反映した文字画像を生成できた。生成画像を用いてOCR用の文字認識モデルを学習したところ、認識精度が向上することを確認した。提案手法は、OCRの学習データ不足を解消する有望なアプローチであることが示された。
- 土橋晃弘：対照学習を用いたEnd-to-End複数言語音声認識モデル
  本研究では、多言語End-to-End音声認識の性能向上を目的とし、対照学習を用いたwav2vec2.0ベースの手法を提案した。複数言語の音声データを用いて対照学習による事前学習を行うことで、言語に依存しない汎用的な特徴量を獲得した。提案手法は未学習言語に対しても高い汎化性能を示しており、多言語音声認識の実用化に向けた有望なアプローチといえる。
- Zhao Haifeng：Chinese Character Recognition Based on Swin Transformer-Encoder ※ Dual Degree Student with HDU
卒業論文 / Graduate Thesis
- 白鳥雅也：講演音声の自動評価に向けた印象データ収集と解析
  概要：本研究は、日本語話し言葉コーパス(CSJ)の講演音声を用いて、音響特徴量と聴覚印象評価の関係性を調べたものである。因子分析の結果、「発話の聞きやすさ・受容性」「話し手のネガティブな印象」「表現のニュアンスと繊細さ」の3つの因子が抽出され、各因子と音響特徴量との相関関係が明らかとなった。
- 市川琢也：人間ロボット連携のためのブドウ栽培管理アプリケーション
  概要：本研究では、ブドウ栽培管理システムを開発した。QRコードを用いて房ごとの情報を管理し、圃場の位置情報や作業履歴の確認などの機能を実装した。Djangoを用いたバックエンド、ReactでフロントエンドのWebアプリケーションを構築し、農家へのユーザーテストの結果、使いやすさが確認された。
- 島津亮輔：自己注意機構を備えた深層学習モデルを用いたシャインマスカットの色推定　※優秀発表賞受賞
  概要：本研究では、色に基づくブドウの熟度推定にVision Transformer (ViT)を用いる方法を提案した。ViTの有効性を検証するため、CNNと比較実験を行ったところ、ViTの方が高い精度を示した。また、色空間の変換による精度への影響も検証した。ViTはブドウの熟度推定に有望であることが示唆された。
- 藤本蓮：ブドウ栽培支援ロボットのための自律移動システムの開発　※優秀発表賞受賞
  概要：本研究では、ブドウ栽培支援ロボットにおける自律走行システムの開発を行った。QRコードと高精度GNSSを用いて、ロボットの自己位置推定と経路追従を行った。QRコード検出にはYOLOを使用し、ROS2とNav2により自律走行を実現した。実験の結果、目的地到達成功率は83.3%、平均誤差は0.5mであった。
- 渡辺蒼：接客訓練のための複数の大規模言語モデルを用いた接客応答生成　※優秀発表賞受賞
  概要：本研究では、大規模言語モデル（LLM）を用いた推論システムReConcileを提案した。ReConcileは複数のLLMによる議論と合意形成によって推論を行う。GPT-4, Claude2, BardのLLMとReConcileの性能を比較したところ、ReConcileが単体のLLMを上回る精度を示した。ReConcileは複数のLLMを効果的に活用できる有望な手法であることが示された。
雑誌論文 (Journal papers)
- 西崎博光，”生成AIのこれまでの変遷と展望”，電子情報通信学会通信ソサイエティマガジン，No.68，2024春号，pp.281-284，2024年3月1日 DOI:10.1587/bplus.17.281
- Hui Fern Soon, Amiza Amir, Hiromitsu Nishizaki, Nik Adilah Hanin Zahri, Latifah Munirah Kamarudin and Saidatul Norlyana Azemi, “Evaluating Tree-based Ensemble Strategies for Imbalanced Network Attack Classification” International Journal of Advanced Computer Science and Applications (IJACSA), Vol.15, No.1, Jan/2024. DOI:10.14569/IJACSA.2024.01501111 IF:1.162
  Abstract: With the continual evolution of cybersecurity threats, the development of effective intrusion detection systems is increasingly crucial and challenging. This study tackles these challenges by exploring imbalanced multiclass classification, a common situation in network intrusion datasets mirroring real-world scenarios. The paper aims to empirically assess the performance of diverse classification algorithms in managing imbalanced class distributions. Experiments were conducted using the UNSW-NB15 network intrusion detection benchmark dataset, comprising ten highly imbalanced classes. The evaluation includes basic, traditional algorithms like the Decision Tree, K-Nearest Neighbor, and Gaussian Naive Bayes, as well as advanced ensemble methods such as Gradient Boosted Decision Trees (GraBoost) and AdaBoost. Our findings reveal that the Decision Tree surpassed the Multi-Layer Perceptron, K-Nearest Neighbor, and Naive Bayes in terms of overall F1-score. Furthermore, thorough evaluations of nine tree-based ensemble algorithms were performed, showcasing their varying efficacy. Bagging, Random Forest, ExtraTrees, and XGBoost achieved the highest F1-scores. However, in individual class analysis, XGBoost demonstrated exceptional performance relative to the other algorithms. This is confirmed by achieving the highest F1-scores in eight out of the ten classes within the dataset. These results establish XGBoost as a predominant method for handling multiclass imbalance classification with Bagging being the closest feasible alternative, as Bagging gains an almost similar accuracy and F1-score as XGBoost.
- Chee Siang Leow, Hideaki Yajima, Tomoki Kitagawa and Hiromitsu Nishizaki, “Single-line Text Detection in Multi-line Text with Narrow Spacing for Line-based Character Recognition,” IEICE Transaction on Information & Systems, Vol.E106-D, No.12, pp.2097-2106, 2023. DOI:10.1587/transinf.2023EDP7070, IF:0.834
  Abstract: Text detection is a crucial pre-processing step in optical character recognition (OCR) for the accurate recognition of text, including both fonts and handwritten characters, in documents. While current deep learning-based text detection tools can detect text regions with high accuracy, they often treat multiple lines of text as a single region. To perform line-based character recognition, it is necessary to divide the text into individual lines, which requires a line detection technique. This paper focuses on the development of a new approach to single-line detection in OCR that is based on the existing Character Region Awareness For Text detection (CRAFT) model and incorporates a deep neural network specialized in line segmentation. However, this new method may still detect multiple lines as a single text region when multi-line text with narrow spacing is present. To address this, we also introduce a post-processing algorithm to detect single text regions using the output of the single-line segmentation. Our proposed method successfully detects single lines, even in multi-line text with narrow line spacing, and hence improves the accuracy of OCR.
- Yan San Woo, Prawit Buayai, Hiromitsu Nishizaki, Koji Makino, Latifah Munirah Kamarudin, Xiaoyang Mao, “End-to-end lightweight berry number prediction for supporting table grape cultivation”, Computers and Electronics in Agriculture, Volume 213, 108203, pp.1-15, 2023. DOI:10.1016/j.compag.2023.108203, IF:8.045
- Azuddin, K. A. and Junoh, A. K. and Zakaria, A. and Rahman, M. T. A. and Nor, N. M. I. M. and Nishizaki, H. and Latiffah, Z. and Azuddin, N. F. and Abdullah, M. Z. and Terna, T. P., “Supervised segmentation on fusarium macroconidia spore in microscopic images via analytical approaches,” Multimedia Tools and Applications, pp.1–16, Oct./2023, Springer, ISBN:1573-7721, DOI:10.1007/s11042-023-17008-y, IF:3.6
- 西崎香苗, 田中博之, 深澤貴裕, 西崎博光, 池上仁志, 出江紳一, “姿勢推定技術を活用した非接触型動作評価ツールの開発”，整形・災害外科，66巻，10号，pp.1219-1226 （2023年9月発行） DOI:10.18888/se.0000002708
- Prawit Buayai, Kabin Yok-In, Daisuke Inoue, Hiromitsu Nishizaki, Koji Makino, Xiaoyang Mao, “Supporting table grape berry thinning with deep neural network and augmented reality technologies”, Computers and Electronics in Agriculture, Volume 213, 2023. DOI:10.1016/j.compag.2023.108194, IF:8.045
- Takashi Minato, Ryuichiro Higashinaka, Kurima Sakai, Tomo Funayama, Hiromitsu Nishizaki, and Takayuki Nagai, “Design of a competition specifically for spoken dialogue with a humanoid robot”, Advanced Robotics, vol.37, no.21, pp.1349-1363, 2023. Taylor & Francis, DOI:10.1080/01691864.2023.2249530, IF:2.202
- Yu Wang, Hiromitsu Nishizaki, “A Lightweight End-to-End Speech Recognition System on Embedded Devices,” IEICE Transaction on Information & Systems, Vol.E106-D, No.7, pp.1230-1239, 1st/July/2023. DOI:10.1587/transinf.2022EDP7221, IF:0.834
- Tasneem Sofri, Hasliza A Rahim, Allan Melvin Andrew, Ping Jack Soh, Latifah Munirah Kamarudin, Hiromitsu Nishizaki, “Data Normalization Methods of Hybridized Multi-Stage Feature Selection Classification for 5G Base Station Antenna Health Effect Detection,” Journal of Advanced Research in Applied Sciences and Engineering Technology, vol.30, no.2, pp.133-140, 19/Apr/2023. DOI:10.37934/araset.30.2.133140
国際会議論文 (Reviewed conference papers)
- Chee Siang Leow, Ryosuke Shimazu, Tomoki Kitagawa, Hideki Yajima, Prawit Buayai, Koji Makino, Xiaoyang Mao, Hiromitsu Nishizaki “Estimation of Non-Invasive Grape Ripeness and Sweetness From Images Captured by a General-Purpose Camera,” Proceedings of the 2023 IEEE International Workshop on Metrology for Agriculture and Forestry, pp.295-300, 2023, 7th/Nov/2023, DOI:10.1109/MetroAgriFor58484.2023.10424087, Presented in Pisa, Italy
- Shunsuke Fujisawa, Muhammad Faris Kamarudzaman, Prawit Buayai, Koji Makino, Hiromitsu Nishizaki, Xiaoyang Mao “Image-Based Measurement of Grape Inflorescence Length for Automatic Inflorescence Trimming,” Proceedings of the 2023 IEEE International Workshop on Metrology for Agriculture and Forestry, pp.289-294, 2023, 7th/Nov/2023, DOI:10.1109/MetroAgriFor58484.2023.10424126, Presented in Pisa, Italy
- Prawit Buayai, Yin Suan Tan, Muhammad Faris Bin Kamarudzaman, Koji Makino, Hiromitsu Nishizaki, Xiaoyang Mao “Automating Grape Thinning: Predicting Robotic Arm End-Effector Positions Using Depth Sensing Technology and Neural Networks,” Proceedings of the 2023 IEEE International Workshop on Metrology for Agriculture and Forestry, pp.76-80, 2023, 6th/Nov/2023, DOI:10.1109/MetroAgriFor58484.2023.10424399, Presented in Pisa, Italy
- Shuto Nakagomi, Yutaka Suzuki, Masayuki Morisawa, Hiromitsu Nishizaki, Takao Kubo, “Proposal of a Method for Evaluating Biological Responses During Swallowing Using the LF/HF Change Rate”, Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023), pp. 341-342, DOI:10.1109/GCCE59613.2023.10315533, Presented on 11/Oct/2023, Nara, Japan.
- Akihiro Dobashi, Chee Siang Leow, Hiromitsu Nishizaki, “Metric Learning Approach for End-To-End Multilingual Automatic Speech Recognition Model”, Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023), pp. 446-450, DOI:10.1109/GCCE59613.2023.10315608, Presented on 11/Oct/2023, Nara, Japan.
- Yinghao He, Chee Siang Leow, Hiromitsu Nishizaki, “Image Remapping Data Augmentation Approach for Improving Fisheye Face Recognition”, Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023), pp. 742-746, DOI:10.1109/GCCE59613.2023.10315437, Presented on 11/Oct/2023, Nara, Japan.
- Chee Siang Leow, Tomoki Kitagawa, Hideaki Yajima, Hiromitsu Nishizaki, “Data Augmentation With Automatically Generated Images for Character Classifier Model Training”, Proceedings of the 2023 IEEE 12th Global Conference on Consumer Electronics (GCCE 2023), pp. 845-849, DOI:10.1109/GCCE59613.2023.10315447, Presented on 12/Oct/2023, Nara, Japan.
- Toki Sugiura and Hiromitsu Nishizaki, “Automatic Exploration of Optimal Data Processing Operations for Sound Data Augmentation Using Improved Differentiable Automatic Data Augmentation”, Proceedings of the 24th INTERSPEECH 2023, pp.5411-5415, 2023, 10.21437/Interspeech.2023-202, Dublin, Ireland, Presented on 24/Aug/2023
- Soya Tsushima, Soichiro Iida, Hiromitsu Nishizaki, Takehito Utsuro and Junichi Hoshino, “Scenario-based Customer Service Training System with Honorific Exercise”, Proceedings of NICOGRAPH International 2023, pp.82-82, 2023, DOI:10.1109/NICOINT59725.2023.00022, Presented on 10th/June/2023, Hokkaido
- Muhammad Husaini, Latifah Munirah Kamarudin, Hiromitsu Nishizaki, Intan Kartika Kamarudin, Muhammad Amin Ibrahim, Ammar Zakaria, Masahiro Toyoura, Xiaoyang Mao, “Non-contact breathing signal classification using Attention based CNN and XGBoost hybrid model”, Proceedings of the ERS/ESRS International Sleep and Breathing Conference 2023, DOI:10.1183/23120541.sleepandbreathing-2023.56 Presented on 21/Apr/2023 (Prague, Czech)
口頭発表 (Domestic conference, not reviewed)
- レオチーシャン，北川智樹，矢島英明，西崎博光，”生成文字画像を用いた単・複数行テキストに対する文字認識精度向上の検討”，情報処理学会第86回全国大会講演論文集，no.2，4U-07，pp.703-704，2024.03.16発表（神奈川大学，横浜市）※学生奨励賞受賞
- 矢島英明，レオチーシャン，北川智樹，西崎博光，”Transformerデコーダを用いた画像内のテキスト領域検出の検討”，情報処理学会第86回全国大会講演論文集，no.2，4U-08，pp.705-706，2024.03.16発表（神奈川大学，横浜市）
- 北川智樹，レオチーシャン，矢島英明，西崎博光，”文字認識モデル訓練のためのスタイル変換を用いた手書き文字生成”，情報処理学会第86回全国大会講演論文集，no.2，5T-09，pp.623-624，2024.03.16発表（神奈川大学，横浜市）※学生奨励賞受賞
- Bong Tze Yaw，Leow Chee Siang，丹沢勉，牧野浩二，西崎博光，”果実盗難通報装置のための小型マイコンで動作する不審音検出システム”，第24回計測自動制御学会システムインテグレーション部門講演会（SI2023）講演論文集，1C4-09，pp.，2023.12.14発表（朱鷺メッセ，新潟市）※優秀講演表彰
- 牧野浩二，西崎博光，茅暁陽，”全方位カメラとAIを用いた共同作業領域を有するシャインマスカット栽培支援用移動ロボットの開発”，第24回計測自動制御学会システムインテグレーション部門講演会（SI2023）講演論文集，3C2-05，pp.，2023.12.16発表（朱鷺メッセ，新潟市）※優秀講演表彰
- 牧野浩二，高橋洋翔，Song Ziwei，Prawit Buayai，西崎博光，石田和義，茅暁陽，”実環境への適用を考慮したAIを用いたタマネギ選別機の開発” 2024年電子情報通信学会総合大会講演予稿集，D-22-04，2024.3.8発表（広島大学、東広島市）
- 土橋晃弘，レオ　チーシャン，西崎博光，”End-to-End 複数言語音声認識モデル訓練における距離学習の効果”，日本音響学会2023年秋季研究発表会講演論文集，3-Q-3，pp.xx-xx，2023.9.29発表（名古屋工業大学）
- 西崎博光，雨宮達佳，レオ　チーシャン，ブアヤイプラウィット，牧野浩二，茅　暁陽，”シャインマスカット栽培支援ロボットのための色推定モデルを用いた収穫適期判定システム”, ロボティクス・メカトロニクス講演会講演論文集, 講演番号 2A1-B03，pp. 2A1-B03(1)-(4)，2023.6.30，（名古屋国際会議場）
外部資金 (Grant)
書籍・雑誌記事 (Books, Magazines)
- 牧野浩二，西崎博光，Leow Chee Siang，Prawit Buayai，茅暁陽：「シャインマスカットの収穫時期をAIで判断」，インターフェース（Interface），pp. 218-219，2024年1月号（2023年11月25日発売），CQ出版社
- 西崎博光：「これからの自然言語処理」，これからのコンピュータ技術555（第1部第2章人工知能），インターフェース（Interface），pp.52-56，2023年9月号，CQ出版社
学外授業・セミナー
- 【セミナー講師】西崎博光，牧野浩二：「AI・データ活用スペシャリスト育成講座：Pythonプログラミング応用」，山梨県情報通信業協会，2023年11月24日 18:30〜20:30，山梨大学実施
- 【セミナー講師】西崎博光，牧野浩二：「AI・データ活用スペシャリスト育成講座：Pythonプログラミング基礎」，山梨県情報通信業協会，2023年11月16日 18:30〜20:30，山梨大学実施
- 【講演】西崎博光：チュートリアル講演「深層学習チュートリアル～基礎から応用まで～」，電子情報通信学会超知性ネットワーキングに関する分野横断型研究会（RISING研究会）　RISING2023，2023年10月30日
- 【セミナー講師】西崎博光：「AI実践講座: Pythonで学ぶはじめてのディープラーニング」，2023年7月20・21日 10:30〜16:30（2日間開催），Robot Innovation Week 2023，名古屋国際会議場
- 【セミナー講師】西崎博光：電子情報通信学会ネットワークシステム研究会シュミレーションスクール「深層学習ハンズオン」，2023年5月13日（土）9:00〜16:00，オンライン実施
表彰・報道等
- 西崎博光，令和4年度山梨大学優秀教員表彰，2023年9月26日
- 「秋田県タマネギ産地形成コンソーシアム」事業の紹介（2023年4月21日），テレビ秋田，NHK，日本経済新聞，日本農業新聞等。