碩論的參考文獻,因為上傳到圖書館要從 latex 和 pdf 上整理一下格式:
[1] Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, and Shinji Watanabe, “A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 47–54.
[2] Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, “End-to-end continuous speech recognition using attention-based recurrent nn: First results,”.
[3] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin, “Attention is All you Need,” in Advances in Neural Information Processing Systems. vol. 30, Curran Associates, Inc.
[4] Anmol Gulati, James Qin, Chung-Cheng Chiu, Niki Parmar, Yu Zhang, Jiahui Yu, Wei Han, Shibo Wang, Zhengdong Zhang, Yonghui Wu, and Ruoming Pang, “Conformer: Convolution-augmented Transformer for Speech Recognition,” in Interspeech 2020. pp. 5036–5040, ISCA.
[5] Alex Graves, Santiago Fernandez, Faustino Gomez, and Jurgen Schmidhuber, “Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks,” p. 8.
[6] Jaesong Lee and Shinji Watanabe, “Intermediate Loss Regularization for CTC-Based Speech Recognition,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6224–6228.
[7] Jumon Nozaki and Tatsuya Komatsu, “Relaxing the Conditional Independence Assumption of CTC-Based ASR by Conditioning on Intermediate Predictions,” in Interspeech 2021. pp. 3735–3739, ISCA.
[8] Yosuke Higuchi, Keita Karube, Tetsuji Ogawa, and Tetsunori Kobayashi, “Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7797–7801.
[9] Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, and Tetsunori Kobayashi, “Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict,” in Interspeech 2020. pp. 3655–3659, ISCA.
[10] Yosuke Higuchi, Hirofumi Inaguma, Shinji Watanabe, Tetsuji Ogawa, and Tetsunori Kobayashi, “Improved Mask-CTC for Non-Autoregressive End-to-End ASR,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8363–8367.
[11] Xingchen Song, Zhiyong Wu, Yiheng Huang, Chao Weng, Dan Su, and Helen Meng, “Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5894–5898.
[12] Yosuke Higuchi, Brian Yan, Siddhant Arora, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe, “BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model,” .
[13] Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, and Pengyuan Zhang, “Improving CTC-Based Speech Recognition Via Knowledge Transferring from Pre-Trained Language Models,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8517–8521.
[14] Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, and Tatsuya Kawahara, “Distilling the Knowledge of BERT for CTC-based ASR,” .
[15] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4171–4186, Association for Computational Linguistics.
[16] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever, “Language Models are Unsupervised Multitask Learners,” p. 24.
[17] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le, “XLNet: Generalized Autoregressive Pretraining for Language Understanding,” in Advances in Neural Information Processing Systems. vol. 32, Curran Associates, Inc.
[18] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton, “A Simple Framework for Contrastive Learning of Visual Representations,” in Proceedings of the 37th International Conference on Machine Learning. pp. 1597–1607, PMLR.
[19] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi- aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” .
[20] Steffen Schneider, Alexei Baevski, Ronan Collobert, and Michael Auli, “Wav2vec: Unsupervised Pre-Training for Speech Recognition,” in Interspeech 2019. pp. 3465–3469, ISCA.
[21] Alexei Baevski, Yuhao Zhou, Abdelrahman Mohamed, and Michael Auli, “Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations,” in Advances in Neural Information Processing Systems. vol. 33, pp. 12449–12460, Curran Associates, Inc.
[22] Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, and Abdelrahman Mohamed, “HuBERT: Self-Supervised Speech
Representation Learning by Masked Prediction of Hidden Units,” vol. 29, pp. 3451–3460.
[23] Shigeki Karita, Nelson Enrique Yalta Soplin, Shinji Watanabe, Marc Delcroix, Atsunori Ogawa, and Tomohiro Nakatani, “Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration,” in Interspeech 2019. pp. 1408–1412, ISCA.
[24] Keqi Deng, Songjun Cao, Yike Zhang, and Long Ma, “Improving Hybrid CTC/Attention End-to-End Speech Recognition with Pretrained Acoustic and Language Models,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 76–82.
[25] Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, and Pengyuan Zhang, “Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8522–8526.
[26] Fu-Hao Yu, Kuan-Yu Chen, and Ke-Han Lu, “Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition,” vol. 30, pp. 1474–1482.
[27] Wen-Chin Huang, Chia-Hua Wu, Shang-Bao Luo, Kuan-Yu Chen, Hsin-Min Wang, and Tomoki Toda, “Speech Recognition by Simply Fine-Tuning Bert,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7343–7347.
[28] Cheng Yi, Shiyu Zhou, and Bo Xu, “Efficiently Fusing Pretrained Acoustic and Linguistic Encoders for Low-Resource Speech Recognition,” vol. 28, pp. 788–792.
[29] Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, and Liang Lin, “Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition,” in Findings of the Association for Computational Linguistics: EMNLP 2021. pp. 2765–2777, Association for Computational Linguistics.
[30] Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi, and Shinji Watanabe, “BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder,” .
[31] Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean, “Distilling the Knowledge in a Neural Network,” in NIPS Deep Learning and Representation Learning Workshop.
[32] Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, and Tatsuya Kawahara, “Distilling the Knowledge of BERT for Sequence-to-Sequence ASR,” in Interspeech 2020. pp. 3635–3639, ISCA.
[33] Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, and Shuai Zhang, “Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT,” vol. 29, pp. 1897–1911.
[34] Keqi Deng, Gaofeng Cheng, Runyan Yang, and Yonghong Yan, “Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification,” vol. 30, pp. 340–354.
[35] Yotaro Kubo, Shigeki Karita, and Michiel Bacchiani, “Knowledge Transfer from Large-Scale Pretrained Language Models to End-To-End Speech Recognizers,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8512–8516.
[36] Joonbo Shin, Yoonhyung Lee, and Kyomin Jung, “Effective Sentence Scoring Method Using BERT for Speech Recognition,” in Proceedings of The Eleventh Asian Conference on Machine Learning. pp. 1081–1093, PMLR.
[37] Julian Salazar, Davis Liang, Toan Q. Nguyen, and Katrin Kirchhoff, “Masked Language Model Scoring,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. pp. 2699–2712, Association for Computational Linguistics.
[38] Shih-Hsuan Chiu and Berlin Chen, “Innovative Bert-Based Reranking Language Models for Speech Recognition,” in 2021 IEEE Spoken Language Technology Workshop (SLT), pp. 266–271.
[39] Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, and Tatsuya Kawahara, “ASR Rescoring and Confidence Estimation with Electra,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 380–387.
[40] Takuma Udagawa, Masayuki Suzuki, Gakuto Kurata, Nobuyasu Itoh, and George Saon, “Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems,” in Interspeech 2022. pp. 3919–3923, ISCA.
[41] Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiangyang Li, Edward Lin, and Tie-Yan Liu, “FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition,” in Advances in Neural Information Processing Systems. vol. 34, pp. 21708–21719, Curran Associates, Inc.
[42] Yun Zhao, Xuerui Yang, Jinchao Wang, Yongyu Gao, Chao Yan, and Yuanfu Zhou, “BART Based Semantic Correction for Mandarin Automatic Speech Recognition System,” in Interspeech 2021. pp. 2017–2021, ISCA.
[43] Alex Graves and Navdeep Jaitly, “Towards End-To-End Speech Recognition with Recurrent Neural Networks,” in Proceedings of the 31st International Conference on Machine Learning. pp. 1764–1772, PMLR.
[44] Alex Graves, “Sequence Transduction with Recurrent Neural Networks,” .
[45] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649.
[46] William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals, “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964.
[47] Linhao Dong, Shuang Xu, and Bo Xu, “Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5884–5888.
[48] Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko, “End-to-End Object Detection with Transformers,”in Computer Vision –ECCV 2020, Andrea Vedaldi, Horst Bischof, Thomas Brox,and Jan-Michael Frahm, Eds., vol. 12346 of Lecture Notes in Computer Science, pp.213–229. Springer International Publishing.
[49] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei, “Language Models are Few-Shot Learners,” in Advances in Neural Information Processing Systems. vol. 33, pp. 1877–1901, Curran Associates, Inc.
[50] Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio, “Neural machine translation by jointly learning to align and translate: 3rd International Conference on Learning Representations, ICLR 2015,” .
[51] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton, “Layer Normalization,”.
[52] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, “Deep Residual Learning for Image Recognition,” pp. 770–778.
[53] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation Applied to Handwritten Zip Code Recognition,” vol. 1, no. 4, pp. 541–551.
[54] Sepp Hochreiter and Jürgen Schmidhuber, “Long Short-Term Memory,” vol. 9, no.8, pp. 1735–1780.
[55] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” .
[56] Shinji Watanabe, Takaaki Hori, Suyoun Kim, John R. Hershey, and Tomoki Hayashi, “Hybrid CTC/Attention Architecture for End-to-End Speech Recognition,” vol. 11, no. 8, pp. 1240–1253.
[57] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed, “Hybrid speech recognition with Deep Bidirectional LSTM,” in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 273–278.
[58] Shinji Watanabe, Takaaki Hori, Shigeki Karita, Tomoki Hayashi, Jiro Nishitoba, Yuya Unno, Nelson Enrique Yalta Soplin, Jahn Heymann, Matthew Wiesner, Nanxin Chen, Adithya Renduchintala, and Tsubasa Ochiai, “ESPnet: End-to-End Speech Processing Toolkit,” in Interspeech 2018. pp. 2207–2211, ISCA.
[59] Santiago Fernández, “Sequence Labelling in Structured Domains with Hierarchical Recurrent Neural Networks,” p. 6.
[60] Shubham Toshniwal, Hao Tang, Liang Lu, and Karen Livescu, “Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition,” in Interspeech 2017. pp. 3532–3536, ISCA.
[61] Ramon Sanabria and Florian Metze, “Hierarchical Multitask Learning With CTC,” in 2018 IEEE Spoken Language Technology Workshop (SLT), pp. 485–490.
[62] Santiago Fernández, Alex Graves, and Jürgen Schmidhuber, “Sequence labelling in structured domains with hierarchical recurrent neural networks,” in Proceedings of the 20th International Joint Conference on Artifical Intelligence. IJCAI’07, pp.774–779, Morgan Kaufmann Publishers Inc.
[63] Kanishka Rao and Haşim Sak, “Multi-accent speech recognition with hierarchical grapheme based models,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4815–4819.
[64] Kalpesh Krishna, Shubham Toshniwal, and Karen Livescu, “Hierarchical Multitask Learning for CTC-based Speech Recognition,” .
[65] Xinying Song, Alex Salcianu, Yang Song, Dave Dopson, and Denny Zhou, “Fast WordPiece Tokenization,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 2089–2103, Association for Computational Linguistics.
[66] Taku Kudo and John Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 66–71, Association for Computational Linguistics.
[67] Hagen Soltau, Hank Liao, and Haşim Sak, “Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition,” in Interspeech 2017. pp. 3707–3711, ISCA.
[68] Marjan Ghazvininejad, Omer Levy, Yinhan Liu, and Luke Zettlemoyer, “Mask-Predict: Parallel Decoding of Conditional Masked Language Models,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 6112–6121, Association for Computational Linguistics.
[69] “CASS-NAT: CTC Alignment-Based Single Step Non-Autoregressive Transformer for Speech Recognition | IEEE Conference Publication | IEEE Xplore,” .
[70] Ruchao Fan, Wei Chu, Peng Chang, Jing Xiao, and Abeer Alwan, “An Improved
Single Step Non-Autoregressive Transformer for Automatic Speech Recognition,” in Interspeech 2021. pp. 3715–3719, ISCA.
[71] Wilson L. Taylor, ““Cloze Procedure”: A New Tool for Measuring Readability,” vol. 30, no. 4, pp. 415–433.
[72] Victor Sanh, “DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter,” p. 5.
[73] Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu, “TinyBERT: Distilling BERT for Natural Language Understanding,” in Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 4163–4174, Association for Computational Linguistics.
[74] Alexei Baevski, Steffen Schneider, and Michael Auli, “Vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations,” .
[75] Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, and Hung-yi Lee, “SUPERB: Speech Processing Universal PERformance Benchmark,” in Interspeech 2021. pp. 1194–1198, ISCA.
[76] Joseph F. DeRose, Jiayao Wang, and Matthew Berger, “Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models,” vol. 27, no. 2, pp. 1160–1170.
[77] Wietse de Vries, Andreas van Cranenburgh, and Malvina Nissim, “What’s so special about BERT’s layers? A closer look at the NLP pipeline in monolingual and multilingual models,” in Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 4339–4350, Association for Computational Linguistics.
[78] Betty van Aken, Benjamin Winter, Alexander Löser, and Felix A. Gers, “How
Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations,” in Proceedings of the 28th ACM International Conference on Information and Knowledge Management. CIKM ’19, pp. 1823–1832, Association for Computing Machinery.
[79] Anna Rogers, Olga Kovaleva, and Anna Rumshisky, “A Primer in BERTology: What We Know About How BERT Works,” vol. 8, pp. 842–866.
[80] John Hewitt and Christopher D. Manning, “A Structural Probe for Finding Syntax in Word Representations,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 4129–4138, Association for Computational Linguistics.
[81] Olga Kovaleva, Alexey Romanov, Anna Rogers, and Anna Rumshisky, “Revealing the Dark Secrets of BERT,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 4365–4374, Association for Computational Linguistics.
[82] Linhao Dong and Bo Xu, “CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition,” in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6079–6083.
[83] Shilong Liu, Feng Li, Hao Zhang, Xiao Yang, Xianbiao Qi, Hang Su, Jun Zhu, and Lei Zhang, “DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR,” .
[84] Yingming Wang, Xiangyu Zhang, Tong Yang, and Jian Sun, “Anchor DETR: Query Design for Transformer-Based Detector,” vol. 36, no. 3, pp. 2567–2575.
[85] Depu Meng, Xiaokang Chen, Zejia Fan, Gang Zeng, Houqiang Li, Yuhui Yuan, Lei Sun, and Jingdong Wang, “Conditional DETR for Fast Training Convergence,” pp. 3651–3660.
[86] Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, and Hao Zheng, “AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline,” in 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), pp. 1–5.
[87] Jiayu Du, Xingyu Na, Xuechen Liu, and Hui Bu, “AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale,” .
[88] Anthony Rousseau, Paul Deléglise, and Yannick Estève, “Enhancing the TEDLIUM Corpus with Selected Data for Language Modeling and More TED Talks,” in Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14). pp. 3935–3939, European Language Resources Association (ELRA).
[89] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean, “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation,” .
[90] Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli, “Fairseq: A Fast, Extensible Toolkit for Sequence Modeling,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations). pp. 48–53, Association for Computational Linguistics.
[91] Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur, “Librispeech: An ASR corpus based on public domain audio books,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5206–5210.
[92] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Can-wen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander Rush, “Transformers: State-of-the-Art Natural Language Processing,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. pp. 38–45, Association for Computational Linguistics.
[93] Tom Ko, Vijayaditya Peddinti, Daniel Povey, and Sanjeev Khudanpur, “Audio augmentation for speech recognition,” in Interspeech 2015. pp. 3586–3589, ISCA.
[94] Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D.Cubuk, and Quoc V. Le, “SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition,” in Interspeech 2019, pp. 2613–2617.
[95] Yoav Goldberg, “Assessing BERT’s Syntactic Abilities,” .
[96] Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith, “Linguistic Knowledge and Transferability of Contextual Representations,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 1073–1094, Association for Computational Linguistics.
[97] Zeyu Zhao and Peter Bell, “Investigating Sequence-Level Normalisation For CTC-Like End-to-End ASR,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7792–7796.
Backlinks
- No backlinks found