![]() Decoder: RNN (LSTM/GRU), Transformer, or S4.Encoder: VGG-like CNN + BiRNN (LSTM/GRU), sub-sampling BiRNN (LSTM/GRU), Transformer, Conformer, Branchformer, or E-Branchformer. ![]() CTC/attention joint decoding to boost monotonic alignment decoding.Fast/accurate training with CTC/attention multitask training.Hybrid CTC/attention based end-to-end ASR.State-of-the-art performance in several ASR benchmarks (comparable/superior to hybrid DNN/HMM and CTC).Support singing voice synthesis recipe (ofuton_p_utagoe_db).Support speaker diarization recipe (mini_librispeech, librimix).Support voice conversion recipe (VCC2020 baseline). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |