《Cross-lingual adaptation with multi-task adaptive networks》(1)
- 首先,为什么要看这篇论文。
- 这篇论文如果没猜错应该是基于DNN做cross-lingual的adaptation,现在DNN还是很火,所以如果能用DNN来做cross-lingual的adaptation肯定有前途
- 论文提到了训练使用的是Theano库,这个库我之前还是接触过一点的,使用的是GTX690GPUs进行训练的,也就是说代码不用自己写。
- 这篇论文是cross-lingual adaptation用于ASR的,看看能不能从ASR那边借鉴一些东西到合成这边来
- Introduction
第一段:
-
In cross-lingual automatic speech recognition (ASR), models applied to a target language are enhanced using data from a different source language.
- 强烈发现,之前读的论文有点少,还是要广泛的多读论文,至少我知道现在在cross-lingual adaptation合成方面,没人用DNN来做,但是在cross lingual识别方面,已经用DNN在做的人很多了,如果能从ASR借鉴一些东西过来,那么肯定能发一篇不错的文章
- 假设现在有了粤语语料1000句,训练一个ASR模型,用于识别粤语,
- 如果还有1000句的英语语料,英语被称作是source language,粤语被称作是target language,
- 那么使用这1000句的英语语料,在对之前训练好的ASR模型进行重新训练,会得到加强的models。
- In this scenario, the target language is typically low-resourced: transcribed acoustic training data for the target language may be difficult or expensive to acquire.
- 目标语言是很少的,
- 并且想要录制目标语言的训练数据非常困难
- 也就是说目标语言是难以获得的,只有少量的,但是source language是很容获得的。
- The cross-lingual approach is motivated by the fact that the source language data, despite being mismatched to the target, may cap- ture common properties of the acoustics of speech which are shared across languages, improving the generalisation of the fi- nal models to unseen speakers and conditions.
- 跨语言的方法是如何被激发想到的呢?
- source 语言的数据,可能捕捉的到共同的声学特征属性,被夸语言的共享,
- 说了个什么,感觉语句都不通顺,就是说source language 英语和target language 粤语,虽然是不同的语种,但是他们之间还是有一些声学特征是可以共享的,肯定有一些声学特征是每种语言所特有的
- 这是基于这种不同语言之间的共享的声学特征,可以提升最终模型的普遍性。
第二段
- Cross-lingual ASR may be viewed as a form of adaptation.
- 跨语言的ASR可以认为是自适应的一种,
- 什么意思?
- 自适应一个宽泛的概念,下面包括
- 跨语言的ASR
- 跨语言的合成
- .....
- In contrast to domain or speaker adaptation, the major problem with cross-lingual adaptation arises from the differences in phone sets between the source and target languages.
- 与domain自适应或者说话人自适应相比
- 跨语言的自适应的主要问题是什么引起的?
- 是由于source和target language语言的音素集的不同引起的
- Even when a universal phone set is used, it has been found that realisation of what is ostensibly the same phone still differs across languages [1].
- 尽管使用了一个通用的音素集,
- 后面一句不会翻译,
- In this paper, we focus on approaches where source and target languages are assumed not to share a phone set, which is probably a valid assumption when a small number of source lan- guages are used, which are unlikely to provide complete phone coverage for an arbitrary target language.
- 作者的方法是:假设source和target language没有共享一个phone set
- 或许作者的假设是一个有效的假设,当少量的source language被使用时,在这种情况下,是不可能提供一个完整的音素覆盖,用于任意的目标语言
- Cross-lingual ASR may be viewed as a form of adaptation.
第三段:
- Arguably the simplest approach to the problem of cross- lingual phoneset mismatch is to define a deterministic mapping between source and target phone sets [2] which may be estimated in a data-driven fashion [3].
- 对cross-lingual的音素集的mismatch有一些方法可以解决
- 简单的一种方法是,在source和target音素集之间定义一个确定的mapping
- 这个好像在合成中也是常用到的一种方法,像我现在做的不就是state mapping吗
However, this hard mapping leads to a loss of information from the target language acoustics that cannot be represented by a single source language phone. - 但是,这种强制性的映射,导致了信息的丢失
- 另外一种方法是概率映射
- 目标音素的分布通过一个特征空间表示.....后面不会翻译了
- 两个实际的例子是:
- 产生专家模型
- KL-HMM模型
- source language被视作是定义了一个低维度的子空间,来估计目标语言的模型
- This is the motivation behind the work of [6], where a subspace GMM (SGMM) is used, in which the source languages define a subspace of full covariance Gaussians.
- 这是根据【6】的工作受到的启发,使用的是一个子空间的GMM
- source语言定义了一个满协方差高斯的子空间。
- 这里涉及了不少的数学知识。
- Arguably the simplest approach to the problem of cross- lingual phoneset mismatch is to define a deterministic mapping between source and target phone sets [2] which may be estimated in a data-driven fashion [3].
郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。