《Cross-lingual adaptation with multi-task adaptive networks》(1)

  1. 首先,为什么要看这篇论文。
    1. 这篇论文如果没猜错应该是基于DNN做cross-lingual的adaptation,现在DNN还是很火,所以如果能用DNN来做cross-lingual的adaptation肯定有前途
    2. 论文提到了训练使用的是Theano库,这个库我之前还是接触过一点的,使用的是GTX690GPUs进行训练的,也就是说代码不用自己写。
    3. 这篇论文是cross-lingual adaptation用于ASR的,看看能不能从ASR那边借鉴一些东西到合成这边来
  2. Introduction

  第一段:

    1. In cross-lingual automatic speech recognition (ASR), models applied to a target language are enhanced using data from a different source language. 

      1. 强烈发现,之前读的论文有点少,还是要广泛的多读论文,至少我知道现在在cross-lingual adaptation合成方面,没人用DNN来做,但是在cross lingual识别方面,已经用DNN在做的人很多了,如果能从ASR借鉴一些东西过来,那么肯定能发一篇不错的文章
      2. 假设现在有了粤语语料1000句,训练一个ASR模型,用于识别粤语,
      3. 如果还有1000句的英语语料,英语被称作是source language,粤语被称作是target language,
      4. 那么使用这1000句的英语语料,在对之前训练好的ASR模型进行重新训练,会得到加强的models。
    2. In this scenario, the target language is typically low-resourced: transcribed acoustic training data for the target language may be difficult or expensive to acquire.
      1. 目标语言是很少的,
      2. 并且想要录制目标语言的训练数据非常困难
      3. 也就是说目标语言是难以获得的,只有少量的,但是source language是很容获得的。
    3. The cross-lingual approach is motivated by the fact that the source language data, despite being mismatched to the target, may cap- ture common properties of the acoustics of speech which are shared across languages, improving the generalisation of the fi- nal models to unseen speakers and conditions. 
      1. 跨语言的方法是如何被激发想到的呢?
      2. source 语言的数据,可能捕捉的到共同的声学特征属性,被夸语言的共享,
      3. 说了个什么,感觉语句都不通顺,就是说source language 英语和target language 粤语,虽然是不同的语种,但是他们之间还是有一些声学特征是可以共享的,肯定有一些声学特征是每种语言所特有的
      4. 这是基于这种不同语言之间的共享的声学特征,可以提升最终模型的普遍性。
      5. 技术分享

  第二段

    1. Cross-lingual ASR may be viewed as a form of adaptation.
      1. 跨语言的ASR可以认为是自适应的一种,
      2. 什么意思?
      3. 自适应一个宽泛的概念,下面包括
        1. 跨语言的ASR
        2. 跨语言的合成
        3. .....
    2. In contrast to domain or speaker adaptation, the major problem with cross-lingual adaptation arises from the differences in phone sets between the source and target languages.
      1. 与domain自适应或者说话人自适应相比
      2. 跨语言的自适应的主要问题是什么引起的?
        1. 是由于source和target language语言的音素集的不同引起的
    3. Even when a universal phone set is used, it has been found that realisation of what is ostensibly the same phone still differs across languages [1].
      1. 尽管使用了一个通用的音素集,
      2. 后面一句不会翻译,
    4. In this paper, we focus on approaches where source and target languages are assumed not to share a phone set, which is probably a valid assumption when a small number of source lan- guages are used, which are unlikely to provide complete phone coverage for an arbitrary target language. 
      1. 作者的方法是:假设source和target language没有共享一个phone set
      2. 或许作者的假设是一个有效的假设,当少量的source language被使用时,在这种情况下,是不可能提供一个完整的音素覆盖,用于任意的目标语言

  第三段:

    1. Arguably the simplest approach to the problem of cross- lingual phoneset mismatch is to define a deterministic mapping between source and target phone sets [2] which may be estimated in a data-driven fashion [3].
      1. 对cross-lingual的音素集的mismatch有一些方法可以解决
      2. 简单的一种方法是,在source和target音素集之间定义一个确定的mapping
      3. 这个好像在合成中也是常用到的一种方法,像我现在做的不就是state mapping吗
    2. However, this hard mapping leads to a loss of information from the target language acoustics that cannot be represented by a single source language phone.
      1. 但是,这种强制性的映射,导致了信息的丢失
      An alternative is to learn a probabilistic mapping, in which the distribution of target phonemes is expressed over a feature space comprising source language phone posterior probability estimates, which may be formulated as a product-of-experts model [4] or as a KL-HMM [5].
      1. 另外一种方法是概率映射
      2. 目标音素的分布通过一个特征空间表示.....后面不会翻译了
      3. 两个实际的例子是:
        1. 产生专家模型
        2. KL-HMM模型
      Here, the source languages may be viewed as defining a low-dimensional subspace in which to es- timate target language models.
      1. source language被视作是定义了一个低维度的子空间,来估计目标语言的模型
    3. This is the motivation behind the work of [6], where a subspace GMM (SGMM) is used, in which the source languages define a subspace of full covariance Gaussians. 
      1. 这是根据【6】的工作受到的启发,使用的是一个子空间的GMM
      2. source语言定义了一个满协方差高斯的子空间。
      3. 这里涉及了不少的数学知识。

 

郑重声明:本站内容如果来自互联网及其他传播媒体,其版权均属原媒体及文章作者所有。转载目的在于传递更多信息及用于网络分享,并不代表本站赞同其观点和对其真实性负责,也不构成任何其他建议。