《Cross-lingual adaptation with multi-task adaptive networks》(1)

浏览数：32 / 时间：2015年06月09日

首先，为什么要看这篇论文。
1. 这篇论文如果没猜错应该是基于DNN做cross-lingual的adaptation，现在DNN还是很火，所以如果能用DNN来做cross-lingual的adaptation肯定有前途
2. 论文提到了训练使用的是Theano库，这个库我之前还是接触过一点的，使用的是GTX690GPUs进行训练的，也就是说代码不用自己写。
3. 这篇论文是cross-lingual adaptation用于ASR的，看看能不能从ASR那边借鉴一些东西到合成这边来
Introduction

　　第一段：

In cross-lingual automatic speech recognition (ASR), models applied to a target language are enhanced using data from a different source language.
1. 强烈发现，之前读的论文有点少，还是要广泛的多读论文，至少我知道现在在cross-lingual adaptation合成方面，没人用DNN来做，但是在cross lingual识别方面，已经用DNN在做的人很多了，如果能从ASR借鉴一些东西过来，那么肯定能发一篇不错的文章
2. 假设现在有了粤语语料1000句，训练一个ASR模型，用于识别粤语，
3. 如果还有1000句的英语语料，英语被称作是source language，粤语被称作是target language，
4. 那么使用这1000句的英语语料，在对之前训练好的ASR模型进行重新训练，会得到加强的models。
In this scenario, the target language is typically low-resourced: transcribed acoustic training data for the target language may be difficult or expensive to acquire.
1. 目标语言是很少的，
2. 并且想要录制目标语言的训练数据非常困难
3. 也就是说目标语言是难以获得的，只有少量的，但是source language是很容获得的。
The cross-lingual approach is motivated by the fact that the source language data, despite being mismatched to the target, may cap- ture common properties of the acoustics of speech which are shared across languages, improving the generalisation of the fi- nal models to unseen speakers and conditions.
1. 跨语言的方法是如何被激发想到的呢？
2. source 语言的数据，可能捕捉的到共同的声学特征属性，被夸语言的共享，
3. 说了个什么，感觉语句都不通顺，就是说source language 英语和target language 粤语，虽然是不同的语种，但是他们之间还是有一些声学特征是可以共享的，肯定有一些声学特征是每种语言所特有的
4. 这是基于这种不同语言之间的共享的声学特征，可以提升最终模型的普遍性。

　　第二段

1. Cross-lingual ASR may be viewed as a form of adaptation.
  1. 跨语言的ASR可以认为是自适应的一种，
  2. 什么意思？
  3. 自适应一个宽泛的概念，下面包括
    1. 跨语言的ASR
    2. 跨语言的合成
    3. .....
2. In contrast to domain or speaker adaptation, the major problem with cross-lingual adaptation arises from the differences in phone sets between the source and target languages.
  1. 与domain自适应或者说话人自适应相比
  2. 跨语言的自适应的主要问题是什么引起的？
    1. 是由于source和target language语言的音素集的不同引起的
3. Even when a universal phone set is used, it has been found that realisation of what is ostensibly the same phone still differs across languages [1].
  1. 尽管使用了一个通用的音素集，
  2. 后面一句不会翻译，
4. In this paper, we focus on approaches where source and target languages are assumed not to share a phone set, which is probably a valid assumption when a small number of source lan- guages are used, which are unlikely to provide complete phone coverage for an arbitrary target language.
  1. 作者的方法是：假设source和target language没有共享一个phone set
  2. 或许作者的假设是一个有效的假设，当少量的source language被使用时，在这种情况下，是不可能提供一个完整的音素覆盖，用于任意的目标语言

　　第三段：

1. Arguably the simplest approach to the problem of cross- lingual phoneset mismatch is to define a deterministic mapping between source and target phone sets [2] which may be estimated in a data-driven fashion [3].
  1. 对cross-lingual的音素集的mismatch有一些方法可以解决
  2. 简单的一种方法是，在source和target音素集之间定义一个确定的mapping
  3. 这个好像在合成中也是常用到的一种方法，像我现在做的不就是state mapping吗
2. This is the motivation behind the work of [6], where a subspace GMM (SGMM) is used, in which the source languages define a subspace of full covariance Gaussians.
  1. 这是根据【6】的工作受到的启发，使用的是一个子空间的GMM
  2. source语言定义了一个满协方差高斯的子空间。
  3. 这里涉及了不少的数学知识。