读论文《TransForm Mapping Using Shared Decision Tree Context Clustering for HMM-based Cross-Lingual Speech Synthesis》(3)

浏览数：35 / 时间：2015年06月11日

3.1. Shareddecisiontreecontextclustering(STC)

STC [11] was originally proposed to avoid generating speaker-biased leaf nodes in the tree construction of an average voice model.
1. 果然，这里作者说了一下STC技术的出处在什么地方
2. 然后简单的介绍了STC技术是用来解决什么问题的
  1. 在average voice model的树的构建过程中，避免产生说话人偏差的叶子节点
  2. 关于上面提到的“说话人偏差的叶子节点”，得详细的去看引用[11]，还有之前看过的做自适应的一篇博士论文，就是之前组会上讲的没讲清楚的那篇博士论文。
In the conventional decision-tree-based context cluster- ing for the average voice model, each leaf node does not always have the training data of all speakers, and some leaf nodes have only a few speakers’ training data.
1. 在传统的average voice model的基于决策树的上下文聚类技术中，每个叶子节点，不总是会有所有的说话人的训练数据，一些叶子节点，仅仅有几个说话人的训练数据。

speaker-biased leaf nodes

On the other hand, in STC, we only use the questions which can be applied to all speakers.
1. 对于STC，我们仅仅使用可以应用到所有说话人的问题。
2. 有一个问题，IBM、helen、拼接不都是粤语的语料吗，还有IBM能用的问题，但是不能用到helen的吗？
3. 还是说，这里我理解错了，作者这里指的是英语和粤语这两个说话人。他在这里把STC是用于不同语言之间。
As a result, every node of the deci- sion tree has the training data of all speakers, which leads to a speaker-unbiased average voice model.
2. 这就是所谓的speaker-unbiased average voice model

================

3.2. Transform mapping based on language-independent decision tree using STC

To use contextual information in the transform mapping be- tween different languages, we must consider the language dependency of decision trees.
1. 这也是我正在考虑的一个问题，如何在state mapping构建过程中去考虑上下文的信息
2. 什么叫做上下文的信息，于泉杰，你自己能举一个例子吗？
3. 作者在这里给出了一个提示，如何在构建state mapping时，考虑上下文的信息，
  1. 必须要考虑决策树的language dependence
In general, near the root node of the decision trees, there are language-independent proper- ties between the two languages in terms of basic articulation manners such as vowel, consonant, and voiced/unvoiced sound.
1. 在决策树的根节点，是两种语言的语言无关的属性
2. 像是基本的发音方式：
  1. 元音
  2. 辅音
  3. 清音/浊音
3. 是这样的吗？
4. 好像我之前有看过HTS训练出来的模型文件，例如，/trees/.../下面的模型文件，没有发现这个规律，
5. 还是我当时看的不对，这个可以以后再来看一下
On the other hand, near the leaf nodes, there frequently appear language-dependent properties because some nodes are split us- ing language-specific questions, e.g., ”Is the current phoneme diphthong?”
1. 在叶子节点处，一般是出现语言相关的属性，因为一些节点的分裂，使用语言特定的问题
2. 例如，当前的音素是diphthong？这种问题是英语所特有的，粤语是肯定没有这个问题的
To alleviate the language mismatch in the trans- form mapping between the average voice models, we gener- ate a transform mapping based on a language-independent de- cision tree constructed by STC.　　
1. 我们使用STC构建一个语言无关的决策树，使用这个决策树来，构建state mapping
Specifically, we use both av- erage voice models of input and output languages in the con- text clustering, and the transformation matrices for the two av- erage voice models are explicitly mapped to each other in the leaf nodes of the language-independent decision tree.
1. 把英语和粤语的average voice model放到一起，在聚类时，
2. 语言无关的决策树，叶子节点，如果两种语言的state在语言无关决策树的同一个叶子节点中，那么认为这两个state是一对映射的叶子节点。
Con- structing the tree, we split nodes from the root using only the questions that can be applied to all speakers of both languages.
1. 构建树，什么树，language-independent decision tree，
2. 构建树，就需要问题集，那么用什么样的问题集呢？
  1. 问题集中的问题，必须能应用两种不同的语言
  2. 也就是两种语言共享的问题
In this study, we control the tree size by introducing a weight into stopping criterion based on the minimum description length (MDL) [13].
1. 我们控制树的大小，通过引入一个权重，到停止原则中，基于MDL的
To avoid the effect of the language dependency, a smaller tree is constructed compared with that based on MDL.
1. 为了避免语言相关性的影响，一个更小的树被构建，与基于MDL的进行比较
Since the node splitting is based on the acoustic parameters of each node, the transform mapping is conducted using both the acoustic and contextual information, which is more desirable than the conventional state mapping based on KLD.
1. 由于节点分裂是基于每个节点的声学参数，
2. state mapping被构建使用声学特征和上下文相关因素
3. 比传统的KLD的state mapping更明智
4. 好吧，作者自己说漏嘴了，前后不一致，这里是state mapping，前面是transform mapping
An appro- priate size of the tree is experimentally examined in Sect. 4.3.
1. 一个适当大小的树，在4.3节中做了一个实验