bert 句向量 的 各向异性问题 及与 对比学习 的联系
本文主要介绍了 为什么基于bert产出的句向量,在语义相似相关的任务上表现较差的原因及相关解释(各向异性,表示退化,锥形空间),另外介绍了simcse 中 论述的 对比学习 与 各向异性 的联系。
主要是涉及的相关论文和主要论点,留存用。
目录
问题引入:
相关论文解释:
1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS
2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT
2.1 The Connection between Semantic Similarity and BERT Pre-training :
2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity:
3. simcse : chap5 : Connection to Anisotropy
4. Alignment and Uniformity
相关论文:
问题引入:
why do the BERT-induced sentence embeddings perform poorly to retrieve semantically similar sentences?
即,为什么基于bert,来产出句向量,在语义相似相关的任务上表现极差?
Reimers and Gurevych (2019) demonstrate that such BERT sentence embeddings lag behind the state-of-the-art sentence embeddings in terms of semantic similarity. On the STS-B dataset, BERT sentence embeddings are even less competitive to averaged GloVe (Pennington et al., 2014) embed- dings, which is a simple and non-contextualized baseline proposed several years ago.
相关论文解释:
1. REPRESENTATION DEGENERATION PROBLEM IN TRAINING NATURAL LANGUAGE GENERATION MODELS
主要引进了表示退化问题(各向异性)
We observe that when training a model for natural language genera- tion tasks through likelihood maximization with the weight tying trick, especially with big training datasets, most of the learnt word embeddings tend to degenerate and be distributed into a narrow cone, which largely limits the representation power of word embeddings.
......
2. bert-flow : chap2 : Understanding the Sentence Embedding Space of BERT
主要介绍了bert类预训练任务和语义相似的联系,以及对语义相似表现较差的分析
2.1 The Connection between Semantic Similarity and BERT Pre-training :
- The similarity between BERT sentence embed- dings can be reduced to the similarity betweenT2BERT context embeddings hc hc′ . However, as shown in Equation 1, the pretraining of BERT does not explicitly involve the computation of hTc hc′ . Therefore, we can hardly derive a mathematical formulation of what h⊤c hc′ exactly represents.
- Co-Occurrence Statistics as the Proxy for Semantic Similarity: roughly speaking, it is semantically meaningful to compute the dot product be- tween a context embedding and a word embedding
- Higher-Order Co-Occurrence Statistics as Context-Context Semantic Similarity: During pretraining, the semantic relationship between two contexts c and c′ could be inferred and reinforced with their connections to words.
2.2 Anisotropic Embedding Space Induces Poor Semantic Similarity:
- To investigate the underlying problem of the fail- ure, we use word embeddings as a surrogate be- cause words and contexts share the same embed- ding space. If the word embeddings exhibits some misleading properties, the context embeddings will also be problematic, and vice versa.
- Gao et al. (2019) and Wang et al. (2020) have pointed out that, for language modeling, the max- imum likelihood training with Equation 1 usually produces an anisotropic word embedding space. “Anisotropic” means word embeddings occupy a narrow cone in the vector space.
- Observation 1: Word Frequency Biases the Embedding Space
- Observation 2: Low-Frequency Words Dis- perse Sparsely We observe that, in the learned anisotropic embedding space, high-frequency words concentrates densely and low-frequency words disperse sparsely.
- Due to the sparsity, many “holes” could be formed around the low-frequency word embed- dings in the embedding space, where the semantic meaning can be poorly defined. Note that BERT sentence embeddings are produced by averaging the context embeddings, which is a convexity- preserving operation. However, the holes violate the convexity of the embedding space
3. simcse : chap5 : Connection to Anisotropy
主要介绍了simcse 与各向异性的联系,及为什么simcse会有效
we take a singular spectrum perspective—which is a common practice in analyzing word embeddings (Mu and Viswanath, 2018; Gao et al., 2019; Wang et al., 2020), and show that the contrastive objective can “flatten” the singular value distribution of sentence embeddings and make the representations more isotropic.
......
4. Alignment and Uniformity
主要引进了Alignment and Uniformity 来分析和评估(训练)句向量
......
相关论文:
- Jun Gao, Di He, Xu Tan, Tao Qin, Liwei Wang, and Tieyan Liu. 2019. Representation degenera- tion problem in training natural language generation models. In International Conference on Learning Representations (ICLR).
- https://openreview.net/pdf?id=ByxY8CNtvr : IMPROVING NEURAL LANGUAGE GENERATION WITH SPECTRUM CONTROL
- bert-flow: On the Sentence Embeddings from Pre-trained Language Models
- SimCSE: Simple Contrastive Learning of Sentence Embeddings
- http://proceedings.mlr.press/v119/wang20k/wang20k.pdf Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere
总结
以上是生活随笔为你收集整理的bert 句向量 的 各向异性问题 及与 对比学习 的联系的全部内容,希望文章能够帮你解决所遇到的问题。
- 上一篇: 【智能优化算法-灰狼算法】基于翻筋斗觅食
- 下一篇: 【MATLAB appdesigner】