
Figure 1
Similarity Matrix. The leftmost column and row headers jointly provide a notational scheme for labelling and scoring the intertextual pairs in the dataset; common intertextual entitles or referential relation are sorted into the corresponding conceptual space of similarity, including specific phenomena like “heteroglossia” (Bakhtin, 1981) in the box for “Phrase-Parallel”.
Table 1
Model details.
| MODEL | MODEL DETAILS |
|---|---|
| Word2Vec | (out of the box, no adjustment) |
| Base SBERT* | all-MiniLM-L6-v2 |
| MPNet (masked model)* | all-mpnet-base-v2 |
| Multilingual MPNet* | paraphrase-multilingual-mpnet-base-v2 |
| Question-Answer & Retrieval* | multi-qa-mpnet-base-dot-v1 |
| Distilled Question-Answer & Retrieval* | multi-qa-disilbert-cos-v1 |
| E5* | e5-base-v2 |
| Note | *SBERT family |

Figure 2
A portion of the dataset illustrating how word-mirroring is weighted down by paragraph-level sense of opposition.

Figure 3
A demonstration of one way to label intertextual elements and calculate a similarity score, compared with basic NLP approaches to STS.

Figure 4
A conceptual representation of the proposed intertextual similarity (aSimMatrix) score.

Figure 5
A sample of pairwise n-gram and label distributions.

Figure 6
An example of semantic/conceptual difference.
