5 COMPARING NARRATIVE STRUCTURES
5.1 Bag-of-actors Model
Srivatsa and Srinivasa  present an abstract “bag-of-actors” document model which is meant for comparing, indexing and retrieving documents based on their narrative structure. This model is based on resolving the main entities or actors in the plot, and the corresponding actions associated with them.
In this model, the plot of a narrative is essentially narrowed down to the following elements: the primary actors in the plot, and the set of expressions or actions that are performed by the actors. In other words, the essence of a plot is distilled to: who are involved in the plot and what do they do.
The similarity computation is modeled as a maximal matching problem between tokens of the first expression and the second. It is implemented in the form of a greedy knapsack algorithm. The model first computes the similarity between two expressions simEx. Using this simEx, it then computes the similarity between two actors simact. Finally, it computes the similarity between two narratives simN using the simact
Actors and expressions are compared using the WordNet LIN similarity measure. LIN gives a similarity score of 1 when the provided tokens match the term and the sense. When the terms and/or sense do not match, LIN returns a similarity measure based on the information content (IC) of the respective words and that of their least common subsumer (LCS) in the hypernym tree that subsumes them. Least Common Subsumer of two concepts A and B is “the most specific concept which is an ancestor of both A and B”.
Identification of actors and expressions are performed by parsing the document on a sentence level. Actors across sentences are linked using the Stanford NLP neural co-reference resolution system. Expressions are extracted from constituency parse trees of sentences. Constituency parse trees break sentences into sub-phrases with words as leaf nodes and phrase types as inner nodes.
The larger problem is to retrieve candidate documents for comparison from a corpus of documents. Given an input document with a story, it is not feasible to compare it pairwise against all documents in a large corpus. In order to address this, this paper proposes a variant of the conventional inverted index model for indexing documents, called the hypernym index.
An inverted index is in the form of a “postings list”, where each element in the list represents a term and points to a set of documents that contain the term.
Given a query document, the method uses the hypernym index to retrieve candidate documents that are likely to be similar in terms of their narrative. In order to do this, the query document dq is parsed to obtain all the sense-disambiguated tokens representing expressions. The hypernym trees for each of the tokens obtained are then constructed. For every term tq in the set of query tokens and their hypernyms, the set of all documents that have a non-zero score for the term is retrieved. The weight of the retrieval is added to the pre-existing score if any, for the document.
The retrieved documents are ordered in descending order of their total scores, and the top 90 percent of retrievals were chosen as candidates for narrative comparison. The final ranking of query results is based on the bag-of-actors similarity score from the previous section on the set of candidates.