Overall, we can treat extractive summarization as a recommendation problem i.e. Given a query, recommend a set of sentences that are relevant. The query here is the document, relevance is a measure of if a given sentence belongs in the document summary.
How we go about obtaining this measure of relevance might vary (the common dilemma for any recommendation systems problem). We can select multiple problem formulations for example.
- Classification/Regression. Given input(s), output a class or some relevance score for each sentence. Here, the input is a document and a sentence in the document, the output is a class (belongs in summary or not) or a likelihood score (likelihood that sentence belongs in summary). This formulation is pairwise, i.e at test time, we need to compute n passes through the model for n sentences to get n classes/scores, or compute this as a batch.
- Metric Learning: Learn a shared distance metric embedding space for both documents and sentences such embedding for documents and sentences that belong in the summary for that document are close in distance space. At test time, we get a representation of the document and each sentence, and then get the most similar sentences. This approach is particularly useful as we can leverage fast similarity search algorithms.
In this work, we will explore a classification setup which follows existing studies (e.g. Nallapati et al 2017
, use RNNs for text encoding and classify each sentence). While this approach is pairwise (and compute intensive wrt to the number of sentences), we can accept this limitation as most documents have a relatively small number of sentences.