Paraphrase algorithm

From Christoph's Personal Wiki
Jump to: navigation, search

This article will describe my work in developing a paraphrase algorithm using Project Gutenberg as my corpora. It is a form of natural language processing (NLP) in computational linguistics.

The idea is to first extract comparable sub-corpora from my main corpus and use this as my training set. To start with, I will first build a basic sub-subset of parallel corpus and use this for sentence clustering. I am trying to collect data for inferring templates from sentences that appear to be similar on a word-by-word level.

Examples

Sem Experiment

See: Statistical Paraphrasing Project from the Cornell Natural Language Processing Group

Swords to ploughshares

Let A1 = Isaiah 2:4, B1 = Micah 4:3, and C1 = Joel 3:10. With,

A1
And he shall judge among the nations, and shall rebuke many people: and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up sword against nation, neither shall they learn war any more.
B1
And he shall judge among many people, and rebuke strong nations afar off; and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up a sword against nation, neither shall they learn war any more.
C1
Beat your plowshares into swords, and your pruninghooks into spears: let the weak say, I am strong.
  • sentence clustering:
A1a: And he shall judge among the nations,
B1a: And he shall judge among many people,

A1b: and shall rebuke many people:
B1b: and rebuke strong nations afar off;

A1c: and they shall beat their swords into plowshares,
B1c: and they shall beat their swords into plowshares,

A1d: and their spears into pruninghooks:
B1d: and their spears into pruninghooks:

A1e: nation shall not lift up sword against nation,
B1e: nation shall not lift up a sword against nation,

A1f: neither shall they learn war any more.
B1f: neither shall they learn war any more.
  • inducing patterns (arguments in square brackets):
{A1c,A1d,A1f} = {B1c,B1d,B1f}
A1a: And he shall judge among [the nations],
B1a: And he shall judge among [many people],
A1b: and [shall rebuke] [many people]:
B1b: and [rebuke] [strong nations] [afar off];
A1e: nation shall not lift up [sword] against nation,
B1e: nation shall not lift up [a sword] against nation,

References

See also

External links