Difference between revisions of "Paraphrase algorithm"

Latest revision as of 03:53, 18 June 2007

This article will describe my work in developing a paraphrase algorithm using Project Gutenberg as my corpora. It is a form of natural language processing (NLP) in computational linguistics.

The idea is to first extract comparable sub-corpora from my main corpus and use this as my training set. To start with, I will first build a basic sub-subset of parallel corpus and use this for sentence clustering. I am trying to collect data for inferring templates from sentences that appear to be similar on a word-by-word level.

Examples

Sem Experiment

See: Statistical Paraphrasing Project from the Cornell Natural Language Processing Group

Swords to ploughshares

Let A1 = Isaiah 2:4, B1 = Micah 4:3, and C1 = Joel 3:10. With,

A1: And he shall judge among the nations, and shall rebuke many people: and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up sword against nation, neither shall they learn war any more.
B1: And he shall judge among many people, and rebuke strong nations afar off; and they shall beat their swords into plowshares, and their spears into pruninghooks: nation shall not lift up a sword against nation, neither shall they learn war any more.
C1: Beat your plowshares into swords, and your pruninghooks into spears: let the weak say, I am strong.

sentence clustering:

A1a: And he shall judge among the nations,
B1a: And he shall judge among many people,

A1b: and shall rebuke many people:
B1b: and rebuke strong nations afar off;

A1c: and they shall beat their swords into plowshares,
B1c: and they shall beat their swords into plowshares,

A1d: and their spears into pruninghooks:
B1d: and their spears into pruninghooks:

A1e: nation shall not lift up sword against nation,
B1e: nation shall not lift up a sword against nation,

A1f: neither shall they learn war any more.
B1f: neither shall they learn war any more.

inducing patterns (arguments in square brackets):

{A1c,A1d,A1f} = {B1c,B1d,B1f}
A1a: And he shall judge among [the nations],
B1a: And he shall judge among [many people],
A1b: and [shall rebuke] [many people]:
B1b: and [rebuke] [strong nations] [afar off];
A1e: nation shall not lift up [sword] against nation,
B1e: nation shall not lift up [a sword] against nation,

References

Barzilay R, Lee L (2003). "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment". Proceedings of HLT-NAACL, pp 16-23.

External links

@@ Line 57: / Line 57: @@
 *[http://www.cs.cornell.edu/home/llee/papers/statpar-informal.draft.html An informal explanation of "Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment"]
 *[http://ejohn.org/projects/javascript-diff-algorithm/ Javascript Diff Algorithm]
+*[http://aclweb.org/aclwiki/index.php?title=DIRT_Paraphrase_Collection DIRT Paraphrase Collection]
+*[http://aclweb.org/aclwiki/index.php?title=Distributional_Hypothesis Distributional Hypothesis]
+*[http://aclweb.org/aclwiki/index.php?title=Statistical_Semantics Statistical Semantics]
+*[http://semantics.isi.edu/ocean/ VerbOcean]
 [[Category:Linguistics]]

Difference between revisions of "Paraphrase algorithm"

Latest revision as of 03:53, 18 June 2007

Contents

Examples

Sem Experiment

Swords to ploughshares

References

See also

External links

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools