Leopar main page

Leopar User's documentation: The Corpus mode

The corpus mode runs the parser on a set of sentence and returns a minimalist set of informations about the parsing results. Most of the usual Leopar options are compatible with this mode.

A corpus example

Consider the file below:

sentences
Léo part.
Il lui donne une lettre.
Pas de solution sans point
not in the lexicon
Si la phrase est trop longue, il faut du temps pour que le système l'analyse. 

Parsing into phrase structure trees

leopar -corpus sentences -dep tree -timeout 10

can produce:

COMMAND : leopar_dev -corpus sentences -dep tree -timeout 10
HOSTNAME: Brunos-MacBook-Pro.local
Léo part.	+++	[Inc[11]] "Léo part.", 14 ---[F: 0.00s]---> 1 ---[P: 0.00s]---> 1   tree:1
Il lui donne une lettre.	+++	[Inc[11]] "Il lui donne une lettre.", 3480 ---[F: 0.01s]---> 1 ---[P: 0.00s]---> 2   tree:2
Pas de solution sans point	---	[Inc[11]] "Pas de solution sans point", 9135 ---[F: 0.01s]---> 0 ---[P: 0.00s]---> 0   tree:0
not in the lexicon  Failed: tokenize [not; the; lexicon]
Si la phrase est trop longue, il faut du temps pour que le système l'analyse.	+++	[Inc[11]] "Si la phrase est trop longue, il faut du temps pour que le système l'analyse.", 7.326625e+13 ---[F: 1.25s]---> 593 ---[P: 7.50s]---> 6   tree:6
===== Report (parsing of corpus file 'sentences' [Inc[11]]) =====
 5 sentences parsed
 3 success (60%)
 2 failure (40%)
 Total time: 9.50s
 Filter time: 1.27s
 Parsing time: 7.50s
=============================================

Parsing into dependencies structures

leopar -corpus sentences -dep complete -timeout 5

produces:

COMMAND : leopar -corpus sentences -dep complete -timeout 5
HOSTNAME: Brunos-MacBook-Pro.local
Léo part.	+++	[Inc[11]] "Léo part.", 14 ---[F: 0.00s]---> 1 ---[P: 0.00s]---> 1   dep:1
Il lui donne une lettre.	+++	[Inc[11]] "Il lui donne une lettre.", 3480 ---[F: 0.01s]---> 1 ---[P: 0.00s]---> 3   dep:1
Pas de solution sans point	---	[Inc[11]] "Pas de solution sans point", 9135 ---[F: 0.00s]---> 0 ---[P: 0.00s]---> 0   dep:0
not in the lexicon  Failed: tokenize [not; the; lexicon]
Si la phrase est trop longue, il faut du temps pour que le système l'analyse.  Failed: timeout
===== Report (parsing of corpus file 'sentences' [Inc[11]]) =====
 5 sentences parsed
 2 success (40%)
 3 failure (60%)
 Total time: 5.81s
 Filter time: 0.02s
 Parsing time: 0.00s
=============================================