Grew main page

Conll syntax for input graph

Graphs used in Grew can be described in the Conll syntax. This is the syntax used in the dependency version of the Sequoia Treebank and the French Treebank.

The input file must use the .conll file extension.

Nodes

Nodes are build using the following fields in the conll syntax:

  • a feature position = n where n is the value of field 1:ID
  • a feature phon = v where v is the value of the field 2:FORM
  • a feature lemma = v where v is the value of the field 3:LEMMA
  • a feature cat = v where v is the value of the field 4:CPOSTAG
  • a feature pos = v where v is the value of the field 5:POSTAG
  • a list of feature f = v for each item f=v in the field 6:FEATS
  • a list of feature f = true for each item f in the field 6:FEATS

For example, in the French Treebank, the following line

6       intervenants    intervenant     N       NC      g=m|n=p|s=c     7       suj     _       _

give the node below:

Graph

In the example page of the CoNLL-X Shared Task desciption:

1   Dat               dat               Pron  Pron  aanw|neut|attr                   2   det     _  _

give the node below:

Graph

Edges

Edges are interpreted in a straightforward way.

Example from the Sequoia Treebank

1	Depuis	depuis	P	P	_	8	mod	8	mod
2	quarante-huit	quarante-huit	D	DET	s=card	3	det	3	det
3	heures	heure	N	NC	g=f|n=p|s=c	1	obj	1	obj
4	,	,	PONCT	PONCT	s=w	8	ponct	8	ponct
5	le	le	D	DET	g=m|n=s|s=def	6	det	6	det
6	redoux	redoux	N	NC	g=m|s=c	8	suj	8	suj
7	a	avoir	V	V	m=ind|n=s|p=3|t=pst	8	aux_tps	8	aux_tps
8	fait	faire	V	VPP	g=m|m=part|n=s|t=past	0	root	0	root
9	son	son	D	DET	n=s|s=poss	10	det	10	det
10	apparition	apparition	N	NC	g=f|n=s|s=c	8	obj	8	obj
11	.	.	PONCT	PONCT	s=s	8	ponct	8	ponct

Dep2pict

Conll

The domain definition for the corpus Sequoia version 6.0 is given below.

features {
  lemma: *;
  phon: *;
  sentid: *;
 
  n: s,p;
  g: f,m;
  p: "1","2","3";
 
  m: imp, ind, inf, subj, part;
  s: refl,p,c,w,ind,def,s, qual, card, suj, neg, dem, int, poss, rel, ord, int, obj, pers, part;
  t: pst, impft, past, fut, cond;
 
  cat: V, N, C, CL, P, I, PONCT, A, ADV, PRO, D, ET, "P+D", "P+PRO", PREF;
 
  pos:
        ADJ, ADJWH,
        ADV, ADVWH,
        CC, CS,
        CLO, CLR, CLS,
        DET, DETWH,
        ET,
        I,
        NC, NPP,
        P,
        "P+D",
        "P+PRO",
        PONCT,
        PREF,
        PRO, PROREL, PROWH,
        V, VIMP, VINF, VPP, VPR, VS;
 
  fctpath: *;
}
 
labels {
  root,
  suj, obj, de_obj, a_obj, p_obj.o, p_obj.agt, ats, ato,
  aux.tps, aux.pass, aux.caus, aff, aff.demsuj,
  mod, mod.rel, mod.app, mod.inc, mod.cleft, mod.voc, dis,
  coord, arg, dep.coord, det, ponct, dep, obj.p, obj.cpl, 
}