Sunday, 30 August 2015

Dependency Parsing in Stanford CoreNLP


If you are working on Natural language Processing, this post will be useful for triplet Extraction from the documents.
Here we assume, you have basic knowledge about Part-of-Speech tagging, tokens etc. concepts.  Let’s discuss about Dependency Parsing first.

Stanford Dependency Parsing:
Stanford dependencies provide a representation of grammatical relations between words in a sentence. These dependencies are triplets : Name of the relation, governor and dependent.
Here is an example sentence :
Bell,based in Los Angeles, makes and distributes electronic, computer and building products.

We can see that  “the subject for verb ‘distributes’ is Bell.”  For the above sentence, Stanford dependencies(SD) representation is :

     nsubj(makes-8, Bell-1)
     nsubj(distributes-10, Bell-1)
     vmod(Bell-1, based-3)
     nn(Angeles-6, Los-5)
     prep_in(based-3, Angeles-6)
     root(ROOT-0, makes-8)
     conj_and(makes-8, distributes-10)
     amod(products-16, electronic-11)
     conj_and(electronic-11, computer-13)
     amod(products-16, computer-13)
     conj_and(electronic-11, building-15)
     amod(products-16, building-15)
     dobj(makes-8, products-16)





In above representation, first term is dependency tag, which represents the relation between governor(2nd term) and dependent(3rd term) .
There are various dependency tags, which are listed in the Stanford Dependency manual.

Following are two type of dependencies :
  •  Basic/Non Collapased: This representation gives the basic dependencies as well as the extra ones (which break the tree structure), without any collapsing or propagation of conjuncts. Eg.
                prep(based-7, in-8)
                pobj(in-8, LA-9) 
  •  Collapased : In the collapsed representation, dependencies involving prepositions, conjuncts, as well as information about the referent of relative clauses are collapsed to get direct dependencies between content words. For instance, the dependencies involving the preposition “in” in the above example will be collapsed into one single relation:
               prep(based-7, in-8)
               pobj(in-8, LA-9) 
         will become :  prep_in(based-7, LA-9)

Now we’ll see, how can we get these using JAVA Code.

import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

class ParserDemo {
                public static void main(String[] args) {
                                LexicalizedParser lp = LexicalizedParser
                                                                .loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
                                lp.setOptionFlags(new String[] { "-maxLength", "80",
                                                                "-retainTmpSubcategories" });
                                String[] sent = { "This", "is", "an", "easy", "sentence", "." };
                                List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
                                Tree parse = lp.apply(rawWords);
                                parse.pennPrint();
                                System.out.println();
                               
                                TreebankLanguagePack tlp = new PennTreebankLanguagePack();
                                GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
                                GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
                                List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
                                System.out.println(tdl);
                                TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
                                tp.printTree(parse);
                }
}


 Now you can easily extract the triplets from document. You can find the example code in github repo.


19 comments:

  1. Hello Every one,
    I am new to dependency parsing. Anyone help me regarding the following in MALT or MST parser
    1. what are the features for training.
    2. How to train sentences using features
    Please anyone explain the above two cases with examples.

    ReplyDelete
  2. In MALT or MST Parser how to get the score of each tree

    ReplyDelete
  3. • Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating IOT Online Training

    ReplyDelete
  4. Artificial Intelligence Companies in bangalore is the field of computer science dedicated to solving cognitive problems commonly associated with human intelligence, such as learning, problem solving, and pattern recognition.

    artificial intelligence companies in indiaMany business certainly expect AI to be disruptive Technology in the coming days or Testing the Waters currently, but why wait when the technology is already transforming every aspect of the way an organization operates?

    ReplyDelete