Set of Python tools to analyse and classify text regarding topics and entities (thanks to natural processing language features).
Open source projects generate several kinds of valuable documents which due to their own nature are not easily accessible for members of the project as structured sources of information. Email messages, IRC or instant messaging conversations, forum posts, wiki pages, or comments on a software bug are examples of these documents. All contain non structured natural language and all of them are stored as free running text.
he final goal is to use as much linguistic information as possible in order to be able to annotate instances of concepts and entities within free running text, taking advantage of the fact that most information generated during the development of a FLOSS project belongs to a very specific sub-language, that is, to a conceptual, lexical, and collocational sub-set of the natural language, which can be formally described by means of ontologies, dictionaries, gazetteers, custom word lists, and other finite lexical and semantic resources.