lingcorpora: documentation¶
This package includes API for more than 20 online text corpora. Comprehensive list of supported corpora is available below in the Contents section.
R version of this package by George Moroz is located here.
About¶
The project is maintained by Ekaterina Gerasimenko and Artyom Kopetskiy. At different stages, the following people have contributed to the project: Alexey Koshevoy, Anna Klezovich, Anna Zueva, Diana Malyshok, George Moroz, Maria Terekhina, Mark Sobolev, Michael Voronov and Ustinya Kosheleva.
Contents¶
- API
- Corpus List
- Adyghe Сorpus
- Albanian Сorpus
- Almaty Corpus of the Kazakh Language
- Bamana Corpus
- Buryat Сorpus
- Chinese Corpus
- Danish Corpus
- Eastern Armenian Corpus
- Estonian Corpus
- German Corpus
- Georgian Monolingual Corpus
- Hindi Corpus
- JuKuu: Chinese-English Subcorpus
- Kalmyk Corpus
- Maninka Corpus
- Mongolian Corpus
- Modern Greek Corpus
- Modern Yiddish Corpus
- National Corpus of Russian
- National Corpus of Russian: Parallel Subcorpus
- Polish-Russian Parallel Corpus
- Tatar Corpus
- Udmurt Сorpus
Contributing¶
Reporting a bug & requesting functionality¶
You can report a bug, ask a question or suggest adding features via Issues in the repository.
Making new corpora and proposing changes¶
You are welcome to suggest your improvements to the source code and make APIs for more corpora. You should propose changes via pull requests. Your code should fit into the overall structure, which is described in the guide in the following guide: