Corpus
object (search
method)PageParser
class (see below) which has method .extract()
.PageParser.extract()
is a generator (see yield
in Python) of Target
objects (individual hits).PageParser
inherits from Container
, which is a class in params_container.py
and contains all possible parameters for corpora.Target
objects are collected in search
(in the Corpus
object) into the Result
object.PageParser
objectContainer
and Container
constructor is called in __init__
(see example below)extract()
which yield
s Target
objectsPageParser
should be encapsulated (add to underscores __
to their names)Target
object the following information:text
) - stringidxs
) of the target in the sentence: l
and r
such that target == text[l:r]
- tuplemeta
) (document name, author, year, etc.) - string. If there is no meta, then pass empty stringtags
) - dict. If there are no tags, pass empty dicttransl
) - translation from queryLanguage
to another languagelang
) - the other language (not queryLanguage
) in the example pair__doc__
and the author __author__
before PageParser
corpora
directory. langcode stands for ISO 639-3 code<dict>
named TEST_DATA
(see template below for details) params_container.py
and add this parameter to the arguments (do not forget default value) and attributes.from params_container import Container
from target import Target
__author__ = ''
__doc__ = \
"""
"""
# <dict> of querying data passed to `Corpus.search` as kwargs while testing
# keys and types to be preserved
TEST_DATA = {'test_single_query': {'query': <str>, ...}, # {arg: value, ...}
'test_multi_query': {'query': [<str 1>, <str 2>, ... <str N>], ...} # {arg: value, ...}
}
class PageParser(Container):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# inner auxiliary attributes:
# self.__page = None
# self.__pagenum = 0
# ...
def any_method_for_getting_the_results(self):
pass
# ...
def any_method_for_getting_the_results_10(self):
pass
def extract(self):
"""
--- Generator of found occurrences as `Target` types
Query.search() uses this method---
"""
# ...
# for each occurrence found we pass `Target` object,
# describing the occurrence, to Query.search()
# for parallel corpora also transl and lang
for text, idxs, meta, tags in found:
yield Target(text, idxs, meta, tags)