basic_search¶
Utilities to support HED searches based on strings.
Functions
|
Check for balanced parentheses in the given text and returns the unbalanced ones. |
|
Based on an input search query and list of words, return the parenthetical delimiters between them. |
|
Find lines in the series that match the search string and returns a mask. |
|
Extract words in the search string based on their prefixes. |
Reverse a string and flips the parentheses. |
|
|
Verify that the text contains specific words with expected delimiters between them. |
- check_parentheses(text)[source]¶
Check for balanced parentheses in the given text and returns the unbalanced ones.
- Parameters:
text (str) – The text to be checked for balanced parentheses.
- Returns:
A string containing the unbalanced parentheses in their original order.
- Return type:
str
Notes
The function only considers the characters ‘(’ and ‘)’ for balancing.
Balanced pairs of parentheses are removed, leaving behind only the unbalanced ones.
- construct_delimiter_map(text, words)[source]¶
Based on an input search query and list of words, return the parenthetical delimiters between them.
- Parameters:
text (str) – The search query.
words (list) – A list of words we want to map between from the query.
- Returns:
The two-way delimiter map.
- Return type:
dict
- find_matching(series, search_string, regex=False)[source]¶
Find lines in the series that match the search string and returns a mask.
- Syntax Rules:
‘@’: Prefixing a term in the search string means the term must appear anywhere within a line.
‘~’: Prefixing a term in the search string means the term must NOT appear within a line.
- Parentheses: Elements within parentheses must appear in the line with the same level of nesting.
- e.g.: Search string: “(A), (B)” will match “(A), (B, C)”, but not “(A, B)”, since they don’t
start in the same group.
“LongFormTag*”: A * will match any remaining word(anything but a comma or parenthesis)
An individual term can be arbitrary regex, but it is limited to single continuous words.
Notes
- Specific words only care about their level relative to other specific words, not overall.
e.g. “(A, B)” will find: “A, B”, “(A, B)”, (A, (C), B)”, or ((A, B))”
If you have no grouping or anywhere words in the search, it assumes all terms are anywhere words.
The format of the series should match the format of the search string, whether it’s in short or long form.
To enable support for matching parent tags, ensure that both the series and search string are in long form.
- Parameters:
series (pd.Series) – A Pandas Series object containing the lines to be searched.
search_string (str) – The string to search for in each line of the series.
regex (bool) – By default, translate any * wildcard characters to .*? regex. If True, do no translation and pass the words as is. Due to how it’s setup, you must not include the following characters: (),
- Returns:
- A Boolean mask Series of the same length as the input series.
The mask has True for lines that match the search string and False otherwise.
- Return type:
mask (pd.Series)
- find_words(search_string)[source]¶
Extract words in the search string based on their prefixes.
- Parameters:
search_string (str) – The search query string to parse. Words can be prefixed with ‘@’ or ‘~’.
- Returns:
- A list containing three lists:
Words prefixed with ‘@’
Words prefixed with ‘~’
Words with no prefix
- Return type:
list
- reverse_and_flip_parentheses(s)[source]¶
Reverse a string and flips the parentheses.
- Parameters:
s (str) – The string to be reversed and have its parentheses flipped.
- Returns:
The reversed string with flipped parentheses.
- Return type:
str
Notes
The function takes into account only the ‘(’ and ‘)’ characters for flipping.
- verify_search_delimiters(text, specific_words, delimiter_map)[source]¶
Verify that the text contains specific words with expected delimiters between them.
- Parameters:
text (str) – The text to search in.
specific_words (list of str) – Words that must appear relative to other words in the text.
delimiter_map (dict) – A dictionary specifying expected delimiters between pairs of specific words.
- Returns:
True if all conditions are met, otherwise False.
- Return type:
bool