BaseInput¶

class BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]: Superclass representing a basic columnar file.

Methods

`BaseInput.__init__`(file[, file_type, ...])	Constructor for the BaseInput class.
`BaseInput.assemble`([mapper, skip_curly_braces])	Assembles the HED strings.
`BaseInput.column_metadata`()	Return the metadata for each column.
`BaseInput.combine_dataframe`(dataframe)	Combine all columns in the given dataframe into a single HED string series,
`BaseInput.convert_to_form`(hed_schema, tag_form)	Convert all tags in underlying dataframe to the specified form.
`BaseInput.convert_to_long`(hed_schema)	Convert all tags in underlying dataframe to long form.
`BaseInput.convert_to_short`(hed_schema)	Convert all tags in underlying dataframe to short form.
`BaseInput.expand_defs`(hed_schema, def_dict)	Shrinks any def-expand found in the underlying dataframe.
`BaseInput.get_column_refs`()	Return a list of column refs for this file.
`BaseInput.get_def_dict`(hed_schema[, ...])	Return the definition dict for this file.
`BaseInput.get_worksheet`([worksheet_name])	Get the requested worksheet.
`BaseInput.reset_mapper`(new_mapper)	Set mapper to a different view of the file.
`BaseInput.set_cell`(row_number, ...[, tag_form])	Replace the specified cell with transformed text.
`BaseInput.shrink_defs`(hed_schema)	Shrinks any def-expand found in the underlying dataframe.
`BaseInput.to_csv`([file])	Write to file or return as a string.
`BaseInput.to_excel`(file)	Output to an Excel file.
`BaseInput.validate`(hed_schema[, ...])	Creates a SpreadsheetValidator and returns all issues with this file.

Attributes

`BaseInput.EXCEL_EXTENSION`
`BaseInput.TEXT_EXTENSION`
`BaseInput.columns`	Returns a list of the column names.
`BaseInput.dataframe`	The underlying dataframe.
`BaseInput.dataframe_a`	Return the assembled dataframe Probably a placeholder name.
`BaseInput.has_column_names`	True if dataframe has column names.
`BaseInput.loaded_workbook`	The underlying loaded workbooks.
`BaseInput.name`	Name of the data.
`BaseInput.needs_sorting`	Return True if this both has an onset column, and it needs sorting.
`BaseInput.onsets`	Return the onset column if it exists.
`BaseInput.series_a`	Return the assembled dataframe as a series.
`BaseInput.series_filtered`	Return the assembled dataframe as a series, with rows that have the same onset combined.
`BaseInput.worksheet_name`	The worksheet name.

BaseInput.__init__(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]¶

Constructor for the BaseInput class.

Parameters:

file (str or file-like or pd.Dataframe) – An xlsx/tsv file to open.
file_type (str or None) – “.xlsx” (Excel), “.tsv” or “.txt” (tab-separated text). Derived from file if file is a filename. Ignored if pandas dataframe.
worksheet_name (str or None) – Name of Excel workbook worksheet name to use. (Not applicable to tsv files.)
has_column_names (bool) – True if file has column names. This value is ignored if you pass in a pandas dataframe.
mapper (ColumnMapper or None) – Indicates which columns have HED tags. See SpreadsheetInput or TabularInput for examples of how to use built-in a ColumnMapper.
name (str or None) – Optional field for how this file will report errors.
allow_blank_names (bool) – If True, column names can be blank

Raises:

HedFileError –

file is blank.
An invalid dataframe was passed with size 0.
An invalid extension was provided.
A duplicate or empty column name appears.
Cannot open the indicated file.
The specified worksheet name does not exist.
If the sidecar file or tabular file had invalid format and could not be read.

BaseInput.assemble(mapper=None, skip_curly_braces=False)[source]¶

Assembles the HED strings.

Parameters:

mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.
skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

Dataframe

BaseInput.column_metadata()[source]¶

Return the metadata for each column.

Returns:: Number/ColumnMeta pairs.
Return type:: dict

static BaseInput.combine_dataframe(dataframe)[source]¶

Combine all columns in the given dataframe into a single HED string series,: skipping empty columns and columns with empty strings.

Parameters:: dataframe (Dataframe) – The dataframe to combin
Returns:: The assembled series.
Return type:: Series

BaseInput.convert_to_form(hed_schema, tag_form)[source]¶

Convert all tags in underlying dataframe to the specified form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.
tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

BaseInput.convert_to_long(hed_schema)[source]¶

Convert all tags in underlying dataframe to long form.

Parameters:: hed_schema (HedSchema or None) – The schema to use to convert tags.

BaseInput.convert_to_short(hed_schema)[source]¶

Convert all tags in underlying dataframe to short form.

Parameters:: hed_schema (HedSchema) – The schema to use to convert tags.

BaseInput.expand_defs(hed_schema, def_dict)[source]¶

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.
def_dict (DefinitionDict) – The definitions to expand.

BaseInput.get_column_refs()[source]¶

Return a list of column refs for this file.

Default implementation returns none.

Returns:: A list of unique column refs found.
Return type:: column_refs(list)

BaseInput.get_def_dict(hed_schema, extra_def_dicts=None)[source]¶

Return the definition dict for this file.

Note: Baseclass implementation returns just extra_def_dicts.

Parameters:

hed_schema (HedSchema) – Identifies tags to find definitions(if needed).
extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

BaseInput.get_worksheet(worksheet_name=None)[source]¶

Get the requested worksheet.

Parameters:: worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.
Returns:: The workbook request.
Return type:: openpyxl.workbook.Workbook

Notes

If None, returns the first worksheet.

Raises:

KeyError –

The specified worksheet name does not exist.

BaseInput.reset_mapper(new_mapper)[source]¶

Set mapper to a different view of the file.

Parameters:: new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

BaseInput.set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')[source]¶

Replace the specified cell with transformed text.

Parameters:

row_number (int) – The row number of the spreadsheet to set.
column_number (int) – The column number of the spreadsheet to set.
new_string_obj (HedString) – Object with text to put in the given cell.
tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:

ValueError –
- There is not a loaded dataframe.
KeyError –
- The indicated row/column does not exist.
AttributeError –
- The indicated tag_form is not an attribute of HedTag.

BaseInput.shrink_defs(hed_schema)[source]¶

Shrinks any def-expand found in the underlying dataframe.

Parameters:: hed_schema (HedSchema or None) – The schema to use to identify defs.

BaseInput.to_csv(file=None)[source]¶

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

None or str

Raises:

OSError –

Cannot open the indicated file.

BaseInput.to_excel(file)[source]¶

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:

ValueError –
- If empty file object was passed.
OSError –
- Cannot open the indicated file.

BaseInput.validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None)[source]¶

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:

hed_schema (HedSchema) – The schema to use for validation.
extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.
name (str) – The name to report errors from this file as.
error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

issues (list of dict)

BaseInput.EXCEL_EXTENSION = ['.xlsx']¶

BaseInput.TEXT_EXTENSION = ['.tsv', '.txt']¶

BaseInput.columns¶

Returns a list of the column names.

Empty if no column names.

Returns:: The column names.
Return type:: columns(list)

BaseInput.dataframe¶: The underlying dataframe.

BaseInput.dataframe_a¶

Return the assembled dataframe Probably a placeholder name.

Returns:: the assembled dataframe
Return type:: Dataframe

BaseInput.has_column_names¶: True if dataframe has column names.

BaseInput.loaded_workbook¶: The underlying loaded workbooks.

BaseInput.name¶: Name of the data.

BaseInput.needs_sorting¶: Return True if this both has an onset column, and it needs sorting.

BaseInput.onsets¶: Return the onset column if it exists.

BaseInput.series_a¶

Return the assembled dataframe as a series.

Returns:: the assembled dataframe with columns merged.
Return type:: Series

BaseInput.series_filtered¶

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:: the assembled dataframe with columns merged, and the rows filtered together.
Return type:: Series or None

BaseInput.worksheet_name¶: The worksheet name.