BaseInput

class BaseInput(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]

Superclass representing a basic columnar file.

Methods

BaseInput.__init__(file[, file_type, ...])

Constructor for the BaseInput class.

BaseInput.assemble([mapper, skip_curly_braces])

Assembles the HED strings.

BaseInput.column_metadata()

Return the metadata for each column.

BaseInput.combine_dataframe(dataframe)

Combine all columns in the given dataframe into a single HED string series,

BaseInput.convert_to_form(hed_schema, tag_form)

Convert all tags in underlying dataframe to the specified form.

BaseInput.convert_to_long(hed_schema)

Convert all tags in underlying dataframe to long form.

BaseInput.convert_to_short(hed_schema)

Convert all tags in underlying dataframe to short form.

BaseInput.expand_defs(hed_schema, def_dict)

Shrinks any def-expand found in the underlying dataframe.

BaseInput.get_column_refs()

Return a list of column refs for this file.

BaseInput.get_def_dict(hed_schema[, ...])

Return the definition dict for this file.

BaseInput.get_worksheet([worksheet_name])

Get the requested worksheet.

BaseInput.reset_mapper(new_mapper)

Set mapper to a different view of the file.

BaseInput.set_cell(row_number, ...[, tag_form])

Replace the specified cell with transformed text.

BaseInput.shrink_defs(hed_schema)

Shrinks any def-expand found in the underlying dataframe.

BaseInput.to_csv([file])

Write to file or return as a string.

BaseInput.to_excel(file)

Output to an Excel file.

BaseInput.validate(hed_schema[, ...])

Creates a SpreadsheetValidator and returns all issues with this file.

Attributes

BaseInput.EXCEL_EXTENSION

BaseInput.TEXT_EXTENSION

BaseInput.columns

Returns a list of the column names.

BaseInput.dataframe

The underlying dataframe.

BaseInput.dataframe_a

Return the assembled dataframe Probably a placeholder name.

BaseInput.has_column_names

True if dataframe has column names.

BaseInput.loaded_workbook

The underlying loaded workbooks.

BaseInput.name

Name of the data.

BaseInput.needs_sorting

Return True if this both has an onset column, and it needs sorting.

BaseInput.onsets

Return the onset column if it exists.

BaseInput.series_a

Return the assembled dataframe as a series.

BaseInput.series_filtered

Return the assembled dataframe as a series, with rows that have the same onset combined.

BaseInput.worksheet_name

The worksheet name.

BaseInput.__init__(file, file_type=None, worksheet_name=None, has_column_names=True, mapper=None, name=None, allow_blank_names=True)[source]

Constructor for the BaseInput class.

Parameters:
  • file (str or file-like or pd.Dataframe) – An xlsx/tsv file to open.

  • file_type (str or None) – “.xlsx” (Excel), “.tsv” or “.txt” (tab-separated text). Derived from file if file is a filename. Ignored if pandas dataframe.

  • worksheet_name (str or None) – Name of Excel workbook worksheet name to use. (Not applicable to tsv files.)

  • has_column_names (bool) – True if file has column names. This value is ignored if you pass in a pandas dataframe.

  • mapper (ColumnMapper or None) – Indicates which columns have HED tags. See SpreadsheetInput or TabularInput for examples of how to use built-in a ColumnMapper.

  • name (str or None) – Optional field for how this file will report errors.

  • allow_blank_names (bool) – If True, column names can be blank

Raises:

HedFileError

  • file is blank.

  • An invalid dataframe was passed with size 0.

  • An invalid extension was provided.

  • A duplicate or empty column name appears.

  • Cannot open the indicated file.

  • The specified worksheet name does not exist.

  • If the sidecar file or tabular file had invalid format and could not be read.

BaseInput.assemble(mapper=None, skip_curly_braces=False)[source]

Assembles the HED strings.

Parameters:
  • mapper (ColumnMapper or None) – Generally pass none here unless you want special behavior.

  • skip_curly_braces (bool) – If True, don’t plug in curly brace values into columns.

Returns:

The assembled dataframe.

Return type:

Dataframe

BaseInput.column_metadata()[source]

Return the metadata for each column.

Returns:

Number/ColumnMeta pairs.

Return type:

dict

static BaseInput.combine_dataframe(dataframe)[source]
Combine all columns in the given dataframe into a single HED string series,

skipping empty columns and columns with empty strings.

Parameters:

dataframe (Dataframe) – The dataframe to combine

Returns:

The assembled series.

Return type:

Series

BaseInput.convert_to_form(hed_schema, tag_form)[source]

Convert all tags in underlying dataframe to the specified form.

Parameters:
  • hed_schema (HedSchema) – The schema to use to convert tags.

  • tag_form (str) – HedTag property to convert tags to. Most cases should use convert_to_short or convert_to_long below.

BaseInput.convert_to_long(hed_schema)[source]

Convert all tags in underlying dataframe to long form.

Parameters:

hed_schema (HedSchema or None) – The schema to use to convert tags.

BaseInput.convert_to_short(hed_schema)[source]

Convert all tags in underlying dataframe to short form.

Parameters:

hed_schema (HedSchema) – The schema to use to convert tags.

BaseInput.expand_defs(hed_schema, def_dict)[source]

Shrinks any def-expand found in the underlying dataframe.

Parameters:
  • hed_schema (HedSchema or None) – The schema to use to identify defs.

  • def_dict (DefinitionDict) – The definitions to expand.

BaseInput.get_column_refs()[source]

Return a list of column refs for this file.

Default implementation returns none.

Returns:

A list of unique column refs found.

Return type:

column_refs(list)

BaseInput.get_def_dict(hed_schema, extra_def_dicts=None)[source]

Return the definition dict for this file.

Note: Baseclass implementation returns just extra_def_dicts.

Parameters:
  • hed_schema (HedSchema) – Identifies tags to find definitions(if needed).

  • extra_def_dicts (list, DefinitionDict, or None) – Extra dicts to add to the list.

Returns:

A single definition dict representing all the data(and extra def dicts).

Return type:

DefinitionDict

BaseInput.get_worksheet(worksheet_name=None)[source]

Get the requested worksheet.

Parameters:

worksheet_name (str or None) – The name of the requested worksheet by name or the first one if None.

Returns:

The workbook request.

Return type:

openpyxl.workbook.Workbook

Notes

If None, returns the first worksheet.

Raises:

KeyError

  • The specified worksheet name does not exist.

BaseInput.reset_mapper(new_mapper)[source]

Set mapper to a different view of the file.

Parameters:

new_mapper (ColumnMapper) – A column mapper to be associated with this base input.

BaseInput.set_cell(row_number, column_number, new_string_obj, tag_form='short_tag')[source]

Replace the specified cell with transformed text.

Parameters:
  • row_number (int) – The row number of the spreadsheet to set.

  • column_number (int) – The column number of the spreadsheet to set.

  • new_string_obj (HedString) – Object with text to put in the given cell.

  • tag_form (str) – Version of the tags (short_tag, long_tag, base_tag, etc.)

Notes

Any attribute of a HedTag that returns a string is a valid value of tag_form.

Raises:
  • ValueError

    • There is not a loaded dataframe.

  • KeyError

    • The indicated row/column does not exist.

  • AttributeError

    • The indicated tag_form is not an attribute of HedTag.

BaseInput.shrink_defs(hed_schema)[source]

Shrinks any def-expand found in the underlying dataframe.

Parameters:

hed_schema (HedSchema or None) – The schema to use to identify defs.

BaseInput.to_csv(file=None)[source]

Write to file or return as a string.

Parameters:

file (str, file-like, or None) – Location to save this file. If None, return as string.

Returns:

None if file is given or the contents as a str if file is None.

Return type:

None or str

Raises:

OSError

  • Cannot open the indicated file.

BaseInput.to_excel(file)[source]

Output to an Excel file.

Parameters:

file (str or file-like) – Location to save this base input.

Raises:
  • ValueError

    • If empty file object was passed.

  • OSError

    • Cannot open the indicated file.

BaseInput.validate(hed_schema, extra_def_dicts=None, name=None, error_handler=None)[source]

Creates a SpreadsheetValidator and returns all issues with this file.

Parameters:
  • hed_schema (HedSchema) – The schema to use for validation.

  • extra_def_dicts (list of DefDict or DefDict) – All definitions to use for validation.

  • name (str) – The name to report errors from this file as.

  • error_handler (ErrorHandler) – Error context to use. Creates a new one if None.

Returns:

A list of issues for a HED string.

Return type:

issues (list of dict)

BaseInput.EXCEL_EXTENSION = ['.xlsx']
BaseInput.TEXT_EXTENSION = ['.tsv', '.txt']
BaseInput.columns

Returns a list of the column names.

Empty if no column names.

Returns:

The column names.

Return type:

columns(list)

BaseInput.dataframe

The underlying dataframe.

BaseInput.dataframe_a

Return the assembled dataframe Probably a placeholder name.

Returns:

the assembled dataframe

Return type:

Dataframe

BaseInput.has_column_names

True if dataframe has column names.

BaseInput.loaded_workbook

The underlying loaded workbooks.

BaseInput.name

Name of the data.

BaseInput.needs_sorting

Return True if this both has an onset column, and it needs sorting.

BaseInput.onsets

Return the onset column if it exists.

BaseInput.series_a

Return the assembled dataframe as a series.

Returns:

the assembled dataframe with columns merged.

Return type:

Series

BaseInput.series_filtered

Return the assembled dataframe as a series, with rows that have the same onset combined.

Returns:

the assembled dataframe with columns merged, and the rows filtered together.

Return type:

Series or None

BaseInput.worksheet_name

The worksheet name.