RemapColumnsOp

class RemapColumnsOp(parameters)[source]

Map values in m columns in a columnar file into a new combinations in n columns.

Required remodeling parameters:
  • source_columns (list): The key columns to map (m key columns).

  • destination_columns (list): The destination columns to have the mapped values (n destination columns).

  • map_list (list): A list of lists with the mapping.

  • ignore_missing (bool): If True, entries whose key column values are not in map_list are ignored.

Optional remodeling parameters:

integer_sources (list): Source columns that should be treated as integers rather than strings.

Notes

Each list element list is of length m + n with the key columns followed by mapped columns.

TODO: Allow wildcards

Methods

RemapColumnsOp.__init__(parameters)

Constructor for the remap columns operation.

RemapColumnsOp.do_op(dispatcher, df, name[, ...])

Remap new columns from combinations of others.

RemapColumnsOp.validate_input_data(parameters)

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Attributes

RemapColumnsOp.NAME

RemapColumnsOp.PARAMS

RemapColumnsOp.__init__(parameters)[source]

Constructor for the remap columns operation.

Parameters:

parameters (dict) – Parameter values for required and optional parameters.

RemapColumnsOp.do_op(dispatcher, df, name, sidecar=None)[source]

Remap new columns from combinations of others.

Parameters:
  • dispatcher (Dispatcher) – Manages the operation I/O.

  • df (DataFrame) – The DataFrame to be remodeled.

  • name (str) – Unique identifier for the dataframe – often the original file path.

  • sidecar (Sidecar or file-like) – Not needed for this operation.

Returns:

A new dataframe after processing.

Return type:

Dataframe

Raises:

ValueError

  • If ignore_missing is False and source values from the data are not in the map.

static RemapColumnsOp.validate_input_data(parameters)[source]

Validates whether operation parameters meet op-specific criteria beyond that captured in json schema.

Example: A check to see whether two input arrays are the same length.

Notes: The minimum implementation should return an empty list to indicate no errors were found.

If additional validation is necessary, method should perform the validation and return a list with user-friendly error strings.

RemapColumnsOp.NAME = 'remap_columns'
RemapColumnsOp.PARAMS = {'additionalProperties': False, 'properties': {'destination_columns': {'description': 'The columns to insert new values based on a key lookup of the source columns.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}, 'ignore_missing': {'description': 'If true, insert missing source columns in the result, filled with n/a, else error.', 'type': 'boolean'}, 'integer_sources': {'description': 'A list of source column names whose values are to be treated as integers.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'map_list': {'description': 'An array of k lists each with m+n entries corresponding to the k unique keys.', 'items': {'items': {'type': ['string', 'number']}, 'minItems': 1, 'type': 'array'}, 'minItems': 1, 'type': 'array', 'uniqueItems': True}, 'source_columns': {'description': 'The columns whose values are combined to provide the remap keys.', 'items': {'type': 'string'}, 'minItems': 1, 'type': 'array'}}, 'required': ['source_columns', 'destination_columns', 'map_list', 'ignore_missing'], 'type': 'object'}