steps

This module contains classes for making “steps” in an ETL flow. Steps can be connected such that a row flows from step to step and each step does something with the row.

class pygrametl.steps.ConditionalStep(condition, whentrue, whenfalse=None, name=None)

Bases: Step

A Step that redirects rows based on a condition.

Arguments:

  • condition: A function f(row) that is evaluated for each row.

  • whentrue: The next step to use if the condition evaluates to a true value. This argument should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on.

  • whenfalse: The Step that rows are sent to when the condition evaluates to a false value. If None, the rows are silently discarded. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.CopyStep(originaldest, copydest, deepcopy=False, name=None)

Bases: Step

A Step that copies each row and passes on the copy and the original

Arguments:

  • originaldest: The Step each given row is passed on to. This argument should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on.

  • copydest: The Step a copy of each given row is passed on to. This argument can be 1) an instance of a Step or 2) the name of a step.

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

  • deepcopy: Decides if the copy should be deep or not. Default: False

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.DimensionStep(dimension, keyfield=None, next=None, name=None)

Bases: Step

A Step that performs ensure(row) on a given dimension for each row.

Arguments:

  • dimension: the Dimension object to call ensure on.

  • keyfield: the name of the attribute that in each row is set to hold the key value for the dimension member

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.GarbageStep(name=None)

Bases: Step

A Step that does nothing. Rows are neither modified nor passed on.

Arguments:

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

process(row)

Perform the Step’s operation on the given row.

If the row is not explicitly redirected (see _redirect), it will be passed on the the next step if this has been set.

class pygrametl.steps.MappingStep(targets, requiretargets=True, next=None, name=None)

Bases: Step

A Step that applies functions to attributes in rows.

Arguments:

  • targets: A sequence of (name, function) pairs. For each element, row[name] is set to function(row[name]) for each row given to the step.

  • requiretargets: A flag that decides if a KeyError should be raised if a name from targets does not exist in a row. If True, a KeyError is raised, if False the missing attribute is ignored and not set. Default: True

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.PrintStep(next=None, name=None)

Bases: Step

A Step that prints each given row.

Arguments:

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.RenamingFromToStep(renaming, next=None, name=None)

Bases: Step

Step that performs renamings of attributes in rows.

Arguments:

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

  • renaming: A dict with pairs (oldname, newname) which will by used by pygrametl.renamefromto to do the renaming

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

pygrametl.steps.RenamingStep

alias of RenamingFromToStep

class pygrametl.steps.RenamingToFromStep(renaming, next=None, name=None)

Bases: RenamingFromToStep

Arguments:

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

  • renaming: A dict with pairs (oldname, newname) which will by used by pygrametl.renamefromto to do the renaming

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.SCDimensionStep(dimension, next=None, name=None)

Bases: Step

A Step that performs scdensure(row) on a given dimension for each row.

Arguments:

  • dimension: the Dimension object to call ensure on.

  • keyfield: the name of the attribute that in each row is set to hold the key value for the dimension member

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

class pygrametl.steps.SourceStep(source, next=None, name=None)

Bases: Step

A Step that iterates over a data source and gives each row to the next step. The start method must be called.

Arguments:

  • source: The data source. Must be iterable.

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

start()

Start the iteration of the source’s rows and pass them on.

class pygrametl.steps.Step(worker=None, next=None, name=None)

Bases: object

The basic class for steps in an ETL flow.

Arguments:

  • worker: A function f(row) that performs the Step’s operation. If None, self.defaultworker is used. Default: None

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

classmethod getstep(name)

Return the Step instance with the given name

name()

Return the name of the Step instance

process(row)

Perform the Step’s operation on the given row.

If the row is not explicitly redirected (see _redirect), it will be passed on the the next step if this has been set.

class pygrametl.steps.ValueMappingStep(outputatt, inputatt, mapping, requireinput=True, defaultvalue=None, next=None, name=None)

Bases: Step

A Step that Maps values to other values (e.g., DK -> Denmark)

Arguments:

  • outputatt: The attribute to write the mapped value to in each row.

  • inputatt: The attribute to map.

  • mapping: A dict with the mapping itself.

  • requireinput: A flag that decides if a KeyError should be raised if inputatt does not exist in a given row. If True, a KeyError will be raised when the attriubte is missing. If False, a the outputatt will be set to defaultvalue. Default: True

  • defaultvalue: The default value to use when the mapping cannot be done. Default: None

  • next: The default next step to use. This should be 1) an instance of a Step, 2) the name of a Step, or 3) None. If if is a name, the next step will be looked up dynamically each time. If it is None, no default step will exist and rows will not be passed on. Default: None

  • name: A name for the Step instance. This is used when another Step (implicitly or explicitly) passes on rows. If two instanes have the same name, the name is mapped to the instance that was created the latest. Default: None

defaultworker(row)

Perform the Step’s operation on the given row.

Inheriting classes should implement this method.

pygrametl.steps.connectsteps(*steps)

Set a.next = b, b.next = c, etc. when given the steps a, b, c, …