User guide#

Introduction#

Granta MI Data Flow can trigger Python scripts at the beginning or end of a workflow step. These Python scripts can execute custom business logic, including interacting with Granta MI systems using the Granta MI Scripting Toolkit or PyGranta suite of packages. Some typical use cases include:

Populating attributes with values computed within the Python script
Analyzing data
Generating links to other records
Interacting with external systems

The dataflow-extensions package includes the code required to process the state information from Data Flow, to pass log information back to Data Flow, and to return execution to Data Flow once the script is complete.

It also includes examples which demonstrate use of the library. These examples can be extended to incorporate business logic for specific use cases. In particular, the Standalone example gives a detailed description of the core components of a typical dataflow-extensions script.

The rest of this user guide provides more detail around specific aspects of the interaction with Data Flow.

Integration with MI Data Flow#

This package is designed to be used with Granta MI Data Flow. The integration works as follows:

At a defined point in a workflow MI Data Flow triggers a Python script and pauses the workflow until the script resumes the workflow.
The Python script executes, potentially utilizing additional Ansys or third-party Python packages.
At a defined point in the Python script (generally the end), the Python script instructs MI Data Flow to resume the workflow.
The Python script ends.

MI Data Flow payload#

When MI Data Flow triggers a Python script, it provides context about the current state of the workflow as a JSON- formatted string. This string is referred to as the ‘MI Data Flow payload’, and an example is given below:

{
    "WorkflowId": "67eb55ff-363a-42c7-9793-df363f1ecc83",
    "WorkflowDefinitionId": "Example; Version=1.0.0.0",
    "TransitionName": "Python_83e51914-3752-40d0-8350-c096674873e2",
    "Record": {
        "Database": "MI_Training",
        "Table": "Metals Pedigree",
        "RecordHistoryGuid": "d2f51a3d-c274-4a1e-b7c9-8ba2976202cc",
    },
    "WorkflowUrl": "http://my_server_name/mi_dataflow",
    "AuthorizationHeader": "",
    "ClientCredentialType": "Windows",
    "Attributes": {
        "Record": {"Value": ["d2f51a3d-c274-4a1e-b7c9-8ba2976202cc+MI_Training"]},
        "TransitionId": {"Value": "9f1bf6e7-0b05-4cd3-ac61-1d2d11a1d351"},
    },
    "CustomValues": {},
}

This payload includes the following information:

Internal data flow identifiers, including:
- Workflow ID
- Workflow definition ID
- Transition name
Workflow record reference and table name
MI Data Flow server access URL
Server authorization information
Custom values defined in the workflow definition

Note

If MI Data Flow is configured in Basic or OIDC authentication mode, the server authorization information contains an obfuscated username and password or an OIDC refresh token respectively. In these configurations, the payload should be treated as confidential.

When a dataflow-extensions-based Python script is launched by MI Data Flow, the MIDataflowIntegration constructor automatically parses the payload from stdin. However, when developing and debugging a dataflow-extensions-based script, it is recommended to run and debug the script separate to Data Flow by first generating a Data Flow payload, and then using it to instantiate the MIDataflowIntegration class. These steps are described in Business logic development best practice.

Recommended script structure#

These are the recommended components of a script that makes use of dataflow-extensions:

`main()`#

Instantiates the MIDataflowIntegration class directly, which parses the data passed into this script via stdin by MI Data Flow. Executes the business logic in step_logic(), and resumes the workflow once the business logic has completed:

def main():
    """
    Initializes the Data Flow integration module, runs the business logic,
    and cleans up once execution has completed.
    """
    dataflow_integration = MIDataflowIntegration()

    try:
        step_logic(dataflow_integration)
        exit_code = 0
    except Exception:
        traceback.print_exc()
        exit_code = 1
    dataflow_integration.resume_bookmark(exit_code)

`testing()`#

Instantiates the MIDataflowIntegration class from a static payload defined within the function. Executes the business logic in step_logic():

def testing():
  """Contains a static copy of an MI Data Flow data payload for testing purposes"""

  dataflow_payload = { ... }

  dataflow_integration = MIDataflowIntegration.from_dict_payload(
    dataflow_payload=dataflow_payload,
    use_https=False,
  )
  step_logic(dataflow_integration)

`step_logic()`#

Contains the actual business logic for the step. In the initial example, the business logic just logs the payload:

def step_logic(dataflow_integration):
    """Contains the business logic to be executed as part of the workflow.

    Replace the code in this module with your custom business logic."""

    payload = dataflow_integration.get_payload_as_string(
        include_credentials=False,
    )
    logger.info("Writing dataflow payload.")
    logger.info(payload)

Either main() or testing() should be executed when running the script. Python best practice is to use an if __name__ == "main" block, such as:

if __name__ == "__main__":
    # main()  # Used when running the script as part of a workflow
    testing()  # Used when testing the script manually

In this state, the script runs the testing() function for testing separately to MI Data Flow. To switch the code to run the main() function, un-comment the main() line and comment the testing() line:

if __name__ == "__main__":
    main()  # Used when running the script as part of a workflow
    # testing()  # Used when testing the script manually

This code now expects the payload to be provided via stdin.

To see all these script components together as a single example, see Standalone example.

Business logic development best practice#

The steps below assume you are proficient in the use of MI Data Flow Designer and MI Data Flow Manager, and already have a workflow fully defined with all required features apart from Python script execution. For more information on working with MI Data Flow Designer, see the Granta MI Data Flow Designer documentation.

Obtaining an MI Data Flow payload for a workflow step#

Copy one of the example scripts and using it to obtain a JSON-formatted Data Flow payload, which makes development much more straightforward. The steps to obtain the payload are described below:

Copy the code block from the Standalone example or Granta MI Scripting Toolkit example to a local .py file.
- If you are starting from the Scripting Toolkit example, you must make sure that Scripting Toolkit is installed.
- If you plan to develop PyGranta-based business logic, start from the Standalone example.

Switch the script to ‘main’ mode by commenting testing() in the if __name__ == "__main__": block and un-commenting main():

if __name__ == "__main__":
    main()  # Used when running the script as part of a workflow
    # testing()  # Used when testing the script manually

Upload the script into MI Data Flow Designer, and add it to the Start or End Script sections for the relevant step.
Run the workflow step once in MI Data Flow Manager.
Obtain the payload:
- If you started from the Standalone example, obtain the payload from the Data Flow log. See Logging and debugging for log file locations.
- If you started from the Scripting Toolkit example, obtain the payload from the Additional Processing Notes attribute.

You should now have a JSON-formatted string which contains information specific to your deployment of Granta MI, including the Data Flow web server URL and internal workflow identifiers.

Developing business logic#

Now the MI Data Flow payload has been obtained, it can be used to test your custom business logic separate to the workflow. This makes it much faster to re-run the script, and allows running and debugging the script in an IDE. The steps to use this payload to develop your custom business logic are described below:

Optional: If you are planning to develop a PyGranta-based script, replace the code you copied previously with the PyGranta RecordLists example, and modify the PyGranta library as required.

Paste the payload JSON into the testing() function:

def testing():
  """Contains a static copy of an MI Data Flow data payload for testing purposes"""

  # Paste payload below
  dataflow_payload = { ... }

  # Call MIDataflowIntegration constructor with "dataflow_payload" argument
  # instead of reading data from MI Data Flow.
  dataflow_integration = MIDataflowIntegration.from_dict_payload(
    dataflow_payload=dataflow_payload,
    use_https=False,
  )
  step_logic(dataflow_integration)

See the documentation for the get_payload_as_string() and get_payload_as_dict() methods for more information, including how to handle Basic and OIDC authentication.

Switch back to ‘testing’ mode by commenting main() in the if __name__ == "__main__": block and un-commenting testing():

if __name__ == "__main__":
    # main()  # Used when running the script as part of a workflow
    testing()  # Used when testing the script manually

Add your specific logic to the step_logic function and test locally.
Once the business logic is implemented, switch back to main() in the in the if __name__ == "__main__": block, re-upload the file into MI Data Flow Designer, and re-add it to the Start or End Script sections.
Update the workflow and test from within MI Data Flow Manager.

Repeat steps 3 to 6 as required.

Logging and debugging#

It is generally required to log outputs from scripts to help with debugging and to understand the inner state of the script. These use cases apply to this package as well, but because the script is executed as part of MI Data Flow, the recommended best practices are different to those of a conventional Python script.

This package supports multiple approaches to logging, summarized in the table below and detailed in the corresponding sections:

Type

Details

Usage options

Stream logging

Recommended for general logging.

The stdout and stderr streams are collected by MI Data Flow on script completion and included in the central Data Flow log. The streams are also available as files in the script working directory on the Granta MI server.

Use the built-in Python logging module with a logging.StreamHandler.
Use the built-in print() function.

Direct logging

Recommended for specific logging in scripts expected to run for a few minutes or more.

Log messages can be sent directly to MI Data Flow via the Data Flow API. Log messages sent to MI Data Flow are available immediately, and can be viewed both for the workflow instance via the MI Data Flow Manager Dashboard, and also in the central Data Flow logs.

This allows the script to report progress during execution.

Send individual log messages to MI Data Flow using log_msg_to_instance().
Use get_api_log_handler() to return a custom MIDataflowApiLogHandler that sends log records to MI Data Flow.

Logs are found in the following locations:

Type	Location
Data Flow central logs	Via the API at `http://my.server.name/mi_dataflow/api/logs` On the Granta MI server in the `ProgramData` folder
Individual workflow instance logs	Available in the MI Data Flow Manager Dashboard for the workflow instance. Available in the working directory associated with the run. See Python script working directory.

Stream logging#

Using `print()`#

A very simple approach to logging the output of a script is to use the print() function to write text to the terminal. This approach can be used with this package, and any printed messages are visible in the central Data Flow log after the script terminates.

Using the `logging` library#

Using the print() function offers limited control around log format and message filtering. Instead, the recommended approach is to use the Python logging module. For more information, see the Python documentation:

The internal operations of this package are logged to a logger with the name ansys.grantami.dataflow_extensions. By default, these messages are not output. To output the messages generated by this package and to add your own log messages, you should:

Create a logger for your script.
Attach a handler to the logger.

stdout and stderr are collected by MI Data Flow and included in the central Data Flow log.

Use the built-in Python logging module to create a logger and write to stderr, which is collected by MI Data Flow and logged centrally on the Granta MI server once the script execution has completed:

# Create an instance of the root logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Add a StreamHandler to write the output to stderr
ch = logging.StreamHandler()
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
ch.setFormatter(formatter)
logger.addHandler(ch)

Direct logging#

It is possible to send log messages directly to MI Data Flow via the Data Flow API. This is useful for long-running scripts where it is desirable to see progress messages in the MI Data Flow Manager Dashboard before the script has completed and the standard streams are collected.

This package provides two mechanisms to send log messages directly to MI Data Flow:

Individual messages#

Log messages can be sent directly to MI Data Flow via log_msg_to_instance(). Log messages sent to MI Data Flow are available immediately. This can be useful to report progress during long-running scripts. For example:

dataflow_integration = MIDataflowIntegration()
dataflow_integration.log_msg_to_instance("Script started", level="Info")

The level argument corresponds to the supported log levels by the Data Flow API.

Using a logging handler#

A custom logging handler is provided by this package via the get_api_log_handler() method. This method returns a MIDataflowApiLogHandler that can be used with the standard logging library:

dataflow_integration = MIDataflowIntegration()

# Create logger
api_logger = logging.getLogger("api_logger")
api_logger.setLevel(logging.INFO)
# Create API log handler
api_log_handler = dataflow_integration.get_api_log_handler()
api_log_handler.setLevel(logging.INFO)
# Add the handler to the logger
api_logger.addHandler(api_log_handler)

# Log messages
api_logger.info("Long running task: 50% complete.") # <- this message is sent to the Data Flow logs via an HTTP request

Log records emitted at standard Python log levels (for example logging.Logger.debug() and logging.Logger.info()) are mapped to supported MI Data Flow log levels. By default, log records emitted with a custom Python log level are not supported by MIDataflowApiLogHandler, but support can be added by defining a subclass with custom log levels.

Warning

Do not attach the MIDataflowApiLogHandler to the root logger. Instead, use a separate named logger.

If the MIDataflowApiLogHandler is attached to the root logger, all log records emitted by the Python instance will be sent to the Data Flow API as individual HTTP requests. This includes the dataflow-extensions package and any other Python dependencies that implement logging.

Logging best practices#

The script run by MI Data Flow should configure a root logger that captures all log messages, and should attach a StreamHandler to write log messages to stderr. stderr is collected by MI Data Flow and included in the central Data Flow log on script completion.

This ensures that log messages generated by both the script and this package are captured and logged centrally.

If additional custom logging is required, additional handlers can be attached to the root logger or to specific loggers as required.

Warning

Using FileHandler objects with Data Flow Python scripts is not recommended.

If you use a FileHandler you must ensure that each instance of the script writes the logs to different file or you may encounter a PermissionError. In certain authentication modes the script executes as the active Data Flow user, and so either multiple users could run the same script concurrently, or a user may try to append to a file created by a different user.

Create a logger#

Python logger objects are hierarchical, and messages are passed from lower level logger objects to higher level ones. The root of the logger hierarchy is the root logger, and contains all messages logged by all loggers in a Python instance.

For single-module scripts generally used with this script, it is recommended to use the root logger directly to ensure that all log messages are included in the output. To create an instance of the root logger and have it capture log messages of logging.DEBUG level and higher, use the following code:

import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

You can then add log statements to the logger at a certain log level as follows:

logger.debug("This is a debug message")
logger.info("This is an info message")

Note

Until a log handler is attached, no log messages are emitted.

Attach a stream handler#

A StreamHandler is used to write log messages to stderr or stdout.

For code using this package, it is best practice to log to stderr, which is collected by dataflow-extensions and included in the central Data Flow log. To add a StreamHandler handler to the root logger from the previous section, use the following code:

ch = logging.StreamHandler()
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

Direct logging (optional)#

To send log messages directly to MI Data Flow, see the Direct logging section.

Python script working directory#

MI Data Flow creates a working directory on the server in %WINDIR%\TEMP\{workflow id}_{8.3}, where {workflow_id} is the workflow ID provided in MI Data Flow Designer when uploading the workflow, and {8.3} is a random set of 8 alphanumeric characters, a period, and 3 alphanumeric characters. This can be found by right-clicking the active workflow in MI Data Flow Manager and selecting ‘View Log’.

This directory includes the two files __stderr__ and __stdout__, which contain the Python stdout and stderr streams and are useful when investigating Python failures during workflow execution before the logger has been initialized.

Note

When the workflow resumes, this folder and all its contents are deleted. They are only persisted if the workflow is manually cancelled.

Supporting files#

It is common for Python scripts to depend on additional supporting files, for example:

Additional Python submodules
Data files, such as JSON or CSV files
Certificate Authority (CA) certificate files

These files can either be stored in a known location on disk and referred to explicitly via an absolute path, or they can be added to the workflow definition in MI Data Flow Designer:

Storing files externally#

If the file is stored externally (for example in a folder C:\DataflowFiles), then you should use the Path class to ensure you are using an absolute path, which is independent of the Python working directory. For example:

my_path = pathlib.Path(r"C:\DataflowFiles\my_data.csv")

Or in the case of providing a custom CA certificate to the MIDataflowIntegration constructor:

my_cert = pathlib.Path(r"C:\DataflowFiles\my_cert.crt")
dataflow = MIDataflowIntegration(certificate_file=my_cert)

The advantage of this approach is that files can easily be shared across workflow definitions, and do not need to be uploaded to each one separately.
The disadvantage is that the files are stored outside of the workflow definition, and do not get automatically uploaded or downloaded from the server when using MI Data Flow Manager.

Storing files within the workflow definition#

If the file is stored within the workflow definition, then MI Data Flow makes these files available on disk at script runtime. To access these files, use the supporting_files_dir property. For example, to access a CSV file which was uploaded as a supporting file to MI Data Flow:

dataflow = MIDataflowIntegration()
my_path = dataflow.supporting_files_dir \ "my_data.csv"

If you are providing a custom CA certificate to the MIDataflowIntegration constructor, the filename can be provided as a string, and dataflow-extensions automatically looks for the file in this location:

my_cert = "my_cert.crt"
dataflow = MIDataflowIntegration(certificate_file=my_cert)

The advantage of this approach is that files are managed by MI Data Flow Designer and are automatically included in the workflow definition if it is uploaded or downloaded and transferred to a different system. However, the disadvantage is that each workflow definition tracks the supporting files separately, and so every workflow needs to be modified separately if a commonly used supporting file is changed.

Warning

This property depends on the use of the sys.path property, specifically that sys.path[0] refers to the location of the executing script. If you intend to use supporting files with your Python scripts, you must not prepend additional paths to the sys.path property.

User guide#

Introduction#

Integration with MI Data Flow#

MI Data Flow payload#

Recommended script structure#

main()#

testing()#

step_logic()#

Business logic development best practice#

Obtaining an MI Data Flow payload for a workflow step#

Developing business logic#

Logging and debugging#

Stream logging#

Using print()#

Using the logging library#

Direct logging#

Individual messages#

Using a logging handler#

Logging best practices#

Create a logger#

Attach a stream handler#

Direct logging (optional)#

Python script working directory#

Supporting files#

Storing files externally#

Storing files within the workflow definition#

`main()`#

`testing()`#

`step_logic()`#

Using `print()`#

Using the `logging` library#