Context SQL Sync

The Context SQL Sync REST service allows importation (either one-time or by scheduling a continuous job) of context data from an SQL database (PostgreSQL, MySQL, or MS SQL) to TrendMiner.

To conduct a synchronization, the service requires two components: a datasource and a query. The datasource defines the connection to the SQL database, from which the data will be fetched. The query provides the SQL query (along with other details as explained later in this document) that will be used to select rows based on a specified time interval. The Context SQL Sync service will create a new context item in TrendMiner (or update an existing item) from each row selected by the query.

Finally, when doing a live sync, a third component is required - a sync job. The sync job encapsulates details, such as the datasource and query that should be used, and the sync frequency, etc.

The steps to sync context data are:

  • Request an access token for authentication
  • Define a datasource
  • Define a query
  • Perform a one-time sync
    - or -
  • Define and execute a continuous sync job

Contents

Authentication

Currently, only OAuth2 authentication is supported.

An access token can be requested with the TrendMiner security API.

Method: POST
Path: /security/oauth/token
Authorization header: Basic dHJlbmRtaW5lckFwaTpjdmJwa3R6SmVGUENLVk10dFN0ZVQ1ejNlYXlkeUZaYQ==
Body

grant_type: password
username: [TrendMiner username]
password: [TrendMiner password]

Context Sync API

A swagger interface is shipped with your appliance and is available on:

<TM-URL>/context-sql-sync/swagger-ui.html#/

When using the swagger interface, an authentication can be done through the browser, by logging in the Trendhub where the swagger is hosted, i.e.:

<TM-URL>

Note: When executing requests in swagger the values in the request body should not contain new lines and tabs. This is especially important when defining the SQL query (see below), because this query is usually very long.

Define a datasource

Note: see Kerberos and Integrated Authentication section, when datasource with integrated security should be used.

The datasource contains the details necessary to connect to the SQL database, such as the type of the database, the address and port, and the credentials which should be used to authenticate. New datasources can be defined by the API:

Method: POST
Path: /context-sql-sync/datasource
Authorization header: Bearer [access_token]
Body
{
  "name": "localdb",
  "jdbcUrl": "jdbc:postgresql://localhost:5432/tm_context_sql_sync",
  "dbType": "POSTGRES",
  "username": "postgres",
  "password": "postgres"
}
Parameter Description
name

An arbitrary string.

jdbcUrl

A JDBC url, specifying the connection to the database. E.g.: jdbc:mysql://192.0.2.1:3306/database jdbc:postgresql://localhost:5432/database jdbc:sqlserver://localhost:1433;databaseName=database

dbType

Specifies the type of the database. The supported values are:

MYSQL
POSTGRES
MSSQL

Contact your TrendMiner CSM in case of different databases to see how we could help out.

username/password The credentials used to access the database; can be omitted if the database does not require authentication.

The result will contain the id of the newly created datasource. This id will be used later when executing a historical sync or when defining sync jobs.

Define a Query

New queries can be defined by the API:

Method: POST
Path: /context-sql-sync/query
Authorization header: Bearer [access_token]
Body
{
  "name": "queryAlias",
  "supportsParameters": true,
  "supportsNamedParameters": true,
  "componentLookupStrategy": "ASSETS|TAGS|ASSETS_THEN_TAGS",
  "fieldNames": "customField1, customField2",
  "validateFieldNames": true,
  "timeZone": "string",
  "sqlString": "Select id, componentId, type, description, start_event, start_time, stop_event, stop_time, customField1, customField2 from LOGBOOK where :startDate <= start_time and start_time < :endDate;"
  "typeMapping": [
    { "PRODUCTION": "MAPPED_TYPE" },
    { "SOURCE_TYPE": "TM_TYPE" }
  ]
}
Parameters Description
name An arbitrary string.
supportsParameters

A boolean value (true|false) indicating whether the SQL query contains positional parameters. The default is true.

Positional parameters are specified with question marks '?' in the SQL query. The query should contain exactly two parameters - they will be replaced with the start and end time of the requested time interval respectively.

supportsNamedParameters

A boolean value (true|false) indicating whether the SQL query contains named parameters. The default is true. If this property is set to true, it will take precedence over supportsParameters.

Named parameters are specified with the ':startDate' and ':endDate' (without the quotes) values in the SQL query - they will be replaced with the start and end time of the requested time interval respectively.

componentLookupStrategy A value specifying how to interpret the componentId returned by the query. The allowed values are ASSETS (componentId is interpreted as the external id of an asset), TAGS (componentId is interpreted as a tag name), and ASSETS_THEN_TAGS (componentId is interpreted as an asset and, if no asset is found, then componentId is interpreted as a tag name). The default is ASSETS.
fieldNames

A comma-separated list of values, each representing the name of a column in the SQL query (see below). These columns represent custom fields to be associated with the newly created or updated context items. This property is optional and defaults to null. Bear in mind that the specified values are validated against the SQL query and the request will fail if the field names are not found in the SELECT clause.

If the fields exist on the context type associated with the item, the values of the columns specified by this property will be imported as values of the fields with the respective technical names. If they don’t exist, then the column values will be imported as general properties of the context item.

validateFieldNames

Optional. If missing or set to true, field names specified in the query will be validated. If set to false, the validation will be skipped.

timeZone

Specifies the timezone of the dates in the SQL database. The default is UTC. The list of supported timezones can be found by using the following endpoints:

<TM-URL>/hps/api/monitoring/options/timezone/available

https://en.wikipedia.org/wiki/List_of_tz_database_time_zones

 

The string of the SQL query that will be executed against the datasource to fetch the items. The query is required to return (at least) the following columns (in order): id, componentId, type, description, startEvent, startTime, endEvent, endTime. The query can contain additional columns, but they should come in after these first eight columns. The extra columns are often necessary when using custom fields.

Additional information about the individual columns can be found below:

 

Column

Details

id

Unique identifier for each row in the database which will be prepended by prefix (see below) during the sync process. This is used as an external identifier of the context item. So that every row with an unique identifier will result in a single context item.


The id should not contain spaces.


If the id is not available, it can potentially be tailored from other columns in the database. The only requirement is that it is unique and does not change between requests.

componentId

The componentId should identify an already existing asset or tag in TrendMiner (according to the currently used lookup strategy, as specified by componentLookupStrategy). For assets this should be the ExternalId, for tags - the tag name.


Note: To get an overview of the synced asset structure and the external ids of the different assets you can use the "/assets/browse" endpoint of the TrendMiner Assets API. The external ID of the assets is identified by the “externalId” field.

Note: For assets imported via CSV, the ExternalId is the asset’s path.

type

Refers back to the "Context Item Type" as defined in the ContextHub Config by the TrendMiner Admin. Values can be mapped to an existing type in TrendMiner in the API request. In case none of the existing types matches, new types can be configured by the Admin in ContextHub>Config>Context item types. 


The type will be looked up in typeMapping (see below) and the associated value should be the technical identifier of the type to use for the context item.

description

Additional information (as a string) that can be included for the Context Item. Can be NULL if not present.

startEvent

Corresponds to the start-state of the workflow used. This field is currently ignored and can be hardcoded to any value, but the column should be present.

startTime

The start (timestamp) of the event in "YYYY-MM-DD hh:mm:ss" format.

endEvent

Corresponds to the end-state of the workflow used. This field is currently ignored and can be hardcoded to any value, but the column should be present.

endTime

The end (timestamp) of the event in "YYYY-MM-DD hh:mm:ss" format.

An example SQL query:

SELECT
id,
componentId,
type,
description,
start_event,
start_time,
stop_event,
stop_time
FROM some_table
WHERE :startDate <= start_time and start_time < :endDate;

In addition to the required columns, the SQL query could also select:

  • Zero or more columns to be used as custom fields. The names of these columns should also be included as a comma-separated list in the “fieldNames” property (see above).
  • The lastModifiedDate. If the datasource contains a lastModifiedDate column, the query can be modified to select records based on this column (rather than the startTime).
typeMapping

A table used to map the values in the type column (see sqlString above) to actual context types in TrendMiner. Even if the type column contains a type that exists, there should be an entry in the typeMapping table that maps the type to itself, e.g.:

"typeMapping": [
 { "PRODUCTION": "PRODUCTION" }
]

 

If the datasource contains a lastModifiedDate column, the query can be modified to select records based on their lastModifiedDate. For example:

{
...
  "sqlString": "Select id, componentId, type, description, start_event, start_time, stop_event, stop_time, last_modified_date from LOGBOOK where :startDate <= last_modified_date and last_modified_date < :endDate;"
...
}

This way, if an item is modified, it will be selected for sync, even if its start time is untouched.

The result will contain the id of the newly created query. This id will be used later.

 

Historical Sync

This endpoint can be used to run one-time synchronization of context items at a specified time interval.

Method: POST
Path: /context-sql-sync/sync
Authorization header: Bearer [access_token]
Body
{
  "datasourceId": "1",
  "queryId": "2",
  "externalIdPrefix": "prefix",
  "startDate": "2018-10-16 12:29:01",
  "endDate": "2018-10-18 20:29:01",
  "chunkSize": "2"
}
Parameter Description
datasourceId The id of the datasource against which to run the sync.
queryId The id of the query to be used for the sync.
externalIdPrefix The prefix that will be prepended to the id obtained from the database in order to produce the external id of the context item. This prefix allows supporting the same external IDs for different data sources.
startDate The start of the sync interval. This is optional and can be omitted if the respective SQL query does not have parameters.
endDate The end of the sync interval. This is optional and can be omitted if the respective SQL query does not have parameters.
chunkSize The sync interval (endDate - startDate) will be split to subintervals with length specified by this value (expressed in minutes). This value is optional and defaults to null to indicate that the entire interval will be synced at one go. The value is also ignored if startDate and endDate are not specified.

The startDate and endDate can be omitted. In this case the associated query will be run only once without parameters.

If the startDate and endDate are specified, the resulting interval is split into small subintervals based on chunkSize. Then the SQL query is executed continuously for each of the subintervals and the start and end of the subintervals are passed to the query as parameters.

The historical sync provides means to perform a “dry” run, where the data is fetched from the database and returned to the customer, but no actual context items are created. To use the dry run:

Path: /context-sql-sync/sync/test

Live Sync

To use live sync, a sync job must be defined first:

Method: POST
Path: /context-sql-sync/syncjob
Authorization header: Bearer [access_token]
Body
{
  "datasourceId": "1",
  "queryId": "2",
  "externalIdPrefix": "prefix",
  "syncJobInterval": "10",
  "chunkSize": "2"
}
Parameter Description
datasourceId The id of the datasource against which to run the sync.
queryId The id of the query to be used for the sync.
externalIdPrefix The prefix that will be prepended to the id obtained from the database in order to produce the external id of the context item. This prefix allows supporting same external IDs for different data sources.
syncJobInterval The interval (expressed in minutes) between subsequent job executions.
chunkSize The sync interval will be split into subintervals with length specified by this value (expressed in minutes).

The result will contain the id of the newly created sync job. This id can be used to start and stop the job.

To start the job:

Method: POST
Path: /context-sql-sync/syncjob/{id}/start
Authorization header: Bearer [access_token]

To stop a job:

Method: POST
Path: /context-sql-sync/syncjob/{id}/stop
Authorization header: Bearer [access_token]

After the job is started, the first time it is executed is calculated based on the syncJobInterval value. For example, if the job has syncJobInterval=10 and the job is started at 15:12, the initial job execution will happen at 15:20.

To get a list of all currently defined sync jobs:

Method: GET
Path: /context-sql-sync/syncjob
Authorization header: Bearer [access_token]

A sync job is started if its “scheduled” field is true.

 

Kerberos and Integrated Authentication

When integrated security should be used to connect to context items datasource, Kerberos must be configured and used by tm-context-sql-sync.

  • Create Kerberos configuration file named krb5.conf.Example:

[libdefaults] 
default_realm = YYYY.CORP.CONTOSO.COM 
dns_lookup_realm = false 
dns_lookup_kdc = true 
ticket_lifetime = 24h 
forwardable = yes 

[domain_realm] 
.yyyy.corp.contoso.com = YYYY.CORP.CONTOSO.COM 
.zzzz.corp.contoso.com = ZZZZ.CORP.CONTOSO.COM  

[realms] 
       YYYY.CORP.CONTOSO.COM = {  
 kdc = krbtgt/YYYY.CORP. CONTOSO.COM @ YYYY.CORP. CONTOSO.COM  
 default_domain = YYYY.CORP. CONTOSO.COM  
}  

 ZZZZ.CORP. CONTOSO.COM = {  
 kdc = krbtgt/ZZZZ.CORP. CONTOSO.COM @ ZZZZ.CORP. CONTOSO.COM  
 default_domain = ZZZZ.CORP. CONTOSO.COM  
}  

  • Copy krb5.conf to Trendminer appliance folder, i.e. /mnt/data/fileserve
  • In Consul’s key/value store define property “java.security.krb5.conf” for tm-context-sql-sync service. Set it’s value to “./krb5.conf”
  • Configure datasource as follow:

{
       "name": "MSSQL_IntegratedSecurity",
       "jdbcUrl": "jdbc:sqlserver://tm-mssql21.yyyy.corp.contoso.com:1433;Integratedsecurity=true;authenticationScheme=JavaKerberos;databaseName=tm_context",
       "dbType": "MSSQL",
       "username": "user_name@YYYY.CORP.CONTOSO.COM",
       "password": "******"
}

Important notes 

  • The access token will expire after 12 hours. Best practice is to request an access token before every API call.
  • Only SQL Server 2008 and later are supported. Earlier versions of MS SQL may give the following error when executing the SQL query: “Cannot find data type datetimeoffset.”