TH NextGen - Indexing explained

TrendMiner uses a caching mechanism (referred to as indexing) to allow a fast, interactive visualization and analysis of your time-series data, and is used in all algorithms (search, diagnose, monitor & predict). Whenever a tag is accessed by a user for the first time on a TrendMiner setup (typically done by adding a tag to the active tag list), the tag will go through an indexation process. Once a tag has been fully indexed, this index is available for all users within the TrendMiner installation (of course taking into account data access permissions). TrendMiner will ensure these indexes will be kept up to date by appending data to the index at regular intervals, without requiring any interaction from you. This will enable you to have fast access to recent data for analysis.

Indexing explained

There are 2 important, configurable parameters related to indexing:

  • the index resolution: determines how granular your analysis will be.
  • the index horizon: determines the start date of the index that will be created.

By default, the index resolution of a TrendMiner installation will be set to 1 minute and the index horizon is set to the 1st of January 2015. Both parameters can be modified by the TrendMiner administrator.

The pictures below provide a conceptual explanation of the indexing process of an analog tag, and how the resolution setting impacts the created index. In the first image below, the original data, as stored in the database, is visualized for a period of 1 minute. Every blue dot represents a value of the time series which is stored in the database (this might already be a compression, based on the storage settings of the database).

IE_1.png

The configured index resolution will define how many datapoints TrendMiner will receive from the database. For a resolution of 10 seconds, TrendMiner requests data by intervals of 10 seconds. TrendMiner's historian connectors are designed to make sure you get the most significant data for these intervals. For an Osisoft Pi connector, this means you potentially receive 4 data points per interval (i.e. the start and end value, and the maximum and minimum value within the interval).

The big red dots in the images below indicate which data would eventually be available in TrendMiner's index for a 10 second resolution setup (top graph) and a 1 minute resolution setup (bottom graph).

IE_2.png

Both the performance of the underlying databases as well as the configured index resolution will influence the speed of indexing. For a higher index resolution, the database needs to transfer more data points, hence the indexing process will take longer. Defining a resolution is therefore a tradeoff between data granularity and performance.

Initial indexing will always start from the current time and will progress up until the configured horizon (this is referred to as backwards indexing). When many tags are indexing at the same time, most recent data will always get priority. This ensures concurrent users will be able to start their analysis on recent data instantaneously.

As a user, you can follow up the indexing process by checking the index state in the tag details of the active tag list, or by looking at the context bar. The context bar is plotted purely from indexed data.

Once tags are backwards indexed, TrendMiner will ensure these indexes will be kept up to date (i.e . forward indexing) by appending data to the index at regular intervals, without requiring interaction from the end-user. This will enable the end-users to always have fast access to recent data for analytics.

Note: As administrator of TrendMiner, you can access the index overview page to get an overview of all indexed tags and their current state. 

How does the index impact TrendMiner features?

The created index of a tag allows a fast, interactive analysis of your data and is used in all algorithms.

Charting

For trend charting purposes (focus chart), index data will be used to facilitate fast visualization of long time periods; more specifically whenever the visualized period is bigger than 300 times the index resolution (i.e. 5 hours for an index resolution of 1 minute). In other cases, the index data might be too course and data will be requested directly from the database to assure the most accurate representation possible. Whenever indexed data is not yet available, data will also be requested directly from the historian.

Searches

Searches will be executed solely on the indexed data. 

The minimum duration of a search result will depend on the index resolution and the search algorithm which is being used. 

  • For value based, digital step and area searches, the minimum duration is 2 times the index resolution, hence 2 minutes for a default setup with a resolution of 1 minute. For periods shorter than the minimum duration, TrendMiner does not have sufficient data to assess the search criteria, hence TrendMiner is unable to retrieve periods which are shorter. Defining a resolution is therefore a tradeoff between search granularity and performance. Since searches are performed on index data blocks, search result durations will always be multiples of the index resolution.
  • For similarity searches, the length of the search query needs to be at least 4 times the index resolution. Also, for similarity searches, only the indexed data will be used to check for similarity. This results in a fast analysis.

For very short periods the similarity search results might seem inaccurate. The search algorithm will find results for which the index data nicely corresponds with the index data of the query period.  When plotting such small periods, however, TrendMiner will retrieve data directly from the historian. If this tag is a high resolution tag, there are a lot more data points visualized than used in the search algorithm, hence explaining the observed deviation.

Tag builder

The index of all calculated tags is built up by performing calculations on top of the index of the underlying tags. 

When charting these tags for periods smaller than 300 times the index resolution, tags will be queried directly from the database and the calculated tags will be evaluated "on the fly".

Diagnose

The diagnose algorithms also use indexed data to evaluate correlation and fingerprint deviations.

In parallel to the remark for similarity searches, this means the cross-correlation analysis can yield high correlation numbers on very short time periods (i.e. periods which are approximately equal to the index resolution), although the visualized raw data will show no or only a slight correlation.

Monitors

Monitors are built on top of searches and allow users to operationalize their searches. Monitors therefore use the same data as searches do. 

To make sure monitors results are timely received, the index of tags that are used in monitors (also the underlying tags in case of formulas) are updated every 2 minutes.

 

Enjoying TrendMiner? Share your experience to help the community.
Rate TrendMiner