TrendMiner offers a custom model for multi-variate anomaly detection via its notebook and machine learning model tags functionality.
The TrendMiner Anomaly Detection Model is trained on a TrendHub View containing normal operations conditions of your process. After learning the desired process conditions, the model will then be able to detect anomalies on new incoming data.
The model itself can be trained inside the embedded notebook functionality after loading in a TrendHub View as a DataFrame. More information on the notebook functionality can be found here.
Note: The anomaly_detection package is experimental. The functions in this package can be changed or removed in future releases.
- Model initiation
- Model training
- Training evaluation
- Using the model inside notebooks
- Using the model with MLM tags
Initiating the Model
The anomaly detection model is provided via the experimental package and can be loaded in as followed:
from trendminer_experimental.anomaly_detection.model import TMAnomalyModel
# Instantiate the model
model = TMAnomalyModel()
Train the model
The anomaly model is based on Self Organising Maps (SOM). Conceptually, the model fits a flexible grid of units, on the training data set. In the training phase, the algorithm attempts to minimize the distance between the n-dimensional training data points and the nearest units, whilst trying to keep the grid as smooth as possible to avoid overfitting on the training data.
To train the model, we first start of by loading in a TrendHub View that represents normal operation conditions. After loading the view the model can be trained. The user can specify the number of iterations used for the training step.
# Loading TrendHub view: Normal Operation Conditions
from trendminer.views import Views
views = Views(client)
df_train = views.load_view('c3fa5b78-3df9-4fe4-b6d0-e0da4f19c24e')
# Training the model
q_error, topo_error = model.fit(df_train, 500)# Training for 500 iterations
Evaluate training
The fit method returns a list of quantization and topological errors. The quantization error measures how far the training data points are from the units in the grid, while the topological error measures the smoothness of the grid and is an indication of overfitting.
- A lower score indicates a better fit.
- The errors should show a decreasing trend for each iteration.
# Plot errors (requires matplotlib import)
plt.plot(q_error);
plt.plot(topo_error);
Run the model inside notebooks
The TMAnomalyModel has two output variables: Anomaly_Class and Anomaly_Score. The former (Class) identifies a point as being anomalous or not, based on the distance of the point to the closest unit in the model and the threshold percentage you provided when converting the model to its PMML representation. The latter, Score, is the distance between a point and the closest unit in the model.
# Loading TrendHub view: Test data or New incoming data
from trendminer.views import Views
views = Views(client)
df_test = views.load_view('c3fa5b78-3df9-4fe4-b6d0-e0da4f19c24e')
# Run model (anomaly_class)
anomaly_class = model.predict(df_test, 0.99)
# Run model (anomaly_score)
anomaly_score = model.score_samples(df_test)
Run the model as Machine Learning Model tag
Once the model is trained on normal behaviour, we can deploy the model on TrendMiner’s scoring engine. This requires us to convert the model to its PMML form. The TMAnomalyModel has a utility method to convert the model to PMML (to_pmml). This function takes two arguments. The first argument is the name of the model. The second argument is the threshold percentage.
After deploying, the model can be selected inside the Machine learning model tag functionality.
model_name = 'Demo_Anomaly_Detection'
zementis = ZementisModels(client)
# Convert model to PMML
model_pmml = model.to_pmml(model_name, 0.99)
# Deploy the model
model_id = zementis.deploy_model(model_pmml)
model_details = zementis.model_details(model_id)
More information
More information can be found in the python experimental documentation.
Current limitations
Limitations from Machine Learning Model tags apply to this model.