Anomaly Detecion Service

Description

The Anomaly Detection service is designed to train, validate and inference to detect anomaly in the data

Anomaly Detecion Proxy

The Anomaly Detection proxy is designed to let the app side call function from Anomaly Detection services via API.

There are three tasks in this proxy:

Train - Train the model using the unsupervised learning algorithm.
- If existing label file, the training will try with difference parameters combinations to find the best F1 score to train the model with
- If not existing label file, the training will use the default configuration to train the model
Validate - Validate the trained model
Infer
Predict the feature file, given a trained model
Detect the outlier

There are 6 endpoints in Anomaly Detection Proxy, POST and GET for each task:

anomaly_detection/train: Train anomaly detection model
anomaly_detection/valid: Validate the trained model
anomaly_detection/infer: Inference from trained model

Train Endpoint

POST method

Input Schemas

request_id: uuid
model_name: str - Specify the model name to use
save_charts: bool
train_feature_file: str - Path to features data file.
train_label_file:: str - Path to labels data file.

Input Assumptions

train_label_file is optional
Both features and label data file are expected to be numerical

Code Examples

import json
import uuid

import requests

url = "http://localhost:8000/anomaly_detection/train/"
id_ = str(uuid.uuid4())
data = {
    "request_id": id_,
    "model_name": "DBSCAN",
    "train_feature_file": "testprojectid1/testfileid1/cogen_feature",
    "save_charts": True,
    "train_label_file": "testprojectid1/testfileid1/cogen_anomaly"
}

json_str = json.dumps(data)

print(json_str)
# Check result

res = requests.post(url, json=data)
print("Status Code:", res.status_code)
print("Response Content:", res.text)  # print raw response content

if res.status_code == 200:  # Ensure that the status code is 200 before parsing
    result = json.loads(res.content)
    print(result)
else:
    print("Request failed.")

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
task_id: str - ID of the Celery task

GET method

Input Schemas

_id: str - ID of the request

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
predictions_label_file: str - Path to the prediction file
model_path: str - Path to the trained model.
predictions_charts_file: str - Path to prediction charts file.

Data Storage

alt text

Database

alt text

Validate Endpoint

POST method

Input Schemas

request_id: uuid
model_path: str - Path to a trained model
save_charts: bool
valid_feature_file: str - Path to features data file
valid_label_file:: str - Path to labels data file

alt text

Input Assumptions

Both features and label data file are expected to be numerical

Code Examples

import json
import uuid

import requests

id_ = str(uuid.uuid4())
url = "http://localhost:8000/anomaly_detection/valid/"
data = {
    "request_id": id_,
    "model_path": "testprojectid1/testfileid1/anomaly_detection/1727161197_dbscan_model.pkl",
    "valid_features_file": "testprojectid1/testfileid1/cogen_feature",
    "save_charts": True,
    "valid_labels_file": "testprojectid1/testfileid1/cogen_anomaly",
}


json_str = json.dumps(data)

print(json_str)
# Check result

res = requests.post(url, json=data)
print("Status Code:", res.status_code)
print("Response Content:", res.text)  # print raw response content

if res.status_code == 200:  # Ensure that the status code is 200 before parsing
    result = json.loads(res.content)
    print(result)
else:
    print("Request failed.")

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
task_id: str - ID of the Celery task

GET method

Input Schemas

_id: str - ID of the request

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
metrics: Objects - Metrics of the performed task
predictions_label_file: str - Path to the prediction file
predictions_charts_file: str - Path to validation charts file.

alt text

Data Storage

alt text

Database

alt text

Inference Endpoint

Input Assumptions

features_file is required and expected to be numerical
model_path is required if detection_type is multivariate
Both method_name is required if detection_type is univariate - Outlier Detection

Multivariate

POST method

Input Schemas

request_id: uuid
detection_type: str - univariate or multivariate
method_name: Optional[str] - Name of the statistical method
model_path: Optional[str] - Path to a trained model
features_file: str - Path to features data file

Code Examples

import json
import uuid

import requests

id_ = str(uuid.uuid4())
url = "http://localhost:8000/anomaly_detection/infer/"
data = {
    "request_id": id_,
    "detection_type": "multivariate",
    "method_name": "",
    "model_path": "testprojectid1/testfileid1/anomaly_detection/1728555947_dbscan_model.pkl",  # noqa: E501
    "feature_file": "testprojectid1/testfileid1/cogen_feature.csv",
}

json_str = json.dumps(data)

print(json_str)
# Check result

res = requests.post(url, json=data)
print("Status Code:", res.status_code)
print("Response Content:", res.text)  # print raw response content

if res.status_code == 200:  # Ensure that the status code is 200 before parsing
    result = json.loads(res.content)
    print(result)
else:
    print("Request failed.")

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
task_id: str - ID of the Celery task

alt text

GET method

Input Schemas

_id: str - ID of the request

Output Schemas

_id: uuid - ID of the request
task: str - Name of the performed task
status: str - Status of the performed task
predictions_label_file: str - Path to the prediction file

Data Storage

alt text

Database

alt text

Univariate - Outlier Detection

Input Assumptions

The features_file required existing Time columns at the first index

POST method

Input Schemas

request_id: uuid
detection_type: str - univariate or multivariate
method_name: Optional[str] - Name of the statistical method
model_path: Optional[str] - Path to a trained model
features_file: str - Path to features data file

alt text

Code Examples

import json
import uuid

import requests

id_ = str(uuid.uuid4())
url = "http://localhost:8000/anomaly_detection/infer/"
data = {
    "request_id": id_,
    "detection_type": "univariate",
    "method_name": "quantile",  # iqr, quantile, persist
    "model_path": "",  # noqa: E501
    "feature_file": "testprojectid2/testfileid1/sample_data",
}

json_str = json.dumps(data)

print(json_str)
# Check result

res = requests.post(url, json=data)
print("Status Code:", res.status_code)
print("Response Content:", res.text)  # print raw response content

if res.status_code == 200:  # Ensure that the status code is 200 before parsing
    result = json.loads(res.content)
    print(result)
else:
    print("Request failed.")