Datasets

AnalyticOps allows you to create datasets within dataset templates. You can group all datasets related to a use case together in a dataset template. A dataset inherits parameters from the dataset template and you can update them if required. The scope of a dataset can be Training or Evaluation. The training dataset is used to train the model, and the evaluation dataset is used to see how well the model performs on data it has not seen.

The Datasets list in AnalyticOps UI organizes and facilitates creation and management of datasets. This chapter covers the following details:

Create a Dataset

To create a new dataset:

  1. Open a Dataset Template. For details see, Navigate a Dataset Template .
    The list of Datasets added in the dataset template displays in Work area.

  2. Click the Create Dataset button.
    The Create Dataset dialog displays.

    https://docs.tdaoa.com/images/ug_dataset_create.png
  3. In the Basic tab, set the properties:

    Property

    Description

    Name

    Specifies the dataset name.

    Description

    Specifies the description of the dataset.

    Connection

    The connection setting defined for the template displays here. You can update the property if required and select a connection from the available list.

    Scope

    Specifies the scope of the dataset.
    Training: The dataset will be used for model training.
    Evaluation: The dataset will be used for model evaluation.

    Query

    The query defined for the template displays here. You can update it if required.

    Tags

    Allows you to add tags to the dataset.

    https://docs.tdaoa.com/images/ug_dataset_basic.png
  4. In the CONFIG tab, set the properties for the defined scope:

    Property

    Description

    Custom Properties

    The customer properties defined in the template displays here. You can add or update the custom properties in the form of key/value pair.

    JSON

    Lets you write JSON script instead of defining custom properties as key/value pair.

    Query Preview

    Shows the non-editable query preview defined in the Basic tab.

    https://docs.tdaoa.com/images/ug_dataset_advanced.png
  5. Click Create.
    The new dataset creates with the name specified in the Name field.

View Dataset Details

You can view the details of a dataset including the defined parameters and the utilization of the dataset in training or evaluation jobs based on the defined scope.

To view details of a dataset:

  1. In the Datasets list, click the Details icon for a dataset.

    The Dataset details page displays.

Dataset Properties

To view the dataset properties:

  1. Click the Basic tab. The properties defined while creating the dataset display in read-only format. You can view the properties in this tab.

    For details of properties, see Create a Dataset.

Dataset Utilization

To view the dataset utilization in training or evaluation jobs:

  1. Click the Activity tab. All the jobs that are executed using the selected dataset display in the list. You can review the progress of each job being executing with the dataset.

    The following details display for each of the jobs:

    Property

    Description

    Model Version ID

    Specifies the model version ID for which the job is executed.

    Model Name

    Specifies the model name for which the job is executed.

    Job ID

    Specifies the job ID for training or evaluation job depending on the scope of the selected dataset.

    Status

    Shows the status of the job: Created, Scheduled, Trained, Evaluated, Completed, Error. For more information, see Jobs.

    Data Features

    Lets you view the Data Features for the selected job. For details, see Dataset Features.

Dataset Features

AnalyticOps allows you to view the dataset features and statistics for each of the job executed by the selected dataset.

To view the dataset features for a job:

  1. In the Activity tab, click the Data Features icon for a job.

    The Data Features page displays.

    The Data Features page shows the following details.

Features

Features are the basic building blocks of datasets. The quality of the features in your dataset has a major impact on the quality of the insights you will gain when you use that dataset for training and evaluating models.

AnalyticOps automatically detects each feature’s data type (categorical, continuous) and performs basic statistical analysis (mean, median, standard deviation, and more) on each feature. Additionally, AnalyticOps automatically generates a histogram for each feature.

The Features table displays the following details.

Property Description
Name Specifies the name of the feature.
Type Specifies the data type of feature as Continuous or Categorical.
Importance Specifies the feature importance. Feature importance measures the increase in the prediction error of the model after we permuted the feature's values, which breaks the relationship between the feature and the true outcome.
The importance of a feature is measured by calculating the increase in the model's prediction error after permuting the feature. A feature is important if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is unimportant if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction.

To view details of a feature:

  1. Select a feature in the Features table. The right section of the page displays the Distribution histogram and Dataset statistics for the selected feature.

    The Distribution histogram displays the feature value on the x-axis and the count on the y-axis.

    https://docs.tdaoa.com/images/ug_dataset_features.png

    The dataset statistics display the following measures for the selected feature.

    • count (cnt)

    • minimum (min)

    • maximum (max)

    • mean

    • standarddeviation (std)

    • skewness (skew)

    • kurtosis (kurt)

    • standarderror (ste)

    • coefficientofvariance (cv)

    • variance (var)

    • sum

    • uncorrectedsumofsquares (uss)

    • correctedsumofsquares (css)

    https://docs.tdaoa.com/images/ug_dataset_stats.png

Predictions

Predictions refer to the output of a model after it has been trained. AnalyticOps provides you a support to view the predictions of a dataset and view distribution plot for its values.

The Predictions table displays the following details.

Property Description
Name Specifies the name of the prediction.
Type Specifies the data type of prediction as Continuous or Categorical depending on the type of problem being solved.

To view details of a prediction:

  1. Select a prediction in the Predictions table. The right section of the page displays the Distribution plot for the selected prediction.

    The Distribution plot displays the prediction categories on the x-axis and count on the y-axis.

    https://docs.tdaoa.com/images/ug_dataset_predictions.png

Archive a Dataset

The archiving feature allows you to hide a dataset from the list to better organize your datasets. You can view the archived datasets and unarchive them if required.

Note: The Archiving feature is provided with multiple modules including projects, models, model versions, dataset templates, datasets, and connections.

To archive a dataset:

  1. Select a dataset in the list.
    The Actions button enables.

  2. Click the Actions button.
    The Actions menu displays.

    https://docs.tdaoa.com/images/ug_dataset_archive_action.png
  3. Click Archive Dataset.
    The dataset archives and hides from the current list. A confirmation message displays on the top.

  4. To view an Archived dataset, click the Show Archived option on the top.
    The archived dataset displays in the list along with an Archived label.

To Un-archive a dataset:

  1. Select an archived dataset in the list.
    The Actions button enables.

  2. Click the Actions button.
    The Actions menu displays.

    https://docs.tdaoa.com/images/ug_dataset_unarchive_action.png
  3. Click Un-Archive Dataset.
    The dataset un-archives and the label Archived removes. A confirmation message displays on the top.