Datasets

AnalyticOps allows you to create datasets within dataset templates. You can group all datasets related to a use case together in a dataset template. A dataset inherits parameters from the dataset template and you can update them if required. The scope of a dataset can be Training or Evaluation. The training dataset is used to train the model, and the evaluation dataset is used to see how well the model performs on data it has not seen.

The Datasets list in AnalyticOps UI organizes and facilitates creation and management of datasets. This chapter covers the following details:

Create a Dataset

To create a new dataset:

  1. Open a Dataset Template. For details see, Navigate a Dataset Template .
    The list of Datasets added in the dataset template displays in Work area.

  2. Click the Create Dataset button.
    The displayed Create Dataset dialog entriely depends on the Catalog Type (Vantage or None) of selected dataset template.

  3. For a dataset template with Catalog Type as None, you can set properties in Step 1 - Basic as:

    Property

    Description

    Name

    Specifies the dataset name.

    Description

    Specifies the description of the dataset.

    Scope

    Specifies the scope of the dataset.
    Training: The dataset will be used for model training.
    Evaluation: The dataset will be used for model evaluation.

    Query

    The query defined for the template displays here. You can update it if required.

    Tags

    Allows you to add tags to the dataset.

    https://docs.tdaoa.com/images/v6/ug_dataset_edit_dialog_basic.png

  4. For a dataset template with Catalog Type as None, you can set the properties in Step 2 - CONFIG step for the defined scope:

    Property

    Description

    Custom Properties

    The custom properties defined in the template displays here. You can add or update the custom properties in the form of key/value pair.

    JSON

    Lets you write JSON script instead of defining custom properties as key/value pair.

    Query Preview

    Shows the non-editable query preview defined in the Basic tab.

    https://docs.tdaoa.com/images/v6/ug_dataset_edit_dialog_none_config.png

    Click Create to save Dataset.

  5. In case of Catalog Type as Vantage, Step 1 - BASIC looks like this:

    https://docs.tdaoa.com/images/v6/ug_dataset_edit_dialog_vantage_basic.png

    Properties definitions are exactly the same as of a dataset with no catalog body.

  6. In case of Catalog Type as Vantage, followed by Step 1 - CATALOG, Step 2 - ENTITY & TARGET displays:

    https://docs.tdaoa.com/images/v6/ug_dataset_dialog_entityAndTarget.png

    Property

    Description

    Query

    Displays defined SQL query used to select variables (entity and targets) for catalog in dataset template.

    Variables

    Displays the metadata of the provided query and selected targets.

    The displayed SQL query for entity and targets is the one defined while creating related dataset template. You can update the SQL while maintaining the sanity of existing entity and target variables. To ensure that, UI provides ability to validate the entered query by clicking on VALIDATE button. In case of successful/unsuccessful validation, an information message appears like this:

    https://docs.tdaoa.com/images/v6/ug_dataset_dialog_entityAndTarget_validation.png

    You can also retrieve actual result of entity sample from SQL query and JOIN result of target variables and features defined while creating Dataset Template by clicking PREVIEW DATA button. Following dialog displays:

    https://docs.tdaoa.com/images/v6/ug_create_template_entityAndTarget_result.png

    Navigate to COMBINED QUERY RESULT tab to view combined JOIN SQL query of features and target variables. You can also view its result:

    https://docs.tdaoa.com/images/v6/ug_create_template_entityAndTarget_data.png

    By closing this you will be navigated back to main dialog.

  7. Click Create.
    The new dataset creates with the name specified in the Name field.

Edit a Dataset

To edit an existing dataset:

  1. Select a dataset from the list.
    The Edit Dataset button enables.

  2. Click the Edit Dataset button.
    The Edit Dataset dialog depending on the catalog type (Vantage or None) displays accordingly where you can modify the selected template.

    https://docs.tdaoa.com/images/v6/ug_dataset_edit_dialog_vantage_basic.png
  3. After modifying the desired properties, click Update.
    The dataset saves with all the latest modifications.

View Dataset Details

You can view the details of a dataset including the defined parameters and the utilization of the dataset in training or evaluation jobs based on the defined scope.

To view details of a dataset:

  1. In the Datasets list, click the Details icon for a dataset.

    The following Dataset details page displays in case of Catalog Type as Vantage dataset template.

    For a dataset template with Catalog Type as None, following page displays.

Dataset Properties

To view the dataset properties:

  1. Click the Basic tab. The properties defined while creating the dataset display in read-only format. You can view the properties in this tab.

    For details of properties, see Create a Dataset.

Dataset Utilization

To view the dataset utilization in training or evaluation jobs:

  1. Click the Activity tab. All the jobs that are executed using the selected dataset display in the list. You can review the progress of each job being executing with the dataset.

    The following details display for each of the jobs:

    Property

    Description

    Model Version ID

    Specifies the model version ID for which the job is executed.

    Model Name

    Specifies the model name for which the job is executed.

    Job ID

    Specifies the job ID for training or evaluation job depending on the scope of the selected dataset.

    Status

    Shows the status of the job: Created, Scheduled, Trained, Evaluated, Completed, Error. For more information, see Jobs.

    Data Features

    Lets you view the Data Features for the selected job. For details, see Dataset Features.

Dataset Features

AnalyticOps allows you to view the dataset features and statistics for each of the job executed by the selected dataset.

To view the dataset features for a job:

  1. In the Activity tab, click the Data Features icon for a job.

    The Data Features page displays.

    The Data Features page shows the following details.

Features

Features are the basic building blocks of datasets. The quality of the features in your dataset has a major impact on the quality of the insights you will gain when you use that dataset for training and evaluating models.

AnalyticOps automatically detects each feature’s data type (categorical, continuous) and performs basic statistical analysis (mean, median, standard deviation, and more) on each feature. Additionally, AnalyticOps automatically generates a histogram for each feature.

The Features table displays the following details.

Property Description
Name Specifies the name of the feature.
Type Specifies the data type of feature as Continuous or Categorical.
Importance Specifies the feature importance. Feature importance measures the increase in the prediction error of the model after we permuted the feature's values, which breaks the relationship between the feature and the true outcome.
The importance of a feature is measured by calculating the increase in the model's prediction error after permuting the feature. A feature is important if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is unimportant if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction.

To view details of a feature:

  1. Select a feature in the Features table. The right section of the page displays the Distribution histogram and Dataset statistics for the selected feature.

    The Distribution histogram displays the feature value on the x-axis and the count on the y-axis.

    https://docs.tdaoa.com/images/ug_dataset_features.png

    The dataset statistics display the following measures for the selected feature.

    • count (cnt)

    • minimum (min)

    • maximum (max)

    • mean

    • standarddeviation (std)

    • skewness (skew)

    • kurtosis (kurt)

    • standarderror (ste)

    • coefficientofvariance (cv)

    • variance (var)

    • sum

    • uncorrectedsumofsquares (uss)

    • correctedsumofsquares (css)

    https://docs.tdaoa.com/images/ug_dataset_stats.png

Predictions

Predictions refer to the output of a model after it has been trained. AnalyticOps provides you a support to view the predictions of a dataset and view distribution plot for its values.

The Predictions table displays the following details.

Property Description
Name Specifies the name of the prediction.
Type Specifies the data type of prediction as Continuous or Categorical depending on the type of problem being solved.

To view details of a prediction:

  1. Select a prediction in the Predictions table. The right section of the page displays the Distribution plot for the selected prediction.

    The Distribution plot displays the prediction categories on the x-axis and count on the y-axis.

    https://docs.tdaoa.com/images/ug_dataset_predictions.png

Archive a Dataset

The archiving feature allows you to hide a dataset from the list to better organize your datasets. You can view the archived datasets and unarchive them if required.

Note: The Archiving feature is provided with multiple modules including projects, models, model versions, dataset templates, datasets, and connections.

To archive a dataset:

  1. Select a dataset in the list.
    The Actions button enables.

  2. Click the Actions button.
    The Actions menu displays.

    https://docs.tdaoa.com/images/ug_dataset_archive_action.png
  3. Click Archive Dataset.
    The dataset archives and hides from the current list. A confirmation message displays on the top.

  4. To view an Archived dataset, click the Show Archived option on the top.
    The archived dataset displays in the list along with an Archived label.

To Un-archive a dataset:

  1. Select an archived dataset in the list.
    The Actions button enables.

  2. Click the Actions button.
    The Actions menu displays.

    https://docs.tdaoa.com/images/ug_dataset_unarchive_action.png
  3. Click Un-Archive Dataset.
    The dataset un-archives and the label Archived removes. A confirmation message displays on the top.