Datasets¶
AnalyticOps allows you to create datasets within dataset templates. You can group all datasets related to a use case together in a dataset template. A dataset inherits parameters from the dataset template and you can update them if required. The scope of a dataset can be Training or Evaluation. The training dataset is used to train the model, and the evaluation dataset is used to see how well the model performs on data it has not seen.
The Datasets list in AnalyticOps UI organizes and facilitates creation and management of datasets. This chapter covers the following details:
Create a Dataset¶
To create a new dataset:
Open a Dataset Template. For details see, Navigate a Dataset Template . The list of Datasets added in the dataset template displays in Work area.
Click the Create Dataset button. The displayed Create Dataset dialog entriely depends on the Catalog Type (Vantage or None) of selected dataset template.
For a dataset template with Catalog Type as None, you can set properties in Step 1 - Basic as:
Property
Description
Name
Specifies the dataset name.
Description
Specifies the description of the dataset.
Scope
Specifies the scope of the dataset.
Training: The dataset will be used for model training.
Evaluation: The dataset will be used for model evaluation.Query
The query defined for the template displays here. You can update it if required.
Tags
Allows you to add tags to the dataset.
For a dataset template with Catalog Type as None, you can set the properties in Step 2 - CONFIG step for the defined scope:
Property
Description
Custom Properties
The custom properties defined in the template displays here. You can add or update the custom properties in the form of key/value pair.
JSON
Lets you write JSON script instead of defining custom properties as key/value pair.
Query Preview
Shows the non-editable query preview defined in the Basic tab.
Click Create to save Dataset.
In case of Catalog Type as Vantage, Step 1 - BASIC looks like this:
Properties definitions are exactly the same as of a dataset with no catalog body.
In case of Catalog Type as Vantage, followed by Step 1 - CATALOG, Step 2 - ENTITY & TARGET displays:
Property
Description
Query
Displays defined SQL query used to select variables (entity and targets) for catalog in dataset template.
Variables
Displays the metadata of the provided query and selected targets.
The displayed SQL query for entity and targets is the one defined while creating related dataset template. You can update the SQL while maintaining the sanity of existing entity and target variables. To ensure that, UI provides ability to validate the entered query by clicking on VALIDATE button. In case of successful/unsuccessful validation, an information message appears like this:
You can also retrieve actual result of entity sample from SQL query and JOIN result of target variables and features defined while creating Dataset Template by clicking PREVIEW DATA button. Following dialog displays:
Navigate to COMBINED QUERY RESULT tab to view combined JOIN SQL query of features and target variables. You can also view its result:
By closing this you will be navigated back to main dialog.
Click Create. The new dataset creates with the name specified in the Name field.
Edit a Dataset¶
To edit an existing dataset:
Select a dataset from the list. The Edit Dataset button enables.
Click the Edit Dataset button. The Edit Dataset dialog depending on the catalog type (Vantage or None) displays accordingly where you can modify the selected template.
After modifying the desired properties, click Update. The dataset saves with all the latest modifications.
View Dataset Details¶
You can view the details of a dataset including the defined parameters and the utilization of the dataset in training or evaluation jobs based on the defined scope.
To view details of a dataset:
In the Datasets list, click the Details icon for a dataset.
The following Dataset details page displays in case of Catalog Type as Vantage dataset template.
For a dataset template with Catalog Type as None, following page displays.
Dataset Properties¶
To view the dataset properties:
Click the Basic tab. The properties defined while creating the dataset display in read-only format. You can view the properties in this tab.
For details of properties, see Create a Dataset.
Dataset Utilization¶
To view the dataset utilization in training or evaluation jobs:
Click the Activity tab. All the jobs that are executed using the selected dataset display in the list. You can review the progress of each job being executing with the dataset.
The following details display for each of the jobs:
Property
Description
Model Version ID
Specifies the model version ID for which the job is executed.
Model Name
Specifies the model name for which the job is executed.
Job ID
Specifies the job ID for training or evaluation job depending on the scope of the selected dataset.
Status
Shows the status of the job: Created, Scheduled, Trained, Evaluated, Completed, Error. For more information, see Jobs.
Data Features
Lets you view the Data Features for the selected job. For details, see Dataset Features.
Dataset Features¶
AnalyticOps allows you to view the dataset features and statistics for each of the job executed by the selected dataset.
To view the dataset features for a job:
In the Activity tab, click the Data Features icon for a job.
The Data Features page displays.
The Data Features page shows the following details.
Features¶
Features are the basic building blocks of datasets. The quality of the features in your dataset has a major impact on the quality of the insights you will gain when you use that dataset for training and evaluating models.
AnalyticOps automatically detects each feature’s data type (categorical, continuous) and performs basic statistical analysis (mean, median, standard deviation, and more) on each feature. Additionally, AnalyticOps automatically generates a histogram for each feature.
The Features table displays the following details.
Property | Description |
---|---|
Name | Specifies the name of the feature. |
Type | Specifies the data type of feature as Continuous or Categorical. |
Importance | Specifies the feature importance. Feature importance measures the increase in the prediction error of the model after we permuted the feature's values, which breaks the relationship between the feature and the true outcome. The importance of a feature is measured by calculating the increase in the model's prediction error after permuting the feature. A feature is important if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. A feature is unimportant if shuffling its values leaves the model error unchanged, because in this case the model ignored the feature for the prediction. |
To view details of a feature:
Select a feature in the Features table. The right section of the page displays the Distribution histogram and Dataset statistics for the selected feature.
The Distribution histogram displays the feature value on the x-axis and the count on the y-axis.
The dataset statistics display the following measures for the selected feature.
count (cnt)
minimum (min)
maximum (max)
mean
standarddeviation (std)
skewness (skew)
kurtosis (kurt)
standarderror (ste)
coefficientofvariance (cv)
variance (var)
sum
uncorrectedsumofsquares (uss)
correctedsumofsquares (css)
Predictions¶
Predictions refer to the output of a model after it has been trained. AnalyticOps provides you a support to view the predictions of a dataset and view distribution plot for its values.
The Predictions table displays the following details.
Property | Description |
---|---|
Name | Specifies the name of the prediction. |
Type | Specifies the data type of prediction as Continuous or Categorical depending on the type of problem being solved. |
To view details of a prediction:
Archive a Dataset¶
The archiving feature allows you to hide a dataset from the list to better organize your datasets. You can view the archived datasets and unarchive them if required.
Note: The Archiving feature is provided with multiple modules including projects, models, model versions, dataset templates, datasets, and connections.
To archive a dataset:
Select a dataset in the list. The Actions button enables.
Click the Actions button. The Actions menu displays.
Click Archive Dataset. The dataset archives and hides from the current list. A confirmation message displays on the top.
To view an Archived dataset, click the Show Archived option on the top. The archived dataset displays in the list along with an Archived label.
To Un-archive a dataset: