Command Line Interface

The AnalyticOps Accelerator Command Line Interface is a tool that allows to interact with the AnalyticOps platform form a terminal.

Find the full and updated documentation here.

Installation

You can install via pip. The minimum python version required is 3.5+

pip install aoa

CLI

The cli can be used to perform a number of interactions and guides the user to perform those actions.

> aoa -h
usage: aoa [-h] [--debug] [--version] {list,add,retire,run,init,clone,configure,message,connection} ...

AOA CLI

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  --version             Display the version of this tool

actions:
  valid actions

  {list,add,retire,run,init,clone,configure,message,connection}
    list                List projects, models, local models or datasets
    add                 Add model to working dir
    retire              Retire active deployments for all users/projects
    run                 Train and Evaluate model locally
    init                Initialize model directory with basic structure
    clone               Clone Project Repository
    configure           Configure AOA client
    message             Send a message to AOA message broker
    connection          Manage local connections

aoa configure

If not already performed the configuration step, start by configuring the client for your user and your environment. This allows you to set the AOA API endpoint and the authentication information for the client (basic or kerberos). The cli stores this configuration information in the users home directory under ~/.aoa/config.yaml. Note if you are using Kerberos, you will need to install an additional library (see the Kerberos section).

You can also use the configure command with the --repo argument to set repository level configuration such as the projectId of the repo. This only needs to be set once and can be committed and pushed to the repository. Note that this configuration is stored in the .aoa/config.yaml of the repository directory!

> aoa configure -h
usage: aoa configure [-h] [--debug] [--repo]

optional arguments:
-h, --help  show this help message and exit
--debug     Enable debug logging
--repo      Configure the repo only

To configure for first time, log in the AnalyticOps browser interface and get to Session Details at the top right.

From there you can copy the OAuth CLI Environment Variabble needed for aoa to connect to the AnalyticOps Service. You just need to paste and execute the copied code in the Command line where aoa is installed.

aoa clone

The clone command provides a convenient way to perform a git clone of the repository associated with a given project. The command can be run interactively and will allow you to select the project you wish to clone. Note that by default it clones to the current working directory so you either need to make sure you create an empty folder and run it from within there or else provide the --path argument.

> aoa clone -h
usage: aoa clone [-h] [--debug] [-id PROJECT_ID] [-p PATH]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -id PROJECT_ID, --project-id PROJECT_ID
                        Id of Project to clone
  -p PATH, --path PATH  Path to clone repository to

aoa init

When you create a git repository, its empty by default. The init command allows you to initialize the repository with the structure required by the AOA. It also adds a default README.md and HOWTO.md.

> aoa init -h
usage: aoa init [-h] [--debug]

optional arguments:
  -h, --help  show this help message and exit
  --debug     Enable debug logging

aoa list

Allows to list the aoa resources. In the cases of listing models (pushed / committed) and datasets, it will prompt the user to select a project prior showing the results. In the case of local models, it lists both committed and non-committed models.

> aoa list -h
usage: aoa list [-h] [--debug] [-p] [-m] [-lm] [-t] [-d] [-c]

optional arguments:
  -h, --help           show this help message and exit
  --debug              Enable debug logging
  -p, --projects       List projects
  -m, --models         List registered models (committed / pushed)
  -lm, --local-models  List local models. Includes registered and non-
                       registered (non-committed / non-pushed)
  -t, --templates      List dataset templates
  -d, --datasets       List datasets
  -c, --connections    List local connections

All results are shown in the format

[index] (id of the resource) name

for example:

List of models for project Demo:
--------------------------------
[0] (03c9a01f-bd46-4e7c-9a60-4282039094e6) Diabetes Prediction
[1] (74eca506-e967-48f1-92ad-fb217b07e181) IMDB Sentiment Analysis

aoa add

Adding a new model to a given repository requires a number of steps. You need to create the folder structure, configuration files, generate a modelId, etc. The add command is intended to simplify this for the user. It will interactively prompt you for the model name, language, description and even allow you to use a model template to get you started. This can really help reduce the boilerplate required and ensure you get started developing quicker while maintaining a standard repository structure.

> aoa add
model name: my new model
model description: to show adding new models
These languages are supported: R, python, sql
model language: python
templates available for python: empty, pyspark, sklearn
template type (leave blank for the default one): 

aoa run

The cli can be used to validate the model training and evaluation logic locally before committing to git. This simplifies the development lifecycle and allows you to test and validate many options. It also enables you to avoid creating the dataset definitions in the AOA UI until you are ready and have a finalised version.

> aoa run -h
usage: aoa run [-h] [--debug] [-id MODEL_ID] [-m MODE] [-d DATASET_ID]
                         [-t DATASET_TEMPLATE_ID] [-ld LOCAL_DATASET] [-lt LOCAL_DATASET_TEMPLATE]
                         [-c CONNECTION]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -id MODEL_ID, --model-id MODEL_ID
                        Id of model
  -m MODE, --mode MODE  Mode (train or evaluate)
  -d DATASET_ID, --dataset-id DATASET_ID
                        Remote datasetId
  -t DATASET_TEMPLATE_ID, --dataset-template-id DATASET_TEMPLATE_ID
                        Remote datasetTemplateId
  -ld LOCAL_DATASET, --local-dataset LOCAL_DATASET
                        Path to local dataset metadata file
  -lt LOCAL_DATASET_TEMPLATE, --local-dataset-template LOCAL_DATASET_TEMPLATE
                        Path to local dataset template metadata file
  -c CONNECTION, --connection CONNECTION
                        Local connection id

You can run all of this as a single command or interactively by selecting some optional arguments, or none of them.

For example, if you want to run the cli interactively you just select aoa run but if you wanted to run it non interactively to train a given model with a given datasetId you would expect

> aoa run -id <modelId> -m <mode> -d <datasetId>

And if you wanted to select the model interactively but use a specific local dataset definition, you would execute

> aoa run -ld /path/to/my_test_dataset.json

aoa retire

The cli can be used to retire deployments and to archive projects. Deployments can be retired based on the project id, filtered by project name or all that are active. In addition, those projects can also be archived.

> aoa retire -h
usage: aoa retire [-h] [--debug] [-A] [-a] [-r REGEX] [-p PROJECT]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -A, --all             Retire all
  -a, --archive         Retire and archive project
  -r REGEX, --regex REGEX
                        Filter project name by regex
  -p PROJECT, --project PROJECT
                        Retire from specified project

The archive (-a) and regex (-r) arguments can be used in combination with project (-p) and retire all (-A) parameters. Though, retire all (-A) takes precedence over project (-p).

aoa connection

As for security reasons, the connection credentials stored in the core service can not be accessed locally through the client. In order to solve that, users may list, create and remove local connection credentials to be used with the run command.

> aoa connection -h
usage: aoa connection [-h] {list,add,remove,export} ...

optional arguments:
  -h, --help         show this help message and exit

actions:
  valid actions

  {list,add,remove}
    list             List all local connections
    add              Add a local connection
    remove           Remove a local connection
    export           Export a local connection to be used as a shell script

aoa connection list

List all local connections.

> aoa connection list -h
usage: aoa connection list [-h] [--debug]

optional arguments:
  -h, --help  show this help message and exit
  --debug     Enable debug logging

aoa connection add

Add a local connection.

> aoa connection add -h
usage: aoa connection add [-h] [--debug] [-n NAME] [-H HOST] [-u USERNAME] [-p PASSWORD]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -n NAME, --name NAME  Connection name
  -H HOST, --host HOST  Connection host
  -u USERNAME, --username USERNAME
                        Connection username
  -p PASSWORD, --password PASSWORD
                        Connection password

aoa connection remove

Remove a local connection.

> aoa connection remove -h
usage: aoa connection remove [-h] [--debug] [-c CONNECTION]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -c CONNECTION, --connection CONNECTION
                        Local connection id

aoa connection export

Export a local connection to be used as a shell script.

> aoa connection export -h
usage: aoa connection export [-h] [--debug] [-c CONNECTION]

optional arguments:
  -h, --help            show this help message and exit
  --debug               Enable debug logging
  -c CONNECTION, --connection CONNECTION
                        Local connection id

pyspark

When using the aoa cli to train and evaluate pyspark models, there are a few additional points to be aware of. The cli for running a spark model works by configuring the PYSPARK_SUBMIT_ARGS which is what spark uses when creating the spark context in the model code. We also use the findspark library to find and configure spark based on the SPARK_HOME environment variable.

PYSPARK_SUBMIT_ARGS="--master <master> <args> --py-files <modules.zip> $AOA_SPARK_CONF

The master and args come from the same location as main AOA automation uses, i.e., the model.json -> resources -> training

As you can see, the AOA_SPARK_CONF environment variable is appended to the end of the PYSPARK_SUBMIT_ARGS which means you can override any other the other values that go before it. You can specify any spark configuration option you want here and it will be passed to spark.

As an example, if you are using conda pack with pyspark to ensure that python libraries you use on the driver node are available all over the cluster automatically with the job, you can add this information to the AOA_SPARK_CONF to automatically do this for you when running it via the cli. These can be added to the users bash profile to ensure they don’t need to manually do this every time in a standard data science environment or even on their own laptops.

AOA_SPARK_CONF="--conf spark.pyspark.driver.python=python --conf spark.pyspark.python=./environment/bin/python --archives conda-env.tar.gz#environment"

Client API

We have a client implementation for all entities exposed in the AOA API. We provide the RESTful and RPC client usage for this. We’ll show an example of the Dataset API here, but the same applies for all.

By default, creating an instance of the AoaClient() will use the users aoa configuration stored in ~/.aoa/config.yaml. You can override these values by passing the relevant constructor arguments or even with env variables.

from aoa import AoaClient
from aoa import DatasetApi


client = AoaClient()
client.set_project_id("23e1df4b-b630-47a1-ab80-7ad5385fcd8d")

dataset_api = DatasetApi(aoa_client=client)

Now, find all datasets or a specific dataset

import pprint

datasets = dataset_api.find_all()
pprint.pprint(datasets)

dataset = dataset_api.find_by_id("11e1df4b-b630-47a1-ab80-7ad5385fcd8c")
pprint.pprint(dataset)

Add a dataset

dataset_definition = {
    "name": "my dataset",
    "description": "adding sample dataset",
    "metadata": {
        "url": "http://nrvis.com/data/mldata/pima-indians-diabetes.csv",
        "test_split": "0.2"
    }
}

dataset = dataset_api.save(dataset=dataset_definition)
pprint.pprint(dataset)

Kerberos

If you are using kerberos, you will need to install some libraries separately. We do not include this as a default dependency as it has a large dependency stack and is not trivial to install. It can be annoying for non Kerberos installations, so we leave it to the specific environment. Note that on OSX, you should use version 1.1.14 of pykerberos. For your linux env, it may vary.

First install the libraries with:

sudo apt update && sudo apt install -y krb5-multidev

Then install or reinstall the package with the option kerberos:

pip install --force-reinstall --upgrade aoa[kerberos]

NOTE: some other libraries may be required in the host OS in order for kerberos to be fully functional.