Introduction to Azure Machine Learning

Data Science Course

4.6

Exercise - Getting started with Azure Machine Learning

Now it's your chance to try out Azure Machine Learning for yourself.

In this exercise, you will:

  • Provision an Azure Machine Learning workspace.
  • Create a compute instance.
  • Run a Python-based experiment.

Instructions

Follow these instructions to complete the exercise.

  1. Azure Machine Learning (Azure ML) is a Microsoft Azure-based service for running data science and machine learning workloads at scale in the cloud. To use Azure Machine Learning, you will need an Azure subscription. If you do not already have an Azure subscription, sign up for a free trial at https://azure.microsoft.com.
  2. Complete the Getting Started with Azure Machine Learning lab from the Lab repo in GitHub at https://aka.ms/mslearn-aml-labs (also included inline).

Getting Started with Azure Machine Learning

In this exercise, you will create the Azure Machine Learning workspace and a compute instance, and clone the lab files to your workspace. You'll then run a simple Python experiment in your workspace.

Create an Azure Machine Learning Workspace

As its name suggests, a workspace is a centralized place to manage all of the Azure ML assets you need to work on a machine learning project.

  1. Sign into the Azure portal and create a new resource - search for "machine learning" and select Machine Learning. Specify a unique workspace name, create a new resource group in the region nearest to your location, and select the Enterprise workspace edition.
    Note:

    Basic edition workspaces have lower cost, but don't include capabilities like Auto ML, the Visual Designer, and data drift monitoring. For more details, see Azure Machine Learning pricing.

  2. When the workspace and its associated resources have been created, view the workspace in the portal. You can manage workspace assets in the Azure portal, but for data scientists, this tool contains lots of irrelevant information and links that relate to managing general Azure resources. An alternative, Azure ML-specific web interface for managing workspaces is available.
  3. In the Azure portal blade for your Azure Machine Learning workspace, click the link to launch Azure Machine Learning studio; or alternatively, in a new browser tab, open https://ml.azure.com. If prompted, sign in using the Microsoft account associated with your Azure subscription and select your Azure subscription and workspace.
  4. View the Azure Machine Learning studio interface for your workspace - you can manage all of the assets in your workspace from here.

Create a Compute Instance

You can perform many machine learning tasks in the Studio interface, but it's also important to be able to script configuration tasks and data experiments to make them easier to repeat and automate. Compute Instances provide a virtual machine that you can use as a hosted development workstation to do this.

  1. In the Azure Machine Learning studio web interface for your workspace, view the Compute page. This is where you'll manage all the compute targets for your data science activities.
  2. On the Compute Instances tab, add a new compute instance, giving it a unique name and using the STANDARD_DS3_V2 VM type template. You'll use this VM as a development environment.
  3. If necessary, click Refresh periodically until the compute instance you created has started. Then click its Jupyter link to open Jupyter Notebooks on the VM.
  4. In the notebook environment, on the New menu, click Terminal. This will open a new tab with a command shell.
  5. The Azure Machine Learning SDK is already installed in the compute instance image, but it's worth ensuring you have the latest version, with the optional packages you'll need in this lab; so enter the following command to update the SDK packages:
    More Information:

    For more details about installing the Azure ML SDK and its optional components, see the Azure ML SDK Documentation.

  6. Next, run the following commands to change the current directory to the Users directory, and retrieve the notebooks you will use in this lab:
  7. After the command has completed, close the terminal tab and view the home page in your Jupyter notebook file explorer. Then open the Users folder - it should contain an mslearn-aml-labs folder, containing the files you will use in the rest of this lab.

Use the Azure Machine Learning SDK in a Notebook

  1. In the Users/mslearn-aml-labs folder, open the 01-Getting_Started_with_Azure_ML.ipynb notebook. Then read the notes in the notebook (also included inline), running each code cell in turn.
  2. When you have finished the lab, close all Jupyter tabs and Stop your compute instance to avoid incurring unnecessary costs.

Azure Machine Learning

Azure Machine Learning (Azure ML) is a cloud-based service for creating and managing machine learning solutions. It's designed to help data scientists leverage their existing data processing and model development skills and frameworks, and help them scale their workloads to the cloud. The Azure ML SDK for Python provides classes you can use to work with Azure ML in your Azure subscription.

Check the Azure ML SDK Version

Let's start by importing the azureml-core package and checking the version of the SDK that is installed.

Connect to Your Workspace

All experiments and associated resources are managed within you Azure ML workspace. You can connect to an existing workspace, or create a new one using the Azure ML SDK.

In most cases, you should store the workspace configuration in a JSON configuration file. This makes it easier to reconnect without needing to remember details like your Azure subscription ID. You can download the JSON configuration file from the blade for your workspace in the Azure portal, but if you're using a Compute Instance within your workspace, the configuration file has alreday been downloaded to the root folder.

The code below uses the configuration file to connect to your workspace. The first time you run it in a notebook session, you'll be prompted to sign into Azure by clicking the https://microsoft.com/devicelogin link, entering an automatically generated code, and signing into Azure. After you have successfully signed in, you can close the browser tab that was opened and return to this notebook.

Run an Experiment

One of the most fundamentals tasks that data scientists need to perform is to create and run experiments that process and analyze data. In this exercise, you'll learn how to use an Azure ML experiment to run Python code and record values extracted from data. In this case, you'll use a simple dataset that contains details of patients that have been tested for diabetes. You'll run an experiment to explore the data, extracting statistics, visualizations, and data samples. Most of the code you'll use is fairly generic Python, such as you might run in any data exploration process. However, with the addition of a few lines, the code uses an Azure ML experiment to log details of the run.

View Experiment Results

After the experiment has been finished, you can use the run object to get information about the run and its outputs:

In Jupyter Notebooks, you can use the RunDetails widget to get a better visualization of the run details, while the experiment is running or after it has finished.

Note that the RunDetails widget includes a link to view the run in Azure Machine Learning studio. Click this to open a new browser tab with the run details (you can also just open Azure Machine Learning studio and find the run on the Experiments page). When viewing the run in Azure Machine Learning studio, note the following:

  • The Properties tab contains the general properties of the experiment run.
  • The Metrics tab enables you to select logged metrics and view them as tables or charts.
  • The Images tab enables you to select and view any images or plots that were logged in the experiment (in this case, the Label Distribution plot)
  • The Child Runs tab lists any child runs (in this experiment there are none).
  • The Outputs tab shows the output files generated by the experiment.
  • The Logs tab shows any logs that were generated by the compute context for the experiment (in this case, the experiment was run inline so there are no logs).
  • The Snapshots tab contains all files in the folder where the experiment code was run (in this case, everything in the same folder as this notebook).
  • The Raw JSON tab shows a JSON representation of the experiment details.
  • The Explanations tab is used to show model explanations generated by the experiment (in this case, there are none).

Run an Experiment Script

In the previous example, you ran an experiment inline in this notebook. A more flexible solution is to create a separate script for the experiment, and store it in a folder along with any other files it needs, and then use Azure ML to run the experiment based on the script in the folder.

First, let's create a folder for the experiment files, and copy the data into it:

Now we'll create a Python script containing the code for our experiment, and save it in the experiment folder.

Note:

Running the following cell just creates the script file - it doesn't run it!

This code is a simplified version of the inline code used before. However, note the following:

  • It uses the Run.get_context() method to retrieve the experiment run context when the script is run.
  • It loads the diabetes data from the folder where the script is located.
  • It creates a folder named outputs and writes the sample file to it - this folder is automatically uploaded to the experiment run

Now you're almost ready to run the experiment. There are just a few configuration issues you need to deal with:

  1. Create a Run Configuration that defines the Python code execution environment for the script - in this case, it will automatically create a Conda environment with some default Python packages installed.
  2. Create a Script Configuration that identifies the Python script file to be run in the experiment, and the environment in which to run it.

The following cell sets up these configuration objects, and then submits the experiment.

Note:

This will take a little longer to run the first time, as the conda environment must be created.

As before, you can use the widget or the link to the experiment in Azure Machine Learning studio to view the outputs generated by the experiment, and you can also write code to retrieve the metrics and files it generated:

View Experiment Run History

Now that you've run the same experiment multiple times, you can view the history in Azure Machine Learning studio and explore each logged run. Or you can retrieve an experiment by name from the workspace and iterate through its runs using the SDK:

Now you've seen how to use the Azure ML SDK to view the resources in your workspace and run experiments.

Learn More

Clean Up

On the File menu, click Close and Halt o close this notebook. Then close all Jupyter tabs in your browser and stop your compute instance to minimize costs.