Installation

Puddle is currently supported on Microsoft Azure, and accounts are tied to each user’s Azure account. Additional cloud offerings will be added at a later date.

Azure VPC Setup Guide

This topic describes how to set up Puddle on Azure Cloud. This topic is divided into multiple sections:

  • Puddle for Azure Architecture
  • Setting up Azure Resources like a resource group, vnet, etc.
  • Setting up Runtime Dependencies like PostgreSQL database and Redis
  • Setting up the Puddle Application

Puddle for Azure Architecture

The image below describes the components that work together to build and run Puddle for Azure.

Puddle/Azure architecture

Set up Azure Resources

The first step is to set up Azure resources. All of these operations require a user login in at portal.azure.com.

Create a Resource Group

  1. After you are logged in to Azure, go to the Resource groups blade.
  2. Click Add.
  3. Fill in the form.
    • Remember the name. We will need the name in later steps.
    • Remember the location. We will need the location in later steps.
  4. Click Create.

Create a Network Security Group

  1. Search for Network security groups. Note: Do not use the option with (classic) suffix.
  2. Click Add.
  3. Fill in the form.
    • Place the Network security group into the Resource group created in first step.
    • Set the location of the Network security group to the same value that you specifed when creating the Resource group.
    • Remember the name of the Network security group. We will need the name in later steps.
  4. Click Create.
  5. Select the newly created Network security group.
  6. Select Inbound security rules.
  7. Click Add.
  8. Set 22, 8888, 12345, 54321 as the Destination port ranges.
    • 22 is SSH. We need to open this port to be able to SSH into Virtual machines launched by Puddle.
    • 8888 is Jupyter. We need to open this port to be able to access Jupyter.
    • 12345 is the Driverless AI UI. We need to open this port to be able to access Driverless AI.
    • 54321 is H2O-3. We need to open this port to be able to access H2O-3 and H2O Flow.

Create Virtual Network and Subnet

  1. Go to the Virtual networks blade.
  2. Click Add.
  3. Fill in the form.
    • Place the Virtual network into the Resource group created in the first step.
    • Set the location of the Virtual network to the same value that you specifed when creating the Resource group.
    • Remember the name of the Virtual network. We will need the name in later steps.
    • Remember the name of the Subnet. We will need the name in later steps.

Create a Virtual Machine

  1. Go to the Virtual Machines blade.

  2. Click Add.

  3. Fill in the form.

    • Place the Virtual machine into the Resource group created in the first step.
    • Set the location of the Virtual machine to the same location you created in the first step.
    • Use Ubuntu Server 18.04 LTS as Image.
    • Select Standard B2s as Size.
    • Select the Authentication type that best suites your needs. Note that the SSH public key is strongly recommended.
  4. Click Next: Disks.

  5. Under OS disk type, select Standard SSD.

  6. Click Next: Networking.

  7. Under Virtual network, specify the Virtual network created in the previous step.

  8. Under Subnet, specify the Subnet created in the previous step.

  9. Set NIC network security group to Advanced.

  10. Click Review + create.

  11. Click Create.

  12. Wait for provisioning to complete.

  13. Go to the Virtual machines blade.

  14. Select the newly created Virtual machine.

  15. Click on Configure next to DNS name.

  16. Set Static under Assignment.

  17. Pick a DNS name. We will need this DNS name later.

  18. Click Save.

  19. Go back to the newly created Virtual machine.

  20. Select Networking.

  21. There should be two security groups available. Make sure the one not explicitly created has rules that allow inbound from ports 22, 80 and 433.

    • If you do not want to allow HTTP connections, then port 80 should not be allowed.

    • If you need to add a rule, then click on Add inbound port rule and fill in the form.

      • set 22, 443 and possibly 80 under the destination port ranges.
    • Click Add.

Create App Registration and Enterprise Application

  1. Go to Azure Active Directory blade.
  2. Select App Registrations.
  3. Click New registration.
  4. Fill in the form.
    • In Supported account types, select Accounts in this organizational directory only (msmarketplaceh2o (Default Directory)).
    • Remember the name of the app registration. We will need the name in later steps.
    • This will create the Enterprise application as well.
  5. Click Register.
  6. Select Manifest.
  7. Set the “appRoles” key to the following value:
json
"appRoles": [
    {
        "allowedMemberTypes": [
            "User"
        ],
        "description": "Users have basic set of permissions in Puddle.",
        "displayName": "User",
        "id": "77e5fac4-3f2a-497d-a70f-1c3e9ac72c83",
        "isEnabled": true,
        "lang": null,
        "origin": "Application",
        "value": "User"
    },
    {
        "allowedMemberTypes": [
            "User"
        ],
        "description": "Administrators have extended permissions in Puddle.",
        "displayName": "Administrator",
        "id": "d1c2ade8-98f8-45fd-aa4a-6d06b947c66f",
        "isEnabled": true,
        "lang": null,
        "origin": "Application",
        "value": "Administrator"
    }
]

Each role definition in this manifest must have a different valid GUID for the id key. We will need the id of the Administrator role in later steps.

  1. Click Save.
  2. Select Authentication.
  3. Add Redirect URI with a value of https://<Puddle Server DNS>/login-azure-callback.
  4. Enable both Access tokens and ID Tokens.
  5. Click Save.
  6. Go to the Azure Active Directory blade.
  7. Select Enterprise applications.
  8. Select the newly created application.
  9. Select the Properties blade.
  10. Set Yes for the Enabled for users to sign-in? option.
  11. If you want only selected users to be able to log in, then set User assignment required? to Yes.
  12. Set Visible to users? to Yes.
  13. Click Save.

Create First Administrator

  1. Go to the Azure Active Directory blade.
  2. Select Enterprise applications.
  3. Select Users and groups.
  4. Click Add user.
  5. Select the desired user.
  6. Click Select.
  7. Assign the Administrator role.
  8. Click Select.
  9. Click Assign.

These steps can be used to add as many Administrators as required. The User role is used to revoke the Administrator access for the user.

Add Roles to Service Principal

  1. Go to the Resource groups blade.
  2. Select the newly created Resource group.
  3. Select Access control (IAM).
  4. Select Role assignments.
  5. Click Add.
  6. Set Owner as Role.
  7. Fill in the App name to Select.
  8. Click Save.
  9. Click Add.
  10. Set User Access Administrator as Role.
  11. Fill in the App name to Select.
  12. Click Save.

Runtime Dependencies

After the basic setup of Azure resources is completed, the next step is to set up runtime dependencies for Puddle.

PostgreSQL Database

Run the following steps to provision the PostgreSQL database.

  1. Search for Azure Database for PostgreSQL servers.
  2. Click Add.
  3. Fill in the form.
  • Place the PostgreSQL database into the Resource group created in the first step.

  • Set the location of the PostgreSQL database to the location that you specified in the first step.

  • Remember the Server admin login name and Password. We will need them in later steps.

  • Set the version to 10.

  • Click on Pricing tier.

    • Select Basic.
    • Set vCore to 2 vCores.
    • Set storage to 50GB.
  1. Click Create to begin provisioning.

Provisioning of the PostgreSQL database will take a few minutes, but we can continue with other steps.

Redis

Run the following steps to provision Redis.

  1. Search for Azure Cache for Redis.
  2. Click Add.
  3. Fill in the form.
    • Place the Redis into the Resource group created in first step.
    • Set the location of the Redis to the same location that you specified in the first step.
    • Set Pricing tier to Standard C1.
  4. Click Create

Provisioning of the Redis will take a few minutes, but we can continue with other steps.

Puddle Application

For this part, we will need to create a Virtual machine where the Puddle application will run. Then we will configure nginx and create a configuration file for Puddle. After those are complete, we can start Puddle.

Additional PostgreSQL Configuration

  1. Search for Azure Database for PostgreSQL servers.
  2. Select the newly created database.
  3. Select Connection security.
  4. Click Add client IP.
  5. Use the Public IP of the Virtual machine as the Start IP and End IP.
  6. Click Save.

Review the Resource Group

The newly created Resource group should now contain these items (some of them are created implicitly):

  • Azure Database for PostgreSQL server
  • Azure Cache for Redis
  • Virtual machine
  • Disk
  • Network interface
  • Public IP address
  • Network security group
  • Storage account
  • Network security group

Install Terraform and Packer

Puddle needs Terraform and Packer for provisioning and destroying the resources in Azure. To install Terraform and Packer, run the following. Note: Do not use Terraform from snap.

wget https://releases.hashicorp.com/terraform/0.11.13/terraform_0.11.13_linux_amd64.zip
unzip terraform_0.11.13_linux_amd64.zip
rm terraform_0.11.13_linux_amd64.zip
sudo mv terraform /usr/bin/
terraform -v

sudo apt install unzip
wget https://releases.hashicorp.com/packer/1.4.0/packer_1.4.0_linux_amd64.zip
unzip packer_1.4.0_linux_amd64.zip
rm packer_1.4.0_linux_amd64.zip
sudo mv packer /usr/bin/
packer -v

Puddle needs ssh keys to communicate with the launched systems.

ssh-keygen -t rsa -b 4096

Create the License File

  1. ssh into the Virtual Machine.
  2. Create a file containing the license. Remember the path to this file. We will need that a later step.

Create a Configuration File

Now we will need to create the config.yaml file. The config yaml should contain the following:

redis:
  connection:
    protocol:               # protocol
    address:                # address with port
    password:               # password if any
    tls:                    # whether to use tls

db:
  connection:
    drivername:             # driver, only postgres is supported at the moment
    host:                   # hostname without port
    port:                   # port
    user:                   # db user
    dbname:                 # name of the database
    sslmode:                # whether to use ssl or not
    password:               # db user password

tls:
  certFile:                 # optional, path to PEM encoded cert file. HTTP is used if not set.
  keyFile:                  # optional, path to PEM encoded key file. HTTP is used if not set.

connection:
  port:                     # port where backend should be running. If HTTPS is enabled defaults to 443. If HTTPS is disabled defaults to 80.

license:
  file:                     # path to file with license key, must be readable

ssh:
  publicKey:                # path to public key; used by Puddle to talk to Systems (upload config.toml, etc.)
  privateKey:               # path to private key; used by Puddle to talk to Systems (upload config.toml, etc.)

auth:
  token:
    secret:                 # secret used when signing tokens for clients
  azureAD:
    enabled:                # whether to enable authentication using Azure ActiveDirectory. False by default.

packer:
  path:                     # path to packer binary, defaults to /usr/bin/packer

terraform:
  path:                     # path to terraform binary, defaults to /usr/bin/packer

providers:
  azure:
    authority:              # authority used to get Azure Active Directory tokens
    location:               # azure location, for example eastus
    rg:                     # resource group name, for example puddle-RG
    vnet:                   # virtual network name, for example puddle-vnet
    sg:                     # network security group name, for example puddle-SG
    subnet:                 # subnet name, for example puddle-subnet
    adminRoleId:            # role id of the administrator

products:
  dai:
    configTomlTemplatePath: # path to config.toml which is used as default in all Systems. If unset, then an empty file is used.

logs:
  dir:                      # directory where logs should be placed
  1. ssh into the Virtual machine.
  2. Create a new empty directory, and in this directory create a file named config.yaml.
  3. Fill in all the fields in the config.yaml.
  • Values for redis.connection.* can be found in following way:
    • Search for Azure Cache for Redis.
    • Select newly created Redis instance.
    • Select Access keys.
  • Values for db.connection.* can be found in following way:
    • Search for Azure Database for PostgreSQL servers.
    • Select the newly created PostgreSQL instance.
    • Select Connection strings.
    • Use the password that was provided when creating the PostgreSQL database.
  • tls.certFile should point to the PEM encoded .crt file if you want to use HTTPS. If you don’t want to use HTTPS, leave this property empty. If you set this property, then tls.keyFile must be set as well.
  • tls.keyFile should point to the PEM encoded .key file if you want to use HTTPS. If you don’t want to use HTTPS, leave this property empty. If you set this property, then tls.certFile must be set as well.
  • connection.port should contain the port where Puddle should be running. This defaults to 80 in the case of HTTP and 443 in the case of HTTPS. Please note that for some ports (like 80 or 443), you might need root privileges.
  • license.file should be a path to the file containing the license (created in previous step).
  • auth.token.secret should be a random string. It is used to encrypt the tokens between the backend and frontend.
  • auth.azureAD.enabled should be true/false and is false by default. If true, then authentication using Azure Active Directory is enabled.
  • ssh.publicKey should be the path to ssh public key (for example /home/myuser/.ssh/id_rsa.pub), which will be used by Puddle to talk to the Systems. If this ssh key is changed, Puddle won’t be able to talk to the Systems created with old key, and these will have to be destroyed.
  • ssh.privateKey should be the path to ssh private key (for example /home/myuser/.ssh/id_rsa), which will be used by Puddle to talk to the Systems. If this ssh key is changed, Puddle won’t be able to talk to the Systems created with old key, and these will have to be destroyed.
  • packer.path should point to the packer binary. Defaults to /usr/bin/packer.
  • terraform.path should point to the terraform binary. Defaults to /usr/bin/terraform.
  • providers.azure.authority should be set to https://login.microsoftonline.com/<Azure ActiveDirectory Name>.onmicrosoft.com.
  • The Azure Active Directory name can be found in following way:
    • Go to Azure Active Directory blade.
    • Select Overview.
  • providers.azure.location should be set to the same value that was specified for the Resource group, for example eastus.
  • providers.azure.rg should be set to the name of the newly created Resource group.
  • providers.azure.vnet should be set to the name of the newly created Virtual network.
  • providers.azure.sg should be set to the name of the newly created Network security group.
  • providers.azure.subnet should be set to the name of the newly created Subnet.
  • providers.azure.adminRoleId should be set to the ID of the newly created Administator Role in the Application Registration Manifest.
  • The Administator Role ID can be found in following way:
    • Go to the Azure Active Directory blade.
    • Select App registrations (preview).
    • Select the newly created App registration.
    • Select Manifest.
    • Search for Administator role under appRoles and use the ID of this role.
  • products.dai.configTomlTemplatePath should be the path to custom config.toml file, which will be used as default configuration for all new Driverless AI Systems. If not set, the default file is used.
  • logs.dir should be set to a directory where logs should be placed. If a Docker image is used, this directory has to be writable by everyone (or at least Puddle user, which has uid 1000 in the Docker image).

Configuring Environment Variables

The next step is to configure environment variables:

export AZURE_SUBSCRIPTION_ID="<YOUR SUBSCRIPTION ID>"
export AZURE_TENANT_ID="<YOUR TENANT ID>"
export AZURE_CLIENT_ID="<APP REGISTRATION ID>"
export AZURE_CLIENT_SECRET="<APP REGISTRATION SECRET>"

export PUDDLE_CONFIG_DIR="/home/vmadmin/h2o-puddle-conf/"
  • AZURE_SUBSCRIPTION_ID is the ID of the subscription that should be used.
    • This value can be found in following way:
      • Search for Subscriptions.
      • Use the SUBSCRIPTION ID of the subscription you want to use.
  • AZURE_TENANT_ID is ID of tenant that should be used.
    • This value can be found in following way:
      • Select Azure Active Directory blade.
      • Select App registrations (preview).
      • Select the newly created App registration.
      • Use Directory (tenant) ID.
  • AZURE_CLIENT_ID is the Application ID that should be used.
    • This value can be found in following way:
      • Select Azure Active Directory blade.
      • Select App registrations (preview).
      • Select the newly created App registration.
      • Use Application (client) ID.
  • AZURE_CLIENT_SECRET client secret that should be used
    • This value can be found in following way:
      • Select the Azure Active Directory blade.
      • Select App registrations (preview).
      • Select the newly created App registration.
      • Select Certificates & Secrets.
      • Click New client secret.
      • Fill in the form and click Add.
      • The secret value should be visible. Copy it because after refreshing the page, this value is gone and cannot be restored.
  • PUDDLE_CONFIG_DIR directory where the config.yaml file is present.

Running Puddle

After all of the previous steps are successfully completed, we can now start Puddle. Execute the following command in the Go Binary directory to start the server and web UI:

./puddle-server

Puddle is accessible on port 8088.

First Steps

At first, you will have to perform some initialization steps:

  1. Log in to Puddle as the Administrator.
  2. Go to Administration > Check Updates.
  3. Either use the update plan from the default URL location, or specify a custom update plan file.
  4. Click Submit.
  5. Review the plan and click Apply.
  6. Go to Administration > Images.
  7. Build all the images you want to use. Please be aware this can take up to 1 hour.

Once the images are built, your Puddle instance is ready.

Stats Board (Optional)

The stats board is an optional component. It’s distributed as Python wheel, and it requires Python 3.6. It’s recommended (although not necessary) to run the board inside a virtual environment.

Some initial configuration is required for the stats board. The easiest way is to add the values below to the existing config.yaml. However, if stats board is running on a separate machine, then create a new config.yaml file containing the following:

db:
  connection:
    drivername:                     # driver, only postgres is supported at the moment
    host:                           # hostname without port
    port:                           # port
    user:                           # db user
    dbname:                         # name of the database
    sslmode:                        # whether to use ssl or not, possible values are require, disable, verify-full, verify-ca
    password:                       # db user password
stats:
  board:
    host:                           # Host where the board is running. Defaults to localhost.

logs:
  dir: /home/vmadmin/h2o-puddle/log # Directory where to place logs

Please note that the machine where the stats board is running needs access to the PostgreSQL database.

You must also set the environment variable PUDDLE_CONFIG_DIR, which points to the directory where the config.yaml file is present (similar to the case of Puddle).

Use the following to install the stats board. Please note that this command will install dependencies as well:

pip install puddle_stats_board-1.0.0-py3-none-any.whl

Use the following to run the stats board:

python -m puddle.stats.board.app

The stats board is running on port 8050 and is accessible from Puddle UI at http://<PUDDLE_SERVER_ADDRESS>/board. (There is a link in the Administration menu as well.) There’s a reverse proxy inside Puddle backend that takes care of making requests to the correct host and port.