Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Introduction

Users have requested the ability to update datasets programmatically, here we describe actions that can be carried out with appropriate credentials in place. Rather than a full manual of actions this information is provided as a series of code snippets to allow users to get started with some of the most required actions. The code snippets are provided in python but the functions are URL end points so any code to call a URL with a POST method will work appropriately.

Access

Most of these actions require a Provider or higher level users access, the dataset:view action can be completed by someone with a Viewer role. To carry out these actions requires your organisation api key and secret, this can be found by going to your profile in uSmart and under settings you will see your Key and Secret, we do not store secrets if you do not know it you can safely regenerate it although any code that relies on this secret will then need updating.

We’re providing code snippets including the header auth method but we would advise against writing production code with the api key secret in plain text. If an api key is believe to have been exposed we would recommend refreshing the secret immediately.

Organisation GUID

The programmatic access works with a POST method to an action, as part of the path to the action includes your Organisation GUID, this can be found in your dcat “@id" key and is the alphanumeric string between “….io/org/” and “/dcat/…” so “28ccd497-7acd-4470-bd17-721d5cbbd6ef” in the example below. Note this is also available in the dcat url and is required when making API calls to datasets

image-20240521-152059.png

uSmart Actions

dataset:view

The simplest action and required to validate other actions:

import json
import requests

# A function to return dataset information.
def dataset_view(dataset_guid):
    url = "https://data.usmart.io/org/[Your Organisation GUID]/dataset:view"

    payload = json.dumps({
        "datasetGUID": dataset_guid
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json'
    }

    dataset = requests.request("POST", url, headers=headers, data=payload)

    return dataset.json()

# Example function call
dataset_guid = "[your dataset GUID]"
output = dataset_view(dataset_guid)
print(output)
# Output file
output_file_path = "dataset_view_output.json"
with open(output_file_path, 'w', encoding='utf-8') as f:
    json.dump(output, f, indent=4)
print(f"Output written to {output_file_path}")

The dataset GUID can also be found from the dcat and is used when accessing the API of a dataset, an example is highlighted below.

image-20240521-152800.png

The return from the dataset_view function above will be a json including all of the meta data, descriptions around the APIs and Files that make up a dataset, and provides useful information for many of the other actions that will be discussed.

file:create

Add a data file to an existing dataset, you need the dataset GUID and the file to call this function. 2 functions are presented here but these could be refactored into a single function. The s3:generatePutRequest action generates a signed URL that is used with the file:create action to enable a user to upload a file to our AWS S3 service, this action is also used by our file:updateRevision action

import json
import os
import requests

# Generates the signed request to load the data into AWS
def generatePutRequest_from_s3(fileName):
    url = "https://data.usmart.io/org/[Your Organisation GUID]/s3:generatePutRequest"
    payload = json.dumps({
        "fileName": fileName
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }

    response = requests.request("POST", url, headers=headers, data=payload)
    return response.json()
    
def dataset_file_create(filename, dataset_guid):
    getS3Putrequest = generatePutRequest_from_s3(filename)
    signedRequest_url = getS3Putrequest["result"]["signedRequest"]
    with open(filename, 'rb') as f:
        headers = {
            'Content-Type': getS3Putrequest["result"]["contentType"]
        }
        http_response = requests.put(signedRequest_url, headers=headers, data=f)
    fileSize = os.path.getsize(filename)

    url = "https://data.usmart.io/org/[Your Organisation GUID]/file:create"

    payload = json.dumps({
        "datasetGUID": dataset_guid,
        "files": [
            {
                "reference": getS3Putrequest["result"]["reference"],
                "fileName": filename,
                "fileSize": fileSize
            }
        ]
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }
    response = requests.request("POST", url, headers=headers, data=payload)
    return response.json()
# Example function call
filename = r"[Your file here]"
dataset_guid = "[Your Dataset GUID]"
output = dataset_file_create(filename, dataset_guid)
print(output)

Note above filename will be a path to file, this is what shows in the UI so you will expose any folder structure if you run this code from a directory other than where the data is located. This is designed for the UI to upload a file where the path is simply the filename. Dataset GUID

resourceContainer:create

Create an output pipeline to enable data sharing, you need the dataset GUID and a pipeline ID to call this function.

import requests
import json

def create_output_pipeline(dataset_guid, pipelineId=1, pipelineParameters=[]):
    url = "https://data.usmart.io/org/[Your Organisation GUID]/resourceContainer:create"

    payload = json.dumps({
        "datasetGUID": dataset_guid,
        "pipelineId": pipelineId,
        "pipelineParameters": pipelineParameters
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    return response.json()
    
# Example function call
dataset_guid = "[Your Dataset GUID]"

# pipelineId is used to determine the type of output required
# pipelineId = 1: "Plain API"
# pipelineId = 2: "Spatial API"
# pipelineId = 3: "Raw File Download"
# pipelineId = 4: "CSV, JSON and XML Download"
# pipelineId = 5: "Spatial API from Spatial File"
# pipelineId = 6: "Real-time Data API"

output = create_output_pipeline(dataset_guid, pipelineId)
print(output)

The status of this action can be determined with the dataset:view, typically this will complete in less than a minute but for large datasets or those with a spatial API this could be over 30minutes. Polling dataset:view to check on status would be advisable.

While we’ve provided the full list of pipelineId’s we’re not supporting the real-time Data API currently as other actions may be required to enable. We can look to support this in the future.

file:updateRevision

Use this action to replace an existing dataset file with a new file, you need the file GUID which you can get with the dataset:view action and the new file to call this action. This uses the s3:generatePutRequest action that was also required for file:create

import json
import os
import requests

# Generates the signed request to load the data into AWS
def generatePutRequest_from_s3(fileName):
    url = "https://data.usmart.io/org/[Your Organisation GUID]/s3:generatePutRequest"
    payload = json.dumps({
        "fileName": fileName
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }

    response = requests.request("POST", url, headers=headers, data=payload)
    return response.json()
    
def dataset_file_update(filename, file_guid):
    getS3Putrequest = generatePutRequest_from_s3(filename)
    signedRequest_url = getS3Putrequest["result"]["signedRequest"]
    with open(filename, 'rb') as f:
        headers = {
            'Content-Type': getS3Putrequest["result"]["contentType"]
        }
        http_response = requests.put(signedRequest_url, headers=headers, data=f)
    fileSize = os.path.getsize(filename)

    url = "https://data.usmart.io/org/[Your Organisation GUID]/file:updateRevision"

    payload = json.dumps({
        "reference": getS3Putrequest["result"]["reference"],
        "fileName": filename,
        "fileGUID": file_guid,
        "fileSize": fileSize,
        "action": "file:updateRevision"
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    return response.json()

# Example function call
filename = r"[Your file here]"
file_guid = "[your file GUID]"
output = dataset_file_update(filename, file_guid)
print(output)

This uses file GUID to determine the specific file that is to be updated, this can be determined by reviewing the dataset:view action, an example is provided below.

image-20240527-085121.png

resourceContainer:process

Use this action to update an output pipeline after updating a file, the function below can be called using resourceContainerGUID which can be sourced from the response to dataset:view, the code snippet below includes a function to refresh all output pipelines of a dataset using the dataset:view function above and is called with the dataset GUID.

import requests
import json

# Refresh the pipelines after updating a dataset
def resourceContainer_refresh(resourceContainerGUID, pipelineParameters=[]):
    url = "https://data.usmart.io/org/[Your Organisation GUID]/resourceContainer:process"

    payload = json.dumps({
        "pipelineParameters": [],
        "resourceContainerGUID": resourceContainerGUID
    })
    headers = {
        'api-key-secret': '[your api-key-secret]',
        'api-key-id': '[your api-key-id]',
        'Content-Type': 'application/json',
    }

    response = requests.request("POST", url, headers=headers, data=payload)

    return response.json()
    
def refresh_all_resourceContainer_from_dataset(dataset_guid):
    dataset = dataset_view(dataset_guid)
    for resourceContainer in dataset["result"]["resourceContainers"]:
        if resourceContainer["status"] == "expired":
            resourceContainer_refresh(resourceContainer["resourceContainerGUID"])

# Example function call
dataset_guid = "[Your Dataset GUID]"
output = refresh_all_resourceContainer_from_dataset(dataset_guid)
print(output)

The resourceContainerGUID can be found in the response to the dataset:view action and an example is highlighted below:

image-20240527-103033.png

Closing thoughts and future

This is our most commonly used actions from the UI and should enable most use cases. We will look to document and support other actions in the future depending on demand. We have not provided support for enabling data access to Redshift and SQL at this point and more actions are currently required to setup the Schema and update Redshift from S3.

  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.