Introduction
Users have requested the ability to update datasets programmatically, here we describe actions that can be carried out with appropriate credentials in place. Rather than a full manual of actions this information is provided as a series of code snippets to allow users to get started with some of the most required actions. The code snippets are provided in python but the functions are URL end points so any code to call a URL with a POST method will work appropriately.
Access
Most of these actions require a Provider or higher level users access, the dataset:view action can be completed by someone with a Viewer role. To carry out these actions requires your organisation api key and secret, this can be found by going to your profile in uSmart and under settings you will see your Key and Secret, we do not store secrets if you do not know it you can safely regenerate it although any code that relies on this secret will then need updating.
We’re providing code snippets including the header auth method but we would advise against writing production code with the API key secret in plain text. If an API key is believe to have been exposed we would recommend refreshing the secret immediately.
As this uses a personal API key we would suggest creating team or dataset dummy users to generate the API key so if a flow is setup with an individual and that user leaves, workflows that were setup by that user will will continue to function.
Organisation GUID
The programmatic access works with a POST method to an action, as part of the path to the action includes your Organisation GUID, this can be found in your dcat “@id" key and is the alphanumeric string between “….io/org/” and “/dcat/…” so “28ccd497-7acd-4470-bd17-721d5cbbd6ef” in the example below. Note this is also available in the dcat url and is required when making API calls to datasets
uSmart Actions
dataset:view
The simplest action and required to validate other actions:
import json import requests # A function to return dataset information. def dataset_view(dataset_guid): url = "https://data.usmart.io/org/[Your Organisation GUID]/dataset:view" payload = json.dumps({ "datasetGUID": dataset_guid }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json' } dataset = requests.request("POST", url, headers=headers, data=payload) return dataset.json() # Example function call dataset_guid = "[your dataset GUID]" output = dataset_view(dataset_guid) print(output) # Output file output_file_path = "dataset_view_output.json" with open(output_file_path, 'w', encoding='utf-8') as f: json.dump(output, f, indent=4) print(f"Output written to {output_file_path}")
The dataset GUID can also be found from the dcat and is used when accessing the API of a dataset, an example is highlighted below.
The return from the dataset_view
function above will be a json including all of the meta data, descriptions around the APIs and Files that make up a dataset, and provides useful information for many of the other actions that will be discussed.
file:create
Add a data file to an existing dataset, you need the dataset GUID and the file to call this function. 2 functions are presented here but these could be refactored into a single function. The s3:generatePutRequest action generates a signed URL that is used with the file:create action to enable a user to upload a file to our AWS S3 service, this action is also used by our file:updateRevision action
import json import os import requests # Generates the signed request to load the data into AWS def generatePutRequest_from_s3(fileName): url = "https://data.usmart.io/org/[Your Organisation GUID]/s3:generatePutRequest" payload = json.dumps({ "fileName": fileName }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() def dataset_file_create(filename, dataset_guid): getS3Putrequest = generatePutRequest_from_s3(filename) signedRequest_url = getS3Putrequest["result"]["signedRequest"] with open(filename, 'rb') as f: headers = { 'Content-Type': getS3Putrequest["result"]["contentType"] } http_response = requests.put(signedRequest_url, headers=headers, data=f) fileSize = os.path.getsize(filename) url = "https://data.usmart.io/org/[Your Organisation GUID]/file:create" payload = json.dumps({ "datasetGUID": dataset_guid, "files": [ { "reference": getS3Putrequest["result"]["reference"], "fileName": filename, "fileSize": fileSize } ] }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() # Example function call filename = r"[Your file here]" dataset_guid = "[Your Dataset GUID]" output = dataset_file_create(filename, dataset_guid) print(output)
Note above filename will be a path to file, this is what shows in the UI so you will expose any folder structure if you run this code from a directory other than where the data is located. This is designed for the UI to upload a file where the path is simply the filename. Dataset GUID
resourceContainer:create
Create an output pipeline to enable data sharing, you need the dataset GUID and a pipeline ID to call this function.
import requests import json def create_output_pipeline(dataset_guid, pipelineId=1, pipelineParameters=[]): url = "https://data.usmart.io/org/[Your Organisation GUID]/resourceContainer:create" payload = json.dumps({ "datasetGUID": dataset_guid, "pipelineId": pipelineId, "pipelineParameters": pipelineParameters }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() # Example function call dataset_guid = "[Your Dataset GUID]" # pipelineId is used to determine the type of output required # pipelineId = 1: "Plain API" # pipelineId = 2: "Spatial API" # pipelineId = 3: "Raw File Download" # pipelineId = 4: "CSV, JSON and XML Download" # pipelineId = 5: "Spatial API from Spatial File" # pipelineId = 6: "Real-time Data API" output = create_output_pipeline(dataset_guid, pipelineId) print(output)
The status of this action can be determined with the dataset:view, typically this will complete in less than a minute but for large datasets or those with a spatial API this could be over 30minutes. Polling dataset:view to check on status would be advisable.
While we’ve provided the full list of pipelineId’s we’re not supporting the real-time Data API currently as other actions may be required to enable. We can look to support this in the future.
file:updateRevision
Use this action to replace an existing dataset file with a new file, you need the file GUID which you can get with the dataset:view action and the new file to call this action. This uses the s3:generatePutRequest action that was also required for file:create
import json import os import requests # Generates the signed request to load the data into AWS def generatePutRequest_from_s3(fileName): url = "https://data.usmart.io/org/[Your Organisation GUID]/s3:generatePutRequest" payload = json.dumps({ "fileName": fileName }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() def dataset_file_update(filename, file_guid): getS3Putrequest = generatePutRequest_from_s3(filename) signedRequest_url = getS3Putrequest["result"]["signedRequest"] with open(filename, 'rb') as f: headers = { 'Content-Type': getS3Putrequest["result"]["contentType"] } http_response = requests.put(signedRequest_url, headers=headers, data=f) fileSize = os.path.getsize(filename) url = "https://data.usmart.io/org/[Your Organisation GUID]/file:updateRevision" payload = json.dumps({ "reference": getS3Putrequest["result"]["reference"], "fileName": filename, "fileGUID": file_guid, "fileSize": fileSize, "action": "file:updateRevision" }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() # Example function call filename = r"[Your file here]" file_guid = "[your file GUID]" output = dataset_file_update(filename, file_guid) print(output)
This uses file GUID to determine the specific file that is to be updated, this can be determined by reviewing the dataset:view action, an example is provided below.
resourceContainer:process
Use this action to update an output pipeline after updating a file, the function below can be called using resourceContainerGUID which can be sourced from the response to dataset:view, the code snippet below includes a function to refresh all output pipelines of a dataset using the dataset:view function above and is called with the dataset GUID.
import requests import json # Refresh the pipelines after updating a dataset def resourceContainer_refresh(resourceContainerGUID, pipelineParameters=[]): url = "https://data.usmart.io/org/[Your Organisation GUID]/resourceContainer:process" payload = json.dumps({ "pipelineParameters": [], "resourceContainerGUID": resourceContainerGUID }) headers = { 'api-key-secret': '[your api-key-secret]', 'api-key-id': '[your api-key-id]', 'Content-Type': 'application/json', } response = requests.request("POST", url, headers=headers, data=payload) return response.json() def refresh_all_resourceContainer_from_dataset(dataset_guid): dataset = dataset_view(dataset_guid) for resourceContainer in dataset["result"]["resourceContainers"]: if resourceContainer["status"] == "expired": resourceContainer_refresh(resourceContainer["resourceContainerGUID"]) # Example function call dataset_guid = "[Your Dataset GUID]" output = refresh_all_resourceContainer_from_dataset(dataset_guid) print(output)
The resourceContainerGUID can be found in the response to the dataset:view action and an example is highlighted below:
Closing thoughts and future
These are our most commonly used actions from the UI and should enable most use cases. We will look to document and support other actions in the future depending on demand. We have not provided support for enabling data access to Redshift and SQL at this point and more actions are currently required to setup the Schema and update Redshift from S3.
Add Comment