Client¶

tdclient.client.Client class is a public interface for tdclient. It provides methods for executions for REST API.

tdclient.client¶

class tdclient.client.Client(*args, **kwargs)[source]¶

Bases: object

API Client for Treasure Data Service

add_apikey(name)[source]¶

Parameters: name (str) – name of the user
Returns: True if success

add_user(name, org, email, password)[source]¶

Add a new user

Parameters

name (str) – name of the user
org (str) – organization
email – (str): e-mail address
password (str) – password

Returns

True if success

bulk_import(name)[source]¶

Get a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: tdclient.models.BulkImport

bulk_import_delete_part(name, part_name)[source]¶

Delete a part from a bulk import session

Parameters

name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session

Returns

True if success

bulk_import_error_records(name)[source]¶

Parameters: name (str) – name of a bulk import session
Returns: an iterator of error records

bulk_import_upload_file(name, part_name, format, file, **kwargs)[source]¶

Upload a part to Bulk Import session, from an existing file on filesystem.

Parameters

name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
format (str) – format of data type (e.g. “msgpack”, “json”, “csv”, “tsv”)
file (str or file-like) – the name of a file, or a file-like object, containing the data
**kwargs – extra arguments.

There is more documentation on format, file and **kwargs at file import parameters.

In particular, for “csv” and “tsv” data, you can change how data columns are parsed using the dtypes and converters arguments.

dtypes is a dictionary used to specify a datatype for individual columns, for instance {"col1": "int"}. The available datatypes are "bool", "float", "int", "str" and "guess". If a column is also mentioned in converters, then the function will be used, NOT the datatype.
converters is a dictionary used to specify a function that will be used to parse individual columns, for instace {"col1", int}.

The default behaviour is "guess", which makes a best-effort to decide the column datatype. See file import parameters for more details.

bulk_import_upload_part(name, part_name, bytes_or_stream, size)[source]¶

Upload a part to a bulk import session

Parameters

name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
bytes_or_stream (file-like) – a file-like object contains the part
size (int) – the size of the part

bulk_imports()[source]¶

List bulk import sessions

Returns: a list of tdclient.models.BulkImport

change_database(db_name, table_name, new_db_name)[source]¶

Move a target table from it’s original database to new destination database.

Parameters

db_name (str) – Target database name.
table_name (str) – Target table name.
new_db_name (str) – Destination database name to be moved.

Returns

True if succeeded.

Return type

bool

close()[source]¶: Close opened API connections.

commit_bulk_import(name)[source]¶

Commit a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: True if success

create_bulk_import(name, database, table, params=None)[source]¶

Create new bulk import session

Parameters

name (str) – name of new bulk import session
database (str) – name of a database
table (str) – name of a table

Returns

tdclient.models.BulkImport

create_database(db_name, **kwargs)[source]¶

Parameters: db_name (str) – name of a database to create
Returns: True if success

create_log_table(db_name, table_name)[source]¶

Parameters

db_name (str) – name of a database
table_name (str) – name of a table to create

Returns

True if success

create_result(name, url, params=None)[source]¶

Create a new authentication with the specified name.

Parameters

name (str) – Authentication name.
url (str) – Url of the authentication to be created. e.g. “ftp://test.com/”
params (dict, optional) – Extra parameters.

Returns

True if succeeded.

Return type

bool

create_schedule(name, params=None)[source]¶

Create a new scheduled query with the specified name.

Parameters

name (str) – Scheduled query name.
params (dict, optional) –
Extra parameters.
- type (str):
  Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
  Target database name.
- timezone (str):
  Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
  Schedule of the query. {"@daily", "@hourly", "10 * * * *" (custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
  A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
  Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
  Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
  Automatic retry count. Default: 0
- engine_version (str, optional):
  Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
  For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
  Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’

Returns

Start date time.

Return type

datetime.datetime

database(db_name)[source]¶

Parameters: db_name (str) – name of a database
Returns: tdclient.models.Database

databases()[source]¶

Returns: a list of tdclient.models.Database

delete_bulk_import(name)[source]¶

Delete a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: True if success

delete_database(db_name)[source]¶

Parameters: db_name (str) – name of database to delete
Returns: True if success

delete_result(name)[source]¶

Delete the authentication having the specified name.

Parameters: name (str) – Authentication name.
Returns: True if succeeded.
Return type: bool

delete_schedule(name)[source]¶

Delete the scheduled query with the specified name.

Parameters: name (str) – Target scheduled query name.
Returns: Tuple of cron and query.
Return type: (str, str)

delete_table(db_name, table_name)[source]¶

Delete a table

Parameters

db_name (str) – name of a database
table_name (str) – name of a table

Returns

a string represents the type of deleted table

export_data(db_name, table_name, storage_type, params=None)[source]¶

Export data from Treasure Data Service

Parameters

db_name (str) – name of a database
table_name (str) – name of a table
storage_type (str) – type of the storage
params (dict) –
optional parameters. Assuming the following keys:
- access_key_id (str):
  ID to access the information to be exported.
- secret_access_key (str):
  Password for the access_key_id.
- file_prefix (str, optional):
  Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
  File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
  From Time of the data to be exported in Unix epoch format.
- to (int, optional):
  End Time of the data to be exported in Unix epoch format.
- assume_role (str, optional): Assume role.
- bucket (str):
  Name of bucket to be used.
- domain_key (str, optional):
  Job domain key.
- pool_name (str, optional):
  For Presto only. Pool name to be used, if not specified, default pool would be used.

Returns

tdclient.models.Job

freeze_bulk_import(name)[source]¶

Freeze a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: True if success

history(name, _from=None, to=None)[source]¶

Get the history details of the saved query for the past 90days.

Parameters

name (str) – Target name of the scheduled query.
_from (int, optional) – Indicates from which nth record in the run history would be fetched. Default: 0. Note: Count starts from zero. This means that the first record in the list has a count of zero.
to (int, optional) – Indicates up to which nth record in the run history would be fetched. Default: 20

Returns

[tdclient.models.ScheduledJob]

import_data(db_name, table_name, format, bytes_or_stream, size, unique_id=None)[source]¶

Import data into Treasure Data Service

Parameters

db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data

Returns

second in float represents elapsed time to import data

import_file(db_name, table_name, format, file, unique_id=None)[source]¶

Import data into Treasure Data Service, from an existing file on filesystem.

This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).

Parameters

db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack”, “json”)
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data

Returns

float represents the elapsed time to import data

job(job_id)[source]¶

Get a job from job_id

Parameters: job_id (str) – job id
Returns: tdclient.models.Job

job_result(job_id)[source]¶

Parameters: job_id (str) – job id
Returns: a list of each rows in result set

job_result_each(job_id)[source]¶

Parameters: job_id (str) – job id
Returns: an iterator of result set

job_result_format(job_id, format)[source]¶

Parameters

job_id (str) – job id
format (str) – output format of result set

Returns

a list of each rows in result set

job_result_format_each(job_id, format)[source]¶

Parameters

job_id (str) – job id
format (str) – output format of result set

Returns

an iterator of rows in result set

job_status(job_id)[source]¶

Parameters: job_id (str) – job id
Returns: a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)

jobs(_from=None, to=None, status=None, conditions=None)[source]¶

List jobs

Parameters

_from (int, optional) – Gets the Job from the nth index in the list. Default: 0.
to (int, optional) – Gets the Job up to the nth index in the list. By default, the first 20 jobs in the list are displayed
status (str, optional) – Filter by given status. {“queued”, “running”, “success”, “error”}
conditions (str, optional) – Condition for TIMESTAMPDIFF() to search for slow queries. Avoid using this parameter as it can be dangerous.

Returns

a list of tdclient.models.Job

kill(job_id)[source]¶

Parameters: job_id (str) – job id
Returns: a string represents the status of killed job (“queued”, “running”)

list_apikeys(name)[source]¶

Parameters: name (str) – name of the user
Returns: a list of string of API key

list_bulk_import_parts(name)[source]¶

List parts of a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: a list of string represents the name of parts

partial_delete(db_name, table_name, to, _from, params=None)[source]¶

Create a job to partially delete the contents of the table with the given time range.

Parameters

db_name (str) – Target database name.
table_name (str) – Target table name.
to (int) – Time in Unix Epoch format indicating the End date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
_from (int) – Time in Unix Epoch format indicating the Start date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
params (dict, optional) –
Extra parameters.
- pool_name (str, optional):
  Indicates the resource pool to execute this job. If not provided, the account’s default resource pool would be used.
- domain_key (str, optional):
  Domain key that will be assigned to the partial delete job to be created

Returns

tdclient.models.Job

perform_bulk_import(name)[source]¶

Perform a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: tdclient.models.Job

query(db_name, q, result_url=None, priority=None, retry_limit=None, type='hive', **kwargs)[source]¶

Run a query on specified database table.

Parameters

db_name (str) – name of a database
q (str) – a query string
result_url (str) – result output URL. e.g., postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>
priority (int or str) – priority (e.g. “NORMAL”, “HIGH”, etc.)
retry_limit (int) – retry limit
type (str) – name of a query engine

Returns

tdclient.models.Job

Raises

ValueError – if unknown query type has been specified

remove_apikey(name, apikey)[source]¶

Parameters

name (str) – name of the user
apikey (str) – an API key to remove

Returns

True if success

remove_user(name)[source]¶

Remove a user

Parameters: name (str) – name of the user
Returns: True if success

results()[source]¶

Get the list of all the available authentications.

Returns: a list of tdclient.models.Result

run_schedule(name, time, num)[source]¶

Execute the specified query.

Parameters

name (str) – Target scheduled query name.
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int) – Indicates how many times the query will be executed. Value should be 9 or less.

Returns

[tdclient.models.ScheduledJob]

schedules()[source]¶

Get the list of all the scheduled queries.

Returns: [tdclient.models.Schedule]

server_status()[source]¶

Returns: a string represents current server status.

swap_table(db_name, table_name1, table_name2)[source]¶

Parameters

db_name (str) – name of a database
table_name1 (str) – original table name
table_name2 (str) – table name you want to rename to

Returns

True if success

table(db_name, table_name)[source]¶

Parameters

db_name (str) – name of a database
table_name (str) – name of a table

Returns

tdclient.models.Table

Raises

tdclient.api.NotFoundError – if the table doesn’t exist

tables(db_name)[source]¶

List existing tables

Parameters: db_name (str) – name of a database
Returns: a list of tdclient.models.Table

tail(db_name, table_name, count, to=None, _from=None, block=None)[source]¶

Get the contents of the table in reverse order based on the registered time (last data first).

Parameters

db_name (str) – Target database name.
table_name (str) – Target table name.
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
block – Deprecated parameter.

Returns

Contents of the table.

Return type

[dict]

unfreeze_bulk_import(name)[source]¶

Unfreeze a bulk import session

Parameters: name (str) – name of a bulk import session
Returns: True if success

update_expire(db_name, table_name, expire_days)[source]¶

Set expiration date to a table

Parameters

db_name (str) – name of a database
table_name (str) – name of a table
epire_days (int) – expiration date in days from today

Returns

True if success

update_schedule(name, params=None)[source]¶

Update the scheduled query.

Parameters

name (str) – Target scheduled query name.
params (dict) –
Extra parameteres.
- type (str):
  Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
  Target database name.
- timezone (str):
  Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
  Schedule of the query. {"@daily", "@hourly", "10 * * * *" (custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
  A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
  Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
  Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
  Automatic retry count. Default: 0
- engine_version (str, optional):
  Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
  For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
  Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’

update_schema(db_name, table_name, schema)[source]¶

Updates the schema of a table

Parameters

db_name (str) – name of a database
table_name (str) – name of a table

schema (list) –

a dictionary object represents the schema definition (will be converted to JSON) e.g.

[
    ["member_id", # column name
     "string", # data type
     "mem_id", # alias of the column name
    ],
    ["row_index", "long", "row_ind"],
    ...
]

Returns

True if success

users()[source]¶

List users

Returns: a list of tdclient.models.User

property api¶: an instance of tdclient.api.API

property apikey¶: API key string.

tdclient.client.job_from_dict(client, dd, **values)[source]¶