Client

tdclient.client.Client class is a public interface for tdclient. It provides methods for executions for REST API.

tdclient.client

class tdclient.client.Client(*args: Any, **kwargs: Any)[source]

Bases: object

API Client for Treasure Data Service

add_apikey(name: str) bool[source]
Parameters:

name (str) – name of the user

Returns:

True if success

add_user(name: str, org: str, email: str, password: str) bool[source]

Add a new user

Parameters:
  • name (str) – name of the user

  • org (str) – organization

  • email – (str): e-mail address

  • password (str) – password

Returns:

True if success

bulk_import(name: str) BulkImport[source]

Get a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

tdclient.models.BulkImport

bulk_import_delete_part(name: str, part_name: str) bool[source]

Delete a part from a bulk import session

Parameters:
  • name (str) – name of a bulk import session

  • part_name (str) – name of a part of the bulk import session

Returns:

True if success

bulk_import_error_records(name: str) Iterator[dict[str, Any]][source]
Parameters:

name (str) – name of a bulk import session

Returns:

an iterator of error records

bulk_import_upload_file(name: str, part_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], **kwargs: Any) None[source]

Upload a part to Bulk Import session, from an existing file on filesystem.

Parameters:
  • name (str) – name of a bulk import session

  • part_name (str) – name of a part of the bulk import session

  • format (str) – format of data type (e.g. “msgpack”, “json”, “csv”, “tsv”)

  • file (str or file-like) – the name of a file, or a file-like object, containing the data

  • **kwargs – extra arguments.

There is more documentation on format, file and **kwargs at file import parameters.

In particular, for “csv” and “tsv” data, you can change how data columns are parsed using the dtypes and converters arguments.

  • dtypes is a dictionary used to specify a datatype for individual columns, for instance {"col1": "int"}. The available datatypes are "bool", "float", "int", "str" and "guess". If a column is also mentioned in converters, then the function will be used, NOT the datatype.

  • converters is a dictionary used to specify a function that will be used to parse individual columns, for instance {"col1", int}.

The default behaviour is "guess", which makes a best-effort to decide the column datatype. See file import parameters for more details.

bulk_import_upload_part(name: str, part_name: str, bytes_or_stream: bytes | bytearray | IO[bytes], size: int) None[source]

Upload a part to a bulk import session

Parameters:
  • name (str) – name of a bulk import session

  • part_name (str) – name of a part of the bulk import session

  • bytes_or_stream (file-like) – a file-like object contains the part

  • size (int) – the size of the part

bulk_imports() list[BulkImport][source]

List bulk import sessions

Returns:

a list of tdclient.models.BulkImport

change_database(db_name: str, table_name: str, new_db_name: str) bool[source]

Move a target table from it’s original database to new destination database.

Parameters:
  • db_name (str) – Target database name.

  • table_name (str) – Target table name.

  • new_db_name (str) – Destination database name to be moved.

Returns:

True if succeeded.

Return type:

bool

close() None[source]

Close opened API connections.

commit_bulk_import(name: str) bool[source]

Commit a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

True if success

create_bulk_import(name: str, database: str, table: str, params: BulkImportParams | None = None) BulkImport[source]

Create new bulk import session

Parameters:
  • name (str) – name of new bulk import session

  • database (str) – name of a database

  • table (str) – name of a table

Returns:

tdclient.models.BulkImport

create_database(db_name: str, **kwargs: Any) bool[source]
Parameters:

db_name (str) – name of a database to create

Returns:

True if success

create_log_table(db_name: str, table_name: str) bool[source]
Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table to create

Returns:

True if success

create_result(name: str, url: str, params: ResultParams | None = None) bool[source]

Create a new authentication with the specified name.

Parameters:
  • name (str) – Authentication name.

  • url (str) – Url of the authentication to be created. e.g. “ftp://test.com/

  • params (dict, optional) – Extra parameters.

Returns:

True if succeeded.

Return type:

bool

create_schedule(name: str, params: ScheduleParams | None = None) datetime | None[source]

Create a new scheduled query with the specified name.

Parameters:
  • name (str) – Scheduled query name.

  • params (dict, optional) –

    Extra parameters.

    • type (str):

      Query type. {“presto”, “hive”}. Default: “hive”

    • database (str):

      Target database name.

    • timezone (str):

      Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752

    • cron (str, optional):

      Schedule of the query. {"@daily", "@hourly", "10 * * * *" (custom cron)} See also: https://docs.treasuredata.com/articles/#!pd/Scheduling-Jobs-Using-TD-Console

    • delay (int, optional):

      A delay ensures all buffered events are imported before running the query. Default: 0

    • query (str):

      Is a language used to retrieve, insert, update and modify data. See also: https://docs.treasuredata.com/articles/#!pd/SQL-Examples-of-Scheduled-Queries

    • priority (int, optional):

      Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0

    • retry_limit (int, optional):

      Automatic retry count. Default: 0

    • engine_version (str, optional):

      Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}

    • pool_name (str, optional):

      For Presto only. Pool name to be used, if not specified, default pool would be used.

    • result (str, optional):

      Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’

Returns:

Start date time.

Return type:

datetime.datetime

database(db_name: str) Database[source]
Parameters:

db_name (str) – name of a database

Returns:

tdclient.models.Database

databases() list[Database][source]
Returns:

a list of tdclient.models.Database

delete_bulk_import(name: str) bool[source]

Delete a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

True if success

delete_database(db_name: str) bool[source]
Parameters:

db_name (str) – name of database to delete

Returns:

True if success

delete_result(name: str) bool[source]

Delete the authentication having the specified name.

Parameters:

name (str) – Authentication name.

Returns:

True if succeeded.

Return type:

bool

delete_schedule(name: str) tuple[str, str][source]

Delete the scheduled query with the specified name.

Parameters:

name (str) – Target scheduled query name.

Returns:

Tuple of cron and query.

Return type:

(str, str)

delete_table(db_name: str, table_name: str) str[source]

Delete a table

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

Returns:

a string represents the type of deleted table

download_job_result(job_id: str | int, path: str, num_threads: int = 4) bool[source]

Save the job result into a msgpack.gz file. :param job_id: job id :type job_id: str :param path: path to save the result :type path: str :param num_threads: number of threads to download the result.

Default: 4

Returns:

True if success

export_data(db_name: str, table_name: str, storage_type: str, params: ExportParams | None = None) Job[source]

Export data from Treasure Data Service

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

  • storage_type (str) – type of the storage

  • params (dict) –

    optional parameters. Assuming the following keys:

    • access_key_id (str):

      ID to access the information to be exported.

    • secret_access_key (str):

      Password for the access_key_id.

    • file_prefix (str, optional):

      Filename of exported file. Default: “<database_name>/<table_name>”

    • file_format (str, optional):

      File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}

    • from (int, optional):

      From Time of the data to be exported in Unix epoch format.

    • to (int, optional):

      End Time of the data to be exported in Unix epoch format.

    • assume_role (str, optional): Assume role.

    • bucket (str):

      Name of bucket to be used.

    • domain_key (str, optional):

      Job domain key.

    • pool_name (str, optional):

      For Presto only. Pool name to be used, if not specified, default pool would be used.

Returns:

tdclient.models.Job

freeze_bulk_import(name: str) bool[source]

Freeze a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

True if success

history(name: str, _from: int | None = None, to: int | None = None) list[ScheduledJob][source]

Get the history details of the saved query for the past 90days.

Parameters:
  • name (str) – Target name of the scheduled query.

  • _from (int, optional) – Indicates from which nth record in the run history would be fetched. Default: 0. Note: Count starts from zero. This means that the first record in the list has a count of zero.

  • to (int, optional) – Indicates up to which nth record in the run history would be fetched. Default: 20

Returns:

[tdclient.models.ScheduledJob]

import_data(db_name: str, table_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], bytes_or_stream: bytes | bytearray | IO[bytes], size: int, unique_id: str | None = None) float[source]

Import data into Treasure Data Service

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

  • format (str) – format of data type (e.g. “msgpack.gz”)

  • bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data

  • size (int) – the length of the data

  • unique_id (str) – a unique identifier of the data

Returns:

second in float represents elapsed time to import data

import_file(db_name: str, table_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], unique_id: str | None = None) float[source]

Import data into Treasure Data Service, from an existing file on filesystem.

This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

  • format (str) – format of data type (e.g. “msgpack”, “json”)

  • file (str or file-like) – a name of a file, or a file-like object contains the data

  • unique_id (str) – a unique identifier of the data

Returns:

float represents the elapsed time to import data

job(job_id: str | int) Job[source]

Get a job from job_id

Parameters:

job_id (str) – job id

Returns:

tdclient.models.Job

job_result(job_id: str | int) list[Any][source]
Parameters:

job_id (str) – job id

Returns:

a list of each rows in result set

job_result_each(job_id: str | int) Iterator[dict[str, Any]][source]
Parameters:

job_id (str) – job id

Returns:

an iterator of result set

job_result_format(job_id: str | int, format: Literal['msgpack', 'json', 'csv', 'tsv'], header: bool = False) list[Any][source]
Parameters:
  • job_id (str) – job id

  • format (str) – output format of result set

Returns:

a list of each rows in result set

job_result_format_each(job_id: str | int, format: Literal['msgpack', 'json', 'csv', 'tsv'], header: bool = False, store_tmpfile: bool = False, num_threads: int = 4) Iterator[dict[str, Any]][source]
Parameters:
  • job_id (str) – job id

  • format (str) – output format of result set

  • header (bool, optional) – include header in the result set. Default: False

  • store_tmpfile (bool, optional) – store result to a temporary file. Works only when fmt is “msgpack”. Default is False.

  • num_threads (int, optional) – number of threads to download result. Works only when store_tmpfile is True. Default is 4.

Returns:

an iterator of rows in result set

job_status(job_id: str | int) str[source]
Parameters:

job_id (str) – job id

Returns:

a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)

jobs(_from: int | None = None, to: int | None = None, status: str | None = None, conditions: dict[str, Any] | None = None) list[Job][source]

List jobs

Parameters:
  • _from (int, optional) – Gets the Job from the nth index in the list. Default: 0.

  • to (int, optional) – Gets the Job up to the nth index in the list. By default, the first 20 jobs in the list are displayed

  • status (str, optional) – Filter by given status. {“queued”, “running”, “success”, “error”}

  • conditions (dict[str, Any], optional) – Condition for TIMESTAMPDIFF() to search for slow queries. Avoid using this parameter as it can be dangerous.

Returns:

a list of tdclient.models.Job

kill(job_id: str | int) str | None[source]
Parameters:

job_id (str) – job id

Returns:

a string represents the status of killed job (“queued”, “running”)

list_apikeys(name: str) list[str][source]
Parameters:

name (str) – name of the user

Returns:

a list of string of API key

list_bulk_import_parts(name: str) list[str][source]

List parts of a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

a list of string represents the name of parts

perform_bulk_import(name: str) Job[source]

Perform a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

tdclient.models.Job

query(db_name: str, q: str, result_url: str | None = None, priority: Literal[-2, -1, 0, 1, 2, 'VERY LOW', 'LOW', 'NORMAL', 'HIGH', 'VERY HIGH'] | None = None, retry_limit: int | None = None, type: str = 'hive', **kwargs: Any) Job[source]

Run a query on specified database table.

Parameters:
  • db_name (str) – name of a database

  • q (str) – a query string

  • result_url (str) – result output URL. e.g., postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>

  • priority (int or str) – priority (e.g. “NORMAL”, “HIGH”, etc.)

  • retry_limit (int) – retry limit

  • type (str) – name of a query engine

Returns:

tdclient.models.Job

Raises:

ValueError – if unknown query type has been specified

remove_apikey(name: str, apikey: str) bool[source]
Parameters:
  • name (str) – name of the user

  • apikey (str) – an API key to remove

Returns:

True if success

remove_user(name: str) bool[source]

Remove a user

Parameters:

name (str) – name of the user

Returns:

True if success

results() list[Result][source]

Get the list of all the available authentications.

Returns:

a list of tdclient.models.Result

run_schedule(name: str, time: int, num: int | None = None) list[ScheduledJob][source]

Execute the specified query.

Parameters:
  • name (str) – Target scheduled query name.

  • time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME

  • num (int) – Indicates how many times the query will be executed. Value should be 9 or less.

Returns:

[tdclient.models.ScheduledJob]

schedules() list[Schedule][source]

Get the list of all the scheduled queries.

Returns:

[tdclient.models.Schedule]

server_status() str[source]
Returns:

a string represents current server status.

swap_table(db_name: str, table_name1: str, table_name2: str) bool[source]
Parameters:
  • db_name (str) – name of a database

  • table_name1 (str) – original table name

  • table_name2 (str) – table name you want to rename to

Returns:

True if success

table(db_name: str, table_name: str) Table[source]
Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

Returns:

tdclient.models.Table

Raises:

tdclient.api.NotFoundError – if the table doesn’t exist

tables(db_name: str) list[Table][source]

List existing tables

Parameters:

db_name (str) – name of a database

Returns:

a list of tdclient.models.Table

tail(db_name: str, table_name: str, count: int, to: None = None, _from: None = None, block: None = None) list[dict[str, Any]][source]

Get the contents of the table in reverse order based on the registered time (last data first).

Parameters:
  • db_name (str) – Target database name.

  • table_name (str) – Target table name.

  • count (int) – Number for record to show up from the end.

  • to – Deprecated parameter.

  • _from – Deprecated parameter.

  • block – Deprecated parameter.

Returns:

Contents of the table.

Return type:

[dict]

unfreeze_bulk_import(name: str) bool[source]

Unfreeze a bulk import session

Parameters:

name (str) – name of a bulk import session

Returns:

True if success

update_expire(db_name: str, table_name: str, expire_days: int) bool[source]

Set expiration date to a table

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

  • epire_days (int) – expiration date in days from today

Returns:

True if success

update_schedule(name: str, params: ScheduleParams | None = None) None[source]

Update the scheduled query.

Parameters:
  • name (str) – Target scheduled query name.

  • params (dict) –

    Extra parameters.

    • type (str):

      Query type. {“presto”, “hive”}. Default: “hive”

    • database (str):

      Target database name.

    • timezone (str):

      Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752

    • cron (str, optional):

      Schedule of the query. {"@daily", "@hourly", "10 * * * *" (custom cron)} See also: https://docs.treasuredata.com/articles/#!pd/Scheduling-Jobs-Using-TD-Console

    • delay (int, optional):

      A delay ensures all buffered events are imported before running the query. Default: 0

    • query (str):

      Is a language used to retrieve, insert, update and modify data. See also: https://docs.treasuredata.com/articles/#!pd/SQL-Examples-of-Scheduled-Queries

    • priority (int, optional):

      Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0

    • retry_limit (int, optional):

      Automatic retry count. Default: 0

    • engine_version (str, optional):

      Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}

    • pool_name (str, optional):

      For Presto only. Pool name to be used, if not specified, default pool would be used.

    • result (str, optional):

      Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’

update_schema(db_name: str, table_name: str, schema: list[list[str]]) bool[source]

Updates the schema of a table

Parameters:
  • db_name (str) – name of a database

  • table_name (str) – name of a table

  • schema (list) –

    a dictionary object represents the schema definition (will be converted to JSON) e.g.

    [
        ["member_id", # column name
         "string", # data type
         "mem_id", # alias of the column name
        ],
        ["row_index", "long", "row_ind"],
        ...
    ]
    

Returns:

True if success

users()[source]

List users

Returns:

a list of tdclient.models.User

property api: API

an instance of tdclient.api.API

property apikey: str | None

API key string.

tdclient.client.job_from_dict(client: Client, dd: dict[str, Any], **values: Any) Job[source]