Client
tdclient.client.Client class is a public interface for tdclient.
It provides methods for executions for REST API.
tdclient.client
- class tdclient.client.Client(*args: Any, **kwargs: Any)[source]
Bases:
objectAPI Client for Treasure Data Service
- add_apikey(name: str) bool[source]
- Parameters:
name (str) – name of the user
- Returns:
True if success
- add_user(name: str, org: str, email: str, password: str) bool[source]
Add a new user
- Parameters:
name (str) – name of the user
org (str) – organization
email – (str): e-mail address
password (str) – password
- Returns:
True if success
- bulk_import(name: str) BulkImport[source]
Get a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
- bulk_import_delete_part(name: str, part_name: str) bool[source]
Delete a part from a bulk import session
- Parameters:
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
- Returns:
True if success
- bulk_import_error_records(name: str) Iterator[dict[str, Any]][source]
- Parameters:
name (str) – name of a bulk import session
- Returns:
an iterator of error records
- bulk_import_upload_file(name: str, part_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], **kwargs: Any) None[source]
Upload a part to Bulk Import session, from an existing file on filesystem.
- Parameters:
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
format (str) – format of data type (e.g. “msgpack”, “json”, “csv”, “tsv”)
file (str or file-like) – the name of a file, or a file-like object, containing the data
**kwargs – extra arguments.
There is more documentation on format, file and **kwargs at file import parameters.
In particular, for “csv” and “tsv” data, you can change how data columns are parsed using the
dtypesandconvertersarguments.dtypesis a dictionary used to specify a datatype for individual columns, for instance{"col1": "int"}. The available datatypes are"bool","float","int","str"and"guess". If a column is also mentioned inconverters, then the function will be used, NOT the datatype.convertersis a dictionary used to specify a function that will be used to parse individual columns, for instance{"col1", int}.
The default behaviour is
"guess", which makes a best-effort to decide the column datatype. See file import parameters for more details.
- bulk_import_upload_part(name: str, part_name: str, bytes_or_stream: bytes | bytearray | IO[bytes], size: int) None[source]
Upload a part to a bulk import session
- Parameters:
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
bytes_or_stream (file-like) – a file-like object contains the part
size (int) – the size of the part
- bulk_imports() list[BulkImport][source]
List bulk import sessions
- Returns:
a list of
tdclient.models.BulkImport
- change_database(db_name: str, table_name: str, new_db_name: str) bool[source]
Move a target table from it’s original database to new destination database.
- Parameters:
db_name (str) – Target database name.
table_name (str) – Target table name.
new_db_name (str) – Destination database name to be moved.
- Returns:
True if succeeded.
- Return type:
bool
- commit_bulk_import(name: str) bool[source]
Commit a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
True if success
- create_bulk_import(name: str, database: str, table: str, params: BulkImportParams | None = None) BulkImport[source]
Create new bulk import session
- Parameters:
name (str) – name of new bulk import session
database (str) – name of a database
table (str) – name of a table
- Returns:
- create_database(db_name: str, **kwargs: Any) bool[source]
- Parameters:
db_name (str) – name of a database to create
- Returns:
True if success
- create_log_table(db_name: str, table_name: str) bool[source]
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table to create
- Returns:
True if success
- create_result(name: str, url: str, params: ResultParams | None = None) bool[source]
Create a new authentication with the specified name.
- Parameters:
name (str) – Authentication name.
url (str) – Url of the authentication to be created. e.g. “ftp://test.com/”
params (dict, optional) – Extra parameters.
- Returns:
True if succeeded.
- Return type:
bool
- create_schedule(name: str, params: ScheduleParams | None = None) datetime | None[source]
Create a new scheduled query with the specified name.
- Parameters:
name (str) – Scheduled query name.
params (dict, optional) –
Extra parameters.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily","@hourly","10 * * * *"(custom cron)} See also: https://docs.treasuredata.com/articles/#!pd/Scheduling-Jobs-Using-TD-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://docs.treasuredata.com/articles/#!pd/SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
- Returns:
Start date time.
- Return type:
datetime.datetime
- databases() list[Database][source]
- Returns:
a list of
tdclient.models.Database
- delete_bulk_import(name: str) bool[source]
Delete a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
True if success
- delete_database(db_name: str) bool[source]
- Parameters:
db_name (str) – name of database to delete
- Returns:
True if success
- delete_result(name: str) bool[source]
Delete the authentication having the specified name.
- Parameters:
name (str) – Authentication name.
- Returns:
True if succeeded.
- Return type:
bool
- delete_schedule(name: str) tuple[str, str][source]
Delete the scheduled query with the specified name.
- Parameters:
name (str) – Target scheduled query name.
- Returns:
Tuple of cron and query.
- Return type:
(str, str)
- delete_table(db_name: str, table_name: str) str[source]
Delete a table
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
- Returns:
a string represents the type of deleted table
- download_job_result(job_id: str | int, path: str, num_threads: int = 4) bool[source]
Save the job result into a msgpack.gz file. :param job_id: job id :type job_id: str :param path: path to save the result :type path: str :param num_threads: number of threads to download the result.
Default: 4
- Returns:
True if success
- export_data(db_name: str, table_name: str, storage_type: str, params: ExportParams | None = None) Job[source]
Export data from Treasure Data Service
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
storage_type (str) – type of the storage
params (dict) –
optional parameters. Assuming the following keys:
- access_key_id (str):
ID to access the information to be exported.
- secret_access_key (str):
Password for the access_key_id.
- file_prefix (str, optional):
Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
From Time of the data to be exported in Unix epoch format.
- to (int, optional):
End Time of the data to be exported in Unix epoch format.
assume_role (str, optional): Assume role.
- bucket (str):
Name of bucket to be used.
- domain_key (str, optional):
Job domain key.
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- Returns:
- freeze_bulk_import(name: str) bool[source]
Freeze a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
True if success
- history(name: str, _from: int | None = None, to: int | None = None) list[ScheduledJob][source]
Get the history details of the saved query for the past 90days.
- Parameters:
name (str) – Target name of the scheduled query.
_from (int, optional) – Indicates from which nth record in the run history would be fetched. Default: 0. Note: Count starts from zero. This means that the first record in the list has a count of zero.
to (int, optional) – Indicates up to which nth record in the run history would be fetched. Default: 20
- Returns:
- import_data(db_name: str, table_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], bytes_or_stream: bytes | bytearray | IO[bytes], size: int, unique_id: str | None = None) float[source]
Import data into Treasure Data Service
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data
- Returns:
second in float represents elapsed time to import data
- import_file(db_name: str, table_name: str, format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], unique_id: str | None = None) float[source]
Import data into Treasure Data Service, from an existing file on filesystem.
This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack”, “json”)
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data
- Returns:
float represents the elapsed time to import data
- job(job_id: str | int) Job[source]
Get a job from job_id
- Parameters:
job_id (str) – job id
- Returns:
- job_result(job_id: str | int) list[Any][source]
- Parameters:
job_id (str) – job id
- Returns:
a list of each rows in result set
- job_result_each(job_id: str | int) Iterator[dict[str, Any]][source]
- Parameters:
job_id (str) – job id
- Returns:
an iterator of result set
- job_result_format(job_id: str | int, format: Literal['msgpack', 'json', 'csv', 'tsv'], header: bool = False) list[Any][source]
- Parameters:
job_id (str) – job id
format (str) – output format of result set
- Returns:
a list of each rows in result set
- job_result_format_each(job_id: str | int, format: Literal['msgpack', 'json', 'csv', 'tsv'], header: bool = False, store_tmpfile: bool = False, num_threads: int = 4) Iterator[dict[str, Any]][source]
- Parameters:
job_id (str) – job id
format (str) – output format of result set
header (bool, optional) – include header in the result set. Default: False
store_tmpfile (bool, optional) – store result to a temporary file. Works only when fmt is “msgpack”. Default is False.
num_threads (int, optional) – number of threads to download result. Works only when store_tmpfile is True. Default is 4.
- Returns:
an iterator of rows in result set
- job_status(job_id: str | int) str[source]
- Parameters:
job_id (str) – job id
- Returns:
a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)
- jobs(_from: int | None = None, to: int | None = None, status: str | None = None, conditions: dict[str, Any] | None = None) list[Job][source]
List jobs
- Parameters:
_from (int, optional) – Gets the Job from the nth index in the list. Default: 0.
to (int, optional) – Gets the Job up to the nth index in the list. By default, the first 20 jobs in the list are displayed
status (str, optional) – Filter by given status. {“queued”, “running”, “success”, “error”}
conditions (dict[str, Any], optional) – Condition for
TIMESTAMPDIFF()to search for slow queries. Avoid using this parameter as it can be dangerous.
- Returns:
a list of
tdclient.models.Job
- kill(job_id: str | int) str | None[source]
- Parameters:
job_id (str) – job id
- Returns:
a string represents the status of killed job (“queued”, “running”)
- list_apikeys(name: str) list[str][source]
- Parameters:
name (str) – name of the user
- Returns:
a list of string of API key
- list_bulk_import_parts(name: str) list[str][source]
List parts of a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
a list of string represents the name of parts
- perform_bulk_import(name: str) Job[source]
Perform a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
- query(db_name: str, q: str, result_url: str | None = None, priority: Literal[-2, -1, 0, 1, 2, 'VERY LOW', 'LOW', 'NORMAL', 'HIGH', 'VERY HIGH'] | None = None, retry_limit: int | None = None, type: str = 'hive', **kwargs: Any) Job[source]
Run a query on specified database table.
- Parameters:
db_name (str) – name of a database
q (str) – a query string
result_url (str) – result output URL. e.g.,
postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>priority (int or str) – priority (e.g. “NORMAL”, “HIGH”, etc.)
retry_limit (int) – retry limit
type (str) – name of a query engine
- Returns:
- Raises:
ValueError – if unknown query type has been specified
- remove_apikey(name: str, apikey: str) bool[source]
- Parameters:
name (str) – name of the user
apikey (str) – an API key to remove
- Returns:
True if success
- remove_user(name: str) bool[source]
Remove a user
- Parameters:
name (str) – name of the user
- Returns:
True if success
- results() list[Result][source]
Get the list of all the available authentications.
- Returns:
a list of
tdclient.models.Result
- run_schedule(name: str, time: int, num: int | None = None) list[ScheduledJob][source]
Execute the specified query.
- Parameters:
name (str) – Target scheduled query name.
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int) – Indicates how many times the query will be executed. Value should be 9 or less.
- Returns:
- swap_table(db_name: str, table_name1: str, table_name2: str) bool[source]
- Parameters:
db_name (str) – name of a database
table_name1 (str) – original table name
table_name2 (str) – table name you want to rename to
- Returns:
True if success
- table(db_name: str, table_name: str) Table[source]
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
- Returns:
- Raises:
tdclient.api.NotFoundError – if the table doesn’t exist
- tables(db_name: str) list[Table][source]
List existing tables
- Parameters:
db_name (str) – name of a database
- Returns:
a list of
tdclient.models.Table
- tail(db_name: str, table_name: str, count: int, to: None = None, _from: None = None, block: None = None) list[dict[str, Any]][source]
Get the contents of the table in reverse order based on the registered time (last data first).
- Parameters:
db_name (str) – Target database name.
table_name (str) – Target table name.
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
block – Deprecated parameter.
- Returns:
Contents of the table.
- Return type:
[dict]
- unfreeze_bulk_import(name: str) bool[source]
Unfreeze a bulk import session
- Parameters:
name (str) – name of a bulk import session
- Returns:
True if success
- update_expire(db_name: str, table_name: str, expire_days: int) bool[source]
Set expiration date to a table
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
epire_days (int) – expiration date in days from today
- Returns:
True if success
- update_schedule(name: str, params: ScheduleParams | None = None) None[source]
Update the scheduled query.
- Parameters:
name (str) – Target scheduled query name.
params (dict) –
Extra parameters.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily","@hourly","10 * * * *"(custom cron)} See also: https://docs.treasuredata.com/articles/#!pd/Scheduling-Jobs-Using-TD-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://docs.treasuredata.com/articles/#!pd/SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
- update_schema(db_name: str, table_name: str, schema: list[list[str]]) bool[source]
Updates the schema of a table
- Parameters:
db_name (str) – name of a database
table_name (str) – name of a table
schema (list) –
a dictionary object represents the schema definition (will be converted to JSON) e.g.
[ ["member_id", # column name "string", # data type "mem_id", # alias of the column name ], ["row_index", "long", "row_ind"], ... ]
- Returns:
True if success
- users()[source]
List users
- Returns:
a list of
tdclient.models.User
- property api: API
an instance of
tdclient.api.API
- property apikey: str | None
API key string.