Model

Some methods of tdclient.client.Client returns model object which represents results from REST API.

tdclient.model

class tdclient.model.Model(client: Client)[source]

Bases: object

property client: Client

a tdclient.client.Client instance

Type:

Returns

tdclient.models

tdclient.models.BulkImport = <class 'tdclient.bulk_import_model.BulkImport'>[source]

Bulk-import session on Treasure Data Service

tdclient.models.Database = <class 'tdclient.database_model.Database'>[source]

Database on Treasure Data Service

tdclient.models.Schema = <class 'tdclient.job_model.Schema'>[source]

Schema of a database table on Treasure Data Service

tdclient.models.Job = <class 'tdclient.job_model.Job'>[source]

Job on Treasure Data Service

tdclient.models.Result = <class 'tdclient.result_model.Result'>[source]

Result on Treasure Data Service

tdclient.models.ScheduledJob = <class 'tdclient.schedule_model.ScheduledJob'>[source]

Scheduled job on Treasure Data Service

tdclient.models.Schedule = <class 'tdclient.schedule_model.Schedule'>[source]

Schedule on Treasure Data Service

tdclient.models.Table = <class 'tdclient.table_model.Table'>[source]

Database table on Treasure Data Service

tdclient.models.User = <class 'tdclient.user_model.User'>[source]

User on Treasure Data Service

tdclient.bulk_import_model

class tdclient.bulk_import_model.BulkImport(client: Client, **kwargs: Any)[source]

Bases: Model

Bulk-import session on Treasure Data Service

commit(wait: bool = False, wait_interval: int = 5, timeout: float | None = None) bool[source]

Commit bulk import

delete() bool[source]

Delete bulk import

delete_part(part_name: str) bool[source]

Delete a part of a Bulk Import session

Parameters:

part_name (str) – name of a part of the bulk import session

Returns:

True if succeeded.

error_record_items() Iterator[dict[str, Any]][source]

Fetch error record rows.

Yields:

Error record

freeze() bool[source]

Freeze bulk import

list_parts() list[str][source]

Return the list of available parts uploaded through bulk_import_upload_part().

Returns:

The list of bulk import part name.

Return type:

[str]

perform(wait: bool = False, wait_interval: int = 5, wait_callback: Callable[[Job], None] | None = None, timeout: float | None = None) Job[source]

Perform bulk import

Parameters:
  • wait (bool, optional) – Flag for wait bulk import job. Default False

  • wait_interval (int, optional) – wait interval in second. Default 5.

  • wait_callback (callable, optional) – A callable to be called on every tick of wait interval.

  • timeout (int, optional) – Timeout in seconds. No timeout by default.

unfreeze() bool[source]

Unfreeze bulk import

update() None[source]
upload_file(part_name: str, fmt: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file_like: str | bytes | IO[bytes], **kwargs: Any) None[source]

Upload a part to Bulk Import session, from an existing file on filesystem.

Parameters:
  • part_name (str) – name of a part of the bulk import session

  • fmt (str) – format of data type (e.g. “msgpack”, “json”, “csv”, “tsv”)

  • file_like (str or file-like) – the name of a file, or a file-like object, containing the data

  • **kwargs – extra arguments.

There is more documentation on fmt, file_like and **kwargs at file import parameters.

In particular, for “csv” and “tsv” data, you can change how data columns are parsed using the dtypes and converters arguments.

  • dtypes is a dictionary used to specify a datatype for individual columns, for instance {"col1": "int"}. The available datatypes are "bool", "float", "int", "str" and "guess". If a column is also mentioned in converters, then the function will be used, NOT the datatype.

  • converters is a dictionary used to specify a function that will be used to parse individual columns, for instance {"col1", int}.

The default behaviour is "guess", which makes a best-effort to decide the column datatype. See file import parameters for more details.

upload_part(part_name: str, bytes_or_stream: bytes | bytearray | IO[bytes], size: int) None[source]

Upload a part to bulk import session

Parameters:
  • part_name (str) – name of a part of the bulk import session

  • bytes_or_stream (file-like) – a file-like object contains the part

  • size (int) – the size of the part

STATUS_COMMITTED = 'committed'
STATUS_COMMITTING = 'committing'
STATUS_PERFORMING = 'performing'
STATUS_READY = 'ready'
STATUS_UPLOADING = 'uploading'
property database: str | None

A database name in a string which the bulk import session is working on

property error_parts: int | None

The number of error parts.

property error_records: int | None

The number of error records.

property job_id: str | None

Job ID

property name: str

A name of the bulk import session

property status: str | None

The status of the bulk import session in a string

property table: str | None

A table name in a string which the bulk import session is working on

property upload_frozen: bool | None

The number of upload frozen.

property valid_parts: int | None

The number of valid parts.

property valid_records: int | None

The number of valid records.

tdclient.database_model

class tdclient.database_model.Database(client: Client, db_name: str, **kwargs: Any)[source]

Bases: Model

Database on Treasure Data Service

create_log_table(name: str) bool[source]
Parameters:

name (str) – name of new log table

Returns:

tdclient.model.Table

delete() bool[source]

Delete the database

Returns:

True if success

query(q: str, **kwargs: Any) Job[source]

Run a query on the database

Parameters:

q (str) – a query string

Returns:

tdclient.model.Job

table(table_name: str) Table[source]
Parameters:

table_name (str) – name of a table

Returns:

tdclient.model.Table

tables() list[Table][source]
Returns:

a list of tdclient.model.Table

PERMISSIONS = ['administrator', 'full_access', 'import_only', 'query_only']
PERMISSION_LIST_TABLES = ['administrator', 'full_access']
property count: int | None

Total record counts in a database.

Type:

int

property created_at: datetime | None

datetime.datetime

property name: str

a name of the database

Type:

str

property org_name: str | None

organization name

Type:

str

property permission: str | None

permission for the database (e.g. “administrator”, “full_access”, etc.)

Type:

str

property updated_at: datetime | None

datetime.datetime

tdclient.job_model

class tdclient.job_model.Job(client: Client, job_id: str, type: str, query: str | None, **kwargs: Any)[source]

Bases: Model

Job on Treasure Data Service

error() bool[source]
Returns:

True if the job has been finished in error

finished() bool[source]
Returns:

True if the job has been finished in success, error or killed

kill() str | None[source]

Kill the job

Returns:

a string represents the status of killed job (“queued”, “running”)

killed() bool[source]
Returns:

True if the job has been finished in killed

queued() bool[source]
Returns:

True if the job is queued

result() Iterator[dict[str, Any]][source]
Yields:

an iterator of rows in result set

result_format(fmt: ResultFormat, store_tmpfile: bool = False, num_threads: int = 4) Iterator[dict[str, Any]][source]
Parameters:
  • fmt (str) – output format of result set

  • store_tmpfile (bool, optional) – store result to a temporary file. Works only when fmt is “msgpack”. Default is False.

  • num_threads (int, optional) – number of threads to download result. Works only when store_tmpfile is True. Default is 4.

Yields:

an iterator of rows in result set

running() bool[source]
Returns:

True if the job is running

status() str | None[source]
Returns:

a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)

Return type:

str

success() bool[source]
Returns:

True if the job has been finished in success

update() None[source]

Update all fields of the job

wait(timeout: float | None = None, wait_interval: int = 5, wait_callback: Callable[[Job], None] | None = None) None[source]

Sleep until the job has been finished

Parameters:
  • timeout (int, optional) – Timeout in seconds. No timeout by default.

  • wait_interval (int, optional) – wait interval in second. Default 5 seconds.

  • wait_callback (callable, optional) – A callable to be called on every tick of wait interval.

FINISHED_STATUS = ['success', 'error', 'killed']
JOB_PRIORITY = {-2: 'VERY LOW', -1: 'LOW', 0: 'NORMAL', 1: 'HIGH', 2: 'VERY HIGH'}
STATUS_BOOTING = 'booting'
STATUS_ERROR = 'error'
STATUS_KILLED = 'killed'
STATUS_QUEUED = 'queued'
STATUS_RUNNING = 'running'
STATUS_SUCCESS = 'success'
property database: str | None

a string represents the name of a database that job is running on

property debug: dict[str, Any] | None

a dict of debug output (e.g. “cmdout”, “stderr”)

property id: str

a string represents the identifier of the job

property job_id: str

a string represents the identifier of the job

property linked_result_export_job_id: str | None

Linked result export job ID from query job

property num_records: int | None

the number of records of job result

property org_name: str | None

organization name

property priority: str

a string represents the priority of the job (e.g. “NORMAL”, “HIGH”, etc.)

property query: str | None

a string represents the query string of the job

property result_export_target_job_id: str | None

Associated query job ID from result export job ID

property result_schema: list[list[str]] | None

an array of array represents the type of result columns (Hive specific) (e.g. [[“_c1”, “string”], [“_c2”, “bigint”]])

property result_size: int | None

the length of job result

property result_url: str | None

a string of URL of the result on Treasure Data Service

property retry_limit: int | None

a number for automatic retry count

property type: str

a string represents the engine type of the job (e.g. “hive”, “presto”, etc.)

property url: str | None

a string of URL of the job on Treasure Data Service

property user_name: str | None

executing user name

class tdclient.job_model.Schema(fields: list[Field] | None = None)[source]

Bases: object

Schema of a database table on Treasure Data Service

class Field(name: str, type: str)[source]

Bases: object

property name: str

add docstring

Type:

TODO

property type: str

add docstring

Type:

TODO

add_field(name: str, type: str) None[source]

TODO: add docstring

property fields: list[Field]

add docstring

Type:

TODO

tdclient.result_model

class tdclient.result_model.Result(client: Client, name: str, url: str, org_name: str | None)[source]

Bases: Model

Result on Treasure Data Service

property name: str

a name for a authentication

Type:

str

property org_name: str | None

organization name

Type:

str

property url: str

a result output URL

Type:

str

tdclient.schedule_model

class tdclient.schedule_model.Schedule(client: Client, *args: Any, **kwargs: Any)[source]

Bases: Model

Schedule on Treasure Data Service

run(time: int, num: int | None = None) list[ScheduledJob][source]

Run a scheduled job

Parameters:
  • time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME

  • num (int) – Indicates how many times the query will be executed. Value should be 9 or less.

Returns:

[tdclient.models.ScheduledJob]

property created_at: datetime | None

Create date

Type:

datetime.datetime

property cron: str | None

The configured schedule of a scheduled job.

Returns a string represents the schedule in cron form, or None if the job is not scheduled to run (saved query)

property database: str | None

The target database of a scheduled job

property delay: int | None

A delay ensures all buffered events are imported before running the query.

property name: str | None

The name of a scheduled job

property next_time: datetime | None

Schedule for next run

Type:

datetime.datetime

property org_name: str | None

add docstring

Type:

TODO

property priority: str

The priority of a scheduled job

property query: str | None

The query string of a scheduled job

property result_url: str | None

The result output configuration in URL form of a scheduled job

property retry_limit: int | None

Automatic retry count.

property timezone: str | None

The time zone of a scheduled job

property type: str | None

Query type. {“presto”, “hive”}.

property user_name: str | None

User name of a scheduled job

class tdclient.schedule_model.ScheduledJob(client: Client, scheduled_at: datetime, job_id: str, type: str, query: str | None, **kwargs: Any)[source]

Bases: Job

Scheduled job on Treasure Data Service

property scheduled_at: datetime

a datetime.datetime represents the schedule of next invocation of the job

tdclient.table_model

class tdclient.table_model.Table(*args: Any, **kwargs: Any)[source]

Bases: Model

Database table on Treasure Data Service

delete() str[source]

a string represents the type of deleted table

export_data(storage_type: str, **kwargs: Any) Job[source]

Export data from Treasure Data Service

Parameters:
  • storage_type (str) – type of the storage

  • **kwargs (dict) –

    optional parameters. Assuming the following keys:

    • access_key_id (str):

      ID to access the information to be exported.

    • secret_access_key (str):

      Password for the access_key_id.

    • file_prefix (str, optional):

      Filename of exported file. Default: “<database_name>/<table_name>”

    • file_format (str, optional):

      File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}

    • from (int, optional):

      From Time of the data to be exported in Unix epoch format.

    • to (int, optional):

      End Time of the data to be exported in Unix epoch format.

    • assume_role (str, optional):

      Assume role.

    • bucket (str):

      Name of bucket to be used.

    • domain_key (str, optional):

      Job domain key.

    • pool_name (str, optional):

      For Presto only. Pool name to be used, if not specified, default pool would be used.

Returns:

tdclient.models.Job

import_data(format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], bytes_or_stream: bytes | bytearray | IO[bytes], size: int, unique_id: str | None = None) float[source]

Import data into Treasure Data Service

Parameters:
  • format (str) – format of data type (e.g. “msgpack.gz”)

  • bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data

  • size (int) – the length of the data

  • unique_id (str) – a unique identifier of the data

Returns:

second in float represents elapsed time to import data

import_file(format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], unique_id: str | None = None) float[source]

Import data into Treasure Data Service, from an existing file on filesystem.

This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).

Parameters:
  • file (str or file-like) – a name of a file, or a file-like object contains the data

  • unique_id (str) – a unique identifier of the data

Returns:

float represents the elapsed time to import data

tail(count: int, to: int | None = None, _from: int | None = None) list[dict[str, Any]][source]
Parameters:
  • count (int) – Number for record to show up from the end.

  • to – Deprecated parameter.

  • _from – Deprecated parameter.

Returns:

the contents of the table in reverse order based on the registered time (last data first).

property count: int | None

total number of the table

Type:

int

property created_at: datetime | None

Created datetime

Type:

datetime.datetime

property database_name: str

a string represents the name of the database

property db_name: str

a string represents the name of the database

property estimated_storage_size: int | None

estimated storage size

property estimated_storage_size_string: str

a string represents estimated size of the table in human-readable format

property expire_days: int | None

an int represents the days until expiration

property identifier: str

a string identifier of the table

property last_import: datetime | None

datetime.datetime

property last_log_timestamp: datetime | None

datetime.datetime

property name: str

a string represents the name of the table

property permission: str | None

permission for the database (e.g. “administrator”, “full_access”, etc.)

Type:

str

property primary_key: str | None

add docstring

Type:

TODO

property primary_key_type: str | None

add docstring

Type:

TODO

property schema: list[tuple[str, str, str]] | None

str, alias:str]]: The list of a schema

Type:

[[column_name

Type:

str, column_type

property table_name: str

a string represents the name of the table

property type: str | None

a string represents the type of the table

property updated_at: datetime | None

Updated datetime

Type:

datetime.datetime