Model
Some methods of tdclient.client.Client returns model object which represents results from REST API.
tdclient.model
tdclient.models
- tdclient.models.BulkImport = <class 'tdclient.bulk_import_model.BulkImport'>[source]
Bulk-import session on Treasure Data Service
- tdclient.models.Database = <class 'tdclient.database_model.Database'>[source]
Database on Treasure Data Service
- tdclient.models.Schema = <class 'tdclient.job_model.Schema'>[source]
Schema of a database table on Treasure Data Service
- tdclient.models.Result = <class 'tdclient.result_model.Result'>[source]
Result on Treasure Data Service
- tdclient.models.ScheduledJob = <class 'tdclient.schedule_model.ScheduledJob'>[source]
Scheduled job on Treasure Data Service
- tdclient.models.Schedule = <class 'tdclient.schedule_model.Schedule'>[source]
Schedule on Treasure Data Service
tdclient.bulk_import_model
- class tdclient.bulk_import_model.BulkImport(client: Client, **kwargs: Any)[source]
Bases:
ModelBulk-import session on Treasure Data Service
- commit(wait: bool = False, wait_interval: int = 5, timeout: float | None = None) bool[source]
Commit bulk import
- delete_part(part_name: str) bool[source]
Delete a part of a Bulk Import session
- Parameters:
part_name (str) – name of a part of the bulk import session
- Returns:
True if succeeded.
- error_record_items() Iterator[dict[str, Any]][source]
Fetch error record rows.
- Yields:
Error record
- list_parts() list[str][source]
Return the list of available parts uploaded through
bulk_import_upload_part().- Returns:
The list of bulk import part name.
- Return type:
[str]
- perform(wait: bool = False, wait_interval: int = 5, wait_callback: Callable[[Job], None] | None = None, timeout: float | None = None) Job[source]
Perform bulk import
- Parameters:
wait (bool, optional) – Flag for wait bulk import job. Default False
wait_interval (int, optional) – wait interval in second. Default 5.
wait_callback (callable, optional) – A callable to be called on every tick of wait interval.
timeout (int, optional) – Timeout in seconds. No timeout by default.
- upload_file(part_name: str, fmt: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file_like: str | bytes | IO[bytes], **kwargs: Any) None[source]
Upload a part to Bulk Import session, from an existing file on filesystem.
- Parameters:
part_name (str) – name of a part of the bulk import session
fmt (str) – format of data type (e.g. “msgpack”, “json”, “csv”, “tsv”)
file_like (str or file-like) – the name of a file, or a file-like object, containing the data
**kwargs – extra arguments.
There is more documentation on fmt, file_like and **kwargs at file import parameters.
In particular, for “csv” and “tsv” data, you can change how data columns are parsed using the
dtypesandconvertersarguments.dtypesis a dictionary used to specify a datatype for individual columns, for instance{"col1": "int"}. The available datatypes are"bool","float","int","str"and"guess". If a column is also mentioned inconverters, then the function will be used, NOT the datatype.convertersis a dictionary used to specify a function that will be used to parse individual columns, for instance{"col1", int}.
The default behaviour is
"guess", which makes a best-effort to decide the column datatype. See file import parameters for more details.
- upload_part(part_name: str, bytes_or_stream: bytes | bytearray | IO[bytes], size: int) None[source]
Upload a part to bulk import session
- Parameters:
part_name (str) – name of a part of the bulk import session
bytes_or_stream (file-like) – a file-like object contains the part
size (int) – the size of the part
- STATUS_COMMITTED = 'committed'
- STATUS_COMMITTING = 'committing'
- STATUS_PERFORMING = 'performing'
- STATUS_READY = 'ready'
- STATUS_UPLOADING = 'uploading'
- property database: str | None
A database name in a string which the bulk import session is working on
- property error_parts: int | None
The number of error parts.
- property error_records: int | None
The number of error records.
- property job_id: str | None
Job ID
- property name: str
A name of the bulk import session
- property status: str | None
The status of the bulk import session in a string
- property table: str | None
A table name in a string which the bulk import session is working on
- property upload_frozen: bool | None
The number of upload frozen.
- property valid_parts: int | None
The number of valid parts.
- property valid_records: int | None
The number of valid records.
tdclient.database_model
- class tdclient.database_model.Database(client: Client, db_name: str, **kwargs: Any)[source]
Bases:
ModelDatabase on Treasure Data Service
- create_log_table(name: str) bool[source]
- Parameters:
name (str) – name of new log table
- Returns:
tdclient.model.Table
- query(q: str, **kwargs: Any) Job[source]
Run a query on the database
- Parameters:
q (str) – a query string
- Returns:
tdclient.model.Job
- table(table_name: str) Table[source]
- Parameters:
table_name (str) – name of a table
- Returns:
tdclient.model.Table
- PERMISSIONS = ['administrator', 'full_access', 'import_only', 'query_only']
- PERMISSION_LIST_TABLES = ['administrator', 'full_access']
- property count: int | None
Total record counts in a database.
- Type:
int
- property created_at: datetime | None
datetime.datetime
- property name: str
a name of the database
- Type:
str
- property org_name: str | None
organization name
- Type:
str
- property permission: str | None
permission for the database (e.g. “administrator”, “full_access”, etc.)
- Type:
str
- property updated_at: datetime | None
datetime.datetime
tdclient.job_model
- class tdclient.job_model.Job(client: Client, job_id: str, type: str, query: str | None, **kwargs: Any)[source]
Bases:
ModelJob on Treasure Data Service
- kill() str | None[source]
Kill the job
- Returns:
a string represents the status of killed job (“queued”, “running”)
- result_format(fmt: ResultFormat, store_tmpfile: bool = False, num_threads: int = 4) Iterator[dict[str, Any]][source]
- Parameters:
fmt (str) – output format of result set
store_tmpfile (bool, optional) – store result to a temporary file. Works only when fmt is “msgpack”. Default is False.
num_threads (int, optional) – number of threads to download result. Works only when store_tmpfile is True. Default is 4.
- Yields:
an iterator of rows in result set
- status() str | None[source]
- Returns:
a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)
- Return type:
str
- wait(timeout: float | None = None, wait_interval: int = 5, wait_callback: Callable[[Job], None] | None = None) None[source]
Sleep until the job has been finished
- Parameters:
timeout (int, optional) – Timeout in seconds. No timeout by default.
wait_interval (int, optional) – wait interval in second. Default 5 seconds.
wait_callback (callable, optional) – A callable to be called on every tick of wait interval.
- FINISHED_STATUS = ['success', 'error', 'killed']
- JOB_PRIORITY = {-2: 'VERY LOW', -1: 'LOW', 0: 'NORMAL', 1: 'HIGH', 2: 'VERY HIGH'}
- STATUS_BOOTING = 'booting'
- STATUS_ERROR = 'error'
- STATUS_KILLED = 'killed'
- STATUS_QUEUED = 'queued'
- STATUS_RUNNING = 'running'
- STATUS_SUCCESS = 'success'
- property database: str | None
a string represents the name of a database that job is running on
- property debug: dict[str, Any] | None
a
dictof debug output (e.g. “cmdout”, “stderr”)
- property id: str
a string represents the identifier of the job
- property job_id: str
a string represents the identifier of the job
- property linked_result_export_job_id: str | None
Linked result export job ID from query job
- property num_records: int | None
the number of records of job result
- property org_name: str | None
organization name
- property priority: str
a string represents the priority of the job (e.g. “NORMAL”, “HIGH”, etc.)
- property query: str | None
a string represents the query string of the job
- property result_export_target_job_id: str | None
Associated query job ID from result export job ID
- property result_schema: list[list[str]] | None
an array of array represents the type of result columns (Hive specific) (e.g. [[“_c1”, “string”], [“_c2”, “bigint”]])
- property result_size: int | None
the length of job result
- property result_url: str | None
a string of URL of the result on Treasure Data Service
- property retry_limit: int | None
a number for automatic retry count
- property type: str
a string represents the engine type of the job (e.g. “hive”, “presto”, etc.)
- property url: str | None
a string of URL of the job on Treasure Data Service
- property user_name: str | None
executing user name
tdclient.result_model
- class tdclient.result_model.Result(client: Client, name: str, url: str, org_name: str | None)[source]
Bases:
ModelResult on Treasure Data Service
- property name: str
a name for a authentication
- Type:
str
- property org_name: str | None
organization name
- Type:
str
- property url: str
a result output URL
- Type:
str
tdclient.schedule_model
- class tdclient.schedule_model.Schedule(client: Client, *args: Any, **kwargs: Any)[source]
Bases:
ModelSchedule on Treasure Data Service
- run(time: int, num: int | None = None) list[ScheduledJob][source]
Run a scheduled job
- Parameters:
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int) – Indicates how many times the query will be executed. Value should be 9 or less.
- Returns:
- property created_at: datetime | None
Create date
- Type:
datetime.datetime
- property cron: str | None
The configured schedule of a scheduled job.
Returns a string represents the schedule in cron form, or None if the job is not scheduled to run (saved query)
- property database: str | None
The target database of a scheduled job
- property delay: int | None
A delay ensures all buffered events are imported before running the query.
- property name: str | None
The name of a scheduled job
- property next_time: datetime | None
Schedule for next run
- Type:
datetime.datetime
- property org_name: str | None
add docstring
- Type:
TODO
- property priority: str
The priority of a scheduled job
- property query: str | None
The query string of a scheduled job
- property result_url: str | None
The result output configuration in URL form of a scheduled job
- property retry_limit: int | None
Automatic retry count.
- property timezone: str | None
The time zone of a scheduled job
- property type: str | None
Query type. {“presto”, “hive”}.
- property user_name: str | None
User name of a scheduled job
- class tdclient.schedule_model.ScheduledJob(client: Client, scheduled_at: datetime, job_id: str, type: str, query: str | None, **kwargs: Any)[source]
Bases:
JobScheduled job on Treasure Data Service
- property scheduled_at: datetime
a
datetime.datetimerepresents the schedule of next invocation of the job
tdclient.table_model
- class tdclient.table_model.Table(*args: Any, **kwargs: Any)[source]
Bases:
ModelDatabase table on Treasure Data Service
- export_data(storage_type: str, **kwargs: Any) Job[source]
Export data from Treasure Data Service
- Parameters:
storage_type (str) – type of the storage
**kwargs (dict) –
optional parameters. Assuming the following keys:
- access_key_id (str):
ID to access the information to be exported.
- secret_access_key (str):
Password for the access_key_id.
- file_prefix (str, optional):
Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
From Time of the data to be exported in Unix epoch format.
- to (int, optional):
End Time of the data to be exported in Unix epoch format.
- assume_role (str, optional):
Assume role.
- bucket (str):
Name of bucket to be used.
- domain_key (str, optional):
Job domain key.
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- Returns:
- import_data(format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], bytes_or_stream: bytes | bytearray | IO[bytes], size: int, unique_id: str | None = None) float[source]
Import data into Treasure Data Service
- Parameters:
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data
- Returns:
second in float represents elapsed time to import data
- import_file(format: Literal['msgpack', 'msgpack.gz', 'json', 'json.gz', 'csv', 'csv.gz', 'tsv', 'tsv.gz'], file: str | bytes | IO[bytes], unique_id: str | None = None) float[source]
Import data into Treasure Data Service, from an existing file on filesystem.
This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).
- Parameters:
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data
- Returns:
float represents the elapsed time to import data
- tail(count: int, to: int | None = None, _from: int | None = None) list[dict[str, Any]][source]
- Parameters:
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
- Returns:
the contents of the table in reverse order based on the registered time (last data first).
- property count: int | None
total number of the table
- Type:
int
- property created_at: datetime | None
Created datetime
- Type:
datetime.datetime
- property database_name: str
a string represents the name of the database
- property db_name: str
a string represents the name of the database
- property estimated_storage_size: int | None
estimated storage size
- property estimated_storage_size_string: str
a string represents estimated size of the table in human-readable format
- property expire_days: int | None
an int represents the days until expiration
- property identifier: str
a string identifier of the table
- property last_import: datetime | None
datetime.datetime
- property last_log_timestamp: datetime | None
datetime.datetime
- property name: str
a string represents the name of the table
- property permission: str | None
permission for the database (e.g. “administrator”, “full_access”, etc.)
- Type:
str
- property primary_key: str | None
add docstring
- Type:
TODO
- property primary_key_type: str | None
add docstring
- Type:
TODO
- property schema: list[tuple[str, str, str]] | None
str, alias:str]]: The
listof a schema- Type:
[[column_name
- Type:
str, column_type
- property table_name: str
a string represents the name of the table
- property type: str | None
a string represents the type of the table
- property updated_at: datetime | None
Updated datetime
- Type:
datetime.datetime