Treasure Data API library for Python¶
Treasure Data API library for Python
Install¶
You can install the releases from PyPI.
$ pip install td-client
It’d be better to install certifi to enable SSL certificate verification.
$ pip install certifi
Examples¶
Please see also the examples at Treasure Data Documentation.
If you want to find API reference, see also API document.
Listing jobs¶
Treasure Data API key will be read from environment variable TD_API_KEY
, if none is given via apikey=
argument passed to tdclient.Client
.
Treasure Data API endpoint https://api.treasuredata.com
is used by default. You can override this with environment variable TD_API_SERVER
, which in turn can be overridden via endpoint=
argument passed to tdclient.Client
. List of available Treasure Data sites and corresponding API endpoints can be found here.
import tdclient
with tdclient.Client() as td:
for job in td.jobs():
print(job.job_id)
Running jobs¶
Running jobs on Treasure Data.
import tdclient
with tdclient.Client() as td:
job = td.query("sample_datasets", "SELECT COUNT(1) FROM www_access", type="hive")
job.wait()
for row in job.result():
print(repr(row))
Running jobs via DBAPI2¶
td-client-python implements PEP 0249 Python Database API v2.0. You can use td-client-python with external libraries which supports Database API such like pandas.
import pandas
import tdclient
def on_waiting(cursor):
print(cursor.job_status())
with tdclient.connect(db="sample_datasets", type="presto", wait_callback=on_waiting) as td:
data = pandas.read_sql("SELECT symbol, COUNT(1) AS c FROM nasdaq GROUP BY symbol", td)
print(repr(data))
We offer another package for pandas named pytd with some advanced features. You may prefer it if you need to do complicated things, such like exporting result data to Treasure Data, printing job’s progress during long execution, etc.
Importing data¶
Importing data into Treasure Data in streaming manner, as similar as fluentd is doing.
import sys
import tdclient
with tdclient.Client() as td:
for file_name in sys.argv[:1]:
td.import_file("mydb", "mytbl", "csv", file_name)
Warning
Importing data in streaming manner requires certain amount of time to be ready to query since schema update will be executed with delay.
Bulk import¶
Importing data into Treasure Data in batch manner.
import sys
import tdclient
import uuid
import warnings
if len(sys.argv) <= 1:
sys.exit(0)
with tdclient.Client() as td:
session_name = "session-{}".format(uuid.uuid1())
bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
try:
for file_name in sys.argv[1:]:
part_name = "part-{}".format{file_name}
bulk_import.upload_file(part_name, "json", file_name)
bulk_import.freeze()
except:
bulk_import.delete()
raise
bulk_import.perform(wait=True)
if 0 < bulk_import.error_records:
warnings.warn("detected {} error records.".format(bulk_import.error_records))
if 0 < bulk_import.valid_records:
print("imported {} records.".format(bulk_import.valid_records))
else:
raise(RuntimeError("no records have been imported: {}".format(bulk_import.name)))
bulk_import.commit(wait=True)
bulk_import.delete()
If you want to import data as msgpack format, you can write as follows:
import io
import time
import uuid
import warnings
import tdclient
t1 = int(time.time())
l1 = [{"a": 1, "b": 2, "time": t1}, {"a": 3, "b": 9, "time": t1}]
with tdclient.Client() as td:
session_name = "session-{}".format(uuid.uuid1())
bulk_import = td.create_bulk_import(session_name, "mydb", "mytbl")
try:
_bytes = tdclient.util.create_msgpack(l1)
bulk_import.upload_file("part", "msgpack", io.BytesIO(_bytes))
bulk_import.freeze()
except:
bulk_import.delete()
raise
bulk_import.perform(wait=True)
# same as the above example
Development¶
Running tests (tox)¶
You can run tests against all supported Python versions. I’d recommend you to install pyenv to manage Pythons.
$ pyenv shell system
$ for version in $(cat .python-version); do [ -d "$(pyenv root)/versions/${version}" ] || pyenv install "${version}"; done
$ pyenv shell --unset
Install tox.
$ pip install tox
Then, run tox
.
$ tox
Release¶
Release to PyPI. Ensure you installed twine.
$ python setup.py bdist_wheel sdist
$ twine upload dist/*
License¶
Apache Software License, Version 2.0
API Reference¶
Client¶
tdclient.client.Client
class is a public interface for tdclient.
It provides methods for executions for REST API.
tdclient.client¶
-
class
tdclient.client.
Client
(*args, **kwargs)[source]¶ Bases:
object
API Client for Treasure Data Service
-
add_user
(name, org, email, password)[source]¶ Add a new user
- Parameters
name (str) – name of the user
org (str) – organization
email – (str): e-mail address
password (str) – password
- Returns
True if success
-
bulk_import
(name)[source]¶ Get a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
-
bulk_import_delete_part
(name, part_name)[source]¶ Delete a part from a bulk import session
- Parameters
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
- Returns
True if success
-
bulk_import_error_records
(name)[source]¶ - Parameters
name (str) – name of a bulk import session
- Returns
an iterator of error records
-
bulk_import_upload_file
(name, part_name, format, file)[source]¶ Upload a part to Bulk Import session, from an existing file on filesystem.
- Parameters
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
format (str) – format of data type (e.g. “msgpack”, “json”)
file (str or file-like) – a name of a file, or a file-like object contains the data
-
bulk_import_upload_part
(name, part_name, bytes_or_stream, size)[source]¶ Upload a part to a bulk import session
- Parameters
name (str) – name of a bulk import session
part_name (str) – name of a part of the bulk import session
bytes_or_stream (file-like) – a file-like object contains the part
size (int) – the size of the part
-
bulk_imports
()[source]¶ List bulk import sessions
- Returns
a list of
tdclient.models.BulkImport
-
change_database
(db_name, table_name, new_db_name)[source]¶ Move a target table from it’s original database to new destination database.
- Parameters
db_name (str) – Target database name.
table_name (str) – Target table name.
new_db_name (str) – Destination database name to be moved.
- Returns
True if succeeded.
- Return type
bool
-
commit_bulk_import
(name)[source]¶ Commit a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
True if success
-
create_bulk_import
(name, database, table, params=None)[source]¶ Create new bulk import session
- Parameters
name (str) – name of new bulk import session
database (str) – name of a database
table (str) – name of a table
- Returns
-
create_database
(db_name, **kwargs)[source]¶ - Parameters
db_name (str) – name of a database to create
- Returns
True if success
-
create_log_table
(db_name, table_name)[source]¶ - Parameters
db_name (str) – name of a database
table_name (str) – name of a table to create
- Returns
True if success
-
create_result
(name, url, params=None)[source]¶ Create a new authentication with the specified name.
- Parameters
name (str) – Authentication name.
url (str) – Url of the authentication to be created. e.g. “ftp://test.com/”
params (dict, optional) – Extra parameters.
- Returns
True if succeeded.
- Return type
bool
-
create_schedule
(name, params=None)[source]¶ Create a new scheduled query with the specified name.
- Parameters
name (str) – Scheduled query name.
params (dict, optional) –
Extra parameters.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily"
,"@hourly"
,"10 * * * *"
(custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
- Returns
Start date time.
- Return type
datetime.datetime
-
databases
()[source]¶ - Returns
a list of
tdclient.models.Database
-
delete_bulk_import
(name)[source]¶ Delete a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
True if success
-
delete_database
(db_name)[source]¶ - Parameters
db_name (str) – name of database to delete
- Returns
True if success
-
delete_result
(name)[source]¶ Delete the authentication having the specified name.
- Parameters
name (str) – Authentication name.
- Returns
True if succeeded.
- Return type
bool
-
delete_schedule
(name)[source]¶ Delete the scheduled query with the specified name.
- Parameters
name (str) – Target scheduled query name.
- Returns
Tuple of cron and query.
- Return type
(str, str)
-
delete_table
(db_name, table_name)[source]¶ Delete a table
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
- Returns
a string represents the type of deleted table
-
export_data
(db_name, table_name, storage_type, params=None)[source]¶ Export data from Treasure Data Service
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
storage_type (str) – type of the storage
params (dict) –
optional parameters. Assuming the following keys:
- access_key_id (str):
ID to access the information to be exported.
- secret_access_key (str):
Password for the access_key_id.
- file_prefix (str, optional):
Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
From Time of the data to be exported in Unix epoch format.
- to (int, optional):
End Time of the data to be exported in Unix epoch format.
assume_role (str, optional): Assume role.
- bucket (str):
Name of bucket to be used.
- domain_key (str, optional):
Job domain key.
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- Returns
-
freeze_bulk_import
(name)[source]¶ Freeze a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
True if success
-
history
(name, _from=None, to=None)[source]¶ Get the history details of the saved query for the past 90days.
- Parameters
name (str) – Target name of the scheduled query.
_from (int, optional) – Indicates from which nth record in the run history would be fetched. Default: 0. Note: Count starts from zero. This means that the first record in the list has a count of zero.
to (int, optional) – Indicates up to which nth record in the run history would be fetched. Default: 20
- Returns
-
import_data
(db_name, table_name, format, bytes_or_stream, size, unique_id=None)[source]¶ Import data into Treasure Data Service
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data
- Returns
second in float represents elapsed time to import data
-
import_file
(db_name, table_name, format, file, unique_id=None)[source]¶ Import data into Treasure Data Service, from an existing file on filesystem.
This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
format (str) – format of data type (e.g. “msgpack”, “json”)
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data
- Returns
float represents the elapsed time to import data
-
job_result
(job_id)[source]¶ - Parameters
job_id (str) – job id
- Returns
a list of each rows in result set
-
job_result_format
(job_id, format)[source]¶ - Parameters
job_id (str) – job id
format (str) – output format of result set
- Returns
a list of each rows in result set
-
job_result_format_each
(job_id, format)[source]¶ - Parameters
job_id (str) – job id
format (str) – output format of result set
- Returns
an iterator of rows in result set
-
job_status
(job_id)[source]¶ - Parameters
job_id (str) – job id
- Returns
a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)
-
jobs
(_from=None, to=None, status=None, conditions=None)[source]¶ List jobs
- Parameters
_from (int, optional) – Gets the Job from the nth index in the list. Default: 0.
to (int, optional) – Gets the Job up to the nth index in the list. By default, the first 20 jobs in the list are displayed
status (str, optional) – Filter by given status. {“queued”, “running”, “success”, “error”}
conditions (str, optional) – Condition for
TIMESTAMPDIFF()
to search for slow queries. Avoid using this parameter as it can be dangerous.
- Returns
a list of
tdclient.models.Job
-
kill
(job_id)[source]¶ - Parameters
job_id (str) – job id
- Returns
a string represents the status of killed job (“queued”, “running”)
-
list_apikeys
(name)[source]¶ - Parameters
name (str) – name of the user
- Returns
a list of string of API key
-
list_bulk_import_parts
(name)[source]¶ List parts of a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
a list of string represents the name of parts
-
partial_delete
(db_name, table_name, to, _from, params=None)[source]¶ Create a job to partially delete the contents of the table with the given time range.
- Parameters
db_name (str) – Target database name.
table_name (str) – Target table name.
to (int) – Time in Unix Epoch format indicating the End date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
_from (int) – Time in Unix Epoch format indicating the Start date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
params (dict, optional) –
Extra parameters.
- pool_name (str, optional):
Indicates the resource pool to execute this job. If not provided, the account’s default resource pool would be used.
- domain_key (str, optional):
Domain key that will be assigned to the partial delete job to be created
- Returns
-
perform_bulk_import
(name)[source]¶ Perform a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
-
query
(db_name, q, result_url=None, priority=None, retry_limit=None, type='hive', **kwargs)[source]¶ Run a query on specified database table.
- Parameters
db_name (str) – name of a database
q (str) – a query string
result_url (str) – result output URL. e.g.,
postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>
priority (int or str) – priority (e.g. “NORMAL”, “HIGH”, etc.)
retry_limit (int) – retry limit
type (str) – name of a query engine
- Returns
- Raises
ValueError – if unknown query type has been specified
-
remove_apikey
(name, apikey)[source]¶ - Parameters
name (str) – name of the user
apikey (str) – an API key to remove
- Returns
True if success
-
remove_user
(name)[source]¶ Remove a user
- Parameters
name (str) – name of the user
- Returns
True if success
-
results
()[source]¶ Get the list of all the available authentications.
- Returns
a list of
tdclient.models.Result
-
run_schedule
(name, time, num)[source]¶ Execute the specified query.
- Parameters
name (str) – Target scheduled query name.
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int) – Indicates how many times the query will be executed. Value should be 9 or less.
- Returns
-
swap_table
(db_name, table_name1, table_name2)[source]¶ - Parameters
db_name (str) – name of a database
table_name1 (str) – original table name
table_name2 (str) – table name you want to rename to
- Returns
True if success
-
table
(db_name, table_name)[source]¶ - Parameters
db_name (str) – name of a database
table_name (str) – name of a table
- Returns
- Raises
tdclient.api.NotFoundError – if the table doesn’t exist
-
tables
(db_name)[source]¶ List existing tables
- Parameters
db_name (str) – name of a database
- Returns
a list of
tdclient.models.Table
-
tail
(db_name, table_name, count, to=None, _from=None, block=None)[source]¶ Get the contents of the table in reverse order based on the registered time (last data first).
- Parameters
db_name (str) – Target database name.
table_name (str) – Target table name.
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
block – Deprecated parameter.
- Returns
Contents of the table.
- Return type
[dict]
-
unfreeze_bulk_import
(name)[source]¶ Unfreeze a bulk import session
- Parameters
name (str) – name of a bulk import session
- Returns
True if success
-
update_expire
(db_name, table_name, expire_days)[source]¶ Set expiration date to a table
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
epire_days (int) – expiration date in days from today
- Returns
True if success
-
update_schedule
(name, params=None)[source]¶ Update the scheduled query.
- Parameters
name (str) – Target scheduled query name.
params (dict) –
Extra parameteres.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily"
,"@hourly"
,"10 * * * *"
(custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
-
update_schema
(db_name, table_name, schema)[source]¶ Updates the schema of a table
- Parameters
db_name (str) – name of a database
table_name (str) – name of a table
schema (list) –
a dictionary object represents the schema definition (will be converted to JSON) e.g.
[ ["member_id", # column name "string", # data type "mem_id", # alias of the column name ], ["row_index", "long", "row_ind"], ... ]
- Returns
True if success
-
users
()[source]¶ List users
- Returns
a list of
tdclient.models.User
-
property
api
¶ an instance of
tdclient.api.API
-
property
apikey
¶ API key string.
-
DB API¶
tdclient¶
-
tdclient.
connect
(*args, **kwargs)[source]¶ Returns a DBAPI compatible connection object
- Parameters
type (str) – query engine type. “hive” by default.
db (str) – the name of database on Treasure Data
result_url (str) – result output URL
priority (str) – job priority
retry_limit (int) – job retry limit
wait_interval (int) – job wait interval to check status
wait_callback (callable) – a callback to be called on every ticks of job wait
- Returns
tdclient.connection¶
tdclient.cursor¶
-
class
tdclient.cursor.
Cursor
(api, wait_interval=5, wait_callback=None, **kwargs)[source]¶ Bases:
object
-
fetchall
()[source]¶ Fetch all (remaining) rows of a query result, returning them as a sequence of sequences (e.g. a list of tuples). Note that the cursor’s arraysize attribute can affect the performance of this operation.
-
fetchmany
(size=None)[source]¶ Fetch the next set of rows of a query result, returning a sequence of sequences (e.g. a list of tuples). An empty sequence is returned when no more rows are available.
-
fetchone
()[source]¶ Fetch the next row of a query result set, returning a single sequence, or None when no more data is available.
-
job_status
()[source]¶ Show job status
- Returns
The status information of the given job id at last execution.
-
show_job
()[source]¶ Returns detailed information of a Job
- Returns
Detailed information of a job
- Return type
dict
-
property
api
¶
-
property
description
¶
-
property
rowcount
¶
-
Model¶
Some methods of tdclient.client.Client
returns model object which represents results from REST API.
tdclient.model¶
-
class
tdclient.model.
Model
(client)[source]¶ Bases:
object
-
property
client
¶ a
tdclient.client.Client
instance- Type
Returns
-
property
tdclient.models¶
-
tdclient.models.
BulkImport
= <class 'tdclient.bulk_import_model.BulkImport'>[source]¶ Bulk-import session on Treasure Data Service
-
tdclient.models.
Database
= <class 'tdclient.database_model.Database'>[source]¶ Database on Treasure Data Service
-
tdclient.models.
Schema
= <class 'tdclient.job_model.Schema'>[source]¶ Schema of a database table on Treasure Data Service
-
tdclient.models.
Result
= <class 'tdclient.result_model.Result'>[source]¶ Result on Treasure Data Service
-
tdclient.models.
ScheduledJob
= <class 'tdclient.schedule_model.ScheduledJob'>[source]¶ Scheduled job on Treasure Data Service
-
tdclient.models.
Schedule
= <class 'tdclient.schedule_model.Schedule'>[source]¶ Schedule on Treasure Data Service
tdclient.bulk_import_model¶
-
class
tdclient.bulk_import_model.
BulkImport
(client, **kwargs)[source]¶ Bases:
tdclient.model.Model
Bulk-import session on Treasure Data Service
-
delete_part
(part_name)[source]¶ Delete a part of a Bulk Import session
- Parameters
part_name (str) – name of a part of the bulk import session
- Returns
True if succeeded.
-
list_parts
()[source]¶ Return the list of available parts uploaded through
bulk_import_upload_part()
.- Returns
The list of bulk import part name.
- Return type
[str]
-
perform
(wait=False, wait_interval=5, wait_callback=None)[source]¶ Perform bulk import
- Parameters
wait (bool, optional) – Flag for wait bulk import job. Default False
wait_interval (int, optional) – wait interval in second. Default 5.
wait_callback (callable, optional) – A callable to be called on every tick of wait interval.
-
upload_file
(part_name, fmt, file_like)[source]¶ Upload a part to Bulk Import session, from an existing file on filesystem.
- Parameters
part_name (str) – name of a part of the bulk import session
fmt (str) – format of data type (e.g. “msgpack”, “json”)
file_like (str or file-like) – a name of a file, or a file-like object contains the data
-
upload_part
(part_name, bytes_or_stream, size)[source]¶ Upload a part to bulk import session
- Parameters
part_name (str) – name of a part of the bulk import session
bytes_or_stream (file-like) – a file-like object contains the part
size (int) – the size of the part
-
STATUS_COMMITTED
= 'committed'¶
-
STATUS_COMMITTING
= 'committing'¶
-
STATUS_PERFORMING
= 'performing'¶
-
STATUS_READY
= 'ready'¶
-
STATUS_UPLOADING
= 'uploading'¶
-
property
database
¶ A database name in a string which the bulk import session is working on
-
property
error_parts
¶ The number of error parts.
-
property
error_records
¶ The number of error records.
-
property
job_id
¶ Job ID
-
property
name
¶ A name of the bulk import session
-
property
status
¶ The status of the bulk import session in a string
-
property
table
¶ A table name in a string which the bulk import session is working on
-
property
upload_frozen
¶ The number of upload frozen.
-
property
valid_parts
¶ The number of valid parts.
-
property
valid_records
¶ The number of valid records.
-
tdclient.database_model¶
-
class
tdclient.database_model.
Database
(client, db_name, **kwargs)[source]¶ Bases:
tdclient.model.Model
Database on Treasure Data Service
-
create_log_table
(name)[source]¶ - Parameters
name (str) – name of new log table
- Returns
tdclient.model.Table
-
query
(q, **kwargs)[source]¶ Run a query on the database
- Parameters
q (str) – a query string
- Returns
tdclient.model.Job
-
table
(table_name)[source]¶ - Parameters
table_name (str) – name of a table
- Returns
tdclient.model.Table
-
PERMISSIONS
= ['administrator', 'full_access', 'import_only', 'query_only']¶
-
PERMISSION_LIST_TABLES
= ['administrator', 'full_access']¶
-
property
count
¶ Total record counts in a database.
- Type
int
-
property
created_at
¶ datetime.datetime
-
property
name
¶ a name of the database
- Type
str
-
property
org_name
¶ organization name
- Type
str
-
property
permission
¶ permission for the database (e.g. “administrator”, “full_access”, etc.)
- Type
str
-
property
updated_at
¶ datetime.datetime
-
tdclient.job_model¶
-
class
tdclient.job_model.
Job
(client, job_id, type, query, **kwargs)[source]¶ Bases:
tdclient.model.Model
Job on Treasure Data Service
-
kill
()[source]¶ Kill the job
- Returns
a string represents the status of killed job (“queued”, “running”)
-
result_format
(fmt)[source]¶ - Parameters
fmt (str) – output format of result set
- Yields
an iterator of rows in result set
-
status
()[source]¶ - Returns
a string represents the status of the job (“success”, “error”, “killed”, “queued”, “running”)
- Return type
str
-
wait
(timeout=None, wait_interval=5, wait_callback=None)[source]¶ Sleep until the job has been finished
- Parameters
timeout (int, optional) – Timeout in seconds. No timeout by default.
wait_interval (int, optional) – wait interval in second. Default 5 seconds.
wait_callback (callable, optional) – A callable to be called on every tick of wait interval.
-
FINISHED_STATUS
= ['success', 'error', 'killed']¶
-
JOB_PRIORITY
= {-2: 'VERY LOW', -1: 'LOW', 0: 'NORMAL', 1: 'HIGH', 2: 'VERY HIGH'}¶
-
STATUS_BOOTING
= 'booting'¶
-
STATUS_ERROR
= 'error'¶
-
STATUS_KILLED
= 'killed'¶
-
STATUS_QUEUED
= 'queued'¶
-
STATUS_RUNNING
= 'running'¶
-
STATUS_SUCCESS
= 'success'¶
-
property
database
¶ a string represents the name of a database that job is running on
-
property
debug
¶ a
dict
of debug output (e.g. “cmdout”, “stderr”)
-
property
id
¶ a string represents the identifier of the job
-
property
job_id
¶ a string represents the identifier of the job
-
property
linked_result_export_job_id
¶ Linked result export job ID from query job
-
property
num_records
¶ the number of records of job result
-
property
org_name
¶ organization name
-
property
priority
¶ a string represents the priority of the job (e.g. “NORMAL”, “HIGH”, etc.)
-
property
query
¶ a string represents the query string of the job
-
property
result_export_target_job_id
¶ Associated query job ID from result export job ID
-
property
result_schema
¶ an array of array represents the type of result columns (Hive specific) (e.g. [[“_c1”, “string”], [“_c2”, “bigint”]])
-
property
result_size
¶ the length of job result
-
property
result_url
¶ a string of URL of the result on Treasure Data Service
-
property
retry_limit
¶ a number for automatic retry count
-
property
type
¶ a string represents the engine type of the job (e.g. “hive”, “presto”, etc.)
-
property
url
¶ a string of URL of the job on Treasure Data Service
-
property
user_name
¶ executing user name
-
tdclient.result_model¶
tdclient.schedule_model¶
-
class
tdclient.schedule_model.
Schedule
(client, *args, **kwargs)[source]¶ Bases:
tdclient.model.Model
Schedule on Treasure Data Service
-
run
(time, num=None)[source]¶ Run a scheduled job
- Parameters
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int) – Indicates how many times the query will be executed. Value should be 9 or less.
- Returns
-
property
created_at
¶ Create date
- Type
datetime.datetime
-
property
cron
¶ The configured schedule of a scheduled job.
Returns a string represents the schedule in cron form, or None if the job is not scheduled to run (saved query)
-
property
database
¶ The target database of a scheduled job
-
property
delay
¶ A delay ensures all buffered events are imported before running the query.
-
property
name
¶ The name of a scheduled job
-
property
next_time
¶ Schedule for next run
- Type
datetime.datetime
-
property
org_name
¶ add docstring
- Type
TODO
-
property
priority
¶ The priority of a scheduled job
-
property
query
¶ The query string of a scheduled job
-
property
result_url
¶ The result output configuration in URL form of a scheduled job
-
property
retry_limit
¶ Automatic retry count.
-
property
timezone
¶ The time zone of a scheduled job
-
property
type
¶ Query type. {“presto”, “hive”}.
-
property
user_name
¶ User name of a scheduled job
-
-
class
tdclient.schedule_model.
ScheduledJob
(client, scheduled_at, job_id, type, query, **kwargs)[source]¶ Bases:
tdclient.job_model.Job
Scheduled job on Treasure Data Service
-
property
scheduled_at
¶ a
datetime.datetime
represents the schedule of next invocation of the job
-
property
tdclient.table_model¶
-
class
tdclient.table_model.
Table
(*args, **kwargs)[source]¶ Bases:
tdclient.model.Model
Database table on Treasure Data Service
-
export_data
(storage_type, **kwargs)[source]¶ Export data from Treasure Data Service
- Parameters
storage_type (str) – type of the storage
**kwargs (dict) –
optional parameters. Assuming the following keys:
- access_key_id (str):
ID to access the information to be exported.
- secret_access_key (str):
Password for the access_key_id.
- file_prefix (str, optional):
Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
From Time of the data to be exported in Unix epoch format.
- to (int, optional):
End Time of the data to be exported in Unix epoch format.
- assume_role (str, optional):
Assume role.
- bucket (str):
Name of bucket to be used.
- domain_key (str, optional):
Job domain key.
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- Returns
-
import_data
(format, bytes_or_stream, size, unique_id=None)[source]¶ Import data into Treasure Data Service
- Parameters
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data
- Returns
second in float represents elapsed time to import data
-
import_file
(format, file, unique_id=None)[source]¶ Import data into Treasure Data Service, from an existing file on filesystem.
This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”).
- Parameters
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data
- Returns
float represents the elapsed time to import data
-
tail
(count, to=None, _from=None)[source]¶ - Parameters
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
- Returns
the contents of the table in reverse order based on the registered time (last data first).
-
property
count
¶ total number of the table
- Type
int
-
property
created_at
¶ Created datetime
- Type
datetime.datetime
-
property
database_name
¶ a string represents the name of the database
-
property
db_name
¶ a string represents the name of the database
-
property
estimated_storage_size
¶ estimated storage size
-
property
estimated_storage_size_string
¶ a string represents estimated size of the table in human-readable format
-
property
expire_days
¶ an int represents the days until expiration
-
property
identifier
¶ a string identifier of the table
-
property
last_import
¶ datetime.datetime
-
property
last_log_timestamp
¶ datetime.datetime
-
property
name
¶ a string represents the name of the table
-
property
permission
¶ permission for the database (e.g. “administrator”, “full_access”, etc.)
- Type
str
-
property
primary_key
¶ add docstring
- Type
TODO
-
property
primary_key_type
¶ add docstring
- Type
TODO
-
property
schema
¶ str, alias:str]]: The
list
of a schema- Type
[[column_name
- Type
str, column_type
-
property
table_name
¶ a string represents the name of the table
-
property
type
¶ a string represents the type of the table
-
property
updated_at
¶ Updated datetime
- Type
datetime.datetime
-
tdclient.user_model¶
-
class
tdclient.user_model.
User
(client, name, org_name, role_names, email, **kwargs)[source]¶ Bases:
tdclient.model.Model
User on Treasure Data Service
-
property
email
¶ e-mail address
- Type
Returns
-
property
name
¶ name of the user
- Type
Returns
-
property
org_name
¶ organization name
- Type
Returns
-
property
role_names
¶ add docstring
- Type
TODO
-
property
API¶
tdclient.api.API
class is an internal class represents API.
tdclient.api¶
-
class
tdclient.api.
API
(apikey=None, user_agent=None, endpoint=None, headers=None, retry_post_requests=False, max_cumul_retry_delay=600, http_proxy=None, **kwargs)[source]¶ Bases:
tdclient.bulk_import_api.BulkImportAPI
,tdclient.connector_api.ConnectorAPI
,tdclient.database_api.DatabaseAPI
,tdclient.export_api.ExportAPI
,tdclient.import_api.ImportAPI
,tdclient.job_api.JobAPI
,tdclient.partial_delete_api.PartialDeleteAPI
,tdclient.result_api.ResultAPI
,tdclient.schedule_api.ScheduleAPI
,tdclient.server_status_api.ServerStatusAPI
,tdclient.table_api.TableAPI
,tdclient.user_api.UserAPI
Internal API class
- Parameters
apikey (str) – the API key of Treasure Data Service. If None is given, TD_API_KEY will be used if available.
user_agent (str) – custom User-Agent.
endpoint (str) – custom endpoint URL. If None is given, TD_API_SERVER will be used if available.
headers (dict) – custom HTTP headers.
retry_post_requests (bool) – Specify whether allowing API client to retry POST requests. False by default.
max_cumul_retry_delay (int) – maximum retry limit in seconds. 600 seconds by default.
http_proxy (str) – HTTP proxy setting. if None is given, HTTP_PROXY will be used if available.
-
DEFAULT_ENDPOINT
= 'https://api.treasuredata.com/'¶
-
DEFAULT_IMPORT_ENDPOINT
= 'https://api-import.treasuredata.com/'¶
-
property
apikey
¶
-
property
endpoint
¶
tdclient.bulk_import_api¶
-
class
tdclient.bulk_import_api.
BulkImportAPI
[source]¶ Bases:
object
Enable bulk importing of data to the targeted database and table.
This class is inherited by
tdclient.api.API
.-
bulk_import_delete_part
(name, part_name, params=None)[source]¶ Delete the imported information with the specified name.
- Parameters
name (str) – Bulk import name.
part_name (str) – Bulk import part name.
params (dict, optional) – Extra parameters.
- Returns
True if succeeded.
-
bulk_import_error_records
(name, params=None)[source]¶ List the records that have errors under the specified bulk import name.
- Parameters
name (str) – Bulk import name.
params (dict, optional) – Extra parameters.
- Yields
Row of the data
-
bulk_import_upload_file
(name, part_name, format, file, **kwargs)[source]¶ Upload a file with bulk import having the specified name.
- Parameters
name (str) – Bulk import name.
part_name (str) – Bulk import part name.
format (str) – Format name. {msgpack, json, csv, tsv}
file (file-like) – Byte string or file-like object contains the data.
**kwargs – Extra argments.
-
bulk_import_upload_part
(name, part_name, stream, size)[source]¶ Upload bulk import having the specified name and part in the path.
- Parameters
name (str) – Bulk import name.
part_name (str) – Bulk import part name.
stream (str or file-like) – Byte string or file-like object contains the data
size (int) – The length of the data.
-
commit_bulk_import
(name, params=None)[source]¶ Commit the bulk import information having the specified name.
- Parameters
name (str) – Bulk import name.
params (dict, optional) – Extra parameters.
- Returns
True if succeeded.
-
create_bulk_import
(name, db, table, params=None)[source]¶ Enable bulk importing of data to the targeted database and table and stores it in the default resource pool. Default expiration for bulk import is 30days.
- Parameters
name (str) – Name of the bulk import.
db (str) – Name of target database.
table (str) – Name of target table.
params (dict, optional) – Extra parameters.
- Returns
True if succeeded
-
delete_bulk_import
(name, params=None)[source]¶ Delete the imported information with the specified name
- Parameters
name (str) – Name of bulk import.
params (dict, optional) – Extra parameters.
- Returns
True if succeeded
-
freeze_bulk_import
(name, params=None)[source]¶ Freeze the bulk import with the specified name.
- Parameters
name (str) – Bulk import name.
params (dict, optional) – Extra parameters.
- Returns
True if succeeded.
-
list_bulk_import_parts
(name, params=None)[source]¶ Return the list of available parts uploaded through
bulk_import_upload_part()
.- Parameters
name (str) – Name of bulk import.
params (dict, optional) – Extra parameteres.
- Returns
The list of bulk import part name.
- Return type
[str]
-
list_bulk_imports
(params=None)[source]¶ Return the list of available bulk imports :param params: Extra parameters. :type params: dict, optional
- Returns
The list of available bulk import details.
- Return type
[dict]
-
perform_bulk_import
(name, params=None)[source]¶ Execute a job to perform bulk import with the indicated priority using the resource pool if indicated, else it will use the account’s default.
- Parameters
name (str) – Bulk import name.
params (dict, optional) – Extra parameters.
- Returns
Job ID
- Return type
str
-
show_bulk_import
(name)[source]¶ Show the details of the bulk import with the specified name
- Parameters
name (str) – Name of bulk import.
- Returns
Detailed information of the bulk import.
- Return type
dict
-
tdclient.connector_api¶
-
class
tdclient.connector_api.
ConnectorAPI
[source]¶ Bases:
object
Access Data Connector API which handles Data Connector.
This class is inherited by
tdclient.api.API
.-
connector_create
(name, database, table, job, params=None)[source]¶ Create a Data Connector session.
- Parameters
name (str) – name of the connector job
database (str) – name of the database to perform connector job
table (str) – name of the table to perform connector job
job (dict) –
dict
representation of load.ymlparams (dict, optional) –
Extra parameters
- config (str):
Embulk configuration as JSON format. See also https://www.embulk.org/docs/built-in.html#embulk-configuration-file-format
- cron (str, optional):
Schedule of the query. {
"@daily"
,"@hourly"
,"10 * * * *"
(custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- database (str):
Target databse for the Data Connector session
- name (str):
Name of the Data Connector session
- table (str):
Target table for the Data Connector session
- time_column (str, optional):
Column in the table for registering config.out.time
- timezone (str):
Timezone for scheduled Data Connector session. See here for list of supported timezones https://gist.github.com/frsyuki/4533752
- Returns
dict
-
connector_delete
(name)[source]¶ Delete a Data Connector session.
- Parameters
name (str) – name of the connector job
- Returns
dict
-
connector_guess
(job)[source]¶ Guess the Data Connector configuration
- Parameters
job (dict) –
dict
representation of seed.yml- Returns
dict
-
connector_history
(name)[source]¶ Show the list of the executed jobs information for the Data Connector job.
- Parameters
name (str) – name of the connector job
- Returns
list
-
connector_issue
(db, table, job)[source]¶ Create a Data Connector job.
- Parameters
db (str) – name of the database to perform connector job
table (str) – name of the table to perform connector job
job (dict) –
dict
representation of load.yml
- Returns
job Id
- Return type
str
-
connector_preview
(job)[source]¶ Show the preview of the Data Connector job.
- Parameters
job (dict) –
dict
representation of load.yml- Returns
dict
-
connector_run
(name, **kwargs)[source]¶ Create a job to execute Data Connector session.
- Parameters
name (str) – name of the connector job
**kwargs (optional) –
Extra parameters.
- scheduled_time (int):
Time in Unix epoch format that would be set as TD_SCHEDULED_TIME.
- domain_key (str):
Job domain key which is assigned to a single job.
- Returns
dict
-
connector_show
(name)[source]¶ Show a specific Data Connector session information.
- Parameters
name (str) – name of the connector job
- Returns
dict
-
connector_update
(name, job)[source]¶ Update a specific Data Connector session.
- Parameters
name (str) – name of the connector job
job (dict) –
dict
representation of load.yml. For detailed format, see also: https://www.embulk.org/docs/built-in.html#embulk-configuration-file-format
- Returns
dict
-
tdclient.database_api¶
-
class
tdclient.database_api.
DatabaseAPI
[source]¶ Bases:
object
Access to Database of Treasure Data Service.
This class is inherited by
tdclient.api.API
.-
create_database
(db, params=None)[source]¶ Create a new database with the given name.
- Parameters
db (str) – Target database name.
params (dict) – Extra parameters.
- Returns
True if succeeded.
- Return type
bool
-
tdclient.export_api¶
-
class
tdclient.export_api.
ExportAPI
[source]¶ Bases:
object
Access to Export API.
This class is inherited by
tdclient.api.API
.-
export_data
(db, table, storage_type, params=None)[source]¶ Creates a job to export the contents from the specified database and table names.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
storage_type (str) – Name of storage type. e.g. “s3”
params (dict) –
Extra parameters. Assuming the following keys:
- access_key_id (str):
ID to access the information to be exported.
- secret_access_key (str):
Password for the access_key_id.
- file_prefix (str, optional):
Filename of exported file. Default: “<database_name>/<table_name>”
- file_format (str, optional):
File format of the information to be exported. {“jsonl.gz”, “tsv.gz”, “json.gz”}
- from (int, optional):
From Time of the data to be exported in Unix epoch format.
- to (int, optional):
End Time of the data to be exported in Unix epoch format.
- assume_role (str, optional):
Assume role.
- bucket (str):
Name of bucket to be used.
- domain_key (str, optional):
Job domain key.
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- Returns
Job ID.
- Return type
str
-
tdclient.import_api¶
-
class
tdclient.import_api.
ImportAPI
[source]¶ Bases:
object
Import data into Treasure Data Service.
This class is inherited by
tdclient.api.API
.-
import_data
(db, table, format, bytes_or_stream, size, unique_id=None)[source]¶ Import data into Treasure Data Service
This method expects data from a file-like object formatted with “msgpack.gz”.
- Parameters
db (str) – name of a database
table (str) – name of a table
format (str) – format of data type (e.g. “msgpack.gz”)
bytes_or_stream (str or file-like) – a byte string or a file-like object contains the data
size (int) – the length of the data
unique_id (str) – a unique identifier of the data
- Returns
float represents the elapsed time to import data
-
import_file
(db, table, format, file, unique_id=None, **kwargs)[source]¶ Import data into Treasure Data Service, from an existing file on filesystem.
This method will decompress/deserialize records from given file, and then convert it into format acceptable from Treasure Data Service (“msgpack.gz”). This method is a warpper function to import_data.
- Parameters
db (str) – name of a database
table (str) – name of a table
format (str) – format of data type (e.g. “msgpack”, “json”)
file (str or file-like) – a name of a file, or a file-like object contains the data
unique_id (str) – a unique identifier of the data
- Returns
float represents the elapsed time to import data
-
tdclient.job_api¶
-
class
tdclient.job_api.
JobAPI
[source]¶ Bases:
object
Access to Job API
This class is inherited by
tdclient.api.API
.-
job_result
(job_id)[source]¶ Return the job result.
- Parameters
job_id (int) – Job ID
- Returns
Job result in
list
-
job_result_each
(job_id)[source]¶ Yield a row of the job result.
- Parameters
job_id (int) – Job ID
- Yields
Row in a result
-
job_result_format
(job_id, format)[source]¶ Return the job result with specified format.
- Parameters
job_id (int) – Job ID
format (str) – Output format of the job result information. “json” or “msgpack”
- Returns
The query result of the specified job in.
-
job_result_format_each
(job_id, format)[source]¶ Yield a row of the job result with specified format.
- Parameters
job_id (int) – job ID
format (str) – Output format of the job result information. “json” or “msgpack”
- Yields
The query result of the specified job in.
-
job_status
(job_id)[source]¶ “Show job status :param job_id: job ID :type job_id: str
- Returns
The status information of the given job id at last execution.
-
kill
(job_id)[source]¶ Stop the specific job if it is running.
- Parameters
job_id (str) – Job Id to kill
- Returns
Job status before killing
-
list_jobs
(_from=0, to=None, status=None, conditions=None)[source]¶ Show the list of Jobs.
- Parameters
_from (int) – Gets the Job from the nth index in the list. Default: 0
to (int, optional) – Gets the Job up to the nth index in the list. By default, the first 20 jobs in the list are displayed
status (str, optional) – Filter by given status. {“queued”, “running”, “success”, “error”}
conditions (str, optional) – Condition for
TIMESTAMPDIFF()
to search for slow queries. Avoid using this parameter as it can be dangerous.
- Returns
a list of
dict
which represents a job
-
query
(q, type='hive', db=None, result_url=None, priority=None, retry_limit=None, **kwargs)[source]¶ Create a job for given query.
- Parameters
q (str) – Query string.
type (str) – Query type. hive, presto, bulkload. Default: hive
db (str) – Database name.
result_url (str) – Result output URL. e.g.,
postgresql://<username>:<password>@<hostname>:<port>/<database>/<table>
priority (int or str) – Job priority. In str, “Normal”, “Very low”, “Low”, “High”, “Very high”. In int, the number in the range of -2 to 2.
retry_limit (int) – Automatic retry count.
**kwargs – Extra options.
- Returns
Job ID issued for the query
- Return type
str
-
show_job
(job_id)[source]¶ Return detailed information of a Job.
- Parameters
job_id (str) – job ID
- Returns
Detailed information of a job
- Return type
dict
-
JOB_PRIORITY
= {'HIGH': 1, 'LOW': -1, 'NORM': 0, 'NORMAL': 0, 'VERY HIGH': 2, 'VERY LOW': -2, 'VERY-HIGH': 2, 'VERY-LOW': -2, 'VERY_HIGH': 2, 'VERY_LOW': -2}¶
-
tdclient.partial_delete_api¶
-
class
tdclient.partial_delete_api.
PartialDeleteAPI
[source]¶ Bases:
object
Create a job to partially delete the contents of the table with the given time range.
This class is inherited by
tdclient.api.API
.-
partial_delete
(db, table, to, _from, params=None)[source]¶ Create a job to partially delete the contents of the table with the given time range.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
to (int) – Time in Unix Epoch format indicating the End date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
_from (int) – Time in Unix Epoch format indicating the Start date and time of the data to be deleted. Should be set only by the hour. Minutes and seconds values will not be accepted.
params (dict, optional) –
Extra parameters.
- pool_name (str, optional):
Indicates the resource pool to execute this job. If not provided, the account’s default resource pool would be used.
- domain_key (str, optional):
Domain key that will be assigned to the partial delete job to be created
- Returns
Job ID.
- Return type
str
-
tdclient.result_api¶
-
class
tdclient.result_api.
ResultAPI
[source]¶ Bases:
object
Access to Result API.
This class is inherited by
tdclient.api.API
.-
create_result
(name, url, params=None)[source]¶ Create a new authentication with the specified name.
- Parameters
name (str) – Authentication name.
url (str) – Url of the authentication to be created. e.g. “ftp://test.com/”
params (dict, optional) – Extra parameters.
- Returns
True if succeeded.
- Return type
bool
-
tdclient.schedule_api¶
-
class
tdclient.schedule_api.
ScheduleAPI
[source]¶ Bases:
object
Access to Schedule API
This class is inherited by
tdclient.api.API
.-
create_schedule
(name, params=None)[source]¶ Create a new scheduled query with the specified name.
- Parameters
name (str) – Scheduled query name.
params (dict, optional) –
Extra parameters.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily"
,"@hourly"
,"10 * * * *"
(custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
- Returns
Start date time.
- Return type
datetime.datetime
-
delete_schedule
(name)[source]¶ Delete the scheduled query with the specified name.
- Parameters
name (str) – Target scheduled query name.
- Returns
Tuple of cron and query.
- Return type
(str, str)
-
history
(name, _from=0, to=None)[source]¶ Get the history details of the saved query for the past 90days.
- Parameters
name (str) – Target name of the scheduled query.
_from (int, optional) – Indicates from which nth record in the run history would be fetched. Default: 0. Note: Count starts from zero. This means that the first record in the list has a count of zero.
to (int, optional) – Indicates up to which nth record in the run history would be fetched. Default: 20
- Returns
History of the scheduled query.
- Return type
dict
-
list_schedules
()[source]¶ Get the list of all the scheduled queries.
- Returns
str, cron:str, query:str, database:str, result_url:str)]
- Return type
[(name
-
run_schedule
(name, time, num=None)[source]¶ Execute the specified query.
- Parameters
name (str) – Target scheduled query name.
time (int) – Time in Unix epoch format that would be set as TD_SCHEDULED_TIME
num (int, optional) – Indicates how many times the query will be executed. Value should be 9 or less. Default: 1
- Returns
[(job_id:int, type:str, scheduled_at:str)]
- Return type
list of tuple
-
update_schedule
(name, params=None)[source]¶ Update the scheduled query.
- Parameters
name (str) – Target scheduled query name.
params (dict) –
Extra parameteres.
- type (str):
Query type. {“presto”, “hive”}. Default: “hive”
- database (str):
Target database name.
- timezone (str):
Scheduled query’s timezone. e.g. “UTC” For details, see also: https://gist.github.com/frsyuki/4533752
- cron (str, optional):
Schedule of the query. {
"@daily"
,"@hourly"
,"10 * * * *"
(custom cron)} See also: https://support.treasuredata.com/hc/en-us/articles/360001451088-Scheduled-Jobs-Web-Console
- delay (int, optional):
A delay ensures all buffered events are imported before running the query. Default: 0
- query (str):
Is a language used to retrieve, insert, update and modify data. See also: https://support.treasuredata.com/hc/en-us/articles/360012069493-SQL-Examples-of-Scheduled-Queries
- priority (int, optional):
Priority of the query. Range is from -2 (very low) to 2 (very high). Default: 0
- retry_limit (int, optional):
Automatic retry count. Default: 0
- engine_version (str, optional):
Engine version to be used. If none is specified, the account’s default engine version would be set. {“stable”, “experimental”}
- pool_name (str, optional):
For Presto only. Pool name to be used, if not specified, default pool would be used.
- result (str, optional):
Location where to store the result of the query. e.g. ‘tableau://user:password@host.com:1234/datasource’
-
tdclient.server_status_api¶
-
class
tdclient.server_status_api.
ServerStatusAPI
[source]¶ Bases:
object
Access to Server Status API
This class is inherited by
tdclient.api.API
.
tdclient.table_api¶
-
class
tdclient.table_api.
TableAPI
[source]¶ Bases:
object
Access to Table API
This class is inherited by
tdclient.api.API
.-
change_database
(db, table, dest_db)[source]¶ Move a target table from it’s original database to new destination database.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
dest_db (str) – Destination database name.
- Returns
True if succeeded
- Return type
bool
-
create_log_table
(db, table)[source]¶ Create a new table in the database and registers it in PlazmaDB.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
- Returns
True if succeeded.
- Return type
bool
-
delete_table
(db, table)[source]¶ Delete the specified table.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
- Returns
Type information of the table (e.g. “log”).
- Return type
str
-
list_tables
(db)[source]¶ Gets the list of table in the database.
- Parameters
db (str) – Target database name.
- Returns
Detailed table information.
- Return type
dict
Examples
>>> td.api.list_tables("my_db") { 'iris': {'id': 21039862, 'name': 'iris', 'estimated_storage_size': 1236, 'counter_updated_at': '2019-09-18T07:14:28Z', 'last_log_timestamp': datetime.datetime(2019, 1, 30, 5, 34, 42, tzinfo=tzutc()), 'delete_protected': False, 'created_at': datetime.datetime(2019, 1, 30, 5, 34, 42, tzinfo=tzutc()), 'updated_at': datetime.datetime(2019, 1, 30, 5, 34, 46, tzinfo=tzutc()), 'type': 'log', 'include_v': True, 'count': 150, 'schema': [['sepal_length', 'double', 'sepal_length'], ['sepal_width', 'double', 'sepal_width'], ['petal_length', 'double', 'petal_length'], ['petal_width', 'double', 'petal_width'], ['species', 'string', 'species']], 'expire_days': None, 'last_import': datetime.datetime(2019, 9, 18, 7, 14, 28, tzinfo=tzutc())}, }
-
swap_table
(db, table1, table2)[source]¶ Swap the two specified tables with each other belonging to the same database and basically exchanges their names.
- Parameters
db (str) – Target database name
table1 (str) – First target table for the swap.
table2 (str) – Second target table for the swap.
- Returns
True if succeeded.
- Return type
bool
-
tail
(db, table, count, to=None, _from=None, block=None)[source]¶ Get the contents of the table in reverse order based on the registered time (last data first).
- Parameters
db (str) – Target database name.
table (str) – Target table name.
count (int) – Number for record to show up from the end.
to – Deprecated parameter.
_from – Deprecated parameter.
block – Deprecated parameter.
- Returns
Contents of the table.
- Return type
[dict]
-
update_expire
(db, table, expire_days)[source]¶ Update the expire days for the specified table
- Parameters
db (str) – Target database name.
table (str) – Target table name.
expire_days (int) – Number of days where the contents of the specified table would expire.
- Returns
True if succeeded.
- Return type
bool
-
update_schema
(db, table, schema_json)[source]¶ Update the table schema.
- Parameters
db (str) – Target database name.
table (str) – Target table name.
schema_json (str) – Schema format JSON string. See also: ~`Client.update_schema` e.g. ‘[[“sep_len”, “long”, “sep_len”], [“sep_wid”, “long”, “sep_wid”]]’
- Returns
True if succeeded.
- Return type
bool
-
tdclient.user_api¶
-
class
tdclient.user_api.
UserAPI
[source]¶ Bases:
object
Access to User API.
This class is inherited by
tdclient.api.API
.-
add_apikey
(name)[source]¶ Create a new apikey for the specified email address.
- Parameters
name (str) – User’s email address
- Returns
True if succeeded.
- Return type
bool
-
add_user
(name, org, email, password)[source]¶ Add a new user to the current account and sends invitation.
- Parameters
name (str) – User’s name
org (str) – Not used
email (str) – User’s email address
password (str) – User’s temporary password for logging-in
- Returns
True if succeeded.
- Return type
bool
-
authenticate
(user, password)[source]¶ Authenticate the indicated email address which is not authenticated via SSO.
- Parameters
user (str) – Email of the user to be authenticated.
password (str) – Must contain at least 1 letter, 1 number, and 1 special character such as the following:
r'[!#$%-_=+<>0-9a-zA-Z]'
- Returns
API key
- Return type
str
-
list_apikeys
(name)[source]¶ Get the apikeys of the current user.
- Parameters
name (str) – User’s email address
- Returns
List of API keys
- Return type
[str]
-
list_users
()[source]¶ Get the list of users for the account.
- Returns
str,organization:str,[user:str]]
- Return type
[[name
-
Misc¶
tdclient.errors¶
-
exception
tdclient.errors.
AlreadyExistsError
[source]¶ Bases:
tdclient.errors.APIError
-
exception
tdclient.errors.
AuthError
[source]¶ Bases:
tdclient.errors.APIError
-
exception
tdclient.errors.
DatabaseError
[source]¶ Bases:
tdclient.errors.Error
-
exception
tdclient.errors.
ForbiddenError
[source]¶ Bases:
tdclient.errors.APIError
-
exception
tdclient.errors.
InterfaceError
[source]¶ Bases:
tdclient.errors.Error
-
exception
tdclient.errors.
NotFoundError
[source]¶ Bases:
tdclient.errors.APIError
tdclient.util¶
-
tdclient.util.
create_msgpack
(items)[source]¶ Create msgpack streaming bytes from list
- Parameters
items (list of dict) – target list
- Returns
Converted msgpack streaming (bytes)
Examples
>>> t1 = int(time.time()) >>> l1 = [{"a": 1, "b": 2, "time": t1}, {"a":3, "b": 6, "time": t1}] >>> create_msgpack(l1) ``b'¡a¡b¤timeÎ]¥X¡¡a¡b¤timeÎ]¥X¡'``
-
tdclient.util.
create_url
(tmpl, **values)[source]¶ Create url with values
- Parameters
tmpl (str) – url template
values (dict) – values for url
Version History¶
Unreleased¶
v1.1.0 (2019-10-16)¶
Move
normalized_msgpack()
fromtdclient.api
totdclient.util
module (#79)Add
tdclient.util.create_msgpack()
to support creating msgpack streaming from list (#79)
v1.0.1 (2019-10-10)¶
Fix
wait_interval
handling forBulkImport.perform
appropriately (#74)Use
io.TextIOWrapper
to prevent"x85"
issue creating None (#77)
v1.0.0 (2019-09-27)¶
Drop Python 2 support (#60)
Remove deprecated functions as follows (#76):
TableAPI.create_item_table
UserAPI.change_email
,UserAPI.change_password
, andUserAPI.change_my_password
JobAPI.hive_query
, andJobAPI.pig_query
Support
TableAPI.tail
andTableAPI.change_database
(#64, #71)Introduce documentation site (#65, #66, #70, #72)
v0.14.0 (2019-07-11)¶
Remove ACL and account APIs (#56, #58)
Fix PyOpenSSL issue which causes pandas-td error (#59)
v0.13.0 (2019-03-29)¶
Change msgpack-python to msgpack (#50)
Dropped 3.3 support as it has already been EOL’d (#52)
Set urllib3 minimum version as v1.24.1 (#51)
v0.12.0 (2018-05-31)¶
Avoided to declare library dependencies too tightly within this project since this is a library project (#42)
Got rid of all configurations for Python 2.6 completely (#42)
v0.11.1 (2018-05-21)¶
Added 3.6 as test target. No functional changes have applied since 0.11.0 (#41)
v0.11.0 (2018-05-21)¶
Support missing parameters in JOB API (#39, #40)
v0.10.0 (2017-11-01)¶
Ignore empty string in job’s
start_at
andend_at
(#35, #36)
v0.9.0 (2017-02-27)¶
Add validation to part names for bulk upload
v0.8.0 (2016-12-22)¶
Fix unicode encoding issues on Python 2.x (#27, #28, #29)
v0.7.0 (2016-12-06)¶
Fix for tdclient tables data not populating
TableAPI.list_tables
now returns a dictionary instead of a tuple
v0.6.0 (2016-09-27)¶
Generate universal wheel by default since there’s no binary in this package
Add missing support for
created_time
anduser_name
from/v3/schedule/list
API (#20, #21)Use keyword arguments for initializing model attributes (#22)
v0.5.0 (2016-06-10)¶
Prevent retry after PUT request failures. This is the same behavior as https://github.com/treasure-data/td-client-ruby (#16)
Support HTTP proxy authentication (#17)
v0.4.2 (2016-03-15)¶
Catch exceptions on parsing date time string
v0.4.1 (2016-01-19)¶
Fix Data Connector APIs based on latest td-client-ruby’s implementation (#14)
v0.4.0 (2015-12-14)¶
Avoid an exception raised when a
start
is not set for a schedule (#12)Fix getting database names of job objects (#13)
Add Data Connector APIs
Add deprecation warnings on the usage of “item tables”
Show
cumul_retry_delay
in retry messages
v0.3.2 (2015-08-01)¶
Fix bugs in
ScheduledJob
andSchedule
models
v0.3.1 (2015-07-10)¶
Fix
OverflowError
on importing integer value longer than 64 bit length which is not supported by msgpack specification. Those values will be converted into string.
v0.3.0 (2015-07-03)¶
Add Python Database API (PEP 0249) compatible connection and cursor.
Add varidation to the part name of a bulk import. It should not contain ‘/’.
Changed default wait interval of job models from 1 second to 5 seconds.
Fix many potential problems/warnings found by landscape.io.
v0.2.1 (2015-06-20)¶
Set default timeout of API client as 60 seconds.
Change the timeout of API client from
sum(connect_timeout, read_timeout, send_timeout)
tomax(connect_timeout, read_timeout, send_timeout)
Change default user-agent of client from
TD-Client-Python:{version}
toTD-Client-Python/{version}
to comply RFC2616
v0.2.0 (2015-05-28)¶
Improve the job model. Now it retrieves the job values automatically after the invocation of
wait
,result
andkill
.Add a property
result_schema
toJob
model to provide the schema of job resultImprove the bulk import model. Add a convenient method named
upload_file
to upload a part from file-like object.Support CSV/TSV format on both streaming import and bulk import
Change module name;
tdclient.model
->tdclient.models
v0.1.11 (2015-05-17)¶
Fix API client to retry POST requests properly if
retry_post_requests
is set toTrue
(#5)Show warnings if imported data don’t have
time
column
v0.1.10 (2015-03-30)¶
Fixed a JSON parse error in
job.result_format("json")
with multipe result rows (#4)Refactored model classes and tests
v0.1.9 (2015-02-26)¶
Stopped using syntax added in recent Python releases
v0.1.8 (2015-02-26)¶
Fix SSL verification errors on Python 2.7 on Windows environment. Now it uses
certifi
to verify SSL certificates if it is available.
v0.1.7 (2015-02-26)¶
Fix support for Windows environments
Fix byte encoding problem in
tdclient.api.API#import_file
on Python 3.x
v0.1.6 (2015-02-12)¶
Support specifying job priority in its name (e.g. “NORMAL”, “HIGH”, etc.)
Convert job priority number to its name (e.g. 0 => “NORMAL”, 1 => “HIGH”, etc.)
Fix a broken behavior in
tdclient.model.Job#wait
when specifying timeoutFix broken
tdclient.client.Client#database()
which is used fromtdclient.model.Table#permission()
Fix broken
tdclient.Client.Client#results()
v0.1.5 (2015-02-10)¶
Fix local variable scope problem in
tdclient.api.show_job
(#2)Fix broken multiple assignment in
tdclient.model.Job#_update_status
(#3)
v0.1.4 (2015-02-06)¶
Add new data import function of
tdclient.api.import_file
to allow importing data from file-like object or an existing file on filesystem.Fix an encoding error in
tdclient.api.import_data
on Python 2.xAdd missing import to fix broken
tdclient.model.Job#wait
Use
td.api.DEFAULT_ENDPOINT
for all requests
v0.1.3 (2015-01-24)¶
Support PEP 343 in
tdclient.Client
and removecontextlib
from exampleAdd deprecation warnings to
hive_query
andpig_query
oftdclient.api.API
Add
tdclient.model.Job#id
as an alias oftdclient.model.Job#job_id
Parse datatime properly returned from
tdclient.Client#create_schedule
Changed
tdclient.model.Job#query
as a property since it won’t be modified during the executionAllow specifying query options from
tdclient.model.Database#query
v0.1.2 (2015-01-21)¶
Fix broken PyPI identifiers
Update documentation
v0.1.1 (2015-01-21)¶
Improve the verification of SSL certificates on RedHat and variants
Implement
wait
andkill
intdclient.model.Job
Change the “Development Status” from Alpha to Beta
v0.1.0 (2015-01-15)¶
Initial public release