Misc¶

tdclient.errors¶

exception tdclient.errors.APIError[source]¶: Bases: Exception

exception tdclient.errors.AlreadyExistsError[source]¶: Bases: tdclient.errors.APIError

exception tdclient.errors.AuthError[source]¶: Bases: tdclient.errors.APIError

exception tdclient.errors.DataError[source]¶: Bases: tdclient.errors.DatabaseError

exception tdclient.errors.DatabaseError[source]¶: Bases: tdclient.errors.Error

exception tdclient.errors.Error[source]¶: Bases: Exception

exception tdclient.errors.ForbiddenError[source]¶: Bases: tdclient.errors.APIError

exception tdclient.errors.IntegrityError[source]¶: Bases: tdclient.errors.DatabaseError

exception tdclient.errors.InterfaceError[source]¶: Bases: tdclient.errors.Error

exception tdclient.errors.InternalError[source]¶: Bases: tdclient.errors.DatabaseError

exception tdclient.errors.NotFoundError[source]¶: Bases: tdclient.errors.APIError

exception tdclient.errors.NotSupportedError[source]¶: Bases: tdclient.errors.DatabaseError

exception tdclient.errors.OperationalError[source]¶: Bases: tdclient.errors.DatabaseError

exception tdclient.errors.ParameterValidationError[source]¶: Bases: Exception

exception tdclient.errors.ProgrammingError[source]¶: Bases: tdclient.errors.DatabaseError

tdclient.util¶

tdclient.util.create_msgpack(items)[source]¶

Create msgpack streaming bytes from list

Parameters: items (list of dict) – target list
Returns: Converted msgpack streaming (bytes)

Examples

>>> t1 = int(time.time())
>>> l1 = [{"a": 1, "b": 2, "time": t1}, {"a":3, "b": 6, "time": t1}]
>>> create_msgpack(l1)
b'\x83\xa1a\x01\xa1b\x02\xa4time\xce]\xa5X\xa1\x83\xa1a\x03\xa1b\x06\xa4time\xce]\xa5X\xa1'

tdclient.util.create_url(tmpl, **values)[source]¶

Create url with values

Parameters

tmpl (str) – url template
values (dict) – values for url

tdclient.util.csv_dict_record_reader(file_like, encoding, dialect)[source]¶

Yield records from a CSV input using csv.DictReader.

This is a reader suitable for use by tdclient.util.read_csv_records.

It is used to read CSV data when the column names are read from the first row in the CSV data.

Parameters

file_like – acts like an instance of io.BufferedIOBase. Reading from it returns bytes.
encoding (str) – the name of the encoding to use when turning those bytes into strings.
dialect (str) – the name of the CSV dialect to use.

Yields

For each row of CSV data read from file_like, yields a dictionary whose keys are column names (determined from the first row in the CSV data) and whose values are the column values.

tdclient.util.csv_text_record_reader(file_like, encoding, dialect, columns)[source]¶

Yield records from a CSV input using csv.reader and explicit column names.

This is a reader suitable for use by tdclient.util.read_csv_records.

It is used to read CSV data when the column names are supplied as an explicit columns parameter.

Parameters

file_like – acts like an instance of io.BufferedIOBase. Reading from it returns bytes.
encoding (str) – the name of the encoding to use when turning those bytes into strings.
dialect (str) – the name of the CSV dialect to use.

Yields

For each row of CSV data read from file_like, yields a dictionary whose keys are column names (determined by columns) and whose values are the column values.

tdclient.util.get_or_else(hashmap, key, default_value=None)[source]¶

Get value or default value

It differs from the standard dict get method in its behaviour when key is present but has a value that is an empty string or a string of only spaces.

Parameters

hashmap (dict) – target
key (Any) – key
default_value (Any) – default value

Example

>>> get_or_else({'k': 'nonspace'}, 'k', 'default')
'nonspace'
>>> get_or_else({'k': ''}, 'k', 'default')
'default'
>>> get_or_else({'k': '    '}, 'k', 'default')
'default'

Returns: The value of key or default_value

tdclient.util.guess_csv_value(s)[source]¶

Determine the most appropriate type for s and return it.

Tries to interpret s as a more specific datatype, in the following order, and returns the first that succeeds:

As an integer
As a floating point value
If it is “false” or “true” (case insensitive), then as a boolean
If it is “” or “none” or “null” (case insensitive), then as None
As the string itself, unaltered

Parameters: s (str) – a string value, assumed to have been read from a CSV file.
Returns: A good guess at a more specific value (int, float, str, bool or None)

tdclient.util.merge_dtypes_and_converters(dtypes=None, converters=None)[source]¶

Generate a merged dictionary from those given.

Parameters

dtypes (optional dict) – A dictionary mapping column name to “dtype” (datatype), where “dtype” may be any of the strings ‘bool’, ‘float’, ‘int’, ‘str’ or ‘guess’.
converters (optional dict) – A dictionary mapping column name to a callable. The callable should take a string as its single argument, and return the result of parsing that string.

Internally, the dtypes dictionary is converted to a temporary dictionary of the same form as converters - that is, mapping column names to callables. The “data type” string values in dtypes are converted to the Python builtins of the same name, and the value “guess” is converted to the tdclient.util.guess_csv_value callable.

Example

>>> merge_dtypes_and_converters(
...    dtypes={'col1': 'int', 'col2': 'float'},
...    converters={'col2': int},
... )
{'col1': int, 'col2': int}

Returns: (dict) A dictionary which maps column names to callables. If a column name occurs in both input dictionaries, the callable specified in converters is used.

tdclient.util.normalize_connector_config(config)[source]¶

Normalize connector config

This is porting of TD CLI’s ConnectorConfigNormalizer#normalized_config. see also: https://github.com/treasure-data/td/blob/15495f12d8645a7b3f6804098f8f8aca72de90b9/lib/td/connector_config_normalizer.rb#L7-L30

Parameters: config (dict) – A config to be normalized
Returns: Normalized configuration
Return type: dict

Examples

Only with in key in a config. >>> config = {“in”: {“type”: “s3”}} >>> normalize_connector_config(config) {‘in’: {‘type’: ‘s3’}, ‘out’: {}, ‘exec’: {}, ‘filters’: []}

With in, out, exec, and filters in a config. >>> config = { … “in”: {“type”: “s3”}, … “out”: {“mode”: “append”}, … “exec”: {“guess_plugins”: [“json”]}, … “filters”: [{“type”: “speedometer”}], … } >>> normalize_connector_config(config) {‘in’: {‘type’: ‘s3’}, ‘out’: {‘mode’: ‘append’}, ‘exec’: {‘guess_plugins’: [‘json’]}, ‘filters’: [{‘type’: ‘speedometer’}]}

tdclient.util.normalized_msgpack(value)[source]¶

Recursively convert int to str if the int “overflows”.

Parameters: value (list, dict, int, float, str, bool or None) – value to be normalized

If value is a list, then all elements in the list are (recursively) normalized.

If value is a dictionary, then all the dictionary keys and values are (recursively) normalized.

If value is an integer, and outside the range -(1 << 63) to (1 << 64), then it is converted to a string.

Otherwise, value is returned unchanged.

Returns: Normalized value

tdclient.util.parse_csv_value(k, s, converters=None)[source]¶

Given a CSV (string) value, work out an actual value.

Parameters

k (str) – The name of the column that the value belongs to.
s (str) – The value as read from the CSV input.
converters (optional dict) – A dictionary mapping column name to callable.

If converters is given, and there is a key matching k in converters, then converters[k](s) will be called to work out the return value. Otherwise, tdclient.util.guess_csv_value will be called with s as its argument.

Warning

No attempt is made to cope with any errors occurring in a callable from the converters dictionary. So if int is called on the string "not-an-int" the resulting ValueError is not caught.

Example

>>> repr(parse_csv_value('col1', 'A string'))
'A string'
>>> repr(parse_csv_value('col1', '10'))
10
>>> repr(parse_csv_value('col1', '10', {'col1': float, 'col2': int}))
10.0

Returns: The value for the CSV column, after parsing by a callable from converters, or after parsing by tdclient.util.guess_csv_value.

tdclient.util.parse_date(s)[source]¶

Parse date from str to datetime

TODO: parse datetime using an optional format string

For now, this does not use a format string since API may return date in ambiguous format :(

Parameters: s (str) – target str
Returns: datetime

tdclient.util.read_csv_records(csv_reader, dtypes=None, converters=None, **kwargs)[source]¶: Read records using csv_reader and yield the results.

tdclient.util.validate_record(record)[source]¶

Check that record contains a key called “time”.

Parameters

record (dict) – a dictionary representing a data record, where the
name the "columns". (keys) –

Returns

True if there is a key called “time” (it actually checks for "time" (a string) and b"time" (a binary)). False if there is no key called “time”.