Misc
tdclient.errors
- exception tdclient.errors.AlreadyExistsError[source]
Bases:
APIErrorException raised when a resource already exists (HTTP 409).
- exception tdclient.errors.AuthError[source]
Bases:
APIErrorException raised for authentication errors (HTTP 401).
- exception tdclient.errors.DataError[source]
Bases:
DatabaseErrorException for errors due to problems with the processed data (PEP 249).
- exception tdclient.errors.DatabaseError[source]
Bases:
ErrorException for errors related to the database (PEP 249).
- exception tdclient.errors.Error[source]
Bases:
ExceptionBase class for database-related errors (PEP 249).
- exception tdclient.errors.ForbiddenError[source]
Bases:
APIErrorException raised for forbidden access errors (HTTP 403).
- exception tdclient.errors.IntegrityError[source]
Bases:
DatabaseErrorException for errors related to relational integrity (PEP 249).
- exception tdclient.errors.InterfaceError[source]
Bases:
ErrorException for errors related to the database interface (PEP 249).
- exception tdclient.errors.InternalError[source]
Bases:
DatabaseErrorException for internal database errors (PEP 249).
- exception tdclient.errors.NotFoundError[source]
Bases:
APIErrorException raised when a resource is not found (HTTP 404).
- exception tdclient.errors.NotSupportedError[source]
Bases:
DatabaseErrorException for unsupported operations (PEP 249).
- exception tdclient.errors.OperationalError[source]
Bases:
DatabaseErrorException for errors related to database operation (PEP 249).
- exception tdclient.errors.ParameterValidationError[source]
Bases:
ExceptionException raised when parameter validation fails.
- exception tdclient.errors.ProgrammingError[source]
Bases:
DatabaseErrorException for programming errors (PEP 249).
tdclient.util
- tdclient.util.create_msgpack(items: list[dict[str, Any]]) bytes[source]
Create msgpack streaming bytes from list
- Parameters:
items (list of dict) – target list
- Returns:
Converted msgpack streaming (bytes)
Examples
>>> t1 = int(time.time()) >>> l1 = [{"a": 1, "b": 2, "time": t1}, {"a":3, "b": 6, "time": t1}] >>> create_msgpack(l1) b'\x83\xa1a\x01\xa1b\x02\xa4time\xce]\xa5X\xa1\x83\xa1a\x03\xa1b\x06\xa4time\xce]\xa5X\xa1'
- tdclient.util.create_url(tmpl: str, **values: Any) str[source]
Create url with values
- Parameters:
tmpl (str) – url template
values (dict) – values for url
- tdclient.util.csv_dict_record_reader(file_like: BinaryIO, encoding: str, dialect: str | type[Dialect]) Iterator[dict[str, str]][source]
Yield records from a CSV input using csv.DictReader.
This is a reader suitable for use by `tdclient.util.read_csv_records`_.
It is used to read CSV data when the column names are read from the first row in the CSV data.
- Parameters:
file_like – acts like an instance of io.BufferedIOBase. Reading from it returns bytes.
encoding (str) – the name of the encoding to use when turning those bytes into strings.
dialect (str | type[csv.Dialect]) – the name of the CSV dialect to use, or a Dialect class.
- Yields:
For each row of CSV data read from file_like, yields a dictionary whose keys are column names (determined from the first row in the CSV data) and whose values are the column values.
- tdclient.util.csv_text_record_reader(file_like: BinaryIO, encoding: str, dialect: str | type[Dialect], columns: list[str]) Iterator[dict[str, str]][source]
Yield records from a CSV input using csv.reader and explicit column names.
This is a reader suitable for use by `tdclient.util.read_csv_records`_.
It is used to read CSV data when the column names are supplied as an explicit columns parameter.
- Parameters:
file_like – acts like an instance of io.BufferedIOBase. Reading from it returns bytes.
encoding (str) – the name of the encoding to use when turning those bytes into strings.
dialect (str | type[csv.Dialect]) – the name of the CSV dialect to use, or a Dialect class.
- Yields:
For each row of CSV data read from file_like, yields a dictionary whose keys are column names (determined by columns) and whose values are the column values.
- tdclient.util.get_or_else(hashmap: dict[str, str], key: str, default_value: str | None = None) str | None[source]
Get value or default value
It differs from the standard dict
getmethod in its behaviour when key is present but has a value that is an empty string or a string of only spaces.- Parameters:
hashmap (dict) – target dictionary with string values
key (str) – key to look up
default_value (str | None) – default value to return if key is missing or value is empty/whitespace
Example
>>> get_or_else({'k': 'nonspace'}, 'k', 'default') 'nonspace' >>> get_or_else({'k': ''}, 'k', 'default') 'default' >>> get_or_else({'k': ' '}, 'k', 'default') 'default'
- Returns:
The value of key or default_value
- tdclient.util.guess_csv_value(s: str) int | float | str | bool | None[source]
Determine the most appropriate type for s and return it.
Tries to interpret s as a more specific datatype, in the following order, and returns the first that succeeds:
As an integer
As a floating point value
If it is “false” or “true” (case insensitive), then as a boolean
If it is “” or “none” or “null” (case insensitive), then as None
As the string itself, unaltered
- Parameters:
s (str) – a string value, assumed to have been read from a CSV file.
- Returns:
A good guess at a more specific value (int, float, str, bool or None)
- tdclient.util.merge_dtypes_and_converters(dtypes: dict[str, str] | None = None, converters: dict[str, Callable[[str], Any]] | None = None) dict[str, Callable[[str], Any]][source]
Generate a merged dictionary from those given.
- Parameters:
dtypes (optional dict) – A dictionary mapping column name to “dtype” (datatype), where “dtype” may be any of the strings ‘bool’, ‘float’, ‘int’, ‘str’ or ‘guess’.
converters (optional dict) – A dictionary mapping column name to a callable. The callable should take a string as its single argument, and return the result of parsing that string.
Internally, the dtypes dictionary is converted to a temporary dictionary of the same form as converters - that is, mapping column names to callables. The “data type” string values in dtypes are converted to the Python builtins of the same name, and the value “guess” is converted to the `tdclient.util.guess_csv_value`_ callable.
Example
>>> merge_dtypes_and_converters( ... dtypes={'col1': 'int', 'col2': 'float'}, ... converters={'col2': int}, ... ) {'col1': int, 'col2': int}
- Returns:
(dict) A dictionary which maps column names to callables. If a column name occurs in both input dictionaries, the callable specified in converters is used.
- tdclient.util.normalize_connector_config(config: dict[str, Any]) dict[str, Any][source]
Normalize connector config
This is porting of TD CLI’s ConnectorConfigNormalizer#normalized_config. see also: https://github.com/treasure-data/td/blob/15495f12d8645a7b3f6804098f8f8aca72de90b9/lib/td/connector_config_normalizer.rb#L7-L30
- Parameters:
config (dict) – A config to be normalized
- Returns:
Normalized configuration
- Return type:
dict
Examples
Only with
inkey in a config. >>> config = {“in”: {“type”: “s3”}} >>> normalize_connector_config(config) {‘in’: {‘type’: ‘s3’}, ‘out’: {}, ‘exec’: {}, ‘filters’: []}With
in,out,exec, andfiltersin a config. >>> config = { … “in”: {“type”: “s3”}, … “out”: {“mode”: “append”}, … “exec”: {“guess_plugins”: [“json”]}, … “filters”: [{“type”: “speedometer”}], … } >>> normalize_connector_config(config) {‘in’: {‘type’: ‘s3’}, ‘out’: {‘mode’: ‘append’}, ‘exec’: {‘guess_plugins’: [‘json’]}, ‘filters’: [{‘type’: ‘speedometer’}]}
- tdclient.util.normalized_msgpack(value: Any) Any[source]
Recursively convert int to str if the int “overflows”.
- Parameters:
value (list, dict, int, float, str, bool or None) – value to be normalized
If value is a list, then all elements in the list are (recursively) normalized.
If value is a dictionary, then all the dictionary keys and values are (recursively) normalized.
If value is an integer, and outside the range
-(1 << 63)to(1 << 64), then it is converted to a string.Otherwise, value is returned unchanged.
- Returns:
Normalized value
- tdclient.util.parse_csv_value(k: str, s: str, converters: dict[str, Callable[[str], Any]] | None = None) Any[source]
Given a CSV (string) value, work out an actual value.
- Parameters:
k (str) – The name of the column that the value belongs to.
s (str) – The value as read from the CSV input.
converters (optional dict) – A dictionary mapping column name to callable.
If converters is given, and there is a key matching k in converters, then
converters[k](s)will be called to work out the return value. Otherwise, `tdclient.util.guess_csv_value`_ will be called with s as its argument.Warning
No attempt is made to cope with any errors occurring in a callable from the converters dictionary. So if
intis called on the string"not-an-int"the resultingValueErroris not caught.Example
>>> repr(parse_csv_value('col1', 'A string')) 'A string' >>> repr(parse_csv_value('col1', '10')) 10 >>> repr(parse_csv_value('col1', '10', {'col1': float, 'col2': int})) 10.0
- Returns:
The value for the CSV column, after parsing by a callable from converters, or after parsing by `tdclient.util.guess_csv_value`_.
- tdclient.util.parse_date(s: str | None) datetime | None[source]
Parse date from str to datetime
TODO: parse datetime using an optional format string
For now, this does not use a format string since API may return date in ambiguous format :(
- Parameters:
s (str | None) – target str, or None
- Returns:
datetime or None
- tdclient.util.read_csv_records(csv_reader: Iterator[dict[str, str]], dtypes: dict[str, str] | None = None, converters: dict[str, Callable[[str], Any]] | None = None, **kwargs: Any) Iterator[dict[str, Any]][source]
Read records using csv_reader and yield the results.
- tdclient.util.validate_record(record: dict[str, Any]) bool[source]
Check that record contains a key called “time”.
- Parameters:
record (dict) – a dictionary representing a data record, where the
"columns". (keys name the)
- Returns:
True if there is a key called “time” (it actually checks for
"time"(a string) andb"time"(a binary)). False if there is no key called “time”.