File import parameters

str or file-like parameters specify where to read the input data from. They can be:

  • a file name.

  • a file object, representing a file opened in binary mode.

  • an object that acts like an instance of io.BufferedIOBase. Reading from it returns bytes.

format is a string specifying an input format. The following input formats are supported:

  • “msgpack” - the data is MessagePack serialized

  • “json” - the data is JSON serialized.

  • “csv” - the data is CSV, and will be read using the Python CSV module.

  • “tsv” - the data is TSV (tab separated data), and will be read using the Python CSV module with dialect=csv.excel_tab explicitly set.

If .gz is appended to the format name (for instance, "json.gz") then the data is assumed to be gzip compressed, and will be uncompressed as it is read.

Both MessagePack and JSON data are composed of an array of records, where each record is a dictionary (hash or mapping) of column name to column value.

In all import formats, every record must have a column named “time”.

JSON data

JSON data is read using the utf-8 encoding.

CSV data

When reading CSV data, the following parameters may also be supplied, all of which are optional:

  • dialect specifies the CSV dialect. The default is csv.excel.

  • encoding specifies the encoding that will be used to turn the binary input data into string data. The default encoding is "utf-8"

  • columns is a list of strings, giving names for the CSV columns. The default is None, meaning that the column names will be taken from the first record in the CSV data.

  • dtypes is a dictionary used to specify a datatype for individual columns, for instance {"col1": "int"}. The available datatypes are "bool", "float", "int", "str" and "guess", where "guess" means to use the function guess_csv_value.

  • converters is a dictionary used to specify a function that will be used to parse individual columns, for instace {"col1", int}. The function must take a string as its single input parameter, and return a value of the required type.

If a column is named in both dtypes and converters, then the function given in converters will be used to parse that column.

If a column is not named in either dtypes or converters, then it will be assumed to have datatype "guess", and will be parsed with guess_csv_value.

Note that errors raised when calling a function from the converters dictionary will not be caught. So if converters={"col1": int} and “col1” contains "not-an-int", the resulting ValueError will not be caught.

To summarise, the default for reading CSV files is:

dialect=csv.excel, encoding="utf-8", columns=None, dtypes=None, converters=None

TSV data

When reading TSV data, the parameters that may be used are the same as for CSV, except that:

  • dialect may not be specified, and csv.excel_tab will be used.

The default for reading TSV files is:

encoding="utf-8", columns=None, dtypes=None, converters=None