File import parameters¶
str or file-like
parameters specify where to read the input data
from. They can be:
a file name.
a file object, representing a file opened in binary mode.
an object that acts like an instance of io.BufferedIOBase. Reading from it returns bytes.
format
is a string specifying an input format.
The following input formats are supported:
“msgpack” - the data is MessagePack serialized
“json” - the data is JSON serialized.
“csv” - the data is CSV, and will be read using the Python CSV module.
“tsv” - the data is TSV (tab separated data), and will be read using the Python CSV module with
dialect=csv.excel_tab
explicitly set.
If .gz
is appended to the format name (for instance, "json.gz"
) then
the data is assumed to be gzip compressed, and will be uncompressed as it is
read.
Both MessagePack and JSON data are composed of an array of records, where each record is a dictionary (hash or mapping) of column name to column value.
In all import formats, every record must have a column named “time”.
JSON data¶
JSON data is read using the utf-8 encoding.
CSV data¶
When reading CSV data, the following parameters may also be supplied, all of which are optional:
dialect
specifies the CSV dialect. The default iscsv.excel
.encoding
specifies the encoding that will be used to turn the binary input data into string data. The default encoding is"utf-8"
columns
is a list of strings, giving names for the CSV columns. The default isNone
, meaning that the column names will be taken from the first record in the CSV data.dtypes
is a dictionary used to specify a datatype for individual columns, for instance{"col1": "int"}
. The available datatypes are"bool"
,"float"
,"int"
,"str"
and"guess"
, where"guess"
means to use the function guess_csv_value.converters
is a dictionary used to specify a function that will be used to parse individual columns, for instace{"col1", int}
. The function must take a string as its single input parameter, and return a value of the required type.
If a column is named in both dtypes
and converters
, then the function
given in converters
will be used to parse that column.
If a column is not named in either dtypes
or converters
, then it will
be assumed to have datatype "guess"
, and will be parsed with
guess_csv_value.
Note that errors raised when calling a function from the converters
dictionary will not be caught. So if converters={"col1": int}
and “col1”
contains "not-an-int"
, the resulting ValueError
will not be caught.
To summarise, the default for reading CSV files is:
dialect=csv.excel, encoding="utf-8", columns=None, dtypes=None, converters=None
TSV data¶
When reading TSV data, the parameters that may be used are the same as for CSV, except that:
dialect
may not be specified, andcsv.excel_tab
will be used.
The default for reading TSV files is:
encoding="utf-8", columns=None, dtypes=None, converters=None