pyphot.io package#
IO module#
This module provides functions for reading and writing data in various formats.
It is Adapted from SimpleTable (v2.0; mfouesneau/simpletable) with minimal depedencies.
For all formats, reading and writing preserve metadata.
The data are given as pd.DataFrame objects and the header as a HeaderInfo object.
- from_file(fname, *, format=None, **kwargs)[source]#
Read a file into a DataFrame and a Header
- Parameters:
fname (str) – File name to read.
format (str, optional) – File format to read. If not provided, the format is inferred from the file extension.
**kwargs – Additional keyword arguments to pass to the reader.
- Returns:
df (pd.DataFrame) – DataFrame containing the data.
hdr (HeaderInfo) – Header information.
- Return type:
tuple[DataFrame, HeaderInfo]
Submodules#
pyphot.io.ascii module#
Export dataframe to ASCII format while preserving attrs
- ascii_generate_header(df, comments='#', delimiter=' ', commented_header=True)[source]#
Generate the corresponding ascii Header that contains all necessary info
- Parameters:
df (pd.DataFrame) – table to export
comments (str) – string to prepend header lines
delimiter (str, optional) – The string used to separate values. By default, this is any whitespace.
commented_header (bool, optional) – if set, the last line of the header is expected to be the column titles
- Returns:
hdr – string that will be be written at the beginning of the file
- Return type:
str
- ascii_read_header(fname, *, commentchar='#', delimiter=',', commented_header=True, **kwargs)[source]#
Read ASCII/CSV header
- Parameters:
fname (str, FilePath, BaseBuffer) – File, filename, or generator to read. Note that generators should return byte strings for Python >=3.
comments (str, optional) – The character used to indicate the start of a comment; default: ‘#’. (”” is equivalent to None)
delimiter (str, optional) – The string used to separate values. By default, this is any whitespace.
commented_header (bool, optional) – if set, the last line of the header is expected to be the column titles (with comment character) otherwise, the first line of the data will be the column titles
commentchar (str)
- Returns:
nlines (int) – number of lines from the header
info (HeaderInfo) – header information (header, alias, units, comments)
names (List[str]) – sequence or str, first data line after header, expected to be the column names.
- Return type:
Tuple[int, HeaderInfo, List[str]]
- from_ascii(filepath_or_buffer, *, commented_header=False, **kwargs)[source]#
Read an ASCII file into a DataFrame.
from_csv with delimiter set to “ “ by default
- Parameters:
filepath_or_buffer (str, path object or file-like object) – Any valid string path is acceptable. The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be:
file://localhost/path/to/table.csv.commented_header (bool, default False) – Whether the header is commented or not.
**kwargs (dict) – Additional keyword arguments passed to
pd.read_csv.
- Returns:
DataFrame (pd.DataFrame) – The parsed data as a pd.DataFrame.
header (HeaderInfo) – The header information extracted from the file.
See also
from_csvRead a CSV file into a DataFrame
- from_csv(filepath_or_buffer, *, commented_header=False, **kwargs)[source]#
Read a CSV file into a DataFrame while preserving header information
Equivalent to pd.read_csv with preserved header information.
Also supports optionally iterating or breaking of the file into chunks.
Additional help can be found in the online docs for IO Tools.
- Parameters:
filepath_or_buffer (str, path object or file-like object) – Any valid string path is acceptable.
commented_header (bool, default False) – Whether the column definition header line starts with a comment character.
commentchar (str, default '#') – Character to treat as a comment character.
sep (str, default ',') – Character or regex pattern to treat as the delimiter. If
sep=None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator from only the first valid row of the file by Python’s builtin sniffer tool,csv.Sniffer. In addition, separators longer than 1 character and different from'\s+'will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example:'\r\t'.
- Returns:
DataFrame (pd.DataFrame) – The parsed data as a pd.DataFrame.
header (HeaderInfo) – The header information extracted from the file.
See also
pd.read_csvRead a CSV file into a DataFrame
- to_ascii(self, filepath_or_buffer, *, sep=' ', commentchar='#', **kwargs)[source]#
Write object to an ASCII values file while preserving attrs
Equivalent to to_csv with default sep set to a space.
See also
to_csvWrite object to a CSV file while preserving attrs
- Parameters:
self (DataFrame)
filepath_or_buffer (str | PathLike[str] | BaseBuffer)
sep (str)
commentchar (str)
- Return type:
str | None
- to_csv(self, filepath_or_buffer, *, sep=',', commentchar='#', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression='infer', quoting=None, quotechar='"', lineterminator=None, chunksize=None, date_format=None, doublequote=True, escapechar=None, decimal='.', errors='strict', storage_options=None)[source]#
Write object to a comma-separated values (csv) file while preserving attrs
Fallsback to pd.DataFrame.to_csv if no attrs content
- Parameters:
path_or_buf (str, path object, file-like object, or None, default None) – String, path object (implementing os.PathLike[str]), or file-like object implementing a write() function. If None, the result is returned as a string. If a non-binary file object is passed, it should be opened with newline=’’, disabling universal newlines. If a binary file object is passed, mode might need to contain a ‘b’.
sep (str, default ',') – String of length 1. Field delimiter for the output file.
commentchar (str, default '#') – Character starting a comment line for the output file.
na_rep (str, default '') – Missing data representation.
float_format (str, Callable, default None) – Format string for floating point numbers. If a Callable is given, it takes precedence over other numeric formatting parameters, like decimal.
columns (sequence, optional) – Columns to write.
header (bool or list of str, default True) – Write out the column names. If a list of strings is given it is assumed to be aliases for the column names.
index (bool, default True) – Write row names (index).
index_label (str or sequence, or False, default None) – Column label for index column(s) if desired. If None is given, and header and index are True, then the index names are used. A sequence should be given if the object uses MultiIndex. If False do not print fields for index names. Use index_label=False for easier importing in R.
mode ({{'w', 'x', 'a'}}, default 'w') –
Forwarded to either open(mode=) or fsspec.open(mode=) to control the file opening. Typical values include:
’w’, truncate the file first.
’x’, exclusive creation, failing if the file already exists.
’a’, append to the end of file if it exists.
encoding (str, optional) – A string representing the encoding to use in the output file, defaults to ‘utf-8’. encoding is not supported if path_or_buf is a non-binary file object.
compression (str or dict, default 'infer') –
For on-the-fly compression of the output data. If ‘infer’ and ‘path_or_buf’ is path-like, then detect compression from the following extensions: ‘.gz’, ‘.bz2’, ‘.zip’, ‘.xz’, ‘.zst’, ‘.tar’, ‘.tar.gz’, ‘.tar.xz’ or ‘.tar.bz2’ (otherwise no compression). Set to
Nonefor no compression. Can also be a dict with key'method'set to one of {'zip','gzip','bz2','zstd','xz','tar'} and other key-value pairs are forwarded tozipfile.ZipFile,gzip.GzipFile,bz2.BZ2File,zstandard.ZstdCompressor,lzma.LZMAFileortarfile.TarFile, respectively. As an example, the following could be passed for faster compression and to create a reproducible gzip archive:compression={'method': 'gzip', 'compresslevel': 1, 'mtime': 1}.May be a dict with key ‘method’ as compression mode and other entries as additional compression options if compression mode is ‘zip’.
Passing compression options as keys in dict is supported for compression modes ‘gzip’, ‘bz2’, ‘zstd’, and ‘zip’.
quoting (optional constant from csv module) – Defaults to csv.QUOTE_MINIMAL. If you have set a float_format then floats are converted to strings and thus csv.QUOTE_NONNUMERIC will treat them as non-numeric.
quotechar (str, default '"') – String of length 1. Character used to quote fields.
lineterminator (str, optional) – The newline character or character sequence to use in the output file. Defaults to os.linesep, which depends on the OS in which this method is called (’\n’ for linux, ‘\r\n’ for Windows, i.e.).
chunksize (int or None) – Rows to write at a time.
date_format (str, default None) – Format string for datetime objects.
doublequote (bool, default True) – Control quoting of quotechar inside a field.
escapechar (str, default None) – String of length 1. Character used to escape sep and quotechar when appropriate.
decimal (str, default '.') – Character recognized as decimal separator. E.g. use ‘,’ for European data.
errors (str, default 'strict') – Specifies how encoding and decoding errors are to be handled. See the errors argument for
open()for a full list of options.storage_options (dict, optional) – Extra options that make sense for a particular storage connection, e.g. host, port, username, password, etc. For HTTP(S) URLs the key-value pairs are forwarded to
urllib.request.Requestas header options. For other URLs (e.g. starting with “s3://”, and “gcs://”) the key-value pairs are forwarded tofsspec.open. Please seefsspecandurllibfor more details, and for more examples on storage options refer here.self (DataFrame)
filepath_or_buffer (str | PathLike[str] | BaseBuffer)
- Returns:
If path_or_buf is None, returns the resulting csv format as a string. Otherwise returns None.
- Return type:
None or str
pyphot.io.fits module#
Module for reading and writing FITS files
Important
This module relies on astropy.io.fits
- fits_generate_hdu(df, index=True)[source]#
Generate a FITS BinTableHDU from a DataFrame.
- Parameters:
df (pd.DataFrame) – The DataFrame to convert.
index (bool, optional) – Whether to include the index in the table, by default True.
- Returns:
The generated HDU.
- Return type:
BinTableHDU
- fits_generate_header(df)[source]#
Generate the corresponding fits Header that contains all necessary info
- Parameters:
df (pd.DataFrame) – DataFrame or HeaderInfo instance
- Returns:
hdr – header instance
- Return type:
fits.Header
- fits_read_header(hdr)[source]#
Convert pyfits header into dictionary with relevant values
- Parameters:
hdr (pyftis.Header) – fits unit
- Returns:
headerinfo – extracted information from header
- Return type:
- fix_endian_issue(arr)[source]#
Fix endian issue in array which happens often when reading FITS files
- Parameters:
arr (ndarray[tuple[Any, ...], dtype[_ScalarT]] | Any)
- Return type:
ndarray[tuple[Any, …], dtype[_ScalarT]]
- from_fits(filename, extension_number=1)[source]#
Load a DataFrame from a FITS file.
- Parameters:
filename (str) – The path to the FITS file.
extension_number (int, optional) – The extension number to load, by default 1.
- Returns:
The loaded DataFrame and its header information.
- Return type:
Tuple[npt.NDArray, HeaderInfo]
- to_fits(df, filename, extension_number=1, header_info=None, output_verify='exception', checksum=False, index=True, overwrite=False, append=False, **kwargs)[source]#
Save a DataFrame to a FITS file.
- Parameters:
df (pd.DataFrame) – The DataFrame to save. header information taken from data.attrs or header_info if provided
filename (str) – The path to the FITS file.
extension_number (int, optional) – The extension number to save, by default 1.
header_info (Optional[HeaderInfo], optional) – Header information to save with the FITS file by default None and taken from data.attrs override data.attrs if provided
output_verify (str) – Output verification option. Must be one of
"fix","silentfix","ignore","warn", or"exception". May also be any combination of"fix"or"silentfix"with"+ignore",+warn, or+exception" (e.g. ``"fix+warn").checksum (bool, optional) – If True, adds both
DATASUMandCHECKSUMcards to the headers of all HDU’s written to the fileindex (bool, optional) – If True, includes the index in the FITS file. Default is True.
append (bool, optional) – If True, appends the DataFrame to the FITS file. Default is False.
overwrite (bool, optional) – If True, overwrites the DataFrame in the FITS file. Default is False.
**kwargs (dict) – Additional keyword arguments to pass to the FITS writer.
- Return type:
None
pyphot.io.hdf module#
Read and write HDF5 files with pytables preserving metadata (tables, https://www.pytables.org/)
Important
This module relies on pytables
- from_hdf5(filename, tablename=None, *, silent=True, **kwargs)[source]#
Generate the corresponding ascii Header that contains all necessary info
- Parameters:
filename (str) – file to read from
tablename (str) – node containing the table
silent (bool) – skip verbose messages
- Returns:
hdr – string that will be be written at the beginning of the file
- Return type:
str
- to_hdf5(df, filename, *, tablename=None, header_info=None, mode='w', append=False, **kwargs)[source]#
Write a pandas DataFrame to an HDF5 file.
- Parameters:
df (pd.DataFrame) – The DataFrame to write.
filename (str or tables.File or PathLike) – The filename or open HDF5 file to write to.
tablename (str, optional) – The name of the table to write to.
header_info (HeaderInfo, optional) – The header information to write. Default is to use from df.attrs
mode ({'r', 'w', 'a', 'r+'}, default 'w') – The mode to open the file in.
append (bool, default False) – Whether to append data to an existing file.
**kwargs – Additional keyword arguments to pass to tables.open_file.
- Raises:
Exception – If the HDF backend does not implement stream.
tables.FileModeError – If the file is already opened in a different mode.
ValueError – If something went wrong without much information from pytables.
- Return type:
None
pyphot.io.header module#
Defines the HeaderInfo class that contains the metadata of a file.
- class HeaderInfo[source]#
Bases:
objectExtracted information from FITS header
- __init__(header, alias, units, comments)#
- Parameters:
header (Dict[Hashable, Any])
alias (Dict[Hashable, str])
units (Dict[Hashable, str])
comments (Dict[Hashable, str])
- Return type:
None
- alias: Dict[Hashable, str] = <dataclasses._MISSING_TYPE object>#
Alias dictionary which contains potential mappings of data columns to aliases
- comments: Dict[Hashable, str] = <dataclasses._MISSING_TYPE object>#
Comments/description dictionary containing potential mappings of data columns to comments
- header: Dict[Hashable, Any] = <dataclasses._MISSING_TYPE object>#
Header dictionary containing any metadata from a file input
- units: Dict[Hashable, str] = <dataclasses._MISSING_TYPE object>#
Units dictionary containing potential mappings of data columns to units
pyphot.io.votable module#
VOTable parser for astronomical tabular data.
VOTable is the standard XML format for astronomical tabular data. This module implements a custom VOTableParser that uses XML parsing and not other dependencies. from_votable provides the standard interface of io operations (pandas.DataFrame, HeaderInfo)
- class VOTableParser[source]#
Bases:
objectA custom VOTable parser using XML parsing.
VOTable is the standard XML format for astronomical tabular data. This example shows how to parse the structure and extract data.
Initialize VOTable parser
- Parameters:
source (str or bytes) – Either a file path, URL, or XML string/bytes
is_url (bool) – If True, treat source as URL to fetch
- __init__(source, is_url=False)[source]#
Initialize VOTable parser
- Parameters:
source (str or bytes) – Either a file path, URL, or XML string/bytes
is_url (bool) – If True, treat source as URL to fetch
- from_votable(fname, *, table_index=0, is_url=False)[source]#
Read a VOTable file and return a pandas DataFrame and header information.
- Parameters:
fname (str, bytes, IOBase, PathLike) – The filename or file-like object to read.
table_index (int, optional) – The index of the table to read, by default 0.
is_url (bool, optional) – Whether the file is a URL, by default False.
- Returns:
A tuple containing the pandas DataFrame and header information.
- Return type:
Tuple[pd.DataFrame, HeaderInfo]