utils Package

Intro

copyright:2010 Marcelo Lima
license:BSD-3-Clause

Data helper

This module is responsible to read the .fits files and return the readed data as a Curve objects, which is a more suitable format to read the data. Or one might choose to only receive the data in a simple type as possible such as lists and dictionaries, thus by choosing this method, one will loose the hability to use the inherited computing pre processing features from the Curve format.

class utils.data_helper.Reader

Bases: object

The reader module is responsible for reading the raw .fits files and provide the data in a more friendly data structure for the python environment. It supports both unique .fits files such as a batch processing approach, using folders filled with several .fits files.

As simple as it is to call a class, the reader package has no usage complication. For one to import and use it’s methods, it is only necessary to do:

import utils.data_helper as dh
myReader = dh.Reader()

Retrieved data structure

One must understand that the retrieved data will be or Curve objects or they will be composed of simple list and dict outputs. All methods have a particular parameter, with default simple_out=False thence returning the data as Curve. If one defines the parameter simple_out=True, one will receive the obtained data as dictionaries of indexed curves in lists, which is not advisable.

New in version 0.1: The from_file and from_folder methods where added.

from_file(path=None, label=None, index=None, feature=None)

Get the white light curve from the provided .fits file, and label this curve with the provided label (or with the folder name, if label=None) and return the data_sctruc.Curve.

Parameters:
  • path (str) – Paths to the desired .fits file
  • label (str) – The label for the returned curve
  • index (int) – The HUD table index
  • feature (str) – The light curve specific feature name
Returns:

The specific feature of light curve in more suitable format

Return type:

data_struc.Curve

from_folder(folder=None, label=None, index=None, items=None)

Get the white light curve of each .fits file presented inside the provided folder, label each one or with the provided label, or with the folder name, and return a list with one data_struc.Curve variable for each .fits file.

Parameters:
  • folder (str) – Paths to the folder with the .fits files
  • label (str) – The label for the returned light curves
  • index (int) – The HUD table index
  • items (int) – The amount of random files to read from folder
Returns:

A list of data_struc.Curve

Return type:

list

from_txt()
list_features(folders=None)

Get only the features list, for each .fits curve file, inside the folders list.

Parameters:folders (list) – Paths to the folders with the .fits
Returns:The list with features available for each .fits found
Return type:list

Data structure

This module gathers all the custom data structures created to replace the .fits format provided by the raw data-set.

It represents the direct interface from the complex representation of the astropy.table modules, to simple array python variables.

class utils.data_struc.Curve(hdu=None, index=None, label=None)

Bases: object

This object maintain the light curve HUD table, and represents an interface to get data from the HUD tables of the astropy library and the python environment. Therefore, it just represent a simple interface to transcript the HUD table informations to simple python variables such as dict, list and array.

For example to get create and extract the data from a curve object the user just need to:

import utils.data_struc as ds
curve = ds.Curve(hdu=hduTable, index=hduTableIndex, label=curveLabel)
feature = curve['FEATURE NAME']

By using the Reader this is even more transparent. Here you need to provide

Parameters:
  • hdu (astronomy.table.Table) – The HDU table from astronomy library
  • index (int) – The desired HDU table index to be used
  • label (str) – The desired label for this light curve

And then, by just argumenting the column name of the table, the user can get the values of the column as a ndarray variable.

Advantages

One must question why use this particular structure instead of just using the astropy.table.Table objects. But, since in most time we just use the last HDU table (usually index=2), this structure automatically removed all other tables and just save the desired one in a memory map format (that has a better performance).

This approach is memory efficient because, the information removed is usually something close to 10 times bigger than the used one… that is, the first and second table are at least ten times bigger than the third table (wich is the most used one, index=2). Therefore, for this application, this approach is actually efficient when compared to just loading the data and dealing with the information in astropy tables.

New in version 0.1: The from_file and from_folder methods where added.

index_tables(index=None)

Handly index the HDU table from the Curve object, by replacing the raw_table variable with only the desired HDU table.

Parameters:index (int) – The index of the desired HDU table
julian_to_stdtime()

Change the julian date variable of the HDU tables to standard time representations.

Visualization

This module simplifies the usage of the bokeh library by reducing the amount of code to plot some figures. For example, without this library, to plot a simple line with the bokeh library, one must do something like:

from bokeh.palettes import Magma
from bokeh.layouts import column
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource
from bokeh.io import output_notebook, push_notebook

p = figure( 
    title="Some title",
    plot_width=400,
    plot_height=600)
    
# Style the figure image
p.grid.grid_line_alpha = 0.1
p.xgrid.band_fill_alpha = 0.1
p.xgrid.band_fill_color = Magma[10][1]
p.yaxis.axis_label = "Some label for y axis"
p.xaxis.axis_label = "Some label for x axis"

# Place the information on plot
p.line(x_data, y_data,
        legend_label="My legend label",
        line_width=2,
        color=Magma[10][2],
        muted_alpha=0.1,
        line_cap='rounded')
p.legend.location = "right_top"
p.legend.click_policy = "disable"

show(p)

Wich, with the visual library one might do with just:

p = visual.line_plot(x_data, y_data,
                    legend_label='My legend label', 
                    title='Some title',
                    y_axis={'label': 'Some label for y axis'},
                    x_axis={'label': 'Some label for x axis'})
visual.show_plot(p)

Simple as that… It follows a defualt plotting style for each plot, presented in the /utils/configs/plot.yaml. And the user just need to pass the parameters that he want to change from this default style. It also provides some pre-computation to make more complex graphs, such as box plots, histograms and so on.

Note

This is just a library to simplify most plots used during the notebooks to not populate the study with unecessary code… The user can use any library desired to do this same plots.

Also the user can change the plot.yaml file to set any default plot style that one might want. Please, for that check out the file in /utils/configs/plot.yaml.

utils.visual.box_plot(score=None, labels=None, opts=None, **kwargs)

Create a Bokeh figure object and populate its box plot properties with the provided information. To create a box plot, you actually need to combine two segment, a vbar, and a rect object from Bokeh. This method already does that for you. It also already computes the statistical median, mean and each quantile.

Parameters:
  • score (list) – The list with all values of the distributions
  • labels (list) – The list with the group label for each value of score
  • opts (dict) – The desired options of the plot.yaml in dictionary format
  • kwargs – The desired options of the plot.yaml in directive format
Returns:

A Bokeh figure object with the box plot necessary properties filled

Return type:

bokeh.Figure

utils.visual.handle_opts(default, provided)

Merge the default (set by the plot.yaml file) and user provided plot options into one dictionary.

Parameters:
  • default (dict) – The default style guide dict from plot.yaml
  • provided (dict) – The user provided properties
Returns:

A dict with the merged default and provided plot options

Return type:

dict

utils.visual.hist_plot(hist=None, edges=None, opts=None, **kwargs)

Create a Bokeh figure object and populate its histogram plot propertie with the provided information. To create a histogram plot, you actually need to the correct properties of the quad object from Bokeh. This method already does that for you. It also already computes the correct values, and create the bins correctly.

Parameters:
  • hist (ndarray) – The hist output from numpy.histogram
  • edges (ndarray) – The histogram edges output from numpy.histogram
  • opts (dict) – The desired options of the plot.yaml in dictionary format
  • kwargs – The desired options of the plot.yaml in directive format
Returns:

A Bokeh figure object with the line properties filled

Return type:

bokeh.Figure

utils.visual.line_plot(x_data=None, y_data=None, opts=None, **kwargs)

Create a Bokeh figure object and populate its line propertie with the provided information.

Parameters:
  • x_data (ndarray) – The ndarray with x axis values
  • y_data (ndarray) – The ndarray with y axis values
  • opts (dict) – The desired options of the plot.yaml in dictionary format
  • kwargs – The desired options of the plot.yaml in directive format
Returns:

A Bokeh figure object with the line properties filled

Return type:

bokeh.Figure

utils.visual.multline_plot(x_data=None, y_data=None, opts=None, **kwargs)

Create a Bokeh figure object and populate a line object of the bokeh library for each line data provided in the y_data list parameter of this function.

Parameters:
  • x_data (list) – The list with a ndarray data for the x axis of each line
  • y_data (list) – The list with a ndarray data for the y axis of each line
  • opts (dict) – The desired options of the plot.yaml in dictionary format
  • kwargs – The desired options of the plot.yaml in directive format
Returns:

A Bokeh figure object with the line properties filled

Return type:

bokeh.Figure

utils.visual.show_plot(*args)

This function shows the figures provided as arguments by default, in a column manner.

Parameters:args – The bokeh.Figure objects to be show in a figure