hdf5r

HDF5 for R Reloaded
R/Finance 2018

Mario Annau

June 1, 2018

What is HDF5

  • Store large amounts of data, e.g. tick data
  • Programming language independent file format
  • Partial I/O: Retrieve subsets of data into memory
  • High performance

Image Title

2 years ago …

  • Presentation of h5 at R/Finance 2016
  • Rcpp to interface HDF5 C++ API
  • Basic HDF5 features implemented

… 2 months later …

On June 21, 2016 Holger wrote:

… my name is Holger Hoefling, I have developed a new version of a wrapper library for hdf5 (R6 Classes, almost all function calls wrapped, full support for all datatypes including tables etc) …

And I replied:

On June 21, 2016 Mario wrote:

sounds interesting!

What’s different in hdf5r?

  • Automatic code generation against HDF5 C API
  • Usage of R6 (instead of S4) classes
  • Close connections during garbage collection
  • Broad coverage of low-level library features

Image Title

Other Considerations

Having many packages is nice…

… but some may be redundant?

Merging codebases

  • Maintain high-level interface and test cases from h5
  • Get low-level HDF5 support within R

Merge Git

On Oct 10, 2016 Holger wrote:

thanks - merged!

Exchange Time Series Data

Merge Git
import quandl
mydata = quandl.get("CHRIS/CME_ES1")
mydata.to_hdf("es1.h5", "ES1")

Conclusion

  • hdf5r provides broad coverage of HDF5 library features
  • Facilitates data exchange between languages

Outlook

  • Custom table class
  • API for dplyr
  • In-memory datasets
  • Fixes, performance improvements

https://CRAN.R-project.org/package=hdf5r https://github.com/hhoeflin/hdf5r