Working with R’s OOPs¶
Object-Oriented Programming can be achieved in R, but in more than one way. Beside the official S3 and S4 systems, there is a rich ecosystem of alternative implementations of objects, like aroma, or proto.
S3 objects are default R objects (i.e., not S4 instances) for which an attribute “class” has been added.
>>> x = robjects.IntVector((1, 3)) >>> tuple(x.rclass) ('integer',)
Making the object x an instance of a class pair, itself inheriting from integer is only a matter of setting the attribute:
>>> x.rclass = robjects.StrVector(("pair", "integer")) >>> tuple(x.rclass) ('pair', 'integer')
Methods for S3 classes are simply R functions with a name such as name.<class_name>, the dispatch being made at run-time from the first argument in the function call.
For example, the function plot.lm plots objects of class lm. The call plot(something) makes R extract the class name of the object something, and see if a function plot.<class_of_something> is in the search path.
This rule is not strict as there can exist functions with a dot in their name and the part after the dot not correspond to an S3 class name.
S4 objects are a little more formal regarding their class definition, and all instances belong to the low-level R type SEXPS4.
The definition of methods for a class can happen anytime after the class has been defined (a practice something referred to as monkey patching or duck punching in the Python world).
There are obviously many ways to try having a mapping between R classes and Python
classes, and the one proposed here is to make Python classes that inherit
Before looking at automated ways to reflect R classes as Python classes, we look at how a class definition in Python can be made to reflect an R S4 class. We take the R class lmList in the package lme4 and show how to write a Python wrapper for it.
Manual R-in-Python class definition¶
The R package lme4 is not distributed with R, and will have to be installed for this example to work.
First, a bit of boilerplate code is needed. We
import the higher-level interface and the function
rpy2.robjects.packages.importr(). The R class we want to represent
is defined in the
rpy2 modules and utilities.
import rpy2.robjects as robjects import rpy2.rinterface as rinterface from rpy2.robjects.packages import importr lme4 = importr("lme4") getmethod = robjects.baseenv.get("getMethod") StrVector = robjects.StrVector
Once done, the Python class definition can be written. In the first part of that code, we choose a static mapping of the R-defined methods. The advantage for doing so is a bit of speed (as the S4 dispatch mechanism has a cost), and the disadvantage is that a modification of the method at the R level would require a refresh of the mappings concerned. The second part of the code is a wrapper to those mappings, where Python-to-R operations prior to calling the R method can be performed. In the last part of the class definition, a static method is defined. This is one way to have polymorphic constructors implemented.
class LmList(robjects.methods.RS4): """ Reflection of the S4 class 'lmList'. """ _coef = getmethod("coef", signature = StrVector(["lmList", ]), where = "package:lme4") _confint = getmethod("confint", signature = StrVector(["lmList", ]), where = "package:lme4") _formula = getmethod("formula", signature = StrVector(["lmList", ]), where = "package:lme4") _lmfit_from_formula = getmethod("lmList", signature = StrVector(["formula", "data.frame"]), where = "package:lme4") def _call_get(self): return self.do_slot("call") def _call_set(self, value): return self.do_slot("call", value) call = property(_call_get, _call_set, None, "Get or set the RS4 slot 'call'.") def coef(self): """ fitted coefficients """ return self._coef(self) def confint(self): """ confidence interval """ return self._confint(self) def formula(self): """ formula used to fit the model """ return self._formula(self) @staticmethod def from_formula(formula, data = rinterface.MissingArg, family = rinterface.MissingArg, subset = rinterface.MissingArg, weights = rinterface.MissingArg): """ Build an LmList from a formula """ res = LmList._lmfit_from_formula(formula, data, family = family, subset = subset, weights = weights) res = LmList(res) return res
Creating a instance of
LmList can now be achieved by specifying
a model as a
Formula and a dataset.
sleepstudy = lme4.sleepstudy formula = robjects.Formula('Reaction ~ Days | Subject') lml1 = LmList.from_formula(formula, sleepstudy)
A drawback of the approach above is that the R “call” is stored,
and as we are passing the
(and as it is believed to to be an anonymous structure by R) the call
is verbose: it comprises the explicit structure of the data frame
(try to print lml1). This becomes hardly acceptable as datasets grow bigger.
An alternative to that is to store the columns of the data frame into
the environment for the
Formula, as shown below:
sleepstudy = lme4.sleepstudy formula = robjects.Formula('Reaction ~ Days | Subject') for varname in ('Reaction', 'Days', 'Subject'): formula.environment[varname] = sleepstudy.rx2(varname) lml1 = LmList.from_formula(formula)
Automated R-in-Python class definitions¶
The S4 system allows polymorphic definitions of methods, that is, there can be several methods with the same name but different number and types of arguments. (This is like Clojure’s multimethods). Mapping R methods to Python methods automatically and reliably requires a bit of work, and we chose to concatenate the method name with the type of the parameters in the signature.
Using the automatic mapping is very simple. One only needs to declare a Python class with the following attributes:
|__rname__||mandatory||the name of the R class|
|__rpackagename__||optional||the R package in which the class is declared|
from rpy2.robjects.packages import importr stats4 = importr('stats4') from rpy2.robjects.methods import RS4Auto_Type # use "six" for Python2/Python3 compatibility import six class MLE(six.with_metaclass(RS4Auto_Type)): __rname__ = 'mle' __rpackagename__ = 'stats4'
The class MLE just defined has all attributes and methods needed to represent all slots (attributes in the S4 nomenclature) and methods defined for the class when the class is declared (remember that class methods can be declared afterwards, or even in a different R package).