When I wanted to implement a python iterator from C using the Python C api, there was almost no information on it. At a time when we move to using more languages - to levarage each’s specific strengths - it is crucial to have a clear and approachable interop story for Python.
In this article, I implement a custom iterator for Freestyle - a stylized edge render engine that is integrated into Blender. The example should generalize. In addition, I cover bridging between Python and C++ iterators.
Primer: Python iterators
For an object to work as an iterator, it has to implement the iterator protocol. The standard way of implementing this protocol is by providing a __iter__
and a __next__
method. The Python for you and me book gives a basic example implementation:
class Counter(object):
def __init__(self, low, high):
self.current = low
self.high = high
def __iter__(self):
'Returns itself as an iterator object'
return self
def __next__(self):
'Returns the next value till current is lower than high'
if self.current > self.high:
raise StopIteration
else:
self.current += 1
return self.current - 1
In this example, __iter__
just returns self
. This means that the object is its own iterator. Some types - like list and tuple - have special iterator types: list
only implements __iter__
, list_iterator
implements __next__
. Having a separate iterator type keeps the iteration logic from mixing with other bits of your data type, and may lend itself to more optimization.
So, how is this implemented in C?
The Python C Api
Python types are declared in C as a PyTypeObject: a large struct of fields that store methods and attributes of the object. Examples are implementations of hash
, print
, repr
and setattr
. In python, there are standard implementations of these functions for new objects. C instead is very explicit, so all of these methods have to be specified or declared 0. The PyTypeObject struct has the fields tp_iter
and tp_iternext
, that correspond to the __iter__
and __next__
methods. These are the functions that need to be implemented in order for an object to be iterable.
PyTypeObject StrokeVertexIterator_Type = {
PyVarObject_HEAD_INIT(NULL, 0)
"StrokeVertexIterator", /* tp_name */
sizeof(BPy_StrokeVertexIterator), /* tp_basicsize */
...
(getiterfunc)StrokeVertexIterator_iter, /* tp_iter */
(iternextfunc)StrokeVertexIterator_iternext, /* tp_iternext */
...
};
The type declaration of StrokeVertexIterator, taken from the Blender/Freestyle code base
So the Python wrapper for a C structure needs
- Iterator initialisation: This can be a
return self
, but you (most likely) do need to increment the reference counter. - Stepping function: A function that advances the iterator to the next state. Below is an example that also shows how to signal
StopIteration
from C.
In the following sections, I will describe how C++ iterators can be exposed to Python.
Python iterators vs. C++ iterators
As it turns out, there is a big difference between C++ and Python iterators.
Python
__iter__
returns an iterator object. The first object of the iterable is not yet available.__next__
returns the next object of the iterator, or raisesStopIteration
when there is none.
C++
- upon initialization, the first object is immediately available.
advance
modifies the iterator, the next object is now available. If there is no next object, the application will segfault if you go look for it.
This seems pretty compatible, but there is a major problem: The initialization does not produce the same result. In C++, the first object of the iterator is available after initialisation, in Python it is not. A naive bridge doesn’t synchronise the two, which will (and in the case of Freestyle, did) cause problems.
My fix is to add a field at_start
to the Python wrapper, that is set to true when the iterator is created. When the python iterator is incremented and at_start == true
, at_start
is set to false and an object is returned but the C++ structure is not incremented. The C++ and Python iterator are now in sync.
Freestyle’s iterators
The most commonly-used (at least from Python) Freestyle iterator is the StrokeVertexIterator, that walks over the vertices of a Stroke object. Like for built-in types, it is more efficient/convenient to have a separate iterator type. This means that Stroke only has an __iter__
method, and StrokeVertexIterator has both an __iter__
and a __next__
method.
the Python StrokeVertexIterator type is an interface for the C++ StrokeVertexIterator type, which is at its core a standard C++ iterator over a dequeue.
__iter__
for StrokeVertexIterator and Stroke
static PyObject *StrokeVertexIterator_iter(
BPy_StrokeVertexIterator *self)
{
// bookkeeping for the garbage collector
Py_INCREF(self);
// synchronize Python and C++ iterator
self->at_start = true;
// C equivalent of 'return self'
return (PyObject *) self;
}
static PyObject *Stroke_iter(PyObject *self)
{
// create new StrokeVertexIterator pointing to the first vertex in the stroke
StrokeInternal::StrokeVertexIterator sv_it( ((BPy_Stroke *)self)->s->strokeVerticesBegin() );
// wrap it in a python type
return BPy_StrokeVertexIterator_from_StrokeVertexIterator(sv_it, false);
}
The __iter__
function takes one argument of the type that the iterator is running over, and returns a PyObject *
of the iterable type (in this case, that is the same type).
__next__
for StrokeVertexIterator, edited slightly for clarity
static PyObject *StrokeVertexIterator_iternext(
BPy_StrokeVertexIterator *self)
{
// Note that 'self' is a python object, self->sv_it is a C++ object
/* If sv_it.isEnd() is true, the iterator can't be incremented.*/
if (self->sv_it->isEnd()) {
PyErr_SetNone(PyExc_StopIteration);
return NULL;
}
/* If at the start of the iterator, only return the object
* and don't increment, to keep for-loops in sync */
else if (self->at_start) {
self->at_start = false;
}
/* If sv_it.atLast() is true, the iterator is currently
* pointing to the final valid element. Incrementing it
* further would result in an invalid state. */
else if (self->sv_it->atLast()) {
PyErr_SetNone(PyExc_StopIteration);
return NULL;
}
else {
self->sv_it->increment();
}
StrokeVertex *sv = self->sv_it->operator->();
return BPy_StrokeVertex_from_StrokeVertex(*sv);
}
In general, iternext takes a reference of the iterator type, and returns a PyObject
of the iterated over type while leaving the iterator in an incremented state. In this case, the Python/C++ disalignment has to be taken care of.
If the C++ iterator is * exhausted, the Python iterator should be too. * at its first element, the object should be returned, but the iterator not incremented. * pointing to the last element, the python iterator is exhausted.
The Python iterators start in an ‘nothing’ state, and thus have to be incremented before the first iteration. So: Python iterators are incremented before the body runs, C++ iterators after the body. During the iteration this makes little difference, but for the first and last iteration this difference is crucial.
Conclusion
Different programming languages have different ideas of how iterators should work. Beware of these differences when interfacing between languages.
On a more general note: The Python C api is actually quite nice to work with, when the infrastructure has been setup. For a more lightweight approach, interfacing with C (or Rust, C++, …) can be done with the cffi library.