Friday, May 6, 2011

C++ Iterators Considered Harmful?

At the Boost library conference today, Andrei Alexandrescu author of the book Modern C++ Design and the Loki C++ library, spoke about why iterators are bad, and he had a better solution.

I tried to read the presentation slides, but could not get much out of them. I have these questions for the StackOverflow community:

  1. Are iterators bad?
  2. Is his replacement really better?
  3. Will C++ implementators pick up his ideas?

http://www.boostcon.com/site-media/var/sphene/sphwiki/attachment/2009/05/08/iterators-must-go.pdf

From stackoverflow
  • I agree with him that iterators are mostly inferior to ranges, and I don't know if 'something better' will get picked up.

    "The good is the enemy of the best" is strongly at play here, as it usually is. Iterators are useful and firmly entrenched, so it's hard to know if something better like ranges can supplant them in a reasonable amount of time.

  • I think C++ implementors will have their hands full producing full working support for C++0x, without implementing new, non-standard paradigms.

    Unknown : Aren't ranges standard in some other languages though?
    anon : So what? Lots of things are available in other languages that are not, and probably never will be, avaiolable in C++.
    1. Sometimes
    2. Probably
    3. Not likely, at least not for many years
    Binary Worrier : Almost a haiku, well done twentieth century boy, give us more?
  • First, to answer your questions:

    1. No. In fact, I argued elsewhere that iterators are the most important/fundamental concept of computer science ever. I (unlike Andrei) also think that iterators are intuitive.
    2. Yes, definitely but that shouldn't come as a surprise.
    3. Hmm. Looking at Boost.Range and C++0x – haven't they already?

    Andrei's big contribution here is just to say: drop the concept of iterators altogether, see ranges not just as a convenience wrapper but rather as a core construct. Other languages have already done this (much of Andrei's concepts just echo .NET's LINQ or Python's iterators) but they all only offer output ranges. Andrei argues for different types of ranges, much like the conventional iterator categories.

    In that light, it's odd that he starts by mocking the arbitrariness of these iterator categories.

    I also think that his examples are off, especially his file copying: yes, the iterator variant is a huge improvement over the 1975 code. It reduces a loop with complicated break condition down to one statement. What he's really taking issue with here is just the syntax. Well, excuse me: we're talking about C++ here – of course the syntax is ugly. And yes, using ranges here is an improvement – but only syntactically.

    I also think that Andrei's find implementation is off. What he really defines there is the DropUntil operation (naming is hard!) from LINQ. The find operation should really return either one or zero elements (or an iterator!). Shunning iterators here isn't helpful in my opinion since we might want to modify the value directly instead of copying it. Returning a one-element range here only adds overhead without a benefit. Doing it Andrei's way is bad because then the name of the method is just wrong and misleading.

    That said, I essentially agree with Andrei in almost all points. Iterators, while being my pet concept from computer science, are certainly a big syntactical burden and many ranges (especially infinite generators) can (and should) be implemented conveniently without them.

    James Hopkin : +1: Not sure I agree with everything here, but very interesting to read. Almost as provocative as the great Alexandrescu himself :-)
    wilhelmtell : That's why I love programming more than math. There are opinions and we get to argue. Much more interesting this way. Also: the stream iterators solved the infinite range problem, so it IS possible. Of course, with an extra member variable for the default constructor signaling the end iterator does impose an overhead, as compared to a `while( true )` loop, but how significant is this overhead?
    Konrad Rudolph : @wilhelmtell: You’re right, infinite ranges are absolutely expressible via iterators but the syntax is cumbersome since we have to define some arbitrary “end” iterator whose basic semantics is just to be unequal to all other iterators. That’s quite artificial. I don’t believe that the runtime overhead is significant here – possibly non-existent (no idea really). I just find it weird to call an operation with a start and an end iterator when in reality there’s no end.
  • I think we should use ranges next to iterators, i.e. we should choose the evolution way, not revolution way.

  • C++0x is already making the first steps:

    • rvalue references solve some problems with treating containers as ranges
    • ranges have been added to the core library, including range concepts

    Transitioning to ranges without losing any iterator functionality (think of all the combinations of iterator categories, const-ness and rvalue-ness) is hard, especially if you try to factor in infinite and mutable ranges.

    1. Most of us make a simple use of them in what have become well known idioms, like in for loops to iterate through an std::vector. A developer reads it and knows what's going on. In our everyday coding life, iterators are not good or bad, they're just "what gets the job done".
    2. Probably, yes.
    3. I don't think so.
  • Andrei at times can be a bit provocative. Iterators are a reasonable concept, and quite fundamental in the sense that bits are. But just like most bits in C++ are not bools, but part of larger types,most iterators should be dealt with at a high level. Andrei is right that the proper level to do so is the range object. But not all ranges are properly exposed as iterator ranges, as the istream_iterator sentinel shows. That's just a hack to create an artificial end iterator. I don't think his ideas will be picked up by implementations, though. C++1x will be as relevant as C99.

  • The only argument I can see from that presentation is the inability to define ranges, and the c++0x "Range for statement" proposal seems to eliminate that problem to some extent anyway. maybe it shouldn't be an argument about if iterators should / shouldn't be used at all, but more what situations should / shouldn't they be used for?

  • Like any API or function, if misused can create many problems of identification difficult. Iterators have used in many projects, but always maintaining the necessary care required according to their characteristics. Its use should be preceded by a good understanding of their limitations. Iterators can be very useful if user properly.
    This questions are related :
    Is there any way to check if an iterator is valid?
    Should I prefer iterators over const_iterators?

  • I disagree with both Andrei and Konrad and myself :-)

    The most fundamental concept is an interface not an iterator and that is pretty obvious in any work anyone does today (which is all about cross-library, cross-language, cross-compiler, cross-OS, cross-platform, you cross-name it :-)

    Neither iterator or range (apart from source-level use) offer anything more than a clean and simple, non intrusive or intrusive, non shared or shared, non unique or unique: pointer ! Clean pointer to typed data is simply put universal and you can make data mutable or immutable and many other things. All interface is is just another level of indirection to it while still being friendly to machine and compiler of all sorts, plus far safer, relegating iterators and range usage to an implementation detail.

    To that extent IEnumerable and IQueryable do the half 'right thing' TM but they are clearly inferior in their concepts of iteration and much more to what you can do with STL, retain control and so on and on (but otoh, they have better metadata and hence a better, cleaner model). Point being with interfaces you can build any abstraction you want and satisfy, well probably contraversial but essentially a no-brainer: optimal, and runtime or compile-time neutral data representation and code (heck essential to algorithms and compilers and VMs and what not).

    It is even possible to optimise it for 'dynamic'/component systems down to 'runtime' inlining (screw HotSpot VM:-).. To that extent, the advance to 1975 is minimal as evident by a huge interop industry workload (it's everywhere you look, including this site, its use of proprietary and open tech, etc; in computer science idealism, well, this type of interfacing 'work' should not exist should it)..

    Konrad Rudolph : The point is that you need a *well-defined* interface (or a set thereof). So the question is really: *which* interface to use? The one offered by iterators? Ranges? …
    rama-jka toti : Completely agree.. imho, something that won't ever be technology or single-idea specific is the way forward, domain and model driven tools will have the same tough time there. Plus the world is such it enforces one proprietary solution over another for vendor lock in especially with VMs and their own ideas of integration (market share). But it all still favours K&R idea as minimal and complete, much like templates and concepts in C++ push forward, or indeed metadata of execution environments help another aspect of 'reshape'/reuse. Protocols are simply hard to do and follow.
    rama-jka toti : On a side note, what's worst, it makes tweats like Linus right in C++ being a moving target for almost fundamental abstraction change that impacts client code considerably (or to point of ease-of-breakage). Then again, Linux kernel is in such a state it wouldn't be suitable for anything but OS and down-to-metal building (which is sufficient I guess but still inflexible for future). Not that other OS-es are not copying and reinventing *nix, ie. W6.0, W7 especially, but modularity doesn't come from dependency breaking solely.[someone fix the crappy ASP.NET HTML constraints for this site PLEASE]
    rama-jka toti : Reuse goes beyond module/assembly and OO here and it applies to protocols and mechanisms that are effectively algorithms or algorithm friendly (CLR isn't; ie. numerics ). Sequences are a problem for every protocol with complex, non-primitive types, heck even simple arrays can exhibit similar issues. LINQ won't cut it everywhere as evident with continous streaming attempts too, it is as bloated and heavy as any high-level protocol.
    jalf : I don't think you answer the question. Yes, we need an interface. What should it look like? Iterators define an interface. So do ranges. But "interface" does not.
    rama-jka toti : Sure, no answer.. but what you certainly want to see is change of the interface in the model rathen than source. As it currently stands, any change simply breaks code and idioms. Something many libraries are popular for; all I want is to change it as say underlying hardware might prefer something else. Why should it be a specific interface? So somebody can come and say look that abstraction is now 'outdated' yet again? It is typical of C++ and will be typical of C++ reinventing itself which is what the question is about.
    rama-jka toti : Note that I'm not in favour of iterator or range or IEnumerable or IQueryable.. my argument is for neutrality and none of those give me anyone of it cross-language, cross-compiler, or anything similar. Which is just the sign of the times, you pick yours, someone else picks another one, depending on the phase of the moon. I'd prefer to be able to say I don't care whether it is an iterator or a range, but that might be just me..
  • Isn't Andrei trying to do some hidden marketing for the D language (currently he is working with it)...?

    Andrei states that containers are ok, but iterators are ugly, non-intuitive, error-prone and dangerous, hard to implement (well this last one seems to be rather true...) And what do we have in C++... pointers? Aren't they ugly/.../dangerous? But we happily embraced them and live with them.

    Which one is more intuitive to write:

    for(auto i=foo.begin();i!=foo.end();++i)
        bar(*i);
    

    or

    for (auto r=foo.all(); !foo.empty(); foo.popFront())
            bar(r.front());
    

    Iterators concept can be complemented with ranges and other ideas, but I think that they have their place and won't be replaced.

    Unknown : "Isn't Andrei trying to do some hidden marketing for the D language" I was wondering exactly the same thing.
    rama-jka toti : He is,and he is also behind some of the constructs that got far more refined.Boost is continously expanding and rewriting itself the compile-time-STL way but it does break interfaces and it brings complexity that kills with plenty of bugs.I don't care whether it is C++, D or C#, iterator or not. I just want maintainable code without going into that level of detail, just like I don't want to go into MPL in detail (it's a suicide, something machine translation should be doing on your intent an not a brain picking over syntax to grasp its meaning with usual keyboard-reflex :). They will 'go'.
    1. No, they are not bad, they are very clever idea in fact. However, they are not ideal and there is room for improvements in the concept of iterator.

    2. It solves number of real-life problems with iterators. For instance, it's tedious (also error prone) in many cases to query two separate objects, iterators, from a single containers and then pass them as still two separate objects to an algorithm. Why not to pass a single object around? Even std::pair<iterator, iterator> would make for a crude range which is easier to manipulate - one object, not two. Also, it's a good idea to consider a range is an iterator. That's in fact what Andrei suggests. By the way, some of these problems have been already solved by Boost.Range.

    3. I would expect it happened, but it will not be a revolution, rather evolution.

1 comments:

Simon Matthew said...

Splendid! That was a fortune list of suggestions offered in the conversation. I understood how you had shown concern for enhancing your skill sets as a full-stack engineer. It definitely helps at improving the skills to have a promising career. The emerging platform, such as Eiliana.com, offers a wide range of projects to interested freelancers to work upon and accentuate their skill sets.

Post a Comment