Pipeline is an incubation effort to mimic LINQ-to-Objects to a great extent, with few differences such as Pipeline supports composition in a way that LINQ doesn’t, as a result of which you can reuse a query written in Pipeline with as many data source as you want. In LINQ, that is not possible.

Other differences arise from their semantics and the naming convention which is explained in the next section. Pipeline mixes quite well with Expression library. In fact, they’re so intertwined together that they are put in the same namespace called foam::composition.

LINQ vis-a-vis Pipeline

Pipeline believes in calling a spade a spade. For example, Pipeline calls a filter a filter, whereas LINQ calls a filter a Where due to reification of the idea of filtering. Likewise, Pipeline calls a transform a transform instead of Select and accumulate instead of Aggregate.

Composability

Consider this LINQ query,

var names = participants                           //bind to source
           .Where(p => p.Age >= 13 && p.Age <= 19) //filtering
           .Select(p => p.Name)                    //transformation

//Or using query-operators
var names = from p in participants                //bind to source
            where p => p.Age >= 13 && p.Age <= 19 //filtering
            select p.Name                         //transformation

In this snippet, the query is bound to participants which is a collection of person. If you want names from another collection of person, then you’ve to write the whole query again. Pipeline avoids this problem by allowing you to compose different pieces of sub-query to form your query, without requiring you to bind the resulting query to a collection of entities. This enables reuse, because once you’ve your unbounded query, you can bind it to a collection of your choice later on.

The equivalent of above LINQ query in Pipeline would be this,

auto names =  from (participants)                                      //bind to source
           | filter([](person p) { return p.age >= 13 && p.age <=19 }) //filtering
           | transform([](person p) { return p.name; } );              //transformation

Now after removing the source from the above query (i.e the from clause), what we’re left with is, reusuable query, as it is not bound to any data source. This is what I mean:

auto upipeline = filter([](person p) { return p.age >= 13 && p.age <=19 }) //filtering
               | transform([](person p) { return p.name; });               //transformation

As we can see, there is no participants collection which the upipeline is bounded to; the result ‘upipeline’ is referred to as an unbounded pipeline. Once we’ve unbounded pipeline, we can reuse it as:

auto names1 = from(participants) | upipeline; //bind to a collection, then execute the query.
auto names2 = from(students) | upipeline;     //bind to a different collection and execute.
Power of unbounded pipelines

Unbounded pipelines are very flexible; they are essentially functors. So whatever you can do with functors, you can do with unbounded pipelines. For example, we can rewrite the above code as,

auto names1 = upipeline(participants);  //execute for participants
auto names2 = upipeline(students);      //execute for students

That is a powerful feature, because you can pass around upipeline to functions which accept functor and execute it passing different collections. Here are few interesting examples,

//a function template which accepts any callable entity : function pointers or functors.
template<typename Functor>
void f(Functor && get_names)
{
   auto participantNames = get_names(get_participants());  
   auto studentNames     = get_names(get_students());      
   //...
}

std::vector<string> get_names(std::vector<person> const & persons);

f(get_names); //pass function pointer.

f(upipeline); //pass the unbounded pipeline!
f(upipeline | take(4) ); //heck! you can do even this!

//we can do these too : create sub-query/unbounded pipeline/composable functors!
auto name_selector = transform([](person p) { return p.name; }); 
auto teen_filter = filter([](person p) { return p.age >= 13 && p.age <=19 }); 

f(name_selector);               //get name of ALL persons (since no filter is applied!)
f(teen_filter | name_selector); //get name of teenagers only (filter is applied!) 

The last example shows we can apply filter later on also. In general, we can compose and apply different kinds of filter, transform, and other pipes on the fly!

Semantics

There are many semantical differences between LINQ and Pipeline, the most interesting being between LINQ’s Range() and Pipeline’s range(). Consider generating the following sequence of natural numbers:

[5, 6, 7, 8, 9, 10, 11, 12]

In LINQ, you would do this:

var numbers = Enumerable.Range(5,8); //gives [5, 6, 7, ...12]

What is the second argument? Why is it 8?

While the first argument is the first value in the sequence, the second argument is the “number of sequential integers to generate” as per the documentation. The sequence has 8 numbers, therefore the second argument is 8. It is pretty much straightforward. So far so good.

Now considering generating this sequence (which is reverse of the previous one):

[12, 11, 10, 9, 8, 7, 6, 5]

LINQ has problem with it. It cannot generate it! Instead you have to generate the sequence in increasing order, then reverse it:

var numbers = Enumerable.Range(5,8).Reverse(); //gives [12, 11, 10, ...5]

That is most certainly not efficient. If it has to be in decreasing order, why does it not generate in that order to begin with?

Pipeline’s range solves this problem by interpreting the second argument differently. The second argument is not the “number of sequential integers to generate” anymore. It is rather past-the-last element to generate. That is,

  • If the number to generate is 12 in an increasing sequence, then the second argument is 13.
  • If the number to generate is 5 in a decreasing sequence, then the second argument is 4.

So in Pipeline, the code would be this:

auto increasingNumbers = range(5,13); //gives [5, 6, 7, ...12]
auto decreasingNumbers = range(12,4); //gives [12, 11, 10, ...5]

which is quite elegant and concise.

Even mathematically speaking, the range [5,6,7,...12] would be represented by the following conventions:

a)  [5,12] i.e 5 <= i <= 12
b)  [5,13) i.e 5 <= i < 13
c)  (4,13) i.e 4 < i < 13
d)  (4,12] i.e 4 < i <= 12

While all are valid and widely-used conventions, Pipeline has adopted the second convention i.e [5,13), and there is a reason to prefer this over the others! Edsger W. Dijkstra has given an excellent argument as to why the second convention is to be preferred, in this tiny article:

which is a must-read.

Note that LINQ doesn’t follow any of these conventions.

Deferred Execution

Like LINQ, Pipeline also supports deferred execution. It can be best understood using an example,

std::vector<int> xs {23,42,89,34};
auto results = from(xs)
	      | filter([](int i) {
			  std::cout << "apply filtering on " << i << std::endl;
			  return i < 50 ;
		      })
	      | transform([](int i) { 
			  std::cout << i << " being transformed to " << (i*10) << std::endl; 
			  return i*10;
  		      });

std::cout << "BEFORE LOOP" << std::endl;
for(auto && r : results )
   std::cout << "r = " << r << std::endl;

Without deferred execution, the output would have been this:

apply filtering on 23
apply filtering on 42
apply filtering on 89
apply filtering on 34
23 being transformed to 230
42 being transformed to 420
34 being transformed to 340
BEFORE LOOP
r = 230
r = 420
r = 340

But the actual output is this:

BEFORE LOOP
apply filtering on 23
23 being transformed to 230
r = 230
apply filtering on 42
42 being transformed to 420
r = 420
apply filtering on 89
apply filtering on 34
34 being transformed to 340
r = 340

Note “BEFORE LOOP” which marks the beginning of the actual output. It shows that the pipeline is not executed before the for loop even though it is created before it. As soon as it enters into the loop and iterates over the results, the pipeline starts executing in a back-and-forth manner. No iteration means no execution, no actual filtering, no actual transformation. Furthermore, only as many items go through the pipeline and get processed as the iteration asked for.

Pipeline + Expression = Elegance

Consider this code which uses lambda,

std::vector<int> v {1,2,3,4,5,6,7,8,9,10};

auto evens = from(v)
           | filter ( [](int i) { return i % 2 == 0; } ); //filter even integers

auto triples = from(v)
             | filter ( [](int i) { return i % 2 == 0; } ); //filter even integers
             | transform ( [](int i) { return i *3 ; } );   //then triple them

Using Expression, the above code would become,

expression<int> x; //declare a variable!

auto evens = from(v) 
           | filter(x % 2 == 0); //filter even integers

auto triples = from(v) 
             | filter(x % 2 == 0)  //filter even integers
             | transform(x * 3);   //then triple them.

which is quite concise and clean.

Since Expression works with class members also, so we can convert this code,

auto names =  from (participants)                                       //bind to source
            | filter([](person p) { return p.age >= 13 && p.age <=19 }) //filtering
            | transform([](person p) { return p.name; } );              //transformation

into this,

//assume you've these variables (one-time creation)
auto age = memexp(&person::age);    //create member expression for person::age
auto name = memexp(&person::name);  //create member expression for person::name

auto names =  from (participants)             //bind to source
            | filter( age >= 13 && age <=19 ) //filtering
            | transform( name );              //transformation

That is effectively concise and clean, since once you have age and name, you can reuse them in your entire project.


Documentation