Java Advent Calendar: CMS Pipelines ... for NetRexx on the JVM

Sunday	Monday	Tuesday	Wednesday	Thursday	Friday	Saturday
29	30	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31	1	2

This year I want to tell you about a new and exciting addition to NetRexx (which, incidentally just turned 19 years old the day before yesterday). NetRexx, as some of you know, is the first alternative language for the JVM, stems from IBM, and is free and open source since 2011 (http://www.netrexx.org). It is a happy marriage of the Rexx Language (Michael Cowlishaw, IBM, 1979) and the JVM. NetRexx can run compiled ahead of time, as .class files for maximum performance, or interpreted, for a quick development cycle, or very dynamic production of code. After the addition of Scripting in version 3.03 last year, the new release (3.04, somewhere at the end of 2014) include Pipes.

We know what pipes are, I hear you say, but what are Pipes? A Pipeline, also called a Hartmann Pipeline, is a concept that extends and improves pipes as they are known from Unix and other operating systems. The name pipe indicates an inter- process communication mechanism, as well as the programming paradigm it has introduced. Compared to Unix pipes, Hartmann Pipelines offer multiple input- and output streams, more complex pipe topologies, and a lot more, too much for this short article but worthy of your study.

Pipelines were first implemented on VM/CMS, one of IBM’s mainframe operating systems. This version was later ported to TSO to run under MVS and has been part of several product configurations. Pipelines are widely used by VM users, in a symbiotic relationship with REXX, the interpreted language that also has its origins on this platform. Pipes in the NetRexx version are compile by a special Pipes Compiler that has been integrated with NetRexx. The resulting code can run on every platform that has a JVM (Java Virtual Machine), including z/VM and z/OS for that matter. This portable version of Pipelines was started by Ed Tomlinson in 1997 under the name of njpipes, when NetRexx was still very new, and was open sourced in 2011, soon after the NetRexx translator itself. It was integrated into the NetRexx translator in 2014 and will be released integrated in the NetRexx distribution for the first time with version 3.04. It answers the eternal question posed to the development team by every z/VM programmer we ever met: "But ... Does It Have Pipes?" It also marks the first time that a non-charge Pipelines product runs on z/OS. But of course most of you will be running Linux, Windows or OSX, where NetRexx and Pipes also run splendidly.

NetRexx users are very cautious of code size and peformance - for example because applications also run on limited JVM specifications as JavaME, in Lego Robots and on Androids and Raspberries, and generally are proud and protective of the NetRexx runtime, which weighs in at 37K (yes, 37 kilobytes, it even shrunk a few bytes over the years). For this reason, the Pipes Compiler and the Stages are packaged in the NetRexxF.jar - F is for Full, and this jar also includes the eclipse Java compiler which makes NetRexx a standalone package that only needs a JRE for development. There is a NetRexxC.jar for those who have a working Java SDK and only want to compile NetRexx. So we have NetRexxR.jar at 37K, NetRexxC.jar at 322K, and the full NetRexx kaboodle in 2.8MB - still small compared to some other JVM Languages.

The pipeline terminology is a metaphore derived from plumbing. Fitting two or more pipe segments together yield a pipeline. Water flows in one direction through the pipeline. There is a source, which could be a well or a water tower; water is pumped through the pipe into the first segment, then through the other segments until it reaches a tap, and most of it will end up in the sink. A pipeline can be increased in length with more segments of pipe, and this illustrates the modular concept of the pipeline. When we discuss pipelines in relation to computing we have the same basic structure, but instead of water that passes through the pipeline, data is passed through a series of programs (stages) that act as filters. Data must come from some place and go to some place. Analogous to the well or the water tower there are device drivers that act as a source of the data, where the tap or the sink represents the place the data is going to, for example to some output device as your terminal window or a file on disk, or a network destination. Just as water, data in a pipeline flows in one direction, by convention from the left to the right.

A program that runs in a pipeline is called a stage. A program can run in more than one place in a pipeline - these occurrences function independent of each other. The pipeline specification is processed by the pipeline compiler, and it must be contained in a character string; on the commandline, it needs to be between quotes, while when contained in a file, it needs to be between the delimiters of a NetRexx string. An exclamation mark (!) is used as stage separator, while the solid vertical bar | can be used as an option when specifiying the local option for the pipe, after the pipe name. When looking a two adjaced segments in a pipeline, we call the left stage the producer and the stage on the right the consumer, with the stage separator as the connector.

A device driver reads from a device (for instance a file, the command prompt, a machine console or a network connection) or writes to a device; in some cases it can both read and write. An example of a device drivers are diskr for diskread and diskw for diskwrite; these read and write data from and to files. A pipeline can take data from one input device and write it to a different device. Within the pipeline, data can be modified in almost any way imaginable by the programmer. The simplest process for the pipeline is to read data from the input side and copy it unmodified to the output side. The pipeline compiler connects these programs; it uses one program for each device and connects them together. All pipeline segments run on their own thread and are scheduled by the pipeline scheduler. The inherent characteristic of the pipeline is that any program can be connected to any other program because each obtains data and sends data throug a device independent standard interface. The pipeline usually processes one record (or line) at a time. The pipeline reads a record for the input, processes it and sends it to the output. It continues until the input source is drained.

Until now everything was just theory, but now we are going to show how to compile and run a pipeline. The executable script pipe is included in the NetRexx distribution to specify a pipeline and to compile NetRexx source that contains pipelines. Pipelines can be specified on the command line or in a file, but will always be compiled to a .class file for execution in the JVM.

pipe ”(hello) literal ”hello world” ! console”

This specifies a pipeline consisting of a source stage literal that puts a string (“hello world”) into the pipeline, and a console sink, that puts the string on the screen. The pipe compiler will echo the source of the pipe to the screen - or issue messages when something was mistyped. The name of the classfile is the name of the pipe, here specified between parentheses. Options also go there. We call execute the pipe by typing:

java hello

Now we have shown the obligatory example, we can make it more interesting by adding a reverse stage in between:

pipe ”(hello) literal ”hello world” ! reverse ! console

When this is executed, it dutifully types “dlrow olleh”. If we replace the string after literal with arg(), we then can start the hello pipeline with a an argument to reverse: and we run it with:

java hello a man a plan a canal panama

and it will respond:

amanap lanac a nalp a nam a

which goes to show that without ignoring space no palindrome is very convincing - which we can remedy with a change to the pipeline: use the change stage to take out the spaces:

pipe”(hello) literal arg() ! change /” ”// ! console”

Now for the interesting parts. Whole pipeline topologies can be added, webservers can be built, relational databases (all with a jdbc driver) can be queried. For people that are familiar with the z/VM CMS Pipelines product, most of its reference manual is relevant for this implementation. We are working on the new documentation to go with NetRexx 3.04.

Pipes for NetRexx are the work of Ed Tomlinson, Jeff Hennick, with contributions by Chuck Moore, myself, and others. Pipes were the first occasion I have laid eyes on NetRexx, and I am happy they now have found their place in the NetRexx open source distribution. To have a look at it, download the NetRexx source from the Kenai site (https://kenai.com/projects/netrexx ) and build a 3.04 version yourself. Alternatively, wait until the 3.04 package hits http://www.netrexx.org.
This post is part of the Java Advent Calendar and is licensed under the Creative Commons 3.0 Attribution license. If you like it, please spread the word by sharing, tweeting, FB, G+ and so on!

Java Advent Calendar

Saturday, December 13, 2014

CMS Pipelines ... for NetRexx on the JVM

No comments:

Post a Comment