Pipeline Creation Language (PCL)

Building pipelines can be tedious and error-prone. Using Moses scripts to build pipelines can be hampered by the fact that scripts need to be able to parse the output of the previous script. Moving scripts to different positions in the pipeline is tricky and may require a code change! It would be better if the scripts were re-usable without change and users can start to build up a library of computational pieces that can be used in any pipeline in any position.

Since pipelines are widely used in machine translation, and given the problem outlined above, a more convienent and less error prone way of building pipelines quickly, with re-usable components, would aid construction.

A domain specific language called Pipeline Creation Language (PCL) has been developed part of the MosesCore project (European Commission Grant Number 288487 under the 7th Framework Programme). PCL enables users to gather components into libraries, or packages, and re-use them in pipelines. Each component defines inputs and outputs which are checked by the PCL compiler to verify components are compatible with each other.

PCL is a general purpose language that can be used to construct non-recurrent software pipelines. In order to adapt your existing programs and script for use with PCL a Python wrapper must be defined for each program. This builds up a library of components with are combined with others in PCL files. The Python wrapper scripts must implement the following function interface:

Once your library of components have been written they can be combined using the PCL language. A PCL file defines one component which uses other defined components. For example, the following file defines a component that performs tokenisation for source and target files.

 #
 # Component definition: 2 input ports, 2 output ports
 #
 #                 +---------+
 # src_filename -->+         +--> tokenised_src_filename
 #                 |         |
 # trg_filename -->+         +--> tokenised_trg_filename
 #                 +---------+
 #
 import wrappers.tokenizer.tokenizer as tokeniser

 component src_trg_tokeniser
  inputs (src_filename), (trg_filename)
  outputs (tokenised_src_filename), (tokenised_trg_filename)
  configuration tokeniser.src.language,
                tokeniser.src.tokenisation_dir,
                tokeniser.trg.language,
                tokeniser.trg.tokenisation_dir,
                tokeniser.moses.installation
  declare
    src_tokeniser := new tokeniser with
      tokeniser.src.language -> language,
      tokeniser.src.tokenisation_dir -> tokenisation_dir,
      tokeniser.moses.installation -> moses_installation_dir
    trg_tokeniser := new tokeniser with
      tokeniser.trg.language -> language,
      tokeniser.trg.tokenisation_dir -> tokenisation_dir,
      tokeniser.moses.installation -> moses_installation_dir
  as
    wire (src_filename -> filename),
         (trg_filename -> filename) >>>
    (src_tokeniser *** trg_tokeniser) >>>
    wire (tokenised_filename -> tokenised_src_filename),
         (tokenised_filename -> tokenised_trg_filename)

A PCL file is composed of the following bits:

The definition of a component can use the following pre-defined components:

Combinator operators used to compose the pipeline, they are:

Examples in the PCL Git repository show the usage of these operators and pre-defined components. Plus an example Moses training pipeline is available in contrib/arrow-pipelines directory of the mosesdecoder Git repository. Please see contrib/arrow-pipelines/README for details of how to compile and run this pipeline.

For more details of how to use PCL please see the latest manual at

 contrib/arrow-pipelines/python/pcl/documentation/pcl-manual.latest.pdf