While large language models have been shown to be very useful, they easily outstrip available computing resources. There is already a language model server implementation as part of Moses, but it is not well integrated with the decoding algorithm. Using the LM server adds additional overhead due to the latency of the TCP/IP requests.
It therefore would be preferable to make these requests in large batches, instead one at a time. This would require to organize the decoding algorithm around such packed requests. The algorithm would create hypothesis in two stages:
There are a couple of ways to optimize this further, such using a randomized LM to filter requests and use multi-threading to not waste time on waiting, but this would be beyond a short project.