The way I am thinking of to approach it is to make a meta programming language that takes individual simple programs and has ways of telling the computer which things can be done in parallel and what parts of the meta program depend on the other parts. Here's a diagram:
This meta program consists of 5 smaller programs that are ordered in columns from left to right. Every column further to the right depends on results from one or more programs from a column to its left. Each of these programs has a list of things that it needs to do and the metaprogram handles communication between them. The parralization comes in for instance in function f at the top left. Because it's to do list is an unordered list, the metaprogram will know that all of the things on its to do list can happen at the same time, so it will run in this case 3 simultaneous copies of f doing each of the things on its list. And also anything else in the first column can also happen at the same because programs in that column don't have dependencies as there is nothing further to the left. But h in the bottom left unlike f has an ordered list so those things have to happen sequentially.
G is an indexed function which consists of multiple copies of the same program waiting on different inputs from the multiple copies of f that run. And j is an example of a program that depends on everything to the left of it and is supposed to sequentially reply back to h to get the next input. The metaprogram would keep track of when j's inputs have all been satisfied and it can be run. This would happen when every copy of g finishes and it's received an input from h.
The metaprogram could potentially run many times faster than a regular program run sequentially step by step depending on how many processors the system has.
So now the thing would be to find a good syntax for the meta programming language and write an implementation. I think this is the gist of a good way to approach the problem though.