Assignment 2
Exercise:
Implementing a parallel command runner.
Change Log
- Assignment Released in draft form (2023-10-30)
- Turned "no busy-waiting" into an design excellence challenge (2023-11-04)
- Clarified error behaviour (2023-11-04)
- Released assignment, with clarification about autotests and updates to examples to clarify EOL buffering. (2023-11-09)
- Updated Lazy/Eager to be lowercase, and fixed typos (2023-11-11).
Pars
The autotests are still unfortunately delayed due to some technical issues we're having with the CSE servers. We're working on it, but the rest of the assignment is now out of draft.
A common requirement for programmers is to run multiple programs concurrently. There are different tools to do this; but one of the most popular ones is called “GNU Parallel”. It is a perl script which allows you to pass in a list of commands, and then to run them concurrently. The most incredible part is that it not only allows you to run commands concurrently on your machine, but it will also let you connect to other computers and use them to do things concurrently too. Within UNSW, we use this feature to mark hundreds of assignments in concurrently; but elsewhere it's used for research and for hobby use as well.
In this assignment, you will be implementing a subset of the functionality of GNU parallel in Rust.
There are 6 parts to this assignment:
- Run commands in sequence.
- Run commands concurrently, with a limit to the number of lines running at once.
- Add support for termination control (halt soon vs. halt immediately)
- Run all commands on a single remote host, in sequence.
- Run all commands on a single remote host, concurrently. You may only connect to each remote host once at a time.
- Run commands across multiple hosts, concurrently.
There is also an extension section. Marks you gain in the extension section can make up for lost marks elsewhere in the assignment.
The goals of this assignment are:
- To experience concurrent programming in rust.
- To practice designing and structuring a larger rust program.
- To focus on skills and design patterns that would actually be used when solving modern programming problems (i.e. writing a pull request, writing re-usable code, etc.)
- To have fun making a useful application.
We want to also be explicit about what the goals aren’t:
- To teach, or assess, any content related to networking. You do not need any knowledge of TCP, IP, or any other networking concepts. If something seems related to that, and you think you need to understand it, please ask on the forums.
- To teach, or assess, any content related to using terminals. You will need to know what a terminal command is, and how to use rust to run one; but nothing more complex.
- To assess your ability to write large amounts of useless cruft solely for us to mark you on. Where you’re writing code, we have a reason for it. Where you’re writing text, it’s because we want to genuinely understand your thinking.
What you will submit
At the end you must submit a tarfile that passes the autotests. This means it must:
- Contain a crate named ‘pars’ which creates a ‘pars’ binary.
- Contain a
mark_request.txt
Reference Solution
This assignment has a reference solution, available at:
6991 pars
. To test it with birdie, you can run:
6991 start-birdie-reference (port)
, which will
start a birdie machine on the given port. The reference
solution will then work correctly on that birdie machine.
Required Knowledge
This section contains the required knowledge for this assignment.
There are some important commands we use regularly in the tests for this assignment:
-
echo "some text"
: this is the equivalent ofprintln!("some text");
in Rust. -
sleep 3
: this command does nothing, but waits for 3 seconds before returning. -
cat filename
: this command outputs the contents of the filefilename
-
/bin/false
: this command always fails -- it returns an exit code other than 0.
We will also use other commands we've created in the tests, most of which will just print a standard output. Your system must be able to run any command (even though our tests only test a limited subset).
When you type a line of text into a terminal, they are
interpreted into a series of commands, split into what are
called “arguments”. The first argument on a command-line is
the name of a program to run. After it, more arguments can
follow, and these modify the behaviour of the program. For
example cargo run
is calling the
cargo
program with the additional argument
run
. Arguments are usually separated by spaces.
You will receive input until you get an end-of-file (EOF)
character. This is usually Ctrl-D
on Linux.
Commands can be separated by a semicolon (;
).
Separate commands are executed entirely separately; so
sleep 1; echo "hello"
waits for 1 second, and
then prints “hello”.
On a command line, you can use the
'single quotes'
and
"Double quotes"
to make a collection of words
into a single argument. Operators (like the
;
operator which you will be asked to implement
in this assignment) are ignored inside of quotes.
You may also hear the following terms:
-
stdin
: “Standard Input” is the way that programs recieve input from a user while they’re running. This is any text that is typed into the terminal. -
stdout
: This is anything the program prints (for example, usingprintln!()
). -
stderr
: This is a special output for error messages. Rust writes to this usingeprintln!()
.
Later in the assignment, you will be expected to use the
command ssh
. This is a command line tool that
connects to another machine, and opens a terminal or runs a
command. To connect to a remote machine, it needs a hostname
(like login.cse.unsw.edu.au
), and a port number
(like 22
). SSH also requires either a password,
or a keyfile. In this assignment, we’ll only use keyfiles. A
keyfile is basically a password in the form of a file, rather
than text.
This command below shows you how you might open a connection to a remote machine:
ssh -i ~/path/to/keyfile -p 22 localhost
And to run a command on a remote machine:
ssh -i ~/path/to/keyfile -p 22 localhost echo "this command is run remotely"
We will discuss this more in the second part of the assignment.
Part 1: Pars on a single machine
You have been provided starter code, which is very minimal. You will need to write almost all the rust code yourself.
Your pars command in Part 1 should always take a
-J
argument, which tells you the number of jobs
to run concurrently. For example,
pars -J 4
should run 4 jobs concurrently. This
argument will be optional in Part 2.
1.1: Running one command at a time (10%)
Each line of standard input represents a command or series of commands. You should treat each line individually, You can expect that you will never receive a command split by a newline.
You can use the pars_lib::parse_line
function to
get a Vec<Vec<String>>
, which is a
containing a list of all the commands on a given line; each
split into the individual arguments.
Finally, use the Command
struct from the
std::process
module to execute the commands, in
order. The first element of the split vector is the command;
and the rest of the elements are the arguments.
The order you execute your commands is important: commands listed on the same line must be executed sequentially (one after the other). Commands on different lines can be executed concurrently. You must start executing a line as soon as you are able to. You must always execute the next line (i.e. you can't reorder commands).
You should execute the commands in the order listed. If one of the commands has an exit status which is not Ok, do not execute any of the following commands on that line. You should execute commands on following lines.
Once all commands from one line have been executed, start the next line.
$ pars -J 1
echo "hello"; echo "world"
hello
world
echo "you can see this"; /bin/false; echo "can't see this"
you can see this
echo "cheeky; echo semicolon"
cheeky; echo semicolon
1.2: Running multiple commands at a time (20%)
You will now extend your program so that you can run commands from different lines at the same time. While commands on different lines can be executed concurrently, commands on the same line should be executed in sequence.
If a command from one line fails, do not execute the rest of the commands from that line. However, you should execute commands from all the other lines.
$ pars -J 2
echo "hello"
hello
sleep 2
echo "world"
world
You should keep the standard output of a command buffered, and
only print it when the entire line is finished. We will not
test your program’s behaviour around standard error. We won't
test your program with any commands which are either
non-existent, or with lines which can't be parsed by the
parse_line
function.
You can see an example of this below:
$ pars -J 2
sleep 2; echo "hello"
echo "world"
world
hello
$ pars -J 2
sleep 2; echo "hello"
echo "world"; sleep 1; /bin/false; sleep 1; echo "hello"
world
hello
1.3: Termination Control (20%)
You will now implement an extra command line argument, called
-e
or --halt
. This can be followed
by one of three arguments: never
,
lazy
and eager
(these should be
lowercase, though a previous update of the spec had them as
uppercase. We won't penalise for either, but we recommend
lowercase). Each of these control the behaviour when a command
you run fails (i.e. returns a non-zero exit code).
Here is what each of these mean:
-
Never
is the default, and you have already implemented up to this stage. It means that if there is an unsuccessful command on one line, you should not execute the rest of the line, but you should continue to execute the other lines as normal. -
Lazy
means that once a command fails on one line, you should not start any new lines. Other existing lines of code will finish running, but no further lines should be executed. -
Eager
means that once a failure occurs, existing commands can finish, but no new commands should begin (either on the same line, or on other lines).
Part 2: Remote Execution
For this part of the assignment; you will now need to deal with remote machines. Rather than your programs all running concurrently on your machine; in remote mode your program should only connect to other machines; and all the actual computation should happen then.
Your program should appear the exact same as if programs were running locally.
In theory, the assignment should work with any remote machine;
we suggest using our tool birdie
to create a
virtual machine on CSE which only you can connect to.
To create a virtual machine running at a particular port, use
the command 6991 start-birdie <port>
.
Notes on Birdie
Note that this creates an SSH key at
~/.ssh/cs6991/
. You can use this key to SSH into
that machine at any time. The
RemoteCommand
implementation in
pars_libs
(provided) already knows that this key
exists, and will automatically use it to connect. This means
that you can simply create a Command
struct, and
call remote_spawn
on it to connect to another
machine.
Birdie may work on your own local machine; but we don’t guarantee it does.
If you start birdie on a particular port, it will create a new home directory for the virtual machine which is unique to that port. Therefore:
$ 6991 start-birdie 12345
(and on a separate terminal)
$ 6991 start-birdie 12346
Will create two different virtual machines, each with their own home directory. Separately,
$ 6991 start-birdie 12345
Ctrl+C
$ 6991 start-birdie 12345
will re-use the same home directory for both machines (so any files you created in the previous VM will be retained).
2.1: Run commands on a remote machine (10%)
The first step in this part of the assignment is to accept a
new argument; -r
or --remote
. It
should take a string like
address.of.machine:6991/1
. The part before the
:
is the machine’s hostname; and the
6991
is the port. After the /
is the
number of concurrent processes to run on that machine. To
complete this part, you will only need to run one process on a
remote machine at a time. In other words, you are guaranteed
that the number after the slash will be 1. If
-r
is provided,
you will never be given an
-J
argument.
2.2: Run commands concurrently on a remote machine, only having one connection (20%)
To complete this section of the assignment, all you need to do
is change your remote implementation so that you run multiple
commands on the same remote machine. In other words, you
should correctly support a remote string with a number greater
than 1
after the /
.
Importantly, you may only have one connection to a remote machine at a time. You must not initiate multiple connections concurrently. You must also follow the rules for the ordering of commands (i.e. you must start commands in order, as soon as you can).
In order to help solve this problem, note that
birdie
has the
--install-binary
option. This allows you to place
a binary on your computer onto the remote machine. In other
words, if you said
--install-binary path/to/binary
, then you'd be
able to use binary
on the remote machine just as
if it were installed.
THe simplest approach is to build your
pars
binary so it can function both as a server
and a client, depending on where it has been started. You can,
alternatively, write a separate binary for the client.
This means you can write another rust program (or re-use your existing one!) to run multiple remote programs at once.
2.3: Run commands on multiple remote machines, sharing the load as required (20%)
To complete this section of the assignment; you should support
connecting to multiple remote machines. In other words, your
program should be able to accept more than one
--remote
flag.
Your program should continue to work the same as before: particularly, you must execute processes in the order they are given on standard input. You must execute them as soon as you can (i.e. you can't pre-allocate some processes to a particular machine). You must support the eager and lazy modes as before.
Extension: Using libssh
This section is extension work. Marks gained in this section can offset up to 10 marks lost in any other part of the assignment. We're not going to be able to give detailed help for this section, and it'll require your own research and work.
So far in this assignment, you’ve used the
ssh
command in order to connect to remote
computers. To complete the extension, remove
all usages of the ssh
command; and use
the
libssh2_sys
crate. The example code provided here
here, as well as the documentation
here will likely
be very useful to you.
Design Excellence
If you choose not to do the extension task, you may choose to complete a design excellence task to get full marks in the design section of this assignment.
- Implement this assignment using only asynchronous Rust (i.e. async/await, using a system like Tokio).
- Implement the assignment using no busy-waiting or polling. In other words, the entire assignment should use only blocking calls, and rely on channels or mutexes to wake threads when they're ready.
- Please add any other design excellence suggestions in the forums.
Other Information
How to design your system
We're hoping the design for the first part of the assignment is relatively straightforward: you'll need an input thread, output thread, and some threads for running commands. The threads will need to talk to eachother using the tools we've discussed in lectures so far.
The second part of the assignment is more interesting: you'll need to have part of your program running on your own machine. The part on your own machine (we'll call it the "server") will need to start up a "client" (another program) on the remote machine. You can communicate to the client over standard input and output, and you will need to develop a protocol (a shared language) through which the client and server can talk. The client will need to run programs, whereas the server will manage getting input, printing output, and telling which clients to run which programs.
Below you can see a diagram which shows the basic structure of the system which we suggest for part 2:
SERVER THREADS CLIENT THREADS | +------------------------------+ | | | | | Input Thread (scans in text) | | | | | | | | +----------------+ +------------------------------+ | | | | | +------> | Client Thread | v | | | (runs program) | +------------------------------+ | | | | | | | | / +----------------+ | Control Thread | | | /- | (reads from input, starts | ---|---+ /- | a thread to run programs) | | | / +----------------+ +------------------------------+ | /- | | | /-| | Client Thread | |/ +------>| (runs program) | +------------------------------+ /| | | | | / | /-----+----------------+ | Output Thread |<- | /--- | (Takes text from clients and | /-|- | prints them out) |<- | +------------------------------+ | | | |
We suggest that your client and server should be the same
program, and you might use a command-line argument like
--client
to indicate to the program it should run
in "client mode". You can alternatively write a separate rust
program to act as the client. When you run
6991 start-birdie
it will automatically make the
pars
binary you're working on run on the remote machine.
Submission
See the instructions down the bottom of the page.
Using Other Crates
We are happy for you to use any crate that has been published on crates.io under three conditions:
-
The crate must not have been authored by anyone else in the
course.
-
The crate must have at least 1000 downloads, excluding the
last 30 days.
-
The crate must not impose license restrictions which require
you to share your own code.
If you are in doubt (or think these restrictions unfairly constrain you from using a reasonable crate), ask on the course forum.
Marking Scheme
There are 3 things on which you will be marked:
- Mechanical Style (10% of the total marks for this assignment.)
- Functional Correctness (40% of the total marks for this assignment times the percentage of the assignment completed)
- Idiomatic Design (50% of the total marks for this assignment, times the percentage of the assignment completed)
And a detailed analysis is shown below:
1. Mechanical Style (10%):
We will look at your crates, and make sure they:
- Compile, with no warnings or errors.
- Raise no issues with
6991 cargo clippy
. -
Are formatted with
rustfmt
(you can run6991 cargo fmt
to auto-format your crate). - Have any tests written for them pass.
If they do all of the above, you get full marks. Otherwise, we will award partial marks. This is meant to be the "easy marks" of programming.
2. Functional Correctness (40%):
You should pass the provided test cases. We will vary the test case very slightly during marking, to ensure you haven't just hard-coded things; but we're not going to do anything that's not just changing around some commands and re-ordering things.
3. Idiomatic Design (50%):
Your code should be well designed. This is where we will spend most of our time when marking. To help you, we have provided "design excellence" suggestions, which are ideas to make your design really excellent. You don't have to do them, but they would be good ways of getting a great design.
The following list of properties will be marked in your program:
- Code is abstracted appropriately.
- Types are used appropriately to express data in the program.
- The design does not impose unnecessary constraints on either the caller or callee through borrowing, lifetimes or ownership.
- Uses traits sensibly to add expressiveness and avoid unnecessary code.
- Data structures used are appropriate to store data.
- Functions perform error handling; cases that are expected do not panic.
- Code is sensibly organised, and split into appropriate modules.
- Documentation, where provided, is correct and readable.
-
The crate does not use loops to wait for events on other threads. It should use appropriate concurrency tools like Mutexes and channels.(now a design excellence) - (optional) Uses external crates effectively to achieve the above goals.
- (optional) Where code is designed in a sub-optimal way, comments about how to improve it are made under "Design Limitations".
IMPORTANT: your marks for the assignment are not the percentage of tests which you pass. We'll scale the tests to fit in with the weights described above.
You must complete the checklist of the
mark_request
faithfully. if you do not fill in
the file, you may receive reduced Idiomaitc Design Marks.
Your mark will be calculated based on the feedback you have received:
100% of available marks | No negative criteria, and one design excellence suggestion has been implemented. |
85% of available marks | Some minor comments are made about some of the above criteria. Above this mark, one design excellence suggestion will have been implemented. |
75% of available marks | Major comments are made about one or two criteria, with multiple small comments in different areas. |
65% of available marks | Major comments are made about three or more criteria. |
50% of available marks | Many areas have major comments made. |
below 50% of available marks | Assignments in this category are likely written as "translations from C", and ignore many Rust features and design patterns. |
Note that the following penalties apply to your total mark for plagiarism:
0 for the assignment | Knowingly providing your work to anyone and it is subsequently submitted (by anyone). |
0 for the assignment | Submitting any other persons work. This includes joint work. |
0 FL for COMP6991 | Paying another person to complete work. Submitting another persons work without their consent. |
Formal Stuff
Assignment Conditions
-
Joint work is not permitted on this assignment.
This is an individual assignment.
The work you submit must be entirely your own work. Submission of any work even partly written by any other person is not permitted.
The only exception being if you use small amounts (< 10 lines) of general purpose code (not specific to the assignment) obtained from a site such as Stack Overflow or other publicly available resources. You should attribute the source of this code clearly in an accompanying comment.
Assignment submissions will be examined, both automatically and manually for work written by others.
Do not request help from anyone other than the teaching staff of COMP6991.
Do not post your assignment code to the course forum.
Rationale: this assignment is an individual piece of work. It is designed to develop the skills needed to produce an entire working program. Using code written by or taken from other people will stop you learning these skills.
-
The use of code-synthesis tools is permitted on this assignment, however beware -- the code it creates can be subtly broken or introduce design flaws. It is your job to figure out what code is good. Your code is your responsibility. If your AI assistant blatantly plagiarises code from another author which you then submit, you will be held accountable.
Rationale: this assignment is intended to mimic the real world. These tools are available in the real world. However, you must be careful to use these tools cautiously and ethically.
-
Sharing, publishing, distributing your assignment work is not permitted.
Do not provide or show your assignment work to any other person, other than the teaching staff of COMP6991. For example, do not share your work with friends.
Do not publish your assignment code via the internet. For example, do not place your assignment in a public GitHub repository. You can publish Workshops or Labs (after they are due), but assignments are large investments for the course and worth a significant amount; so publishing them makes it harder for us and tempts future students.
Rationale: by publishing or sharing your work you are facilitating other students to use your work, which is not permitted. If they submit your work, you may become involved in an academic integrity investigation.
-
Sharing, publishing, distributing your assignment work after the completion of COMP6991 is not permitted .
For example, do not place your assignment in a public GitHub repository after COMP6991 is over.
Rationale: COMP6991 sometimes reuses assignment themes, using similar concepts and content. If students in future terms can find your code and use it, which is not permitted, you may become involved in an academic integrity investigation.
Violation of the above conditions may result in an academic integrity investigation with possible penalties, up to and including a mark of 0 in COMP6991 and exclusion from UNSW.
Relevant scholarship authorities will be informed if students holding scholarships are involved in an incident of plagiarism or other misconduct. If you knowingly provide or show your assignment work to another person for any reason, and work derived from it is submitted - you may be penalised, even if the work was submitted without your knowledge or consent. This may apply even if your work is submitted by a third party unknown to you.
If you have not shared your assignment, you will not be penalised if your work is taken without your consent or knowledge.
For more information, read the UNSW Student Code, or contact the course account.
When you are finished working on this exercise, you must
submit your work by running give
:
This exercise cannot be submitted with
6991 give-crate
.
Instead, please package your crate(s) (along with all other
files intended for submission) up into a tar file named
crate.tar
.
tar cvf crate.tar <path1> <path2> <path3> ... e.g.: tar cvf crate.tar ./my_crate1/ ./my_crate2/
Finally, submit with the following command:
give cs6991 assign02_pars crate.tar
The due date for this exercise is Week 11 Monday 21:00:00.
You must run give
before
Week 11 Monday 21:00:00 to obtain the marks
for this exercise. Note that this is an individual exercise;
the work you submit with give
must be entirely
your own.