Issue
I'm trying to develop a module that calls user-defined grading scripts to grade programming homework.
There are at least 2 roles of users -- teachers and students. Teachers upload the grading scripts and students upload the programming homework.
The following code is just a quite simplified version of it.
@PostMapping("/upload/spring")
@ResponseBody
String uploadSpringHomework(Authentication authentication, @RequestParam("file") MultipartFile file) throws IOException {
CustomUserDetails userDetails = (CustomUserDetails) authentication.getPrincipal();
User user = userRepository.findById(userDetails.getId()).get();
String fileName = StringUtils.cleanPath(file.getOriginalFilename());
saveHomework(file, fileName);
Grade grade = gradeHomework(user, fileName);
saveGrade(grade, user);
return grade.toString();
}
There are mainly 3 steps in the grading process -- receiving homework(saveHomework()
), grading the uploaded homework(gradeHomework()
) and save the grade(saveGrade()
).
I guess the following 2 parts are doing a good job at supporting concurrency if I add a bit more code e.g. create a isolated directory for each user.
private static void saveHomework(MultipartFile file, String fileName) {
Path path = Paths.get(TEST_UPLOAD_DIR + fileName);
try {
Files.copy(file.getInputStream(), path, StandardCopyOption.REPLACE_EXISTING);
} catch (IOException e) {
e.printStackTrace();
}
}
private void saveGrade(Grade grade, User user){
grade.setProblemName(TEST_PROBLE_MNAME);
userRepository.save(user);
grade.setUser(user);
grade.setSubmittedTime(new Date());
gradeRepository.save(grade);
}
The main concern is the following part (gradeHomework()
).
For the sake of teachers' convenience, the module takes python scripts as grading scripts that save grade and details in a file (TEST_GRADING_REPORT
), which seems to be a bad design.
private static Grade gradeHomework(User user, String fileName)
throws StreamReadException, DatabindException, IOException {
ProcessBuilder processBuilder = new ProcessBuilder("python3", TEST_UPLOAD_DIR + TEST_GRADING_SCRIPT, fileName,
user.getUsername());
processBuilder.redirectErrorStream(true);
try {
Process process = processBuilder.start();
List<String> results = readProcessOutput(process.getInputStream());
System.out.println("results: " + results);
} catch (IOException e) {
e.printStackTrace();
}
ObjectMapper mapper = new ObjectMapper();
Grade grade = mapper.readValue(Paths.get(TEST_UPLOAD_DIR + TEST_GRADING_REPORT).toFile(),
Grade.class);
return grade;
}
private static List<String> readProcessOutput(InputStream inputStream) throws IOException {
try (BufferedReader output = new BufferedReader(new InputStreamReader(inputStream))) {
return output.lines()
.collect(Collectors.toList());
}
}
I guess there're several sub-parts needing improvement to support concurrency, e.g. running python3 <script_name>
, getting grade, etc. Could someone give me a hint to do this work?
In conclusion, how do I make calling python script support concurrency in spring boot? By concurrency, I mean allowing multiple students upload their homework and get their grade at the same time.
PS: Each grading task (Python script) takes about 50ms on average.
Solution
Since you are running this in web container, each web request will be naturally concurrent. The container keeps the requests isolated and provides basic thread-safety. No concurrency problems for you there ...
The obvious way to deal with the Python scripts (potentially) running simultaneously is:
- create a unique subdirectory for each "grading task",
- run the task with a separate
python
command, and - use the subdirectory as the command's current / working directory.
Refer to the ProcessBuilder
javadocs to see how to specify the child processes current directory.
Then all you need to do is to process the "TEST_GRADING_REPORT" file written to the respective grading task's current directory when the task has completed. And then (maybe) clean up.
Note: I am assuming that the grading script writes the report file into its current directory. To me, that is implied by your description of the problem. (If my assumption is incorrect, please update the question to explain precisely where the report file is written.)
There are potentially other issues that you need to address. For example:
What happens if the student's code goes into an infinite loop and the marking script doesn't complete? You need some timeout mechanism to detect this and (potentially) kill the script. Hint: check the
Process
javadocs ...What happens if the marking is going to take longer than the timeout on the submission web request? Should you actually be doing the marking asynchronously? (And if that is the case, do you really need to run the script simultaneously at all? You could have a queue and a single worker thread to run the scripts ...)
You wrote:
For the sake of teachers' convenience, the module takes python scripts as grading scripts that save grade and details in a file (TEST_GRADING_REPORT), which seems to be a bad design.
On the contrary, this is a common pattern for simple stand-alone applications. Not bad design. And it is not a difficult problem for your code to deal with.
Answered By - Stephen C
Answer Checked By - Mildred Charles (JavaFixing Admin)