Message communication between supervisor and child processes in Elixir
Updated:
Process is one of the core concept in Elixir/Erlang. Sending & receiving messages between them is crucial. So how should the communication be done in the spider project I introduced in last blog?
Let us take a look on different types of scenarios below.
Messages for itself
The Spider.Fetcher only has one job. It asks one URL from Spider.Manager, crawls the page content and sends back the result.
This job should be invoked regularly based on a configured interval after initialization and crawling for one URL. Hence, the Spider.Fetcher should send message to itself at this two points as you can see below.
1 | def init(state = {spider_config, index, manager_pid}) do |
Normal Messages between Parent and Child
As we can see in the code above, Spider.Fetcher sends two different types of message to its Spider.Manager process.
:next_urlfor asking next url to crawl.:resultfor sending back the crawled result.
The manager_pid is captured in its state because leaking the process naming logic here and use Name Registration to look it up again are unnecessary.
For the same reason, the process ids of the child processes are captured during creation. Messages can be sent through them if I need to collect child processes’ status or start/stop them.
For example, in SpiderEngine.Manager, the all_engines is a map that uses spider name as key, spider process id as value.
1 | def handle_info(:kickoff, _) do |
Message for Parent Supervisor Termination
Why the case
It might be counter-intuitive to ask the child process to terminate the whole supervisor, but it’s reasonable if you put it under a real scenario.
During spider initialization, if it encounters unrecoverable issues, say unreachable target website or receiving 401 status code because it’s blocked, you might prefer to take down the spider instead of keep trying.
Which direction
In my case, the Spider.Manager.Facebook makes the decision on bringing down the whole supervision tree below Spider.Facebook. But how? Should it directly sends signal to Spider.Facebook and ask it to exit? Or go through the higher ladder by asking the SpiderEngine.Manager to terminate it?
TLDR. Never call Process.exit with parent process id in child process. Use Supervisor.stop/3 or DynamicSupervisor.terminate_child/2 instead.
Process.exit
I roughly remember there is one API called Process.exit/2, and so wrote some modules to test.
1 | defmodule Test.Root do |
Here is the testing code and result.
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
In step 5, it shows that all the processes created are alived. But shouldn’t the Test.Supervisor and two of its child processes be terminated? When checking on the children of root and sup processes in step 6 & 7:
- Process for
supandchild_aare still alive. - Process
child_ais not under supervision ofTest.Supervisor.
After going through the documentation for API Process.exit/2 more carefully and learning what trapping exits means, I finally the reasons behind.
- Supervisor traps exits. Exit signal is transformed to
{:EXIT, from, :normal}message. Process.exit(supp, :normal)fromchild_amakesTest.Supervisorthink thatchild_ais terminated.Test.Supervisorremoveschild_afrom its supervision tree even though thechild_aprocess is actually alive.
According to the API doc, using :kill as exit reason can unconditionally kill the process. Does it work in this case?
If I changed the handling for :down message in Test.Worker as below:
1 | def handle_call({:down, sup_pid}, _from, state) do |
The test result will be:
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
In step 5, the supervision tree of sup is terminated. However, when I checked the children of root, there is a new process created for Test.Supervisor because it is restarted as it does not exit normally.
Impact of Restart Strategy
Worth noting that, if we keep using Process.exit(sup_pid, :normal) in Test.Worker but change the child spec of Test.Worker in Test.Supervisor by omitting the restart strategy which is :permanent by default, the test result might be misleading.
1 | def start_worker(id) do |
Test result:
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
It looks like the supervisor process sup is terminated correctly. However, if you start up your application with sasl logging on: iex --logger-sasl-reports true -S mix, you can see similar information after step 4:
1 | 08:01:10.976 [error] Child :undefined of Supervisor Test.Supervisor terminated |
The Test.Supervisor actually tries to restart the child_a which indicates its termination by Process.exit(sup_pid, :normal), but it keeps failing. In the end, Test.Supervisor shutdown itself due to :reached_max_restart_intensity. It seems the final outcome is what you need, but it is not working the same way as you thought.
By using ExTrace with below command before executing the testing code, it shows what really happens behind the scene.
1 | iex(1)> Extrace.calls([ |
