Message communication between supervisor and child processes in Elixir
Updated:
Process is one of the core concept in Elixir/Erlang. Sending & receiving messages between them is crucial. So how should the communication be done in the spider project I introduced in last blog?
Let us take a look on different types of scenarios below.
Messages for itself
The Spider.Fetcher
only has one job. It asks one URL from Spider.Manager
, crawls the page content and sends back the result.
This job should be invoked regularly based on a configured interval after initialization and crawling for one URL. Hence, the Spider.Fetcher
should send message to itself at this two points as you can see below.
1 | def init(state = {spider_config, index, manager_pid}) do |
Normal Messages between Parent and Child
As we can see in the code above, Spider.Fetcher
sends two different types of message to its Spider.Manager
process.
:next_url
for asking next url to crawl.:result
for sending back the crawled result.
The manager_pid
is captured in its state because leaking the process naming logic here and use Name Registration to look it up again are unnecessary.
For the same reason, the process ids of the child processes are captured during creation. Messages can be sent through them if I need to collect child processes’ status or start/stop them.
For example, in SpiderEngine.Manager
, the all_engines
is a map that uses spider name as key, spider process id as value.
1 | def handle_info(:kickoff, _) do |
Message for Parent Supervisor Termination
Why the case
It might be counter-intuitive to ask the child process to terminate the whole supervisor, but it’s reasonable if you put it under a real scenario.
During spider initialization, if it encounters unrecoverable issues, say unreachable target website or receiving 401 status code because it’s blocked, you might prefer to take down the spider instead of keep trying.
Which direction
In my case, the Spider.Manager.Facebook
makes the decision on bringing down the whole supervision tree below Spider.Facebook
. But how? Should it directly sends signal to Spider.Facebook
and ask it to exit? Or go through the higher ladder by asking the SpiderEngine.Manager
to terminate it?
TLDR. Never call Process.exit
with parent process id in child process. Use Supervisor.stop/3
or DynamicSupervisor.terminate_child/2
instead.
Process.exit
I roughly remember there is one API called Process.exit/2
, and so wrote some modules to test.
1 | defmodule Test.Root do |
Here is the testing code and result.
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
In step 5, it shows that all the processes created are alived. But shouldn’t the Test.Supervisor
and two of its child processes be terminated? When checking on the children of root
and sup
processes in step 6 & 7:
- Process for
sup
andchild_a
are still alive. - Process
child_a
is not under supervision ofTest.Supervisor
.
After going through the documentation for API Process.exit/2
more carefully and learning what trapping exits means, I finally the reasons behind.
- Supervisor traps exits. Exit signal is transformed to
{:EXIT, from, :normal}
message. Process.exit(supp, :normal)
fromchild_a
makesTest.Supervisor
think thatchild_a
is terminated.Test.Supervisor
removeschild_a
from its supervision tree even though thechild_a
process is actually alive.
According to the API doc, using :kill
as exit reason can unconditionally kill the process. Does it work in this case?
If I changed the handling for :down
message in Test.Worker
as below:
1 | def handle_call({:down, sup_pid}, _from, state) do |
The test result will be:
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
In step 5, the supervision tree of sup
is terminated. However, when I checked the children of root
, there is a new process created for Test.Supervisor
because it is restarted as it does not exit normally.
Impact of Restart Strategy
Worth noting that, if we keep using Process.exit(sup_pid, :normal)
in Test.Worker
but change the child spec of Test.Worker
in Test.Supervisor
by omitting the restart
strategy which is :permanent
by default, the test result might be misleading.
1 | def start_worker(id) do |
Test result:
1 | iex(1)> [root, mgr, sup] = [Test.Root, Test.Manager, Test.Supervisor] |> Enum.map(&Process.whereis/1) |
It looks like the supervisor process sup
is terminated correctly. However, if you start up your application with sasl
logging on: iex --logger-sasl-reports true -S mix
, you can see similar information after step 4:
1 | 08:01:10.976 [error] Child :undefined of Supervisor Test.Supervisor terminated |
The Test.Supervisor
actually tries to restart the child_a
which indicates its termination by Process.exit(sup_pid, :normal)
, but it keeps failing. In the end, Test.Supervisor
shutdown itself due to :reached_max_restart_intensity
. It seems the final outcome is what you need, but it is not working the same way as you thought.
By using ExTrace with below command before executing the testing code, it shows what really happens behind the scene.
1 | iex(1)> Extrace.calls([ |