The most fantastic concepts in Erlang & Elixir that fascinate me are:
- Everything can be a process.
- Let it crash.
If you don’t need a process, then you don’t need a process. Use processes only to model runtime properties, such as mutable state, concurrency and failures, never for code organization.
quoted from GenServer, Elixir API Documentation
In order to fully embrace these features in every programs I write, learning OTP, GenServer, Supervisor is a must.
But every senior developer knows something very well: There is a large gap between knowing something in the book and using it.
When reading the「Programming Elixir ≥ 1.6」, everything makes sense to me on the chapters about OTP Servers and Supervisors. But when I tried to implement my own version for a more sophisticated scenario, I realized how big that gap is for me.
In this article, I would like to help you bridge the gap using my experience and the sophisticated enough scenario which seldom found in other tutorials.
In many tutorials on
Supervisor, they first tell us to create an application that has such capability by running command
mix new test --sup. The
application.ex created in the project will be similar as below. The application itself becomes a Supervisor that contains a defined set of children.
defmodule Test.Application do
Later when the tutorial introduces the concept of
DynamicSupervisor, it may define a module and modify the
application.ex like this:
defmodule Test.MyDynamicSupervisor do
# Modified in applicaiton.ex to include the Test.MyDynamicSupervisor
The process hierarchy for the above example is:
graph TD A(Application) --> B(Test.Service) & C(Test.MyDynamicSupervisor) C(Test.MyDynamicSupervisor) --> D(Test.MyWorker) & E(Test.MyWorker)
In a word, most of the tutorial about
DynamicSupervisor, only has 1-level depth.
But the real world is much more than that.
My application that involves multi-level supervision is a classic scenario for many personal or large corporate projects, Web Crawling. This scenario is far more real and practical than most of the examples in the tutorials out there. The process hierarchy of the application design is:
graph TD A(Spider Application) --> B(SpiderEngine) B --> C(SpiderEngine.Manager) & D(SpiderEngine.Supervisor) & E(SpiderStorage) D --> F(Spider.Facebook) & G(Spider.Instagram) F --> H(Spider.Manager.Facebook) & I(Spider.Supervisor.Facebook) G --> J(Spider.Manager.Instagram) & K(Spider.Supervisor.Instagram) I --> L(Spider.Fetcher.Facebook.1) & M(Spider.Fetcher.Facebook.2) K --> N(Spider.Fetcher.Instagram.1)
You may have already known, most of the modules are normal
GenServer except these:
sequenceDiagram participant SpiderStorage participant SpiderEngine.Manager participant SpiderEngine.Supervisor participant Spider participant Spider.Manager participant Spider.Supervisor participant Spider.Fetcher SpiderEngine.Manager->>SpiderStorage: What spiders to start? SpiderEngine.Supervisor-->>Spider: Dynamically starts spider engine Spider.Manager->>SpiderStorage: The base domain and urls to fetched? Spider.Supervisor-->>Spider.Fetcher: Dynamically starts fetcher Spider.Fetcher->>Spider.Manager: Asks for next url and/or sends back more urls SpiderEngine.Manager->>Spider.Manager: Reset fecher count, speed, etc.
SpiderEngineis the root of the whole web crawler.
The reason of not using Application directly as root because the application might contain other components, such as Phoenix API and Web UI to help managing the crawler.
The benefit of grouping all crawling behaviors under one particular module is the freedom to move them all at once easily if necessary.
SpiderEnginemust be able to supervise different types of crawlers depending on the nature of the websites or systems you want to crawl information from.
Types of the supported spiders should be configurable so that there must a
The reason of separating
Supervisor/DynamicSupervisoris purely for process supervision. There can be no other message passing between the
Supervisor/DynamicSupervisorand children workers. Hence, all children workers’ control logic have to be in the other module,
Spider.Instagramand other possible crawlers for different websites, are supervisor of their own to manage their own crawling policy, frequency, fetcher count, etc.
I hope the process hierarchy and sequence diagram explain clearly enough. Let’s move on to the real code. Actually, there are only two things that you need to clearly understand in order to keep your thought straight on building the process hierarchy.
The most important thing to be awared is the
name option in
start_link/3. It’s for Name Registration which causes trouble if you do not pay attention to when constructing the multi-level supervision.
In my scenario,
SpiderEngine.Supervisor only have one running process. Hence, their calls to
start_link/3 can be simply:
GenServer.start_link(__MODULE__, :no_args, name: __MODULE__)
However, for the
Spider.Supervisor, they have multiple processes for different types of spiders. Hence, their names should be differentiated.
# for Spider
Spider.Fetcher? There are also multiple processes for it, but do we have to start them with different names?
I think it’s a judgement call. The communication between
Spider.Manager are through PID. Hence, if you have saved those children PID somewhere when
Spider.Manager starts them, or the message is sent from
Spider.Manager, it’s not quite necessary to name this type of children workers, especially when their amount is large.
# for Fetcher, you can omit the name option or simply use a combined index as name
There are different forms of child specification when we start them by Supervisor:
- A map
- A tuple with a module as first element and the start argument as second
- A module
Below three samples are equal:
# I just present three forms here.
If you are using the map form, it’s important to know that, the
id field is used by the supervisor internally to identify the child. Hence, even in our scenario with multi-level supervision, we do not need to use spider name to differentiate it besides module name.
This is where you should pay attention to.
The first argument of the supervisor’s reference. It can be PID or other name forms used in Name Registration.
So in this spider scenario, the only place need to take care is the
Spider.Supervisor where it starts the
# The spec can be any one of the three forms above.
I hope this more sophisticated and real world example can help you understand the OTP Supervision more clearly.
Be used to model the system by process and modeling them correctly are just the first step on learning OTP. Next, it would be the message passing between them for coordination. I will share more when this web crawling project goes.
Please don’t hesitate to leave a comment if you have anything supplement or good advice.