Contents
  1. 1. Through Documentation
  2. 2. Through AST
    1. 2.1. Module AST structure
    2. 2.2. Module Block AST structure
    3. 2.3. Function AST structure

One day, I got a requirement of extracting the public function information from the Elixir Module. I got no clue how to achieve it at the beginning but it turns out to be an interesting experience afterward.

Through Documentation

Anyone who learns Elixir should be familiar with Hex Docs. Hence, my first reaction is to see how it works because there should be a way to extract the function signature and documentation. I found out that I can use Code.fetch_docs as below:

1
2
{:docs_v1, _, _, _, %{"en" => module_doc}, _meta, doc_elements} =
Code.fetch_docs(module_or_path)

Notes: If a module name is provided, it must be compiled from a file, not directly in Livebook because of this.

The format of the doc_elements contains most of the information I need: the function signature and documentation. However, as there are multiple implementations for one of the functions get_best_block_height/2, using Code.fetch_docs cannot deal with this scenario. How to overcome it?

1
2
3
4
5
6
7
8
9
10
11
12
[
{{:function, :get_best_block_height, 1}, 35, ["get_best_block_height(endpoint)"],
%{
"en" => "Get best **block height** of Ethereum-type network from given endpoint\n"
}, %{}},
{{:function, :get_best_block_height, 2}, 14, ["get_best_block_height(network, endpoint)"],
%{
"en" =>
"Get best **block height** from given endpoint of either:\n* Ethereum\n* Arweave\n\nNot implemented for BTC yet\n"
}, %{}},
{{:function, :get_module_doc, 0}, 9, ["get_module_doc()"], %{"en" => "Get module doc\n"}, %{}}
]

Through AST

Anyone who has some years’ programming experience should know or hear about AST (Abstract Syntax Tree). Abstract Syntax Trees are data structures widely used in compilers to represent the structure of program code.

Elixir has great Meta-programming capability and AST can do the job. I did not come to AST in the first place due to the deep down scare in me facing this advanced & complicated technique.

However, it’s actually not as complex as I thought, at least for achieving what I want to do. According to Elixir’s Documentation:

The building block of Elixir’s AST is a call, such as:

1
sum(1, 2, 3)

which is represented as a tuple with three elements:

1
{:sum, meta, [1, 2, 3]}

the first element is an atom (or another tuple), the second element is a list of two-element tuples with metadata (such as line numbers) and the third is a list of arguments.

We can retrieve the AST for any Elixir expression by calling quote:

1
2
3
4
quote do
sum()
end
#=> {:sum, [], []}

It is not difficult to understand the AST for a call statement and how to generate the AST. How about the AST for a module? Actually, using the quote approach cannot get a module’s AST. We can load it from the source file using:

1
source_path |> File.read!() |> Code.string_to_quoted()

Module AST structure

Elixir Module AST structure

By comparing the Elixir source code and generated AST structure, we can see that AST of the whole module fulfills the basic structure above. The argument list is an alias and a do block:

1
2
3
4
5
6
7
{:defmodule, [line: 1],
[
{:__aliases__, [line: 1], [:BestBlockHeightGetter]},
[
do: {:__block__, [], []}
]
]}

Module Block AST structure

The block AST shows clearly that its argument list contains separate ASTs for every attribute (starting with :@ atom) and function (starting with :def atom).

Function AST structure

As we can see, for simple functions without guard:

  • If it has no parameters, the get_module_doc function ([line: 12]), its argument list in AST is nil.
  • If it has parameters, the get_best_block_height function ([line: 26], [line: 31] and [line: 39]), the parameter is either a constant value or an AST.

The function with guard is a bit more complex but you should still be able to recognize the components.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# get_module_doc
{:def, [line: 12],
[
{:get_module_doc, [line: 12], nil},
[do: {:@, [line: 12], [{:moduledoc, [line: 12], nil}]}]
]}

# get_best_block_height
{:def, [line: 26],
[
{:get_best_block_height, [line: 26],
["ethereum", {:\\, [line: 26], [{:endpoint, [line: 26], nil}, "eth"]}]},
[
do: {:__block__, [], []}
]
]}

{:def, [line: 26],
[
{:get_best_block_height, [line: 31], ["arweave", {:endpoint, [line: 26], nil}]},
[
do: {:__block__, [], []}
]
]}

{:def, [line: 39],
[
{:get_best_block_height, [line: 39], [{:endpoint, [line: 39], nil}]},
[
do: {:__block__, [], []}
]
]}

# With Guards
{:def, [line: 22],
[
{:when, [line: 22],
[
{:get_best_block_height, [line: 22],
[{:network, [line: 22], nil}, {:_endpoint, [line: 22], nil}]},
{:==, [line: 22], [{:network, [line: 22], nil}, "btc"]}
]},
[do: {:-, [line: 23], [1]}]
]}

Hence, based on this generated AST, I can get back the function signature using this code snippet (I know the part handling the when case is a bit ugly).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
{:ok, {:defmodule, _meta, [_, [do: {:__block__, _, block_statements}]]}} =
source_file_path |> File.read!() |> Code.string_to_quoted()

quote_arg = fn arg ->
if is_binary(arg), do: "\"#{arg}\"", else: arg
end

parse_arg = fn arg ->
case arg do
{:\\, _, [{arg_name, _, _}, actual]} ->
"#{arg_name} \\\\ #{quote_arg.(actual)}"

{arg_name, _, _} ->
arg_name

_ ->
quote_arg.(arg)
end
end

block_statements
|> Enum.filter(fn statement ->
case statement do
{:def, _meta, _args} ->
true

_ ->
false
end
end)
|> Enum.map(fn {:def, _meta, [signature, _block]} ->
signature
end)
|> Enum.map(fn fun_ast ->
case fun_ast do
{:when, _meta, [{fun_name, _, args}, {operator, _, [arg1, arg2]}]} ->
"#{fun_name}(#{Enum.map(args, &parse_arg.(&1)) |> Enum.join(", ")}) when #{parse_arg.(arg1)} #{operator} #{parse_arg.(arg2)}"

{fun_name, _meta, nil} ->
"#{fun_name}()"

{fun_name, _meta, args} ->
"#{fun_name}(#{Enum.map(args, &parse_arg.(&1)) |> Enum.join(", ")})"
end
end)
|> Enum.each(&IO.puts/1)

The output would be:

1
2
3
4
5
get_module_doc()
get_best_block_height(network, _endpoint) when network == "btc"
get_best_block_height("ethereum", endpoint \\ "eth")
get_best_block_height("arweave", endpoint)
get_best_block_height(endpoint)
Contents
  1. 1. Through Documentation
  2. 2. Through AST
    1. 2.1. Module AST structure
    2. 2.2. Module Block AST structure
    3. 2.3. Function AST structure