Recently I have implemented a small piece of functionality, which is suitable to be a small challenge.
Description
Given a string with opening and closing tags (e.g., <mark></mark>
), return a string without tags and indices of opening and closing tags.
For example, given a string: "we eat <mark>healthy</mark> and <mark>tasty</mark> food."
We expect to receive a string: "we eat healthy and tasty food."
, and an array with pairs of indices: [[7, 14], [19, 24]]
. There are two pairs of indices, because there were two pairs of <mark></mark>
tags.
Ruby
First, I tried to solve this problem imperative way, which did not work very well. Solving the problem with recursion was easier:
def remove_tag(str)
str.sub(/<(\/?)mark>/, '')
end
def compute_text_with_indices(str, indices = [])
if /<(\/?)mark>/.match(str) == nil
{text: str, indices: indices.each_slice(2).to_a}
else
indices << /<(\/?)mark>/.match(str).begin(0)
compute_text_with_indices(remove_tag(str), indices)
end
end
The trick here is to compute position of each subsequent tag after the previous one has been removed. It is needed to account for the shifting of the marked part of the string after removal of tags.
Ruby provides a convenient method to retrieve index of the _n_th match - begin(n)
(where n is index of an element in the match array). There is also end(n)
method.
Elixir (Ruby-ish)
Then I tried to solve the same problem with Elixir. My first attempt was to use only pattern matching and recursion, but that didn’t look as pretty as expected (next solution), so I went on to try to repeat my ruby solution.
defmodule MyString do
def remove_tags(str, indices \\ []) do
unless String.contains?(str, ["<mark>", "</mark>"]) do
%{ text: str, indices: Enum.chunk(Enum.reverse(indices), 2) }
else
remove_tags(remove_tag(str),
[ get_match_index(str, ~r/<(\/?)mark>/) ] ++ indices
)
end
end
def remove_tag(str) do
Regex.replace(~r/<(\/?)mark>/,str , "", global: false)
end
def get_match_index(str, regex) do
Regex.run(regex, str, return: :index, capture: :first)
|> List.first
|> Tuple.to_list
|> List.first
end
end
With Elixir I had to write a small function to extract beginning of a match - get_match_index/2
.
It was unclear for me what exactly Regex.run/2
returns in [{integer, integer}]
. At first I assumed it was beginning and end of a match. Later, when I saw [{12, 7}] it became clear that it probably returns beginning of the match and it’s length.
Elixir
Since Elixir is a functional language I expected to write code without any if\unless
statements easily. Even though I managed to do this, I have a feeling that it does look beautiful:
defmodule MyString do
def remove_tags(str), do: remove_tags(str, [], has_match?(str) )
def remove_tags(str, acc, false), do: %{ text: str, indices: Enum.chunk(Enum.reverse(acc), 2) }
def remove_tags(str, acc, true) do
remove_tags(remove_tag(str),
[ get_match_index(str, ~r/<(\/?)mark>/) ] ++ acc,
has_match?(remove_tag(str))
)
end
def remove_tag(""), do: []
def remove_tag(str), do: Regex.replace(~r/<(\/?)mark>/,str , "", global: false)
def has_match?(str) do
String.contains?(str, ["<mark>", "</mark>"])
end
def get_match_index(str, regex) do
Regex.run(regex, str, return: :index, capture: :first)
|> List.first
|> Tuple.to_list
|> List.first
end
end
In this version I use pattern matching instead of explicit conditional to decide whether the program should iterate more or terminate.
Conclusion
Even such a small problem as removing tags from a string can take some time to get right. We see that Elixir can be written very similar to Ruby and can be written very differently.