Rill Stage 1

The goal of the project Rill is to collect data about online streams from Twitch (and, possibly, other streaming platforms) for further analysis.

Set up Twitch client ID according to: http://blog.danielberkompas.com/elixir/2015/03/21/manage-env-vars-in-elixir.html

The process to obtain data about streams for a particular user looks like this:

Find user’s username (e.g., from a Twitch URL)
Make a request to Twitch API to convert username to stream id.
Make a request to Twitch API to obtain data about user’s stream (is there a live steam, is there a recording being played)

In stage 1 we will write a simple functions to explore Twitch API.

To issue requests to Twitch API we will use HTTPoison library. Step one is to obtain Stream ID from a username. Given Twitch URL https://www.twitch.tv/nuke73 username is nuke73. Let’s use Elixir’s awesome OptionParser module to receive usernames as a command-line argument. We expect to receive a comma-separated list of usernames. To issue a request to Twitch API we setup proper headers with Twitch Client ID and url. Then we send a request to Twitch API using HTTPoison and display response on the command-line. If an error occurred then we will write it to stderr with

  IO.puts(:stderr, reason)

To build an executable with escript we must add escript configuration parameter that point escript to the right module (in mix.exs):

def project do
  [
    ...
    escript: [main_module: Rill],
    ...
  ]
end

Now execution of our fresh CLI script: ./rill --usernames=nuke73,richard_hammer returns json with data about two Twitch users. Field _id can be used to fetch data about corresponding streams.

Let’s write functions that fetch data about user’s stream.

Firstly, Twitch API response returned in JSON format, so we should parse JSON in Elixir. To parse JSON in Elixir we will use Poison library. Parsing response is very simple:

Poison.Parser.parse!(body)

Poison will parse response into something like:

%{"_total" => 1,
  "users" => [%{"_id" => "64341520", "bio" => nil,
     "created_at" => "2014-06-14T11:45:40.184851Z", "display_name" => "Nuke_73",
     "logo" => nil, "name" => "nuke_73", "type" => "user",
     "updated_at" => "2017-08-28T23:02:50.810351Z"}]}

Now we should loop over each user Map and extract value of the field "_id". To extract this field I have used list comprehensions:

for element <- source, into: [], do: Map.get(element, field)

Where source is list of all user Maps, element is an element of this list, and field is required "_id" field. This list comprehension will return a list of user ids, which also serve as channel ids.

Obtaining data about streams is easy when channel ids are extracted. The only thing left is to issue requests to the correct endpoint for each channel id. This will be achieved with another function with list comprehension:

  def get_streams(body) do
    channel_ids = extract_channel_id(body)
    for channel_id <- channel_ids do
      Process.sleep(1000)
      get_stream_by_user(channel_id)
    end
  end

We will make our script to play nicely with potential rate limiting and add a 1 second delay between API calls.

Functions for fetching data about users and streams are very similar:

  def get_user(options) do
    usernames = options[:usernames] |> String.downcase()
    headers = ["Client-ID": Application.get_env(:rill, :twitch_client_id),
               "Accept": "application/vnd.twitchtv.v5+json"]
    api_url = "https://api.twitch.tv/kraken/users?login=#{usernames}"
    case HTTPoison.get(api_url, headers) do
      {:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
        body
      {:ok, %HTTPoison.Response{status_code: 400, body: body}} ->
        IO.puts(:stderr, body)
      {:error, %HTTPoison.Error{reason: reason}} ->
        IO.puts(:stderr, reason)
    end
  end

We set up correct headers and API endpoint for our request and send the request with HTTPoison. If everything is fine, body of the response is returned, otherwise body of a successful request with status code other than 200 or error message is printed to stderr.

As an extra step I have added command-line switches to print data only about users or streams or both.

Function to parse arguments now contains definitions for three switches:

  defp parse_args(args) do
    {options, _, _} = OptionParser.parse(args,
      strict: [usernames: :string, with_user: :boolean, with_stream: :boolean])
    options
  end

With those switches call to our command-line application will look like: ./rill --usernames=nuke73,tornis --with-stream --with-user

In order to choose which path to take (based on provided switches), I have created a function process, which holds decision logic:

  def process([]), do: IO.puts(:stderr, "No arguments given")
  def process(options) do
    cond do
      options[:usernames] == nil ->
        IO.puts(:stderr, "Comma-separated list of usernames is expected")
      options[:with_user] && options[:with_stream] ->
        options |> get_users |> print_to_console |> get_streams |> IO.puts
      options[:with_stream] ->
        options |> get_users |> get_streams |> IO.puts
      true ->
        options |> get_users |> IO.puts
    end
  end

There are just three pipes that execute functions in order, depending on which switches were enabled. Printing to console with IO.puts will stop the execution that’s why I have added a small function called print_to_console, which executes IO.puts and returns data that was passed as an argument:

  defp print_to_console(data) do
    IO.puts data
    data
  end

Refactoring

Definition of headers is repeated twice in the code (once for user endpoint and once for streams endpoint) Let’s extract the definition into a module attribute:

defmodule Rill do
  @moduledoc """
  Rill is an application to collect data about online streams.
  """

  @headers ["Client-ID": Application.get_env(:rill, :twitch_client_id),
            "Accept": "application/vnd.twitchtv.v5+json"]

and just use @headers in code:

case HTTPoison.get(api_url, @headers)

I would like to output data about users on separate lines (one line per user), and do the same for streams. To achieve this this we will add functions to parse response from users endpoint, streams endpoint, add a function to print users on separate lines, and change function print_to_console/1 to work with lists. First, the pipeline to obtain and print data will be changed. For example, when arguments --with-user and --with-stream are provided, then the pipeline looks as follows:

options[:with_user] && options[:with_stream] ->
  options
  |> get_users
  |> parse_users
  |> print_users
  |> get_streams
  |> parse_streams
  |> print_to_console

Currently, API responses for user information and stream information do not contain a time stamp about when the response has been returned. This time stamp is included in headers of the response in the field “Date”: {"Date", "Sun, 12 Nov 2017 15:02:03 GMT"} We can extract value from this field using list comprehensions and include it to every user. This operation is done in a separate function add_request_date_to_users:

  defp add_request_date_to_users({headers, users}) do
    date = for { "Date", date } <- headers, into: "", do: date
    users
    |> Enum.map(fn(user) -> Map.put(user, :request_date, date) end)
  end

In a similar way, we add request date to every stream. In case of streams, function get_streams/1 is changed to parse every stream and add request date to each of them:

  def get_streams(users) do
    channel_ids = extract_field("_id", users)
    for channel_id <- channel_ids, into: [] do
      Process.sleep(1000)
      channel_id
      |> get_stream_by_user
      |> parse_stream
      |> add_request_date_to_stream
    end
  end

Conclusion

In this part we have explored Twitch API to obtain data about user profiles and their streams. There is a plenty of further work to do.

One thing is to extract giantbomb id and make a request to giantbomb API to obtain data about a game. Another thing to do is to cache user profile information, because it does not change often (querying profile information once a day is enough), and in general think about retry strategy. Finally, result of a request should be saved into a file. Apart from these, there is a task to switch to the latest Twitch API, which was released in September 2017.