The goal of the project Rill is to collect data about online streams from Twitch (and, possibly, other streaming platforms) for further analysis.
- Set up Twitch client ID according to: http://blog.danielberkompas.com/elixir/2015/03/21/manage-env-vars-in-elixir.html
The process to obtain data about streams for a particular user looks like this:
- Find user’s username (e.g., from a Twitch URL)
- Make a request to Twitch API to convert username to stream id.
- Make a request to Twitch API to obtain data about user’s stream (is there a live steam, is there a recording being played)
In stage 1 we will write a simple functions to explore Twitch API.
To issue requests to Twitch API we will use HTTPoison library.
Step one is to obtain Stream ID from a username.
Given Twitch URL https://www.twitch.tv/nuke73 username is nuke73.
Let’s use Elixir’s awesome OptionParser module to receive usernames as a command-line argument.
We expect to receive a comma-separated list of usernames.
To issue a request to Twitch API we setup proper headers with Twitch Client ID and url.
Then we send a request to Twitch API using HTTPoison and display response on the command-line.
If an error occurred then we will write it to stderr
with
IO.puts(:stderr, reason)
To build an executable with escript
we must add escript configuration parameter
that point escript to the right module (in mix.exs):
def project do
[
...
escript: [main_module: Rill],
...
]
end
Now execution of our fresh CLI script:
./rill --usernames=nuke73,richard_hammer
returns json with data about two Twitch users.
Field _id
can be used to fetch data about corresponding streams.
Let’s write functions that fetch data about user’s stream.
Firstly, Twitch API response returned in JSON format, so we should parse JSON in Elixir. To parse JSON in Elixir we will use Poison library. Parsing response is very simple:
Poison.Parser.parse!(body)
Poison will parse response into something like:
%{"_total" => 1,
"users" => [%{"_id" => "64341520", "bio" => nil,
"created_at" => "2014-06-14T11:45:40.184851Z", "display_name" => "Nuke_73",
"logo" => nil, "name" => "nuke_73", "type" => "user",
"updated_at" => "2017-08-28T23:02:50.810351Z"}]}
Now we should loop over each user Map and extract value of the field "_id". To extract this field I have used list comprehensions:
for element <- source, into: [], do: Map.get(element, field)
Where source is list of all user Maps, element is an element of this list, and field is required "_id" field. This list comprehension will return a list of user ids, which also serve as channel ids.
Obtaining data about streams is easy when channel ids are extracted. The only thing left is to issue requests to the correct endpoint for each channel id. This will be achieved with another function with list comprehension:
def get_streams(body) do
channel_ids = extract_channel_id(body)
for channel_id <- channel_ids do
Process.sleep(1000)
get_stream_by_user(channel_id)
end
end
We will make our script to play nicely with potential rate limiting and add a 1 second delay between API calls.
Functions for fetching data about users and streams are very similar:
def get_user(options) do
usernames = options[:usernames] |> String.downcase()
headers = ["Client-ID": Application.get_env(:rill, :twitch_client_id),
"Accept": "application/vnd.twitchtv.v5+json"]
api_url = "https://api.twitch.tv/kraken/users?login=#{usernames}"
case HTTPoison.get(api_url, headers) do
{:ok, %HTTPoison.Response{status_code: 200, body: body}} ->
body
{:ok, %HTTPoison.Response{status_code: 400, body: body}} ->
IO.puts(:stderr, body)
{:error, %HTTPoison.Error{reason: reason}} ->
IO.puts(:stderr, reason)
end
end
We set up correct headers and API endpoint for our request and send the request with HTTPoison. If everything is fine, body of the response is returned, otherwise body of a successful request with status code other than 200 or error message is printed to stderr.
As an extra step I have added command-line switches to print data only about users or streams or both.
Function to parse arguments now contains definitions for three switches:
defp parse_args(args) do
{options, _, _} = OptionParser.parse(args,
strict: [usernames: :string, with_user: :boolean, with_stream: :boolean])
options
end
With those switches call to our command-line application will look like:
./rill --usernames=nuke73,tornis --with-stream --with-user
In order to choose which path to take (based on provided switches), I have created a function
process
, which holds decision logic:
def process([]), do: IO.puts(:stderr, "No arguments given")
def process(options) do
cond do
options[:usernames] == nil ->
IO.puts(:stderr, "Comma-separated list of usernames is expected")
options[:with_user] && options[:with_stream] ->
options |> get_users |> print_to_console |> get_streams |> IO.puts
options[:with_stream] ->
options |> get_users |> get_streams |> IO.puts
true ->
options |> get_users |> IO.puts
end
end
There are just three pipes that execute functions in order, depending on which switches were
enabled. Printing to console with IO.puts
will stop the execution that’s
why I have added a small function called print_to_console
, which executes
IO.puts
and returns data that was passed as an argument:
defp print_to_console(data) do
IO.puts data
data
end
Refactoring
- Definition of headers is repeated twice in the code (once for user endpoint and once for streams endpoint) Let’s extract the definition into a module attribute:
defmodule Rill do
@moduledoc """
Rill is an application to collect data about online streams.
"""
@headers ["Client-ID": Application.get_env(:rill, :twitch_client_id),
"Accept": "application/vnd.twitchtv.v5+json"]
and just use @headers
in code:
case HTTPoison.get(api_url, @headers)
- I would like to output data about users on separate lines (one line per user), and do the same for streams.
To achieve this this we will add functions to parse response from users endpoint, streams endpoint,
add a function to print users on separate lines, and change function
print_to_console/1
to work with lists. First, the pipeline to obtain and print data will be changed. For example, when arguments--with-user
and--with-stream
are provided, then the pipeline looks as follows:
options[:with_user] && options[:with_stream] ->
options
|> get_users
|> parse_users
|> print_users
|> get_streams
|> parse_streams
|> print_to_console
- Currently, API responses for user information and stream information do not contain
a time stamp about when the response has been returned. This time stamp is included
in headers of the response in the field “Date”:
{"Date", "Sun, 12 Nov 2017 15:02:03 GMT"}
We can extract value from this field using list comprehensions and include it to every user. This operation is done in a separate functionadd_request_date_to_users
:
defp add_request_date_to_users({headers, users}) do
date = for { "Date", date } <- headers, into: "", do: date
users
|> Enum.map(fn(user) -> Map.put(user, :request_date, date) end)
end
In a similar way, we add request date to every stream. In case of streams,
function get_streams/1
is changed to parse every stream and add request date to each of them:
def get_streams(users) do
channel_ids = extract_field("_id", users)
for channel_id <- channel_ids, into: [] do
Process.sleep(1000)
channel_id
|> get_stream_by_user
|> parse_stream
|> add_request_date_to_stream
end
end
Conclusion
In this part we have explored Twitch API to obtain data about user profiles and their streams. There is a plenty of further work to do.
One thing is to extract giantbomb id and make a request to giantbomb API to obtain data about a game. Another thing to do is to cache user profile information, because it does not change often (querying profile information once a day is enough), and in general think about retry strategy. Finally, result of a request should be saved into a file. Apart from these, there is a task to switch to the latest Twitch API, which was released in September 2017.