MCD emulating memcached with Erlang/OTP using ets, statem and socket

2 minute read

mcd is a memcached compatible API server using some old and new features from Erlang/OTP.

From about memcached:

memcached is a high-performance, distributed memory object caching system, generic in nature, but originally intended for use in speeding up dynamic web applications by alleviating database load.

You can think of it as a short-term memory for your applications.

There is a really nice story based tutorial that describes the itch it scratches.

It has a TCP API that has 3 flavours:

While the binary protocol is now deprecated, it will be supported by memcached for the foreseeable future.

It also has an UDP protocol, which we are going to ignore for the purposes of this article.

mcd

mcd is a memcached compatible API server using some old and new features from Erlang/OTP:

  • We are going to use ETS as our in memory key value store for the cache;
  • socket which allows for asynchronous requests for all communication via accept and recv;
  • send_request for asynchronous request and response;
  • the timeout feature of statem to expire items from the cache;
  • we will define our own protocol OTP behaviour so that we can plug-in different backend implementation of the cache.

Lets start by implementing a very small part of the memcached text protocol, to store and get a value:

telnet localhost 11211
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

set foo 12321 3600 6 
fooval
STORED
get foo
VALUE foo 12321 6
fooval
END

Where 12321 are client flags and 3600 is the time to live for foo.

In Erlang, we can use pattern matching using the bit syntax, to decode this fragment of protocol:

decode(<<"set ", Remainder/bytes>>) ->
    mcd_protocol_text:decode(set, Remainder);

decode(<<"get ", Remainder/bytes>>) ->
    mcd_protocol_text:decode(get, Remainder).

The above two function clauses will match anythiing that starts with either “set “ or “get “ followed by zero of more futher bytes.

To decode a “set” we use a regular expression after splitting the command line from the data block:

decode(set, Remainder) ->
    [CommandLine, DataLine] = split(Remainder),
    data_line(
      re_run(
        #{command => Command,
          subject => CommandLine,
          re => "(?<key>[^\\s]+) "
                "(?<flags>\\d+) "
                "(?<expiry>\\d+) "
                "(?<bytes>\\d+)"
                "( (?<noreply>noreply))?\\s*"}),
      DataLine);

The re_run compiles and matches the regular expression against the command line, returning the components in a map. So our original “set” request of:

set foo 12321 3600 6 
fooval

Is decoded and represented as a map as follows:

#{command => set,
  data => <<"fooval">>,
  expiry => 3600,
  flags => 12321,
  key => <<"foo">>,
  noreply => false}.

In our pluggable storage layer, the set is handled using ETS as follows:

recv(#{message := #{command := set,
                    key := Key,
                    data := Data,
                    expiry := Expiry,
                    flags := Flags},
       data := #{table := Table}}) ->
    ets:insert(Table,
               #entry{key = Key,
                      flags = Flags,
                      expiry = Expiry,
                       data = Data}),
    {continue,
      [{encode, #{command => stored}},
       {expire, #{key => Key, seconds => Expiry}}]}
    end;

In a later article we will look at how continue is handled, to encode the stored command sent back to the client, and the expire handling the TTL.

In the meantime, take a look at postpone resource allocation, an article exploring some other features of statem..