The “memcached” Explained

Before we dive in with memcached, a little disclaimer. The text you are about to read is written for my own reference and studying. If you are unsure about something, or it does not make sense to you, please read memcached wiki page.

TL;DR: How memcached works?

It sits between your application and storage device. So, let’s say you have server with NGINX as a webserver. It accepts connections and everything is fine. A client request an image out of your server. NGINX accepts connection, it looks at header, finds that directory, finds that image, and serves it back to the client. This process is slow as fuck. Memcached can easily handle 200.000 requests per second and it can serve request in less than a millisecond (holly molly). It’s OK if your website has 1000 connections per minute or so, but is terrible for sites that serve millions of clients per second. It would be great for NGINX this image is already in RAM waiting to be cherry-picked by NGINX. This way, NGINX would not have to look for image in shitty slow storage device (even SSD compared to RAM is slow). So, normally, whatever client requests, in most cases web server fetches it from storage device or from database. To avoid this, memcached is made which literally holds data in RAM ready to be served to any process that requests for it. If image is not found in RAM by NGINX, then NGINX looks for it in database or storage device. Basically, you would store most frequently served data inside memcached pool so any process that looks for it has it right away. That’ it.

How memcached works?

  • Memcached is in-memory key-value storage for small data such as strings, objects or even images which are usually pulled from results of database queries.
  • Memcached sits between process and storage device and keeps track of what the commonly read data is and stores it in RAM. This is done in favor of speed and performance because if data is already in RAM, it is served right away. If data is not in RAM, that is memcached pool, then database is queried (which is slow as hell compared to RAM).
  • Memcached stores every piece of data and associates an ID to that data. Stored data is called value, and ID is called key – this explains key-value storage definition of memcached.
  • When client requests data, memcached checks for that data inside cache (pool). If data is available, it returns it to client. If data is not available, memcached queries storage device or database and then returns data to the client. If there is any change in stored data, memcached updates its own cache (pool) so that it can serve exact copy of data from database or storage device to its clients.
  • Memcached is best suited for large websites with heavy traffic, for servers that have frequent queries (requests). If most frequent requests are cached, that is inside pool controlled by memcached, then website load time is reduced drastically
  • Memcached is not suited for websites with frequent updates because cache needs to refresh itself always. This update process slows down page load time.
  • Memcached cannot store objects (data) larger than 1MB and keys longer than 250 characters.
  • System on which memcached is installed (server) does not care what your data looks like and it does not understand data types. It is made of a key which a string of up to 250 bytes, expiration time in seconds (0 for never), optional flags, and raw data (value)
  • You can have multiple memcached instances on one system. However, data cannot be shared between those instaces, nor instances can talk to each other (read more about mcrouter to implement cross-talk. Mcrouter is, simply put, a load balancer for memcached instances). Also, instances are unaware of each other, do not sync data with each other, and there is no replication present (data duplication)
  • Memcached is LRU (Least Recently Used) cache. This means least used data is evicted from cache if more space is needed for new items in cache. Also, items can expire after specific amount of time or can be set to never expire. We can evict items after one minute in order to limit stale data being returned to clients, or even flush unused data to get more fresh data or data which is most frequently asked for by clients.
  • Memcached is not supported on Windows. Best suited for *nix operating systems, preferably Linux.
  • Memcached is low on CPU usage because it responds very, very fast. It is multi-threaded process which uses 4 threads by default, but most installations require only one memcached thread. and takes as much memory as you give it, that is size of your pool.
  • Goal of memcached is to gather memory sections from multiple hosts and make your process see it as one large section of memory. More memory, the better.

Command-line arguments:

 # memcached -h // memcached documentation
 # memcached -m 1000 // tells memcached how much RAM to use for items storage (in MB)
 # memcached -d // tells memcached to daemonize (to run in background)
 # memcached -v // controls verbosity for STDOUT or STDERR. Also, -vv puts more verbosity
 # memcached -p 11222 // TCP port on which memcached will listen (default is 11211)
 # memcached -U 11223 // UDP port on which memcached will listen (-U 0 to disable UDP)
 # memcached -l eth0 // specify interface/IP on which memcached will listen

NOTE: NEVER EXPOSE MEMCACHED DIRECTLY TO INTERNET.

Connection Limit:

  • By default, max number of connections at one time is 1024. If more arrive, they will hang and wait for slots to free up. Use stats command to see if system is running out of connections. Also, look for listen_disabled_num which should be zero or close to zero
  • Threading is used to scale memcached across multiple CPU cores. Each thread handles many connections in parallel. However, Apache uses one thread per active client connection.

Let’s use memcached:

First, let’s connect to our memcached instance with telnet on port 11211 and type stats to see what info memcached returns.

 # apt install memcached
 # service memcached status
 # telnet localhost 11211
 stats

This will show a ton of directives, and right now let’s dive into most important ones:

  • curr_connections – number of active clients connected. Always make sure this number does not come too close to max connection setting (-c flag)
  • listen_disabled_num – shows number of times memcached has hit its connection limit. In other words, how many times memcached had maximum number of clients connected to it
  • accepting_conns – if 0 then memcached is not accepting connections
  • limit_maxbytes – same as -m flag. This tells how much memcached’s pool is large.
  • cmd_flush – every time someone types flush_all command, all items inside cache are invalidated. When that happens, this counter will increment by one. Watch this in production and sound alarms if it starts moving because your data is erased from pool.
  • curr_items – number of items currently held by memcached. Those would be key-value pairs that are served to process when asked.
  • evictions – number of items that are evicted (ditched) for some reason (maybe to evict stale data so we can serve fresh one, and so on)
  • threads – how many threads memcached uses
    Read more about these in memcached official wiki page.

Leave a Reply