Wrong I18n key was read from memcached, when using unicorn

Recently I got a error from my project, which reported by hoptoad, the error information like:
I18n::MissingInterpolationArgument: missing interpolation argument in "I ran %{distance}km on %{date}. It took %{time}" ({:pace=>"..."} given)
You will find I was passing the data  {:pace=>"..."} to the translation key: "I ran %{distance}km on %{date}. It took %{time}".  this translation key required 3 parameters, but I only given one. And when I look into the code, I found there is no such a mistake as I mentioned above. Then I gave up to fix this problem. Couple of weeks past, one day, when I was driving back from shanghai to hangzhou, I overheard my colleagues was discussing about this issue, they found this problem is referred to some multi-process issue. Thereafter, I did a more deeper investigation, then finally found the reason and knew how to recur it.

How to recur it?

  1. memcached -vv
  2. watch lsof -i :11211
  3. unicorn -c config/unicorn.conf.rb -E production
  4. ab -n 1000 -c 100 http://localhost:8080 (You have to run this code as soon as possible after ran unicorn command.)
  5. tail -f log/production.log | grep ActionView::Template::Error
Then you would see some error messages in production.log which is the same to message wroten above.

What's the cause?

If you look carefully, when you start unicorn server, all the unicorn workers are using the same memcached connection.
memcached 11700 michaelhe   30u  IPv6 0xffffff800bbe3600      0t0  TCP localhost:11211->localhost:59310 (ESTABLISHED)
ruby      19672 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19719 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19720 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19721 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19722 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19723 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19724 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19725 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19726 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
Then 2 http requests come, they are processed by 2 unicorn processes, then request A send many "get" requests to memcached, as well as request B. but they are using the same memcached connection at beginning, so in extreme condition, request B will get the values which is respond for the request send from request A, whereafter the tragedy happened.   However, after several errors was raised, everything goes well, no error was raised for all the time. I notice one thing:
memcached 11700 michaelhe   30u  IPv6 0xffffff800bbe3600      0t0  TCP localhost:11211->localhost:59310 (ESTABLISHED)
memcached 11700 michaelhe   31u  IPv6 0xffffff800f2b7140      0t0  TCP localhost:11211->localhost:59979 (ESTABLISHED)
memcached 11700 michaelhe   32u  IPv6 0xffffff800bbe54a0      0t0  TCP localhost:11211->localhost:59980 (ESTABLISHED)
memcached 11700 michaelhe   33u  IPv6 0xffffff800f2b7d80      0t0  TCP localhost:11211->localhost:59984 (ESTABLISHED)
memcached 11700 michaelhe   34u  IPv6 0xffffff800f2b83a0      0t0  TCP localhost:11211->localhost:59986 (ESTABLISHED)
memcached 11700 michaelhe   35u  IPv6 0xffffff800f2bfb70      0t0  TCP localhost:11211->localhost:59987 (ESTABLISHED)
memcached 11700 michaelhe   36u  IPv6 0xffffff800f2c0dd0      0t0  TCP localhost:11211->localhost:59991 (ESTABLISHED)
memcached 11700 michaelhe   37u  IPv6 0xffffff800f2b7450      0t0  TCP localhost:11211->localhost:59994 (ESTABLISHED)
ruby      19672 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19719 michaelhe   10u  IPv6 0xffffff800f2b8cd0      0t0  TCP localhost:59984->localhost:11211 (ESTABLISHED)
ruby      19720 michaelhe   11u  IPv6 0xffffff800bbe3f30      0t0  TCP localhost:59986->localhost:11211 (ESTABLISHED)
ruby      19721 michaelhe   10u  IPv6 0xffffff800f2b7a70      0t0  TCP localhost:59979->localhost:11211 (ESTABLISHED)
ruby      19722 michaelhe   10u  IPv6 0xffffff800bbe4550      0t0  TCP localhost:59987->localhost:11211 (ESTABLISHED)
ruby      19723 michaelhe   10u  IPv6 0xffffff800f2bef30      0t0  TCP localhost:59991->localhost:11211 (ESTABLISHED)
ruby      19724 michaelhe   10u  IPv6 0xffffff800f2bec20      0t0  TCP localhost:59980->localhost:11211 (ESTABLISHED)
ruby      19725 michaelhe   10u  IPv6 0xffffff800bbe3910      0t0  TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby      19726 michaelhe   10u  IPv6 0xffffff800f2bf860      0t0  TCP localhost:59994->localhost:11211 (ESTABLISHED)
The connections to memcached of each unicorn worker was reset, each worker has its own connection, so that's why it goes correctly.

How to solve it?

After a little research online, I found the comments in unicorn config file:
after_fork do |server, worker|
  ##
  # Unicorn master loads the app then forks off workers - because of the way
  # Unix forking works, we need to make sure we aren't using any of the parent's
  # sockets, e.g. db connection
Yes, the magic code to fix this problem is:
Rails.cache.reset
I18n.cache_store.reset
add them into the after_fork block in unicorn config file. And what happened then?
memcached 11700 michaelhe   30u  IPv6 0xffffff800f2b7140      0t0  TCP localhost:11211->localhost:60687 (ESTABLISHED)
ruby      24046 michaelhe   10u  IPv6 0xffffff800bbe3600      0t0  TCP localhost:60687->localhost:11211 (ESTABLISHED)
Any other unicorn workers won't use the connection of unicorn master, instead, they will generate a new connection when on demand.
BTW,  this bug happen to both ruby 1.8.7 and REE.
blog comments powered by Disqus