Wrong I18n key was read from memcached, when using unicorn
Recently I got a error from my project, which reported by hoptoad, the error information like:
I18n::MissingInterpolationArgument: missing interpolation argument in "I ran %{distance}km on %{date}. It took %{time}" ({:pace=>"..."} given)You will find I was passing the data {:pace=>"..."} to the translation key: "I ran %{distance}km on %{date}. It took %{time}". this translation key required 3 parameters, but I only given one. And when I look into the code, I found there is no such a mistake as I mentioned above. Then I gave up to fix this problem. Couple of weeks past, one day, when I was driving back from shanghai to hangzhou, I overheard my colleagues was discussing about this issue, they found this problem is referred to some multi-process issue. Thereafter, I did a more deeper investigation, then finally found the reason and knew how to recur it.
How to recur it?
- memcached -vv
- watch lsof -i :11211
- unicorn -c config/unicorn.conf.rb -E production
- ab -n 1000 -c 100 http://localhost:8080 (You have to run this code as soon as possible after ran unicorn command.)
- tail -f log/production.log | grep ActionView::Template::Error
What's the cause?
If you look carefully, when you start unicorn server, all the unicorn workers are using the same memcached connection.memcached 11700 michaelhe 30u IPv6 0xffffff800bbe3600 0t0 TCP localhost:11211->localhost:59310 (ESTABLISHED)
ruby 19672 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19719 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19720 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19721 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19722 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19723 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19724 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19725 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19726 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
Then 2 http requests come, they are processed by 2 unicorn processes, then request A send many "get" requests to memcached, as well as request B. but they are using the same memcached connection at beginning, so in extreme condition, request B will get the values which is respond for the request send from request A, whereafter the tragedy happened.
However, after several errors was raised, everything goes well, no error was raised for all the time. I notice one thing:
memcached 11700 michaelhe 30u IPv6 0xffffff800bbe3600 0t0 TCP localhost:11211->localhost:59310 (ESTABLISHED)
memcached 11700 michaelhe 31u IPv6 0xffffff800f2b7140 0t0 TCP localhost:11211->localhost:59979 (ESTABLISHED)
memcached 11700 michaelhe 32u IPv6 0xffffff800bbe54a0 0t0 TCP localhost:11211->localhost:59980 (ESTABLISHED)
memcached 11700 michaelhe 33u IPv6 0xffffff800f2b7d80 0t0 TCP localhost:11211->localhost:59984 (ESTABLISHED)
memcached 11700 michaelhe 34u IPv6 0xffffff800f2b83a0 0t0 TCP localhost:11211->localhost:59986 (ESTABLISHED)
memcached 11700 michaelhe 35u IPv6 0xffffff800f2bfb70 0t0 TCP localhost:11211->localhost:59987 (ESTABLISHED)
memcached 11700 michaelhe 36u IPv6 0xffffff800f2c0dd0 0t0 TCP localhost:11211->localhost:59991 (ESTABLISHED)
memcached 11700 michaelhe 37u IPv6 0xffffff800f2b7450 0t0 TCP localhost:11211->localhost:59994 (ESTABLISHED)
ruby 19672 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19719 michaelhe 10u IPv6 0xffffff800f2b8cd0 0t0 TCP localhost:59984->localhost:11211 (ESTABLISHED)
ruby 19720 michaelhe 11u IPv6 0xffffff800bbe3f30 0t0 TCP localhost:59986->localhost:11211 (ESTABLISHED)
ruby 19721 michaelhe 10u IPv6 0xffffff800f2b7a70 0t0 TCP localhost:59979->localhost:11211 (ESTABLISHED)
ruby 19722 michaelhe 10u IPv6 0xffffff800bbe4550 0t0 TCP localhost:59987->localhost:11211 (ESTABLISHED)
ruby 19723 michaelhe 10u IPv6 0xffffff800f2bef30 0t0 TCP localhost:59991->localhost:11211 (ESTABLISHED)
ruby 19724 michaelhe 10u IPv6 0xffffff800f2bec20 0t0 TCP localhost:59980->localhost:11211 (ESTABLISHED)
ruby 19725 michaelhe 10u IPv6 0xffffff800bbe3910 0t0 TCP localhost:59310->localhost:11211 (ESTABLISHED)
ruby 19726 michaelhe 10u IPv6 0xffffff800f2bf860 0t0 TCP localhost:59994->localhost:11211 (ESTABLISHED)
The connections to memcached of each unicorn worker was reset, each worker has its own connection, so that's why it goes correctly.
How to solve it?
After a little research online, I found the comments in unicorn config file:Yes, the magic code to fix this problem is:after_fork do |server, worker|### Unicorn master loads the app then forks off workers - because of the way# Unix forking works, we need to make sure we aren't using any of the parent's# sockets, e.g. db connection
Rails.cache.reset
I18n.cache_store.reset
add them into the after_fork block in unicorn config file. And what happened then?
memcached 11700 michaelhe 30u IPv6 0xffffff800f2b7140 0t0 TCP localhost:11211->localhost:60687 (ESTABLISHED)
ruby 24046 michaelhe 10u IPv6 0xffffff800bbe3600 0t0 TCP localhost:60687->localhost:11211 (ESTABLISHED)
Any other unicorn workers won't use the connection of unicorn master, instead, they will generate a new connection when on demand.
BTW, this bug happen to both ruby 1.8.7 and REE.