小塌客

石头的博客

MongoDB client IP

最近mongodb出现了大量的慢查询,由于我们前端有大量的app server, 需要定位到具体的机器,在mongodb的日志中默认是没有client ip的:
Mon Apr 16 01:41:38 [conn327586] update db.coll query: ...... upsert:1 112ms
但是会有当前对应的连接ID,这里就是conn327586, 在每次client与mongodb server建立连接时会记录client的IP和所产生的连接ID, 可以通过再反向查找“accepted from.*conn327586"这样的日志就可以找到对应的client IP了

我们系统中对一处group的操作使用了memcache来缓存查询结果,但是最近还是出现大量了group操作导致锁进程,出现了严重的性能问题,查看后发现memcache是有效的,但是有大量的进程同时遇到了缓存超时的时间点,导致同时执行刷新缓存的操作,问题也就产生了,下面是统计脚本:
awk '/group/{print $5, $4}' mongodb.log | sort -k 2| awk '{split(a[$1], b, " "); b[length(b)+1]=$2; a[$1]=""; for(i in b){a[$1]=sprintf("%s %s", a[$1], b[i])}};END{for(i in a){printf "%s %s\n", i, a[i]}}' > sta.txt
这个命令会列出同一连接(connection ID)对应的所有group的执行时间,通过这个结果上面的问题很容易就可以看到了,不过这个命令有一个小问题就是不会把时间顺序排列,需要再进一步处理一下,实在是想不出怎么用shell处理

MongoDB Sharding设计

这篇文章是shitou在目前的company(外企)的wiki上写的,是英文的,主要是关于mongodb sharding的设计以及主要问题的考虑,过于细节和基础的问题在这里并没有涉及,可以参考mongodb的官方文档

 

 

Sharding Key

Most of our queries are based monitor_id, timestamp and location, and according to the sharding key design principle: 

  • Single read should be finished in one shard.
  • All reads should be distributed to all shards.
  • Writes should be distributed to all shards.

so the sharding key will be: {xxx_id: 1, xxx: 1, xxx: 1}, Remember the sequence of it when creating index, the sequence is extremely important, because database index follows the LEFT MATCH principle.

 

Query

Most queries will hit the sharing key, however, we can not cover all situations and requirements are changing, we can not change the sharding key when we create it in database. When query not hit the sharding key, it will scan all the shards, it will be slower than hitting the sharding key query, but it is acceptable, we don't have too much resource to do this benchmark in this situation, but in mongodb official says, it will be still fast when the shards servers is under 10.

When non-sharing-key query scans all shards we need to make sure that it will still hit the index, otherwise, it will be definitely vey slow.

 

Chunks

The default chunk size is 64MB, when one chunk is bigger than that number, mongodb will split the chunk into two chunks, and move it to another shard.

  • If the number is too small, which will make mongodb be busy splitting chunks, especially when our data grow fast.
  • If the number is too big, it will make mongodb cost too much time to move chunks to another server.

we will use the default number in the beginning, and then to see if we need to increase the number or decrease the number.

 

When mongodb move chunks from one shard to another shard, the server load will grow, which maybe will cause performance problems, but we can control the chunk moving process, it's the balancer process, for example, if we notice that every time chunk moving process cause us a problem, we can stop the balancer process:

 

$mongo #connect to mongos

>use config;

>db.settings.update({_id: "balancer"}, {$set: {stopped: true}}, true)

 

Then mongodb will stop moving chunks, but it will still do the chunk splitting job, and it leaves to us to decide when to move the chunks manually:

 

$mongo #connect to mongos

>use admin;

>db.printShardingStatus();

>db.runCommand({moveChunk: "testdb.users", find: {login: "rock26944"}, to: "shard0000"})

 

We can also write a crond job to do this.

 

Failover

When one shard is down, the query hit this shard will fail, so we will make every shard a replica set. About replica set please refer other documentations.

When one shard(it's a replica set) is down, and the query hit this shard, and the other shard, the query will fail default, but we can add a query option({partial: 1}) to make the query just return partial data, not raise an socket error, currently the ruby mongo driver support this feature, but it's not able to put this config option to a config file, so it's difficult to use this feature right now. But i think the replica set will give us a lot of insurance.

 

Problems

  1. How much memory does mongos need? Will it become a bottleneck?
  2. Does mongos support :read_secondary internal?
  3. How much volumes does the monitor system support in design? We need to estimate the data growth speed.

 

Odds

In sharding environment, we can not use group query, use map/reduce to substitute.

All index must be ascending, can not be descending.

Once you do shard on a collection, it is very difficult to un-shard it (you still can, but tough).

 

Scripts and Step-by-step

I have write a script for staring mongodb sharding environment easily, put this script to your ~/.profile, 

mongod_sharding_restart() {
  killall mongod
  killall mongos
  sleep 2

  [ ! -d /data/configdb ] && mkdir -p /data/configdb
  [ ! -d /data/db ] && mkdir -p /data/db
  [ ! -d /data/shard2_db ] && mkdir -p /data/shard2_db
  rm -f /data/configdb/mongod.lock
  rm -f /data/db/mongod.lock
  rm -f /data/shard2_db/mongod.lock
  #config server
  mongod --fork --logpath /data/shard_config.log --logappend --configsvr &
  #wait for config server
  sleep 3

  #mongos
  mongos --fork --logpath /data/shard_mongos.log --logappend --configdb localhost:27019 &

  #shard1 (mongod)
  mongod --fork --logpath /data/shard_shard1.log --logappend --shardsvr &
  #shard2 (mongod)
  mkdir -p /data/shard2_db
  mongod --fork --logpath /data/shard_shard2.log --logappend --shardsvr --port 37018 --dbpath /data/shard2_db &

  sleep 2
  ps aux | egrep "mongod|mongos"
}
export -f mongod_sharding_restart

and then:

#source ~/.profile

 

Then you can start mongos, config server, and two shards using one command:

#mongod_sharding_restart 

 

 

2.

Connect to mongos:

#mongo

>db.printShardingStatus();

>db.runCommand({addShard: "localhost:27018"});

>db.runCommand({addShard: "localhost:37018"});

>db.printShardingStatus();

  ...

  shards:

      { "_id" : "shard0000", "host" : "localhost:27018" }

      { "_id" : "shard0001", "host" : "localhost:37018" }

  ...

  ...

>db.runCommand({enableSharding: "testdb"});

>db.runCommand({shardCollection: "testdb.collection", key: {_id: 1}})

>db.printShardingStatus();

 

You can check the mongos and shard logs to see the status.

 

>use testdb;

>db.printCollectionStats();

 

 

 




Tri-survive - HTML5 Game

其实这个游戏早在2011年7月份,就是shitou在研究impactjs的时候做的,由于后来一直很忙,加上当时做的还不完善,问题多多,就一直没有在这里公开出来,现在就放出来给对impactjs感兴趣的同学参考下

 

http://www.ccok.me/photo/image/83/medium/logo.png

游戏名称, Tri-survive, 游戏使用impactjs引擎开发,游戏中图片素材是shitou自己PS的, 也有一部分是从其他游戏中扣过来了, 音效来自于http://www.partnersinrhyme.com/pir/PIRsfx.shtmlhttp://opengameart.org/, 上面两个网站的资源都是开源的, 对程序员来说是再好不过的资源了

 

还有很多不完善的地方,半成品都算不上,点击游戏右上的HELP按钮查看操作帮助吧, Tri-survive

Cut the rope - HTML5版

微软最近又有新动作了,和cut the rope合作推出了游戏的HTML5版,而且开发团队也给出了开发日志.

 

cut the rope

 

json_formatter

最近一直在用sinatra做API, OAuth2的东西, 调试用的curl, 数据的返回格式是json, 默认情况下一对一对字符串是直接在console输出出来的,不方便查看,像这样:
{"accounts":[{"name":null,"id":"4ea060742c76682ab800a807","api_url":"https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea060742c76682ab800a807","email":"skidave32289@gmail.com"},{"name":null,"id":"4ea06b5e2c76682aa6002ac5","api_url":"https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea06b5e2c76682aa6002ac5","email":"skidave32289+20@gmail.com"},{"name":null,"id":"4ea071862c766848c3000055","api_url":"https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea071862c766848c3000055","email":"skidave32289+21@gmail.com"},{"name":null,"id":"4ea075e02c766848c8000ae5","api_url":"https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea075e02c766848c8000ae5","email":"skidave32289+22@gmail.com"}]}
今天在rubyflow上看到一个不错的工具gem: json_formatter, 直接gem install后,就可以在命令行用了:
#curl -H 'EXAMPLE-API-KEY: b60b5400dd6f012eg16c12313b0ecdxc2' "https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts?per_page=100" | json_formatter 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  5812  100  5812    0     0   3358      0  0:00:01  0:00:01 --:--:-- 11049
{
  "accounts": [
    {
      "name": null,
      "id": "4ea060742c76682ab800a807",
      "email": "skidave32289@gmail.com",
      "api_url": "https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea060742c76682ab800a807"
    },
    {
      "name": null,
      "id": "4ea06b5e2c76682aa6002ac5",
      "email": "skidave32289+20@gmail.com",
      "api_url": "https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea06b5e2c76682aa6002ac5"
    },
    {
      "name": null,
      "id": "4ea071862c766848c3000055",
      "email": "skidave32289+21@gmail.com",
      "api_url": "https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts/4ea071862c766848c3000055"
    }
  ]
}

#也可以直接输入字符串
#json_formatter "{}"

ps: 关于curl
上面提到了curl,就再多说一点吧,在用curl进行POST请求添加数据时需要注意, 使用 -d "name=value" 的方法是不会自动给value进行编码(url escape)的, 如果想让curl给数据进行escape需要使用 --data-urlencode 选项:
#curl -d "site=www.ccok.me" --data-urlencode "email=li+lei.hand-some2@yottaa.com" -H 'EXAMPLE-API-KEY: b60b5400dd6f012ec57c123112b0ecda2' "https://api-dev.example.com/partners/4ea05b812c76682ab800a0b0/accounts"

Google Analytics更新了

又快半年没更新了.., 为了证明shitou's blog仍然活着, 凑点字数吧..

 

最近Google的活动又多了起来, 收购Moto的手机业务, Android 4.0的发布, Google Music, 新的UI页面..

 

今天收到封Google Analytics的邮件, 说有更新了, 主要有:

实时的数据更新. 现在可以看到几秒之前用户活动信息了

Google Realtime

 

Multi-Channel Funnels, 这个是针对电子商务网站, 提供各种用户来源的转化率.

Google Multi-Channel Funnels

 

Mobile Reporting, 移动端数据统计的加强, 之前GA已经支持查看访问来源的UA(User Agent)了,不过功能比较简单,这次是在原来基础上的加强,支持直接查看访问用户的手机型号, 还有手机图片.

Google Mobile Reporting

 

Flow Visualization, 这个功能也是之前功能的加强,以更直观的曲线图查看用户的访问数据.

Google Flow Visualization

 

上面的图可能会被q-i-a-n-g掉, uncle fu****