北京快三开奖

  • <tr id="U9YkSO"><strong id="U9YkSO"></strong><small id="U9YkSO"></small><button id="U9YkSO"></button><li id="U9YkSO"><noscript id="U9YkSO"><big id="U9YkSO"></big><dt id="U9YkSO"></dt></noscript></li></tr><ol id="U9YkSO"><option id="U9YkSO"><table id="U9YkSO"><blockquote id="U9YkSO"><tbody id="U9YkSO"></tbody></blockquote></table></option></ol><u id="U9YkSO"></u><kbd id="U9YkSO"><kbd id="U9YkSO"></kbd></kbd>

    <code id="U9YkSO"><strong id="U9YkSO"></strong></code>

    <fieldset id="U9YkSO"></fieldset>
          <span id="U9YkSO"></span>

              <ins id="U9YkSO"></ins>
              <acronym id="U9YkSO"><em id="U9YkSO"></em><td id="U9YkSO"><div id="U9YkSO"></div></td></acronym><address id="U9YkSO"><big id="U9YkSO"><big id="U9YkSO"></big><legend id="U9YkSO"></legend></big></address>

              <i id="U9YkSO"><div id="U9YkSO"><ins id="U9YkSO"></ins></div></i>
              <i id="U9YkSO"></i>
            1. <dl id="U9YkSO"></dl>
              1. <blockquote id="U9YkSO"><q id="U9YkSO"><noscript id="U9YkSO"></noscript><dt id="U9YkSO"></dt></q></blockquote><noframes id="U9YkSO"><i id="U9YkSO"></i>
                企业空间 推销商城 存储论坛
                北京快三开奖全闪存阵列 IBM云盘算 Acronis 安克诺斯 安腾普 腾保数据
                首页 > Hadoop > 注释

                hadoop容灾才能测试实行

                2015-08-10 12:06泉源:中国存储网
                导读:本次hadoop容灾才能测试实行复杂来讲便是,put一个文件,测试封闭hadoop节点后,文件能否能准确取回。

                本次hadoop容灾才能测试实行复杂来讲便是

                1. put 一个600M文件,疏散3个replica x, 9个block 共18个blocks到4个datanode

                2. 我关失了两个datanode,使得大局部的block只在一个datanode上存在,但由于9个很疏散,以是文件能准确取回(靠的是checksum来盘算文件值)

                3. hadoop namenode很敏捷的复制了仅有一个replica的block使之成为 3 replica(2) but only found 2

                4. 我再关失一个datanode,后果发明每个datanode被很平衡的分派了block,如许即便只要一个datanode,也由于之前有确保2个replicas的比率,以是仍然healthy

                5. 我从这个仅存的datanode中删除一个blk,namenode report这个文件corrupt,(我实在不断很盼望能进safemode,后果-safemode get不断是OFF)

                6. 然后我启动别的一个datanode,30秒不到,这个missing的block被从这个新启动的datanode中敏捷“扩展”为2个replicas

                容灾性十分牢靠,假如运用至多三个rack的话,数据会十分坚硬,对HADOOP信托值 level up!

                起首来理解一下HDFS的一些根本特性

                HDFS设计根底与目的

                硬件错误是常态。因而需求冗余
                流式数据拜访。即数据批量读取而非随机读写,Hadoop善于做的是数据剖析而不是事件处置
                大范围数据集
                复杂分歧性模子。为了低落零碎庞大度,对文件接纳一次性写屡次读的逻辑设计,便是文件一经写入,封闭,就再也不克不及修正
                顺序接纳“数据就近”准绳分派节点实行
                HDFS体系构造

                NameNode
                DataNode
                事件日记
                映像文件
                SecondaryNameNode
                Namenode

                办理文件零碎的定名空间
                记载每个文件数据块在各个Datanode上的地位和正本信息
                和谐客户端对文件的拜访
                记载定名空间内的窜改或空间自身属性的窜改
                Namenode运用事件日记记载HDFS元数据的变革。运用映像文件存储文件零碎的定名空间,包罗文件映射,文件属性等
                Datanode

                担任地点物理节点的存储办理
                一次写入,屡次读取(不修正)
                文件由数据块构成,典范的块巨细是64MB
                数据块只管即便分布道各个节点
                读取数据流程

                客户端要拜访HDFS中的一个文件
                起首从namenode取得构成这个文件的数据块地位列表
                依据列表晓得存储数据块的datanode
                拜访datanode获取数据
                Namenode并不到场数据实践传输
                HDFS的牢靠性

                冗余正本战略
                机架战略
                心跳机制
                平安形式
                运用文件块的校验和 Checksum来反省文件的完好性
                接纳站
                元数据维护
                快照机制
                我辨别实验了冗余正本战略/心跳机制/平安形式/接纳站。上面实行是关于冗余正本战略的。

                情况:

                Namenode/Master/jobtracker: h1/192.168.221.130
                SecondaryNameNode: h1s/192.168.221.131
                四个Datanode: h2~h4 (IP段:142~144)
                为以防文件太小只要一个文件块(block/blk),我们预备一个略微大一点的(600M)的文件,使之能疏散散布到几个datanode,再停失此中一个看有没有题目。 
                先来put一个文件(为了方便起见,发起将hadoop/bin追加到$Path变量后 
                :hadoop fs –put ~/Documents/IMMAUSWX201304 
                完毕后,我们想检查一下文件块的状况,可以去网页上看,也可以在namenode上运用fsck下令来反省一下,关于fsck下令
                :bin/hadoop fsck /user/hadoop_admin/in/bigfile  -files -blocks -locations < ~/hadoopfiles/log1.txt 
                上面打印后果阐明 个600M文件被分别为9个64M的blocks,而且被疏散到我以后一切datanode上(共4个),看起来比拟均匀, 

                /user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  OK 
                0. blk_-4541681964616523124_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010] 
                1. blk_4347039731705448097_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010] 
                2. blk_-4962604929782655181_1011 len=67108864 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010] 
                3. blk_2055128947154747381_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010] 
                4. blk_-2280734543774885595_1011 len=67108864 repl=3 [192.168.221.131:50010, 192.168.221.142:50010, 192.168.221.144:50010] 
                5. blk_6802612391555920071_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010] 
                6. blk_1890624110923458654_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.142:50010, 192.168.221.144:50010] 
                7. blk_226084029380457017_1011 len=67108864 repl=3 [192.168.221.143:50010, 192.168.221.131:50010, 192.168.221.144:50010] 
                8. blk_-1230960090596945446_1011 len=60768970 repl=3 [192.168.221.142:50010, 192.168.221.143:50010, 192.168.221.144:50010]

                Status: HEALTHY 
                Total size:    597639882 B 
                Total dirs:    0 
                Total files:   1 
                Total blocks (validated):      9 (avg. block size 66404431 B) 
                Minimally replicated blocks:   9 (100.0 %) 
                Over-replicated blocks:        0 (0.0 %) 
                Under-replicated blocks:       0 (0.0 %) 
                Mis-replicated blocks:         0 (0.0 %) 
                Default replication factor:    3 
                Average block replication:     3.0 
                Corrupt blocks:                0 
                Missing replicas:              0 (0.0 %) 
                Number of data-nodes:          4 
                Number of racks:               1

                h1s,h2,h3,h4四个DD全部到场,跑去h2 (142),h3(143) stop datanode, 从h4下面get,发明竟然可以get回,并且开端来看,size准确,看一下上图中黄底和绿底都DEAD了,每个blk都有源可以取回,以是GET后数据依然是完好的,从这点看hadoop的确是弱小啊,load balancing也做得很不错,数据看上去很刚强,容错性做得不错

                1
                再反省一下,我原本想测试safemode的,后果隔一会一刷,原本有几个blk只要1个livenode的,如今又被全部复制为确保每个有2个了!    

                hadoop_admin@h1:~/hadoop-0.20.2$ hadoop fsck /user/hadoop_admin/in/bigfile  -files -blocks -locations 
                /user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  
                Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s). 
                0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010] 
                1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010] 
                5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

                我决议再关一个datanode,后果等了好半天也没见namenode发明它去世了,这是由于心跳机制,datanode每隔3秒会向namenode发送heartbeat指令标明它的存活,但假如namenode很永劫间(5~10分钟看设置)没有收到heartbeat即以为这个NODE去世失了,就会做出BLOCK的复制操纵,以包管有充足的replica来包管数占有充足的容灾/错性,如今再打印看看,发明由于只要一个live datanode,以是如今每个blk都有且只要一份

                hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations 
                /user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s). 

                我如今把此中一个BLK从这个仅存的Datanode中移走使之corrupt,我想实行,重启一个DATANODE后,会不会复员 
                hadoop_admin@h4:/hadoop_run/data/current$ mv blk_4347039731705448097_1011* ~/Documents/ 
                然后为了不用要等8分钟DN发block report,我手动修正了h4的dfs.blockreport.intervalMsec值为30000,stop datanode,再start (别的,你应该把hadoop/bin也参加到Path变量前面,如许你可以不带全途径实行hadoop下令,后果,检测它已被破坏 
                hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations 

                /user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 1 replica(s).

                /user/hadoop_admin/in/bigfile/USWX201304: CORRUPT block blk_4347039731705448097
                Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 1 replica(s). 
                Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 1 replica(s). 
                MISSING 1 blocks of total size 67108864 B
                0. blk_-4541681964616523124_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                1. blk_4347039731705448097_1011 len=67108864 MISSING!
                2. blk_-4962604929782655181_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                3. blk_2055128947154747381_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                4. blk_-2280734543774885595_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                5. blk_6802612391555920071_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                6. blk_1890624110923458654_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                7. blk_226084029380457017_1011 len=67108864 repl=1 [192.168.221.144:50010] 
                8. blk_-1230960090596945446_1011 len=60768970 repl=1 [192.168.221.144:50010]

                Status: CORRUPT
                Total size:    597639882 B 
                Total dirs:    0 
                Total files:   1 
                Total blocks (validated):      9 (avg. block size 66404431 B) 
                   纨绔子弟纨绔子弟热情** 
                   CORRUPT FILES:        1 
                   MISSING BLOCKS:       1 
                   MISSING SIZE:         67108864 B 
                   CORRUPT BLOCKS:       1 
                   纨绔子弟纨绔子弟热情** 
                Minimally replicated blocks:   8 (88.888885 %) 
                Over-replicated blocks:        0 (0.0 %) 
                Under-replicated blocks:       8 (88.888885 %) 
                Mis-replicated blocks:         0 (0.0 %) 
                Default replication factor:    3 
                Average block replication:     0.8888889 
                Corrupt blocks:                1 
                Missing replicas:              16 (200.0 %) 
                Number of data-nodes:          1 
                Number of racks:               1


                The filesystem under path '/user/hadoop_admin/in/bigfile' is CORRUPT

                我如今启动一个DATANODE h1s(131),后果很快的在30秒之内,它就被hadoop原地满HP复生了,如今每个blk都有了两份replica 
                hadoop_admin@h1:~$ hadoop fsck /user/hadoop_admin/in/bigfile -files -blocks -locations 
                /user/hadoop_admin/in/bigfile/USWX201304 597639882 bytes, 9 block(s):  Under replicated blk_-4541681964616523124_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_4347039731705448097_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-4962604929782655181_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_2055128947154747381_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-2280734543774885595_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_6802612391555920071_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_1890624110923458654_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_226084029380457017_1011. Target Replicas is 3 but found 2 replica(s). 
                Under replicated blk_-1230960090596945446_1011. Target Replicas is 3 but found 2 replica(s). 
                0. blk_-4541681964616523124_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                1. blk_4347039731705448097_1011 len=67108864 repl=2 [192.168.221.131:50010, 192.168.221.144:50010]
                2. blk_-4962604929782655181_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                3. blk_2055128947154747381_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                4. blk_-2280734543774885595_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                5. blk_6802612391555920071_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                6. blk_1890624110923458654_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]
                7. blk_226084029380457017_1011 len=67108864 repl=2 [192.168.221.144:50010, 192.168.221.131:50010] 
                8. blk_-1230960090596945446_1011 len=60768970 repl=2 [192.168.221.144:50010, 192.168.221.131:50010]

                发明这个文件被从131乐成复制回了144 (h4)。

                结论:HADOOP容灾太坚硬了,我如今深信不疑了!

                别的有一个没有粘出来的提示便是,h4 datanode上有不少重新format遗留上去的badLinkBlock,在重新put统一个文件的时分,hadoop将那些老旧残留的block文件全部都删除了。这阐明它是具有删除有效bad block的功用的。 

                持续阅读
                要害词 :
                Hadoop集群
                中国存储网声明:此文观念不代表本站态度,若有版权疑问请联络我们。
                相干阅读
                产物引荐
                头条阅读
                栏目热门

                Copyright @ 2006-2019 ChinaStor.COM 版权一切 京ICP备14047533号

                中国存储网

                存储第一站,存储流派,存储在线交换平台