博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Hadoop2.3+Hive0.12集群部署
阅读量:4341 次
发布时间:2019-06-07

本文共 12835 字,大约阅读时间需要 42 分钟。

0 机器说明

 

IP

Role

192.168.1.106

NameNodeDataNodeNodeManagerResourceManager

192.168.1.107

SecondaryNameNodeNodeManagerDataNode

192.168.1.108

NodeManagerDataNode

192.168.1.106

HiveServer

1 打通无密钥

    配置HDFS,首先就得把机器之间的无密钥配置上。我们这里为了方便,把机器之间的双向无密钥都配置上。

(1)产生RSA密钥信息

ssh-keygen -t rsa

一路回车,直到产生一个图形结构,此时便产生了RSA的私钥id_rsa和公钥id_rsa.pub,位于/home/user/.ssh目录中。

(2)将所有机器节点的ssh证书公钥拷贝至/home/user/.ssh/authorized_keys文件中,三个机器都一样。

(3)切换到root用户,修改/etc/ssh/sshd_config文件,配置:

RSAAuthentication yesPubkeyAuthentication yesAuthorizedKeysFile      .ssh/authorized_keys

(4)重启ssh服务:service sshd restart 

(5)使用ssh服务,远程登录:

 ssh配置成功。

2 安装Hadoop2.3

     将对应的hadoop2.3的tar包解压缩到本地之后,主要就是修改配置文件,文件的路径都在etc/hadoop中,下面列出几个主要的。

(1)core-site.xml

1 
2
3
hadoop.tmp.dir
4
file:/home/sdc/tmp/hadoop-${user.name}
5
6
7
fs.default.name
8
hdfs://192.168.1.106:9000
9
10

(2)hdfs-site.xml

1 
2
3
dfs.replication
4
3
5
6
7
dfs.namenode.secondary.http-address
8
192.168.1.107:9001
9
10
11
dfs.namenode.name.dir
12
file:/home/sdc/dfs/name
13
14
15
dfs.datanode.data.dir
16
file:/home/sdc/dfs/data
17
18
19
dfs.replication
20
3
21
22
23
dfs.webhdfs.enabled
24
true
25
26

(3)hadoop-env.sh 

主要是将其中的JAVA_HOME赋值:

export JAVA_HOME=/usr/local/jdk1.6.0_27

(4)mapred-site.xml

1 
2
3
4
mapreduce.framework.name
5
yarn
6
7
8
9
mapreduce.jobhistory.address
10
centos1:10020
11
12
13
14
mapreduce.jobhistory.webapp.address
15
centos1:19888
16
17
18
19
mapreduce.task.io.sort.factor
20
100
21
22
23
24
mapreduce.reduce.shuffle.parallelcopies
25
50
26
27
28
mapred.system.dir
29
file:/home/sdc/Data/mr/system
30
31
32
mapred.local.dir
33
file:/home/sdc/Data/mr/local
34
35
36
37
mapreduce.map.memory.mb
38
1536
39
40
41
42
mapreduce.map.java.opts
43
-Xmx1024M
44
45
46
47
mapreduce.reduce.memory.mb
48
2048
49
50
51
52
mapreduce.reduce.java.opts
53
-Xmx1536M
54
55
56
57
mapreduce.task.io.sort.mb
58
512
59
60

  注意上面的几个内存大小的配置,其中Container的大小一般都要小于所能申请的最大值,否则所运行的Mapreduce任务可能无法运行。

(5)yarn-site.xml

1 
2
3
yarn.nodemanager.aux-services
4
mapreduce_shuffle
5
6
7
yarn.nodemanager.aux-services.mapreduce.shuffle.class
8
org.apache.hadoop.mapred.ShuffleHandler
9
10
11
yarn.resourcemanager.address
12
centos1:8080
13
14
15
yarn.resourcemanager.scheduler.address
16
centos1:8081
17
18
19
yarn.resourcemanager.resource-tracker.address
20
centos1:8082
21
22
23
24
yarn.nodemanager.resource.memory-mb
25
2048
26
27
28
yarn.nodemanager.remote-app-log-dir
29
${hadoop.tmp.dir}/nodemanager/remote
30
31
32
yarn.nodemanager.log-dirs
33
${hadoop.tmp.dir}/nodemanager/logs
34
35
36
yarn.resourcemanager.admin.address
37
centos1:8033
38
39
40
yarn.resourcemanager.webapp.address
41
centos1:8088
42
43

 

    此外,配置好对应的HADOOP_HOME环境变量之后,将当前hadoop文件发送到所有的节点,在sbin目录中有start-all.sh脚本,启动可见:

    启动完成之后,有如下两个WEB界面:

http://192.168.1.106:8088/cluster

 

http://192.168.1.106:50070/dfshealth.html

 

使用最简单的命令检查下HDFS:

3 安装Hive0.12

    将Hive的tar包解压缩之后,首先配置下HIVE_HOME的环境变量。然后便是一些配置文件的修改:

(1)hive-env.sh

将其中的HADOOP_HOME变量修改为当前系统变量值。

(2)hive-site.xml

  • 修改hive.server2.thrift.sasl.qop属性

修改为:

 

  • 将hive.metastore.schema.verification对应的值改为false

        强制metastore的schema一致性,开启的话会校验在metastore中存储的信息的版本和hive的jar包中的版本一致性,并且关闭自动schema迁移,用户必须手动的升级hive并且迁移schema,关闭的话只会在版本不一致时给出警告。

  • 修改hive的元数据存储位置,改为mysql存储:
1 
2
javax.jdo.option.ConnectionURL
3
jdbc:mysql://localhost:3306/hive?characterEncoding=UTF-8
4
JDBC connect string for a JDBC metastore
5
6 7
8
javax.jdo.option.ConnectionDriverName
9
com.mysql.jdbc.Driver
10
Driver class name for a JDBC metastore
11
12 13
14
javax.jdo.PersistenceManagerFactoryClass
15
org.datanucleus.api.jdo.JDOPersistenceManagerFactory
16
class implementing the jdo persistence
17
18 19
20
javax.jdo.option.DetachAllOnCommit
21
true
22
detaches all objects from session so that they can be used after transaction is committed
23
24 25
26
javax.jdo.option.NonTransactionalRead
27
true
28
reads outside of transactions
29
30 31
32
javax.jdo.option.ConnectionUserName
33
hive
34
username to use against metastore database
35
36 37
38
javax.jdo.option.ConnectionPassword
39
123
40
password to use against metastore database
41

    

    在bin下启动hive脚本,运行几个hive语句:

4 安装Mysql5.6

 见

5 Pi计算实例、Hive表的计算实例运行

     在Hadoop的安装目录bin子目录下,执行hadoop自带的示例,pi的计算,命令为:

./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 10 10

运行日志为:

1 Number of Maps  = 10 2 Samples per Map = 10 3 14/03/20 23:50:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 4 Wrote input for Map #0 5 Wrote input for Map #1 6 Wrote input for Map #2 7 Wrote input for Map #3 8 Wrote input for Map #4 9 Wrote input for Map #510 Wrote input for Map #611 Wrote input for Map #712 Wrote input for Map #813 Wrote input for Map #914 Starting Job15 14/03/20 23:50:06 INFO client.RMProxy: Connecting to ResourceManager at centos1/192.168.1.106:808016 14/03/20 23:50:07 INFO input.FileInputFormat: Total input paths to process : 1017 14/03/20 23:50:07 INFO mapreduce.JobSubmitter: number of splits:1018 14/03/20 23:50:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1395323769116_000119 14/03/20 23:50:08 INFO impl.YarnClientImpl: Submitted application application_1395323769116_000120 14/03/20 23:50:08 INFO mapreduce.Job: The url to track the job: http://centos1:8088/proxy/application_1395323769116_0001/21 14/03/20 23:50:08 INFO mapreduce.Job: Running job: job_1395323769116_000122 14/03/20 23:50:18 INFO mapreduce.Job: Job job_1395323769116_0001 running in uber mode : false23 14/03/20 23:50:18 INFO mapreduce.Job:  map 0% reduce 0%24 14/03/20 23:52:21 INFO mapreduce.Job:  map 10% reduce 0%25 14/03/20 23:52:27 INFO mapreduce.Job:  map 20% reduce 0%26 14/03/20 23:52:32 INFO mapreduce.Job:  map 30% reduce 0%27 14/03/20 23:52:34 INFO mapreduce.Job:  map 40% reduce 0%28 14/03/20 23:52:37 INFO mapreduce.Job:  map 50% reduce 0%29 14/03/20 23:52:41 INFO mapreduce.Job:  map 60% reduce 0%30 14/03/20 23:52:43 INFO mapreduce.Job:  map 70% reduce 0%31 14/03/20 23:52:46 INFO mapreduce.Job:  map 80% reduce 0%32 14/03/20 23:52:48 INFO mapreduce.Job:  map 90% reduce 0%33 14/03/20 23:52:51 INFO mapreduce.Job:  map 100% reduce 0%34 14/03/20 23:52:59 INFO mapreduce.Job:  map 100% reduce 100%35 14/03/20 23:53:02 INFO mapreduce.Job: Job job_1395323769116_0001 completed successfully36 14/03/20 23:53:02 INFO mapreduce.Job: Counters: 4937     File System Counters38         FILE: Number of bytes read=22639         FILE: Number of bytes written=94814540         FILE: Number of read operations=041         FILE: Number of large read operations=042         FILE: Number of write operations=043         HDFS: Number of bytes read=267044         HDFS: Number of bytes written=21545         HDFS: Number of read operations=4346         HDFS: Number of large read operations=047         HDFS: Number of write operations=348     Job Counters 49         Launched map tasks=1050         Launched reduce tasks=151         Data-local map tasks=1052         Total time spent by all maps in occupied slots (ms)=57358453         Total time spent by all reduces in occupied slots (ms)=2043654         Total time spent by all map tasks (ms)=28679255         Total time spent by all reduce tasks (ms)=1021856         Total vcore-seconds taken by all map tasks=28679257         Total vcore-seconds taken by all reduce tasks=1021858         Total megabyte-seconds taken by all map tasks=44051251259         Total megabyte-seconds taken by all reduce tasks=2092646460     Map-Reduce Framework61         Map input records=1062         Map output records=2063         Map output bytes=18064         Map output materialized bytes=28065         Input split bytes=149066         Combine input records=067         Combine output records=068         Reduce input groups=269         Reduce shuffle bytes=28070         Reduce input records=2071         Reduce output records=072         Spilled Records=4073         Shuffled Maps =1074         Failed Shuffles=075         Merged Map outputs=1076         GC time elapsed (ms)=71077         CPU time spent (ms)=7180078         Physical memory (bytes) snapshot=653192806479         Virtual memory (bytes) snapshot=1914591641680         Total committed heap usage (bytes)=569675776081     Shuffle Errors82         BAD_ID=083         CONNECTION=084         IO_ERROR=085         WRONG_LENGTH=086         WRONG_MAP=087         WRONG_REDUCE=088     File Input Format Counters 89         Bytes Read=118090     File Output Format Counters 91         Bytes Written=9792 Job Finished in 175.556 seconds93 Estimated value of Pi is 3.20000000000000000000

    如果运行不起来,那说明HDFS的配置有问题啊!

    Hive中执行count等语句,可以触发mapduce任务:

 

    如果运行的时候出现类似于如下的错误:

Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

说明元数据存储有问题,可能是以下两方面的原因:

(1)HDFS的元数据存储有问题:

$HADOOP_HOME/bin/hadoop fs -mkdir       /tmp $HADOOP_HOME/bin/hadoop fs -mkdir       /user/hive/warehouse $HADOOP_HOME/bin/hadoop fs -chmod g+w   /tmp $HADOOP_HOME/bin/hadoop fs -chmod g+w   /user/hive/warehouse

(2)Mysql的授权有问题:

在mysql中执行如下命令,其实就是给Mysql中的Hive数据库赋权

grant all on db.* to hive@'%' identified by '密码';(使用户可以远程连接Mysql)grant all on db.* to hive@'localhost' identified by '密码';(使用户可以本地连接Mysql)flush privileges;

    具体哪方面的原因,可以查看hive的日志。

 

-------------------------------------------------------------------------------

如果您看了本篇博客,觉得对您有所收获,请点击右下角的 [推荐]

如果您想转载本博客,请注明出处

如果您对本文有意见或者建议,欢迎留言

感谢您的阅读,请关注我的后续博客

转载于:https://www.cnblogs.com/Scott007/p/3614960.html

你可能感兴趣的文章
always on 完整方案
查看>>
python 网络编程
查看>>
Entity Framework快速入门笔记第四篇—ModelFirst
查看>>
自定义的命民空间在其他程序集里无法调用
查看>>
[poj 2773] Happy 2006 解题报告 (二分答案+容斥原理)
查看>>
Datatable 插入一行数据到第一行
查看>>
vue中axios设置
查看>>
1149 Dangerous Goods Packaging
查看>>
centos网卡配置
查看>>
sigma网格中水平压力梯度误差及其修正
查看>>
基于netty的长连接
查看>>
DKNightVersion的基本使用(夜间模式)
查看>>
Sencha Touch Guide
查看>>
HackerRank "Kundu and Tree" !!
查看>>
Div居中,Div里的内容居中
查看>>
神州通,我看行---K2用户交流会华南站
查看>>
C# winform 中 TabControl 动态显示 TabPage
查看>>
SVM
查看>>
Projects_ILs Parameterization
查看>>
wpf首次项目开发总结之缺陷
查看>>