智能文章系统实战-Hive数据仓库(15)

admin 发布于:2018-7-12 16:07  有 2239 人浏览,获得评论 0 条  

1. hive 安装

http://www.wangfeilong.cn/server/118.html

2.查看上节用Hadoop清洗过的数据pvlog.txt
[root@localhost hive]# cat /tmp/pvlog.txt 
192.168.100.1	1530460833	http://news.demo.com/h5.php?action=show&id=89
192.168.100.1	1530460803	http://news.demo.com/h5.php?action=show&id=128

 

3启动HIVE并且创建数据库

[root@localhost hive]# hive
which: no hbase in (/usr/local/soft/hive/bin:/usr/local/soft/Hadoop/hadoop/bin:/usr/local/soft/Hadoop/hadoop/sbin:/usr/local/soft/jdk1.8.0_17/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/soft/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/soft/Hadoop/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/usr/local/soft/hive/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> CREATE DATABASE IF NOT EXISTS article;
OK
Time taken: 7.758 seconds

hive> show databases;
OK
article
default
demo
wordcount
Time taken: 0.229 seconds, Fetched: 4 row(s)


hive> use article;
OK
Time taken: 0.068 seconds

 

4.hive详细日志表

4.1 创建明细表

hive> create table pvlog(
    > ip string,
    > times string, 
    > url string)
    > PARTITIONED BY (stat_date string) 
    > row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.582 seconds

 

4.2 加载文本数据到HIVE数据库表

hive> load data local inpath '/tmp/pvlog.txt' overwrite into table pvlog partition(stat_date='2018-07-01');
Loading data to table article.pvlog partition (stat_date=2018-07-01)
OK
Time taken: 2.383 seconds

 

4.3 查询明细表数据

hive> select * from pvlog where stat_date = '2018-07-01';
OK
192.168.100.1	1530460833	http://news.demo.com/h5.php?action=show&id=89	2018-07-01
192.168.100.1	1530460803	http://news.demo.com/h5.php?action=show&id=128	2018-07-01
Time taken: 4.96 seconds, Fetched: 2 row(s)

 

5.统计数据

5.1 创建统计表

hive> create table stat(
    > stat_date string,
    > pv int, 
    > ip int
    > )
    > row format delimited fields terminated by '\t' stored as textfile;
OK
Time taken: 0.26 seconds

 

5.2 统计数据

hive> insert into stat   
    > select stat_date,count(*) as pv,count(distinct(ip)) as ip from pvlog where stat_date = '2018-07-01' group by stat_date;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20180710175116_136d9e36-a8fc-4d0d-9f91-93dd71aba321
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1531202649478_0010, Tracking URL = http://localhost:8088/proxy/application_1531202649478_0010/
Kill Command = /usr/local/soft/Hadoop/hadoop/bin/hadoop job  -kill job_1531202649478_0010
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2018-07-10 17:51:36,560 Stage-1 map = 0%,  reduce = 0%
2018-07-10 17:51:52,289 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.23 sec
2018-07-10 17:52:07,262 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.86 sec
MapReduce Total cumulative CPU time: 5 seconds 860 msec
Ended Job = job_1531202649478_0010
Loading data to table article.stat
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.86 sec   HDFS Read: 9792 HDFS Write: 83 SUCCESS
Total MapReduce CPU Time Spent: 5 seconds 860 msec
OK
Time taken: 53.74 seconds

 

5.3 查询统计结果

hive> select * from stat;
OK
2018-07-01	2	1
Time taken: 0.36 seconds, Fetched: 1 row(s)

 

6.查看统计的数据文件

 

7.把统计导入MYSQL数据库