How do you append to a Hive array? -
i have hive table user id have ts column, timeseries, stored array. want maintain timeseries recentmost window.
(a) how append new number end of each column table joined id? (b) how drop leading number?
data in hive typically stored in hdfs. hdfs has limited append capabilities. if constant modification of data @ core of analytics systems, perhaps should consider using alternatives hbase or cassandra.
however, if data updates small part of workflow, encourage continue using hive (in order make use of it's sql functionality) reconsider design storing these updates.
a quick solution above problem have more 1 record per user id in table. each record have timeseries corresponding user id. when want last n analysis on timeseries, should select table using distribute by on user id column. custom reducer pick out last n (or less, if size of timeseries less n) timestamps , return them.
harish butani did work on windowing functions in hive. can take @ his work , associated documentation gain more insight. luck, alexy!
Comments
Post a Comment