Monitoring Of Hives – What Exactly It Is
If you are planning to monitor
multiple hives and nodes which are doing the monitoring and auditing work in an
SQL environment, then you will certainly have many reasons to find the next few
lines interesting and informative. It could be useful for those who are
planning to run hive queries using the relevant SQL driver including the likes
of OBDC. However, it is important to have some basic information about the way
in which this monitoring can be done. Let us therefore try and learn something
basic about it.
How
To Go About It
While monitoring multiple hive and
nodes is very much possible there are quite a few things which must be taken
into account. To begin with the quality, quantity and type of input data has a
big role to play. You have to understand that when you go in for hive queries
it could result in different types of jobs depending on the type of query you
have used. The size of the input data is also important. It should be of the
right size to be handled by a single map. This will work to your advantage
because irrespective of how many nodes you might have on the cluster, you could
see only on node being used. However, on the other side of the spectrum if the
input data splits it could lead to multiple mappers being used. In such cases
you could see more than one node being used.
What
Does This Mean
When we look at this aspect of hive monitoring we should understand
quite a few things. In many cases we might not see any difference in the
completion time whether one node is being used is different numbers of
monitoring nodes are being used. This is because at the end of the day there
could be only one map task which might get created and therefore only one node
is being used in both the above cases.
If you wish to know how many mappers
you will have in your hive query, there cannot be a simplistic answer to this.
Comments
Post a Comment