Premium

Cloudera HBase Certification Questions and Answers (Dumps and Practice Questions)



Question :

Your client connects to HBase for the first time to read a row user_1234 located in a table Users. What
process does your client use to find the correct RegionServer to which it should send the request?

  :
1. The client looks up the location of ROOT, in which it looks up the location of META, in which it looks up the
location of the correct Users region.
2. The client looks up the location of the master, in which it looks up the location of META, in which it looks up
the location of the correct Users region.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The client queries the master to find the location of the Users table.




Correct Answer : Get Lastest Questions and Answer :

*The general flow is that a new client contacts the Zookeeper quorum (a separate cluster of Zookeeper nodes)
first to find a particular row key. It does so by retrieving the server name (i.e. host name) that hosts the -
ROOT- region from Zookeeper. With that information it can query that server to get the server that hosts the .
META. table. Both of these two details are cached and only looked up once. Lastly it can query the .META.
server and retrieve the server that has the row the client is looking for.

*The HBase client HTable is responsible for finding RegionServers that are serving the particular row range of
interest. It does this by querying the .META. and -ROOT- catalog tables.After locating the required region(s),
the client directly contacts the RegionServer serving that region (i.e., it does not go through the master) and
issues the read or write request. This information is cached in the client so that subsequent requests need not
go through the lookup process. Should a region be reassigned either by the master load balancer or because
a RegionServer has died, the client will requery the catalog tables to determine the new location of the user
region.





Question :

Your data load application is maintaining a custom versioning scheme (not using the timestamp as the version
number). You accidentally executed three writes to a given cell all with the same version during which time no
flushes have occurred. Which of the three data writes will dBase maintain?
  :
1. None of the writes to cell
2. The last write to cell
3. Access Mostly Uused Products by 50000+ Subscribers
4. All of the writes to cell




Correct Answer : Get Lastest Questions and Answer :

If multiple writes to a cell have the same version, are all versions maintained or just the last?
Currently, only the last written is fetchable. In this post https://issues.apache.org/jira/browse/HBASE-2406 read Kevin Peterson answer (

If multiple writes to a cell have the same timestamp, one of those versions will be maintained, and it is undefined which version will be maintained.

)

Hence, in question there is no appropriate answer given, so best fit will be B


Similarly HBase definitive guide, we find below (Page 384)

When using your own timestamp values, you need to test your solution
thoroughly, as this approach has not been used widely in production.
Be aware that negative timestamp values are untested and, while they
have been discussed a few times in HBase developer circles, they have
never been confirmed to work properly.
Make sure to avoid collisions by using the same value for two separate
updates to the same cell. Usually the last saved value is visible afterward.

In the below link it mentions
http://hbase.apache.org/book/versions.html#ftn.d4029e5669

Remark 8 says (http://hbase.apache.org/book/versions.html#ftn.d4029e5669) : Currently, only the last written is fetchable.







Question :

Your client application needs to write a row to a region that has, recently split. Where will the row be written?


  :
1. One of the daughter regions
2. The original region
3. Access Mostly Uused Products by 50000+ Subscribers
4. The HMaster



Correct Answer : Get Lastest Questions and Answer :

*With a roughly uniform data distribution and growth, eventually all the regions in the table will need to be split
at the same time. Immediately following a split, compactions will run on the daughter regions to rewrite their
data into separate files. This causes a large amount of disk I/O and network traffic.

*Splits run unaided on the RegionServer; i.e. the Master does not participate. The RegionServer splits a
region, offlines the split region and then adds the daughter regions to META, opens daughters on the parent's
hosting RegionServer and then reports the split to the Master.



Related Questions


Question :

While inserting the data to HBase using Put

 :
1. True
2. False





Question :

While schema design which of the following is valid point to keeping StoreFile indices small..



 :
1. Keep ColumnFamily names as small as possible
2. Avoid long verbose attribute names
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2 and 3 are correct
5. Only 1 and 3 are correct




Question :

For storing the data in HBase
Anything that can be converted to an array of bytes can be stored
  :
1. True
2. False




Question :
While storing the values in HBase, how cell size matters..


 :
1. Cell Size, Practical limits to the size of values
2. In general, cell size should not consistently be above 10MB
3. Access Mostly Uused Products by 50000+ Subscribers
4. Both 1 and 2 are wrong


Question :

For the large cell size in HBase

 :
1. Increase the block size
2. Increase the maximum region size for the table
3. Access Mostly Uused Products by 50000+ Subscribers
4. All 1,2 and 3 are correct
5. Only 1 and 3 are correct




Question :

In case of Counters Synchronization is done on the RegionServer and not client side..

 :
1. True
2. False