Premium

Mapr (HP) HBase Developer Certification Questions and Answers (Dumps and Practice Questions)



Question : There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the
article. You decided to use HBase rather than HDFS to store this article. You have created an ARTICLES table in HBase to store all the versions of the articles
in this table. Select the Column Family attribute settings which will retain at least one version of an article always but expire all other versions that are older than
1 month (30 Days) for a given Column Family?
  :  There is a feature provided in QuickTechie.com website that any Software Professional can create an article as well as can update and delete the
1. LENGTH = 30, MIN_VERSIONS = 1
2. TTL = 30, VERSIONS = 1

3. Access Mostly Uused Products by 50000+ Subscribers

4. TTL = 2592000 , MIN_VERSIONS = 1




Correct Answer : Get Lastest Questions and Answer :
Explanation: ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions
of a row - even the current one. The TTL time encoded in the HBase for the row is specified in UTC. Store files which contains only expired rows are deleted on minor compaction.
Setting hbase.store.delete.expired.storefile to false disables this feature. Setting link:[minimum number of versions] to other than 0 also disables this. See HColumnDescriptor
for more information. Recent versions of HBase also support setting time to live on a per cell basis. See HBASE-10560 for more information. Cell TTLs are submitted as an attribute
on mutation requests (Appends, Increments, Puts, etc.) using Mutation#setTTL. If the TTL attribute is set, it will be applied to all cells updated on the server by the operation.
There are two notable differences between cell TTL handling and ColumnFamily TTLs: Cell TTLs are expressed in units of milliseconds instead of seconds. A cell TTLs cannot extend
the effective lifetime of a cell beyond a ColumnFamily level TTL setting.
The maximum number of row versions to store is configured per column family via HColumnDescriptor. The default for max versions is, but rather stores different values per row by
time (and qualifier). Excess versions are removed during major compactions. The number of max versions may need to be increased or decreased depending on application needs.
It is not recommended setting the number of max versions to an exceedingly high level (e.g., hundreds or more) unless those old values are very dear to you because this will
greatly increase StoreFile size.
HBase keeps track of timestamp for each row, TTL(time-to-live) and MIN_VERSIONS are used to control 'how many rows' and 'how many versions' to keep after major compactions.
TTL uses unit of seconds, therefore 5 days equal to 432000(5X24X60X60) seconds. MIN_VERSIONS controls how many min number of copy to keep. For example,MIN_VERSIONS = 1
instructs HBase to keep at least one copy.
Minimum Number of Versions : Like maximum number of row versions, the minimum number of row versions to keep is configured per column family via HColumnDescriptor.
The default for min versions is 0, which means the feature is disabled. The minimum number of row versions parameter is used together with the time-to-live parameter
and can be combined with the number of row versions parameter to allow configurations such as "keep the last T minutes worth of data, at most N versions, but keep at
least M versions around" (where M is the value for minimum number of row versions, M less than N). This parameter should only be set when time-to-live is enabled for a
column family and must be less than the number of row versions.
"ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row - even
the current one. The TTL time encoded in the HBase for the row is specified in UTC . By default, delete markers extend back to the beginning of time. Therefore, Get or Scan
operations will not see a deleted cell (row or column), even when the Get or Scan operation indicates a time range before the delete marker was placed. ColumnFamilies can
optionally keep deleted cells. In this case, deleted cells can still be retrieved, as long as these operations specify a time range that ends before the timestamp of any
delete that would affect the cells. This allows for point-in-time queries even in the presence of deletes.
Deleted cells are still subject to TTL and there will never be more than "maximum number of versions" deleted cells. A new "raw" scan options returns all deleted rows and
the delete markers. Example 18. Change the Value of KEEP_DELETED_CELLS Using HBase Shell
hbase> hbase> alter 't1', NAME => 'f1', KEEP_DELETED_CELLS => true
Example 19. Change the Value of KEEP_DELETED_CELLS Using the API
...
HColumnDescriptor.setKeepDeletedCells(true);






Question : Given that following is your entire data set:

23 column=Engineers:FirstName, timestamp=1331314762084, value=Shobhit
23 column=Engineers:Payment, timestamp=1331314762086, value=800000
23 column=TechnicalSkills:1_FirstSkill, timestamp=1331314762089, value=J2EE
23 column=TechnicalSkills:2_AnotherSkill, timestamp=1331314762092, value=Java

How many sets of physical files will be read during a scan of the entire data set immediately following a major compaction?
  :   Given that following is your entire data set:
1. One
2. Two
3. Access Mostly Uused Products by 50000+ Subscribers
4. Four



Correct Answer : Get Lastest Questions and Answer :

This table consists of one unique rowkey(23) and 2 different column families (Engineers, TechnicalSkills). All data for a given row in the table is managed together in a region.
Region size is configurable between 256Mb to 20Gb. In this example, 4 rows can fit within the lowest default region size, 256 Mb, therefore there is one region for this dataset.

Regions are the basic element of availability and distribution for tables, and are comprised of a Store per Column Family. The heirarchy of objects is as follows:
Table (HBase table)
Region (Regions for the table)
Store (Store per ColumnFamily for each Region for the table)
MemStore (MemStore for each Store for each Region for the table)
StoreFile (StoreFiles for each Store for each Region for the table)
Block (Blocks within a StoreFile within a Store for each Region for the table)

Determining the "right" region size can be tricky, and there are a few factors to consider:

HBase scales by having regions across many servers. Thus if you have 2 regions for 16GB data, on a 20 node machine your data will be concentrated on just a few machines - nearly
the entire cluster will be idle. This really cant be stressed enough, since a common problem is loading 200MB data into HBase then wondering why your awesome 10 node cluster isn't
doing anything.

On the other hand, high region count has been known to make things slow. This is getting better with each release of HBase, but it is probably better to have 700 regions than
3000 for the same amount of data.

There is not much memory footprint difference between 1 region and 10 in terms of indexes, etc, held by the RegionServer.

When starting off, it's probably best to stick to the default region-size, perhaps going smaller for hot tables (or manually split hot regions to spread the load over the cluster),
or go with larger region sizes if your cell sizes tend to be largish (100k and up).







Question : As an HBase administrator at Acmeshell.com you have configured HBase to store a maximum of versions.
You have inserted 7 versions of your data in a Column Family called Acmeshell. At what point are the older versions removed from Acmeshell?
  :  As an HBase administrator at Acmeshell.com you have configured HBase to store a maximum of  versions.
1. Never, the older version has to be manually deleted.
2. The older versions are removed at major compaction.
3. Access Mostly Uused Products by 50000+ Subscribers
4. The older versions are removed at minor compaction.



Correct Answer : Get Lastest Questions and Answer : Deletion in HBase
When a Delete command is issued through the HBase client, no data is actually deleted. Instead a tombstone marker is set, making the deleted cells effectively invisible.
User Scans and Gets automatically filter deleted cells until they get removed.HBase periodically removes deleted cells during compactions.

The tombstone markers are only deleted during major compactions (which compacts all store files to a single one), because in order to prove that a tombstone marker has no
effect HBase needs to look at all cells. There are three types of tombstone markers:
version delete marker
Marks a single version of a column for deletion
column delete marker
Marks all versions of a column for deletion
family delete marker
Marks all versions of all columns for a column family for deletion

It is also possible to add a maximum time stamp to column and family delete markers, in which case only versions with a lower timestamp are affected by the delete marker.
HBase allows to perform timerange queries in order to see only the versions in a specified range of time. For example to see the data "as of time T" the range would be set to
[0,T+1) (T+1, because in HBase the end time is exclusive).

There is one snag, though. Once a delete marker is set, all cells affected by that marker are no longer visible. If a Put for a column C was issued at time T and is followed
by a column delete at time T+X, issuing a time range scan for [0, T+1) will return no data, as deleted cells are never shown.
The write operation will continue to insert new data. The delete operation will mark rows as deleted in tombstones and eventually will be removed when major compaction runs.
Major compaction removes delete records, tombstones and old versions.

HBase writes out immutable files as data is added and accumulate more files as time passes, eventually, your read operations get slower.
HBase compaction will rewrite several files into one in order to perform read faster. The major compactions rewrite all files within a column family for a region into a single
new one, and remove the older versions. The Minor compactions rewrite the last few files into one larger one, and not all the older versions will be removed.
HBASE-4536 addresses that issue. It is now possible to instruct a column family to retain deleted cells and treat them exactly like ordinary undelete cells (which means they
will still contribute to version counts, and can expire with a TTL was set for the column family). This can be done in the Java client by calling
HColumnDescriptor.setKeepDeletedCells(true) or through the HBase shell by setting KEEP_DELETED_CELLS=>true for a column family.

When this setting is enabled for a column family, deleted cells are visible to time range scans and gets as long as the requested range does not include the delete marker.

So in the case above a Scan or Get for [0, T+1) will return the Put that was marked as deleted. A Scan or Get for the range [0, T+X+1) will not return the Put as the range
does include the delete marker.

This is very useful to provide full "as-of time" queries, for example on back up replicas for production data in case a user accidentally deleted some data.



Related Questions


Question : MapR-DB stores structured data as a __________
 :  MapR-DB stores structured data as a __________
1. nested series of arrays
2. nested series of maps
3. nested series of lists
4. nested series of sets
5. nested series of linkedsets
http://www.training4exam.com/hbase-inroduction-part-1



Question : In MapR-DB , what is the maximum supported size of a row key

 : In MapR-DB , what is the  maximum supported size of a row key
1. 128 Bytes

2. 1 KB

3. 64 KB

4. 1 MB


Question : In MapR-DB , please order the descending order of granularity, the elements of a table.

1. Key
2. Column family
3. Timestamp
4. Row
5. Column
6. Value


 : In MapR-DB , please order the descending order of granularity, the elements of a table.
1. 1,3,4,5,6,2

2. 1,4,2,5,3,6

3. 3,6,4,5,2,1

4. 3,6,4,2,5,1


Question : In MapR-DB, what is the maximumn size of a Row.

 : In MapR-DB, what is the maximumn size of a Row.
1. 1 GB

2. 2 GB

3. Equal to minimum size of RAM in entire cluster

4. Equal to maximum size of RAM in entire cluster


Question : In MapR-DB, Rows span one or more column families and columns.

 : In MapR-DB, Rows span one or more column families and columns.
1. True
2. False


Question :

In HBase data is not updated ?
  :
1. True
2. False