Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are able to push this full file in a directory on HDFS called /log/QT/QT31012015.log. Now the permission for your file and directory as below in the HDFS.
QT31012015.log -> rw-rw-r-x Which is the correct statement for the file (QT31012015.log)
4. HDFS runs in userspace which makes all users with access to the namespace able to read, write, and modify all files. 5. The owner and group cannot delete the file, but others can.
Correct Answer : Get Lastest Questions and Answer : Explanation: ACL entries consist of a type, an optional name and a permission string. For display purposes, ':' is used as the delimiter between each field. In this example ACL, the file owner has read-write access, the file group has read-execute access and others have read access. So far, this is equivalent to setting the file's permission bits to 654. Additionally, there are 2 extended ACL entries for the named user bruce and the named group sales, both granted full access. The mask is a special ACL entry that filters the permissions granted to all named user entries and named group entries, and also the unnamed group entry. In the example, the mask has only read permissions, and we can see that the effective permissions of several ACL entries have been filtered accordingly. Every ACL must have a mask. If the user doesn't supply a mask while setting an ACL, then a mask is inserted automatically by calculating the union of permissions on all entries that would be filtered by the mask.Running chmod on a file that has an ACL actually changes the permissions of the mask. Since the mask acts as a filter, this effectively constrains the permissions of all extended ACL entries instead of changing just the group entry and possibly missing other extended ACL entries.The model also differentiates between an "access ACL", which defines the rules to enforce during permission checks, and a "default ACL", which defines the ACL entries that new child files or sub-directories receive automatically during creation. The permissions show that the file can be read from and written to (appended to) by the owner and anyone in the owner's group, and read from by anyone else (it is 'world readable'). The execute permission on a file in HDFS is ignored. That the file's existing contents cannot be modified by anyone because HDFS is a write-once filesystem. Once a file has been written, its contents cannot be changed. Only directories may have a default ACL. When a new file or sub-directory is created, it automatically copies the default ACL of its parent into its own access ACL. A new sub-directory also copies it to its own default ACL. In this way, the default ACL will be copied down through arbitrarily deep levels of the file system tree as new sub-directories get created.
The exact permission values in the new child's access ACL are subject to filtering by the mode parameter. Considering the default umask of 022, this is typically 755 for new directories and 644 for new files. The mode parameter filters the copied permission values for the unnamed user (file owner), the mask and other. Using this particular example ACL, and creating a new sub-directory with 755 for the mode, this mode filtering has no effect on the final result. However, if we consider creation of a file with 644 for the mode, then mode filtering causes the new file's ACL to receive read-write for the unnamed user (file owner), read for the mask and read for others. This mask also means that effective permissions for named user bruce and named group sales are only read.Note that the copy occurs at time of creation of the new file or sub-directory. Subsequent changes to the parent's default ACL do not change existing children.The default ACL must have all minimum required ACL entries, including the unnamed user (file owner), unnamed group (file group) and other entries. If the user doesn't supply one of these entries while setting a default ACL, then the entries are inserted automatically by copying the corresponding permissions from the access ACL, or permission bits if there is no access ACL. The default ACL also must have mask. As described above, if the mask is unspecified, then a mask is inserted automatically by calculating the union of permissions on all entries that would be filtered by the mask.
Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are able to push this full file in a directory on HDFS called /log/QT/QT31012015.log. Now the permission for your file and directory as below in the HDFS. QT31012015.log -> rw-r--r-- /log/QT -> rwxr-xr-x Which is the correct statement for the file (QT31012015.log)
1. The file cannot be deleted by anyone but the owner 2. The file cannot be deleted by anyone
Correct Answer : Get Lastest Questions and Answer : Exp: File Permissions in HDFS : HDFS has a permissions model for files and directories that is much like POSIX. There are three types of permission: the read permission (r), the write permission (w), and the execute permission (x). The read permission is required to read files or list the contents of a directory. The write permission is required to write a file, or for a directory, to create or delete files or directories in it. The execute permission is ignored for a file since you cant execute a file on HDFS (unlike POSIX), and for a directory it is required to access its children. Each file and directory has an owner, a group, and a mode. The mode is made up of the permissions for the user who is the owner, the permissions for the users who are members of the group, and the permissions for users who are neither the owners nor members of the group. By default, a clients identity is determined by the username and groups of the process it is running in. Because clients are remote, this makes it possible to become an arbitrary user, simply by creating an account of that name on the remote system. Thus, permissions should be used only in a cooperative community of users, as a mechanism for sharing filesystem resources and for avoiding accidental data loss, and not for securing resources in a hostile environment. (Note, however, that the latest versions of Hadoop support Kerberos authentication, which removes these restrictions, see Security.) Despite these limitations, it is worthwhile having permissions enabled (as it is by default; see the dfs.permissions property), to avoid accidental modification or deletion of substantial parts of the filesystem, either by users or by automated tools or programs. When permissions checking is enabled, the owner permissions are checked if the clients username matches the owner, and the group permissions are checked if the client is a member of the group; otherwise, the other permissions are checked. There is a concept of a super-user, which is the identity of the namenode process. Permissions checks are not performed for the super-user. The permissions show that the file can be read from and written to (appended to or deleted) by the owner, read by anyone in the owners group, and read from by anyone else (it is world readable). Because group and world do not have write permissions, they cannot delete the file. Note that the files contents cannot be modified by the owner (other than to append to the file) because HDFS is a write-once filesystem. Once a file has been written, its existing contents cannot be changed. ACL entries consist of a type, an optional name and a permission string. For display purposes, ':' is used as the delimiter between each field. In this example ACL, the file owner has read-write access, the file group has read-execute access and others have read access. So far, this is equivalent to setting the file's permission bits to 654. Additionally, there are 2 extended ACL entries for the named user bruce and the named group sales, both granted full access. The mask is a special ACL entry that filters the permissions granted to all named user entries and named group entries, and also the unnamed group entry. In the example, the mask has only read permissions, and we can see that the effective permissions of several ACL entries have been filtered accordingly. Every ACL must have a mask. If the user doesn't supply a mask while setting an ACL, then a mask is inserted automatically by calculating the union of permissions on all entries that would be filtered by the mask.Running chmod on a file that has an ACL actually changes the permissions of the mask. Since the mask acts as a filter, this effectively constrains the permissions of all extended ACL entries instead of changing just the group entry and possibly missing other extended ACL entries.The model also differentiates between an "access ACL", which defines the rules to enforce during permission checks, and a "default ACL", which defines the ACL entries that new child files or sub-directories receive automatically during creation. The permissions show that the file can be read from and written to (appended to) by the owner and anyone in the owner's group, and read from by anyone else (it is 'world readable'). The execute permission on a file in HDFS is ignored. That the file's existing contents cannot be modified by anyone because HDFS is a write-once filesystem. Once a file has been written, its contents cannot be changed. Only directories may have a default ACL. When a new file or sub-directory is created, it automatically copies the default ACL of its parent into its own access ACL. A new sub-directory also copies it to its own default ACL. In this way, the default ACL will be copied down through arbitrarily deep levels of the file system tree as new sub-directories get created. The exact permission values in the new child's access ACL are subject to filtering by the mode parameter. Considering the default umask of 022, this is typically 755 for new directories and 644 for new files. The mode parameter filters the copied permission values for the unnamed user (file owner), the mask and other. Using this particular example ACL, and creating a new sub-directory with 755 for the mode, this mode filtering has no effect on the final result. However, if we consider creation of a file with 644 for the mode, then mode filtering causes the new file's ACL to receive read-write for the unnamed user (file owner), read for the mask and read for others. This mask also means that effective permissions for named user bruce and named group sales are only read.Note that the copy occurs at time of creation of the new file or sub-directory. Subsequent changes to the parent's default ACL do not change existing children.The default ACL must have all minimum required ACL entries, including the unnamed user (file owner), unnamed group (file group) and other entries. If the user doesn't supply one of these entries while setting a default ACL, then the entries are inserted automatically by copying the corresponding permissions from the access ACL, or permission bits if there is no access ACL. The default ACL also must have mask. As described above, if the mask is unspecified, then a mask is inserted automatically by calculating the union of permissions on all entries that would be filtered by the mask.
Question : You have a website www.QuickTechie.com, where you have one month user profile updates log. Now for the classification analysis you want to save all the data in a single file called QT31012015.log which is approximately in 30GB in size. Now using the MapReduce ETL job you are able to push this full file in a directory on HDFS called /log/QT/QT31012015.log. You want that your data in /log/QT/QT31012015.log file will not be compromised, so what does HDFS help us for this 1. Storing multiple replicas of data blocks on different DataNodes.
4. DataNodes make copies of their data blocks, and put them on different local disks.
Correct Answer : Get Lastest Questions and Answer : Exp: The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and is designed to be deployed on low-cost hardware. HDFS provides high throughput access to application data and is suitable for applications that have large data sets. HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is part of the Apache Hadoop Core project.HDFS provides reliability by splitting a file into multiple blocks, and replicating each block on multiple different machines (3 by default). Although it is possible to use RAID on DataNodes, this is not a recommended configuration as it reduces the amount of raw disk which can be used for data storage and is not necessary.Hardware Failure : Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system's data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS. Streaming Data Access : Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. Large Data Sets : Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance. Simple Coherency Model : HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A Map/Reduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future. Moving Computation is Cheaper than Moving Data : A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. Portability Across Heterogeneous Hardware and Software Platforms : HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications.
Question : What is HIVE? 1. HIVE is part of the Apache Hadoop project that enables in-memory analysis of real-time streams of data 2. Hive is a way to add data from local file system to HDFS 3. Access Mostly Uused Products by 50000+ Subscribers 4. Hive is a part of the Apache Hadoop project that provides SQL like interface for data processing