Avatar

Hue - Hadoop UI blog

@gethue-blog / gethue-blog.tumblr.com

Hue is the open source UI for making it easier to use Apache Hadoop. http://gethue.com This blog features posts, tutorials and examples of Hue, that includes a File Browser for HDFS, a Job Designer/Browser for MapReduce, query editors for Hive, Pig, Cloudera Impala, Solr Search, Sqoop2 and an HBase browser. It also ships with an Oozie Application for creating workflows, various Shells and a collection of Hadoop API.
Avatar

How to fix the MultipleObjectsReturned error in Hue

When going on the Home page (/home) in Hue 3.0, this error could appear:

MultipleObjectsReturned: get() returned more than one DocumentPermission -- it returned 2! Lookup parameters were {'perms': 'read', 'doc': <Document: saved query Sample: Job loss sample>}

This is fixed in Hue 3.6 and here is a way to repair it:

1. Backup the Hue database

2. Run the cleanup script

from desktop.models import DocumentPermission, Document for document in Document.objects.all(): try: perm, created = DocumentPermission.objects.get_or_create(doc=document, perms=DocumentPermission.READ_PERM) except DocumentPermission.MultipleObjectsReturned, ex: # We can delete duplicate perms of a document dups = DocumentPermission.objects.filter(doc=document, perms=DocumentPermission.READ_PERM) perm = dups[0] for dup in dups[1:]: print 'Deleting duplicate %s' % dup dup.delete()

Avatar

How to change or reset a forgotten password?

Her are two easy ways. Go on the Hue machine, then in the Hue home directory (/usr/lib/hue by default) and either type: To change the password of the currently logged in Unix user:

build/env/bin/hue  changepassword

If you don't remember the admin username, create a new Hue admin (you will then also be able to login and could change the password of another user in Hue):

build/env/bin/hue  createsuperuser

Avatar

Join the Hue Team!

Hue 1

Hue 2

Hue 3

Hue N+1

Your Turn

Do you love UI/UX and making an impact?

Join the Team!

image

 + 

image
Avatar

Hadoop Tutorial: Schedule your Hadoop jobs intuitively with the new Oozie crontab!

Hue is taking advantage of a new way to specify the frequency of a coordinator in Oozie (OOZIE-1306). Here is how to put it in practice:

The crontab requires Oozie 4. In order to use the previous Frequency drop-down from Oozie 3, the feature can be disabled in hue.ini:

[oozie] # Use Cron format for defining the frequency of a Coordinator instead of the old frequency number/unit. enable_cron_scheduling=false

As usual feel free to comment on the hue-user list or @gethue!

Avatar

Secure your YARN cluster and access the jobs information safely

Hue can authenticate with Kerberos in YARN and guarantee than someone cannot access someone else’s MapReduce information.

As usual feel free to comment on the hue-user list or @gethue!

Avatar

How to use the new file types icons with the Hue SDK or in standalone

Hue 3.5+ ships with two font icon sets: Font Awesome 4 (http://fontawesome.io/) and the Hue Filetypes font that includes some basic file types you might need.

The icons are available in the Hue master or in this zip file:

When you want to use the new icons in your app, you have first to import the Hue Filetypes css in your .mako template:

<link href="/static/ext/css/hue-filetypes.css" rel="stylesheet">

and then define you icons with the same way you would do with Font Awesome.

In our case you need to write a prefix (hfo instead of fa)

<i class="hfo .."></i>

and then you can specify the icon you want. To render a JSON file icon for instance you should use

<i class="hfo hfo-file-json"></i>

You can also use the modifiers from Font Awesome, so you can create a larger rotated PDF icon like this:

<i class="hfo hfo-file-json fa-2x fa-rotate-90"></i>

Which other icons would you like to see implemented? We would also be glad to contribute them back. Please let us know or comment!

Avatar

Making Hadoop Accessible to your Employees with LDAP

Hue easily integrates with your corporation’s existing identity management systems and provides authentication mechanisms for SSO providers. By changing a few configuration parameters, your employees can start doing big data analysis in their browser by leveraging an existing security policy.

This blog post details the various features and capabilities available in Hue for LDAP:

  1. Authentication
  2. Search bind
  3. Direct bind

Importing users

Importing groups

Synchronizing users and groups

  1. Attributes synchronized
  2. Useradmin interface
  3. Command line interface

LDAP search

Case sensitivity

LDAPS/StartTLS support

Notes

Summary

1.    Authentication

The typical authentication scheme for Hue takes of the form of the following image:

Passwords are saved into the Hue databases.

With the Hue LDAP integration, users can use their LDAP credentials to authenticate and inherit their existing groups transparently. There is no need to save or duplicate any employee password in Hue:

There are several other ways to authenticate with Hue: PAM, SPNEGO, OpenID, OAuth, SAML2, etc. This section details how Hue can authenticate against an LDAP directory server.

When authenticating via LDAP, Hue validates login credentials against a directory service if configured with this authentication backend:

[desktop] [[auth]] backend=desktop.auth.backend.LdapBackend

The LDAP authentication backend will automatically create users that don’t exist in Hue by default. Hue needs to import users in order to properly perform the authentication. The password is never imported when importing users. The following configuration can be used to disable automatic import:

[desktop] [[ldap]] create_users_on_login=false

The purpose of disabling the automatic import is to only allow to login a predefined list of manually imported users.

The case sensitivity of the authentication process is defined in the “Case sensitivity” section below.

There are two different ways to authenticate with a directory service through Hue:

  1. Search bind
  2. Direct bind

1.1.    Search bind

The search bind mechanism for authenticating will perform an ldapsearch against the directory service and bind using the found distinguished name (DN) and password provided. This is, by default, used when authenticating with LDAP. The configurations that affect this mechanism are outlined in “LDAP search”.

1.2.    Direct bind

The direct bind mechanism for authenticating will bind to the ldap server

using the username and password provided at login. There are two options that can be used to choose how Hue binds:

  1. nt_domain - Domain component for User Principal Names (UPN) in active directory. This active directory specific idiom allows Hue to authenticate with active directory without having to follow LDAP references to other partitions. This typically maps to the email address of the user or the users ID in conjunction with the domain.
  2. ldap_username_pattern - Provides a template for the DN that will ultimately be sent to the directory service when authenticating.

If ‘nt_domain’ is provided, then Hue will use a UPN to bind to the LDAP service:

[desktop] [[ldap]] nt_domain=example.com

Otherwise, the ‘ldap_username_pattern’ configuration is used (the <username> parameter will be replaced with the username provided at login):

[desktop] [[ldap]] ldap_username_pattern=”uid=,ou=People,DC=hue-search,DC=ent,DC=cloudera,DC=com”

Typical attributes to search for include:

  1. uid
  2. sAMAccountName

To enable direct bind authentication, the ‘search_bind_authentication’ configuration must be set to false:

[desktop] [[ldap]] search_bind_authentication=false

2.    Importing users

If an LDAP user needs to be part of a certain group and have a particular set of permissions, then this user can be imported via the Useradmin interface:

As you can see, there are two options available when importing:

  1. Distinguished name
  2. Create home directory

If ‘Create home directory’ is checked, when the user is imported their home directory in HDFS will automatically be created, if it doesn’t already exist.

If ‘Distinguished name’ is checked, then the username provided must be a full distinguished name (eg: uid=hue,ou=People,dc=gethue,dc=com). Otherwise, the Username provided should be a fragment of a Relative Distinguished Name (rDN) (e.g., the username “hue” maps to the rDN “uid=hue”). Hue will perform an LDAP search using the same methods and configurations as defined in the “LDAP search” section. Essentially, Hue will take the provided username and create a search filter using the ‘user_filter’ and ‘user_name_attr’ configurations. For more information on how Hue performs LDAP searches, see the “LDAP Search” section.

The case sensitivity of the search and import processes are defined in the “Case sensitivity” section.

3.    Importing groups

Groups are importable via the Useradmin interface. Then, users can be added to this group, which would provide a set of permissions (e.g. accessing the Impala application). This function works almost the exact same way as user importing, but has a couple of extra features.

As the above image portrays, not only can groups be discovered via DN and rDN search, but users that are members of the group and members of the group’s subordinate groups can be imported as well. Posix groups and members are automatically imported if the group found has the object class ”posixGroup”.

4.    Synchronizing users and groups

Users and groups can be synchronized with the directory service via the Useradmin interface or via a command line utility. The images from the previous sections use the words “Sync” to indicate that when a name of a user or group that exists in Hue is being added, it will in fact be synchronized instead. In the case of importing users for a particular group, new users will be imported and existing users will be synchronized. Note: Users that have been deleted from the directory service will not be deleted from Hue. Those users can be manually deactivated from Hue via the Useradmin interface.

4.1.    Attributes synchronized

Currently, only the first name, last name, and email address are synchronized. Hue looks for the LDAP attributes ‘givenName’, ‘sn’, and ‘mail’ when synchronizing.  Also, the ‘user_name_attr’ config is used to appropriately choose the username in Hue. For instance, if ‘user_name_attr’ is set to “uid”, then the “uid” returned by the directory service will be used as the username of the user in Hue.

4.2.    Useradmin interface

The “Sync LDAP users/groups” button in the Useradmin interface will  automatically synchronize all users and groups.

4.3.    Command line interface

Here’s a quick example of how to use the command line interface to synchronize users and groups:

<hue root>/build/env/bin/hue sync_ldap_users_and_groups

5.    LDAP search

There are two configurations for restricting the search process:

  1. user_filter - General LDAP filter to restrict the search.
  2. user_name_attr - Which attribute will be considered the username to search against.

Here is an example configuration:

[desktop] [[ldap]] [[[users]]] user_filter=”objectClass=*” user_name_attr=uid

With the above configuration, the LDAP search filter will take on the form:

(&(objectClass=*)(uid=<user entered usename>))

6.    Case sensitivity

Hue can be configured to ignore the case of usernames as well as force usernames to lower case via the ‘ignore_username_case’ and ‘force_username_lowercase’ configurations. These two configurations are recommended to be used in conjunction with each other. This is useful when integrating with a directory service containing usernames in capital letters and unix usernames in lowercase letters (which is a Hadoop requirement). Here is an example of configuring them:

[desktop]

[desktop] [[ldap]] ignore_username_case=true force_username_lowercase=true

7.    LDAPS/StartTLS support

Secure communication with LDAP is provided via the SSL/TLS and StartTLS protocols. It allows Hue to validate the directory service it’s going to converse with. Practically speaking, if a Certificate Authority Certificate file is provided, Hue will communicate via LDAPS:

[desktop] [[ldap]] ldap_cert=/etc/hue/ca.crt

The StartTLS protocol can be used as well (step up to SSL/TLS):

[desktop] [[ldap]] use_start_tls=true

8.    Notes

  1. Setting "search_bind_authentication=true" in the hue.ini will tell Hue to perform an LDAP search using the bind credentials specified in the hue.ini (bind_dn, bind_password). Hue will then search using the base DN specified in "base_dn" for an entry with the attribute, defined in "user_name_attr", with the value of the short name provided in the login page. The search filter, defined in "user_filter" will also be used to limit the search. Hue will search the entire subtree starting from the base DN.
  2. Setting  "search_bind_authentication=false" in the hue.ini will tell Hue to perform a direct bind to LDAP using the credentials provided (not bind_dn and bind_password specified in the hue.ini). There are two effective modes here:
  3. nt_domain is specified in the hue.ini: This is used to connect to an Active Directory directory service. In this case, the UPN (User Principal Name) is used to perform a direct bind. Hue forms the UPN by concatenating the short name provided at login and the nt_domain like so: "<short name>@<nt_domain>". The 'ldap_username_pattern' config is completely ignore.
  4. nt_domain is NOT specified in the hue.ini: This is used to connect to all other directory services (can even handle Active Directory, but nt_domain is the preferred way for AD). In this case, 'ldap_username_pattern' is used and it should take on the form "cn=<username>,dc=example,dc=com" where <username> will be replaced with whatever is provided at the login page.
  5. The UserAdmin app will always perform an LDAP search when manage LDAP entries and will then always use the "bind_dn", "bind_password", "base_dn", etc. as defined in the hue.ini.
  6. At this point in time, there is no other bind semantics supported other than SIMPLE_AUTH. For instance, we do not yet support MD5-DIGEST, NEGOTIATE, etc. Though, we definitely want to hear from folks what they use so we can prioritize these things accordingly!

9.    Summary

The Hue team is working hard on improving security. Upcoming LDAP features include: Import nested LDAP groups and multidomain support for Active Directory. We hope this brief overview of LDAP in Hue will help you make your system more secure, more compliant with current security standards, and open up big data analysis to many more users!

As always, feel free to contact us at hue-user@ or @gethue!

Avatar

How to manage the Hue database with the shell

First, backup the database. By default this is this SqlLite file:

cp /var/lib/hue/desktop.db ~/

Then if using CM, export this variable in order to point to the correct database:

HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/-hue-HUE_SERVER export HUE_CONF_DIR

Where <id> is the most recent ID in that process directory for hue-HUE_SERVER.

Then go in the Database. From the Hue root (/use/lib/hue by default):

root@hue:hue# build/env/bin/hue dbshell

And you can start typing SQL queries:

sqlite> .tables auth_group oozie_dataset auth_group_permissions oozie_decision auth_permission oozie_decisionend auth_user oozie_distcp auth_user_groups oozie_email auth_user_user_permissions oozie_end beeswax_metainstall oozie_fork beeswax_queryhistory oozie_fs beeswax_savedquery oozie_generic beeswax_session oozie_history desktop_document oozie_hive desktop_document_tags oozie_java desktop_documentpermission oozie_job desktop_documentpermission_groups oozie_join desktop_documentpermission_users oozie_kill desktop_documenttag oozie_link desktop_settings oozie_mapreduce desktop_userpreferences oozie_node django_admin_log oozie_pig django_content_type oozie_shell django_openid_auth_association oozie_sqoop django_openid_auth_nonce oozie_ssh django_openid_auth_useropenid oozie_start django_session oozie_streaming django_site oozie_subworkflow jobsub_checkforsetup oozie_workflow jobsub_jobdesign pig_document jobsub_jobhistory pig_pigscript jobsub_oozieaction search_collection jobsub_ooziedesign search_facet jobsub_ooziejavaaction search_result jobsub_ooziemapreduceaction search_sorting jobsub_ooziestreamingaction south_migrationhistory oozie_bundle useradmin_grouppermission oozie_bundledcoordinator useradmin_huepermission oozie_coordinator useradmin_ldapgroup oozie_datainput useradmin_userprofile oozie_dataoutput

Or migrating the database manually:

build/env/bin/hue syncdb build/env/bin/hue migrate

If you want to switch to another database (we recommend MySql), this guide details the migration process.

The database settings in Hue are located in the hue.ini.

Note, you also query the database by pointing the DB Query App to the Hue database.

In developer mode (runserver command), you can also access the /admin page for using the Django Admin:

Avatar

Solving the Hue 2.X hanging problem

In the Hue versions before 3, Hue is sometimes getting slow and "stuck". To fix this problem, it is recommended to switch Hue to use the CherryPy server instead of Spawning. In the hue.ini or the Hue Safety Valve in CM, enter:

[desktop] use_cherrypy_server = true

Cause:

Most of the time some timeout/Thrift errors can be seen in the Hue logs (/logs page). These errors are due to Beeswax crashing or being very slow and blocking all the requests as the Spawing Server is not perfectly greenified in Hue 2 (the unique Thread is blocked in the RPC IO call). This is fixed in CDH5 and improved in CDH4.5 by switching to HiveServer2.

Note: switching to CherryPy will disable the Shell Application but this one is replaced by the HBase Browser, Sqoop2 Editor and Pig Editor applications.

Avatar

Using Hadoop MR2 and YARN with an alternative Job Browser interface

Hue now defaults to using Yarn since version 3.

First, it is a bit simpler to configure Hue with MR2 than in MR1 as Hue does not need to use the Job Tracker plugin since Yarn provides a REST API. Yarn is also going to provide an equivalent of Job Tracker HA with YARN-149.

Here is how to configure the clusters in hue.ini. Mainly, if you are using a pseudo distributed cluster it will work by default. If not, you will just need to update all the localhost to the hostnames of the Resource Manager and History Server:

[hadoop] ... # Configuration for YARN (MR2) # ------------------------------------------------------------------------ [[yarn_clusters]] [[[default]]] # Enter the host on which you are running the ResourceManager resourcemanager_host=localhost # The port where the ResourceManager IPC listens on resourcemanager_port=8032 # Whether to submit jobs to this cluster submit_to=True # URL of the ResourceManager API resourcemanager_api_url=http://localhost:8088 # URL of the ProxyServer API proxy_api_url=http://localhost:8088 # URL of the HistoryServer API history_server_api_url=http://localhost:19888 # Configuration for MapReduce (MR1) # ------------------------------------------------------------------------ [[mapred_clusters]] [[[default]]] # Whether to submit jobs to this cluster submit_to=False

And that’s it! You can now look at jobs in Job Browser, get logs and submit jobs to Yarn!

As usual feel free to comment on the hue-user list or @gethue!

Avatar

A better PyGreSql support for Django

With the release of django-pygresql, the Hue team has taken a first stab at PyGreSQL support in Django!

The ‘Why’

The open source world has many different kinds of licenses and it can be confusing to know which one makes sense for you. PyGreSQL is a PostgreSQL client with a permissible enough license that it can be packaged and shipped.

The ‘How’

PyGreSQL has some minor differences from the provided postgresql backend. It required a few changes including:

  • Massaging Date/Datetime/Time types to work with Django.
  • A custom cursor for massaging data.
  • Custom autocommit management.

To install this backend:

  1. Download django-pygresql.
  2. Run
  3. unzip master.zip && cd django-pygresql-master && /build/env/bin/python install setup.py
  4. At the bottom of <hue root>/desktop/core/src/desktop/settings.py, add the following code:

if DATABASES['default']['ENGINE'] == 'django_pygresql': SOUTH_DATABASE_ADAPTERS = { 'default': 'south.db.postgresql_psycopg2' }

  1. In the hue.ini, set desktop->database->engine to “django_pygresql”. Then, add the normal postgresql configuration parameters.

Summary

This is an initial implementation of a backend for Django to communicate with PostgreSQL via PyGreSQL. We hope this helps other members of the community.

Write to us at hue-user mailing list or @gethue!

Avatar

A new Spark Web UI: Spark App

Hi Spark Makers!

A Hue Spark application was recently created. It lets users execute and monitor Spark jobs directly from their browser and be more productive.

We previously released the app with an Oozie submission backend but switched to the Spark Job Server (SPARK-818) contributed by Ooyala and Evan’s team at the last Spark Summit. This new server will enable a real interactivity with Spark and is closer to the community.

  We hope to work with the community and have support for Python, Java, direct script submission without compiling/uploading and other improvements in the future!

As usual feel free to comment on the hue-user list or @gethue! About questions directly related to Job Server, participate on the pull request, SPARK-818 or the Spark user list!

Get Started!

Currently only Scala jobs are supported and programs need to implement this trait and be packaged into a jar. Here is a WordCount example. To learn more about Spark Job Server, check its README.

Requirements

We assume you have Scala installed on your system.

Get Spark Job Server

Currently on github on this branch:

git clone https://github.com/ooyala/incubator-spark.git spark-server cd spark-server git checkout -b jobserver-preview-2013-12 origin/jobserver-preview-2013-12

Then type:

sbt/sbt project jobserver re-start

Get Hue

Currently only on github (will be in CDH5b2):

If Hue and Spark Job Server are not on the same machine update the hue.ini property in desktop/conf/pseudo-distributed.ini:

[spark] # URL of the Spark Job Server. server_url=http://localhost:8090/

Get a Spark example to run

Then follow this walk-through and create the example jar that is used in the video demo.

Avatar

JobTracker High Availability (HA) in MR1

When the Job Tracker goes down, Hue cannot display the Jobs in File Browser or submit to the correct cluster.

In MR1, Hadoop can support two Job Trackers, a master Job Tracker that can fail over to a standby Job Tracker and hence provide Job Tracker HA. Let's see how Hue 3.5 and CDH5beta1 (and probably CDH4.6) can take advantage of this.

Note: in MR1 Hue is using a plugin to communicate with the Job Tracker. This can be configured in CDH or Hadoop 0.23 / 1.2.0 (MAPREDUCE-461).

We configure two Job Trackers in the hue.ini:

[hadoop] ... [[mapred_clusters]] [[[default]]] # Enter the host on which you are running the Hadoop JobTracker jobtracker_host=host-1 # Whether to submit jobs to this cluster submit_to=True [[[ha-standby]]] # Enter the host on which you are running the Hadoop JobTracker jobtracker_host=host-2 # Whether to submit jobs to this cluster submit_to=True

And that's it! Hue will communicate with the available Job Tracker automatically!

Notice that in the case of Oozie jobs, Oozie will try to re-submit the job but will need a logical name (HUE-1631). To enable this in Hue, specify it in each MapReduce cluster, e.g.:

[hadoop] [[mapred_clusters]] [[[default]]] # JobTracker logical name. ## logical_name=MY_NAME

As usual feel free to comment on the hue-user list or @gethue!

Avatar

Use the Impala App with Sentry for real security

Apache Sentry (incubating) is the new way to provide security (e.g. privileges on SQL statements SELECT, CREATE...) when querying data in Hadoop. Impala offers fast SQL for Apache Hadoop and can leverage Sentry. Here is how to use it in Hue.

First enable impersonation in the hue.ini that way permissions will be checked against the current user and not ‘hue’ which acts as a proxy:

[impala] impersonation_enabled=True

Then you might hit this error:

User 'hue' is not authorized to impersonate 'romain'. User impersonation is disabled.

This is because Hue is not authorized to be a proxy. To fix it, startup Impala with this flag:

--authorized_proxy_user_config=hue=*

Note: if you use Cloudera Manager, add it to the ‘Impalad Command Line Argument Safety Valve’

And that’s it! You can now benefit from real security similar to Hive!

As usual feel free to comment on the hue-user list or @gethue!

Note: if you are on CDH4/Hue 2.x, make sure that Hue is configured to talk to Impala with the HiveServer2 API:

[impala] # Host of the Impala Server (one of the Impalad) server_host=nightly-1.ent.cloudera.com # The backend to contact for queries/metadata requests. # Choices are 'beeswax' or 'hiveserver2' (default). # 'hiveserver2' supports log, progress information, query cancellation # 'beeswax' requires Beeswax to run for proxying the metadata requests server_interface=hiveserver2 # Port of the Impala Server # Default is 21050 as HiveServer2 Thrift interface is the default. # Use 21000 when using Beeswax Thrift interface. server_port=21050 # Kerberos principal ## impala_principal=impala/hostname.foo.com impersonation_enabled=True

You are using an unsupported browser and things might not work as intended. Please make sure you're using the latest version of Chrome, Firefox, Safari, or Edge.