You can use csv data and index files stored in HDFS for distributed analysis.
Register an HDFS directory
Steps:
- On the iServer service management page (http://supermapiserver:8090/iserver/manager), click on "Cluster">"Data Registration" to enter data registration page;
- Click "Register data storage" to register a data store;
- Input "Storage ID", select "Big data file share" in "Data storage type" dropdown arrow, and choose "HDFS directory" in "Shared data type" dropdown arrow;
- When configuring the "HDFS directory", you can do the following operations:
-
- If you're registering a single csv file stored on HDFS, you can directly fill in "hdfs path +csv name", such as: hdfs://{ip}:9000/data/newyork_taxi_2013-01_14k.csv
- If you're registering a directory with csv files, ie., the csvfolder contains several CSV files (Only supports open the HDFS directory in read-only way):
-
- When the formats of the fields and attributes of the CSV files are the same, you need to fill in the upper-level directory of the directory where the CSV files locate, such as hdfs://{ip}:9000/data, where "data" is csvfolder.
- When the formats of the fields and attributes of the CSV files are different, you need to fill in the directory where the CSV files locate, such as hdfs://{ip}:9000/data/csvfolder.
- If you're registering a file directory with an index file, you need to fill in the directory where the index file locates, such as hdfs://{ip}:9000/data/indexfolder, where "indexfolder" contains an index file.
- If the HDFS directory you're registering enables Kerberos authentication, you need to check "HDFS cluster has Kerberos authentication turned on", and configure the following items:
-
- principle name: username@domain. For example: iServer@SUPERMAP1.COM
- username: Must be a system user of the HDFS cluster mater node.
- domain: The domain set by the Kerberos service. (Must be the same as the domain where the HDFS cluster locates)
- principle keytab path: The file location on the machine where iServer service locates to which the keytab file is to be copied, which is generated when building the cluster.
- config file path: The Kerberos client config file location on the machine where iServer service locates.
- Click "Register data Storage" button to complete the registration.
Now you can start to perform distributed analysis services.
Note:
1. If you are registering csv files, the data needs to be verified before it can be used for distributed analysis service. For details, see: csv data verification.
2. If you are registering an HDFS directory with Kerberos authentication enabled, it'll be available for distributed analysis only when the distributed analysis service uses the Hadoop Yarn cluster with Kerberos authentication enabled.