ADMINISTRATOR S MANUAL Made by Neit Consulting, s.r.o. Date of last actualization: 10. 4. 2017 Version: 2.
Summary A. Preface... 3 B. Filetype administrator... 4 B.1. Filetype definition... 4 B.2. Definitions of tags... 9 B.3. Filetypes... 11 C. System admin... 13 C.1. Storage definitions... 13 C.2. Authorized IPs... 16 C.3. Servers... 17 C.4. Statistics... 18 D. User account admin... 19 D.1. User management... 19 D.2. Groups management... 20 E. Auditor výsledků filtrování... Chyba! Záložka není definována. F. Auditor souborů... Chyba! Záložka není definována. Strana 2
A. Preface This manual serves to manage the web interface of CloudStorage. In this manual are written in detail individual administrators roles, which can be added to account. Administrator s roles can be combined. About roles distribution is responsible User account admin. In this manual for every single administrator s account are not added groups with FTD. System administrator creates and maintains storages accounts, he is responsible for authorization new IP addresses or maintenance of already authorized. He is also responsible for possible server addition, tracks their load and maintains them. The User Account Manager takes care of the complete administration of administrator roles for existing and newly created user accounts created by this administrator as well. He also manages groups that have certain file type definitions that they then assign to user accounts. File Type Manager takes care of creating, setting up and managing definition file types, tag definitions, and file types. The file auditor has right to see all files. However, he has no right to change or search these files anyway. The filter result auditor can see all created filters above the files, nevertheless he cannot search or create his own filters anyway. He also sees filenames in running processes. Strana 3
B. Filetype administrator Administrator has rights to manage filetype definitions (further only FTD), tag definitions and filetypes. He can only modify their values, create new ones or delete and lock them. B.1. Filetype definition It is important for the successful upload of the file to the repository. After entering all the values, remember the file type definition identifier, as you will need to record the file through the CLI. You can edit, lock, and unlock the file type definition. The first preview allows you to view all FTDs and allow them to be edited, deleted, unlocked or locked. It also allows you to filter individual items, saving you time searching for a particular FTD. When editing an already created FTD, it is not possible to modify the file type, indexing, and the selected storage for backward compatibility. After selecting a particular FTD, you can delete all the files stored under it. Obrázek 1: FTD Admin menu Obrázek 2:List of filetype definitions Strana 4
Setup for filetype definition: Identifier Delete after [Days] FTD name. For what number of days, the FTD should be automatically removed. To prevent automatic removal after certain number of days, enter value 0. Delete temporary copy in [Days] For what number of days will be temporary copies of files under this FTD deleted. To prevent automatic removal, enter value 0. Storage definition Compress algorithm Copress ratio Allow all filetypes Select account: OracleCloud, LocalStorage, Hadoop Select compress algorithm: GZIP or XZ. Select compression level. Enabling all file types with the created FTD allows you to upload all the files whose extensions are listed in the "File Types" in the menu. Obrázek 3: Addition of filetype definition Strana 5
Allowed suffixes If FTD does not have all extensions enabled, you need to add at least one file type extension. You can specify an index type or lock it. These extensions / FTD are expected and required when recording. The index type defines whether the file should be indexed. When indexing it is necessary to specify the indexing method. For Csv files, it is possible to define the columns to be indexed by assigning columns. The table below describes how to delete files, indexes, or FTD suffixes in bulk. Delete Suffix Index Files It deletes suffix from allowed extensions FTD. This option is available only in case of when in FTD are not uploaded files with certain suffix. It deletes all data from index. Uploaded files are left in the storage. Deletes all data from index and files from storage, which are uploaded under that suffix. Obrázek 4: Allowed suffixes All allowed filetypes By checking the "Enable All File Types" button, the user is allowed to record all file types. However, this does not mean that the individual extensions used should not be included. When uploading files that are not included in the extensions, the "Delete other files" button will appear in the FTD settings with the number of files that do not have suffixes included. The button allows you to delete these files. For example, with this FTD you can upload csv and pdf. Csv suffix is included, while pdf is not. Strana 6
Index types Text by lines Text by chunks Every line is indexed as separate document. It indexes certain amount of lines as one chunk. Number of indexed lines in one document depends on their size. Text file content All text in file is indexed to single document (without e.g. pictures). During indexation file content is being used by Tika parser. File content by lines Every line is indexed to separate file. During indexation file content is being used by Tika parser. CSV by documents CSV by lines CSV by chunks Divided files Certain chosen values in row with column names are indexed to separate document. Index format is following: column_name:row_value. For CSV file it is possible to index certain columns. This can be done this by entering a column name, that this document contains in the header. Indexes chosen values of row to separate document. Single values are separated with space. Index form is following: row_value1: row_value2. Indexes CSV by lines and stores indexes into chunk. Number of created documents depends on size and number of rows in chunk. For this type of indexing, it is necessary to specify the appropriate separator by which the individual values in the line are separated (eg,, -). If the file does not have a header, individual columns for indexing can be defined by their order. Before the column number must be #. The entry for indexing second column data is # 2. The indexing format is in this case in the form: # 1: line_value # 2: line value If the file contains a header, column names can be entered instead of # numbers. The entry then looks similar to CSV document Strana 7
indexing. Each row is indexed into a separate file. Divided lines It works on the same principle as previous method. It is necessary to specify delimiter and choose if file contains header. In both cases (with and without header) is record format following: row_value1: row_value2. Every row is indexed to separate file. XML Documents When indexing XML documents it is possible to enter paths to individual entities, which that valid XML document contains. Addressing this parts of document mediates XPath. Every single path can be combined at will under circumstance, that files root will be the same. For example. xml/doc/value and pdf/value cannot pass, due to XPath does not recognize end of element. Two valid examples are displayed below. xml/doc/value - indexes values, that element value contains xml/doc[name] - indexes names of attributes name in element doc List of tag definitions Every FTD can contain so ever number of tags, which can be searched and filtered by filenames. For tag is displayed his identifier, description if this tag is is compulsory or not. Obrázek 5: List of tag definitions List of authorized IP addresses Strana 8
Any number of authorized IP addresses that can be accessed through the CLI for that FTD can be added to the list. Obrázek 6: List of authorized IP addresses B.2. Definitions of tags In order to upload the files to the storage, you also need to define tags. Definitions of tags used to filter file names, whereas the file can have a name corresponding data types such as text, date, boolean, real number, integer or IP address. Tags can be filtered by their identifier. When editing, it is possible to change all tag parameters. Tag definition contains: Identifier Compulsory Data type Regular expression Standart value Description Tag name. For compulsory checking name of uploaded or searched file. Filename can correspond certain data type. Standard value must match regular expression. For this purpose use only regex metacharacters. Optional. Value which must this file contain. Optional. Description of tag definition, for example for what it is used or which values does it match. Strana 9
Obrázek 7: List of tag definitions Obrázek 8: Edit and save tag definition Strana 10
B.3. Filetypes Filetypes are predefined. Some of them you can change, if you want you want to index them, how do you want to index them or if you want to index them at all. You can delete unused filetypes. If you are missing some filetype, you can extend this list by yourself. For this purpose just press button add. It is possible to filter list of data type by index or name of filetype. Obrázek 9: List of filetype When adding a file type, it is advisable to choose possible ways of indexing files. The individual indexing options that will be set here will be reflected in the creation of new FTDs in the form of different indexing options. Obrázek 10: Add file type Way of indexation File type Index as a text File extension. Compulsory Text by lines Text by chunks Strana 11
Index by file content Text file content File content by lines Index as CSV CSV by documents CSV by lines CSV by chunks Index using delimiter Divided documents Divided lines Indexovat as XML XML Documents Strana 12
C. System admin C.1. Storage definitions Accounts serve for storage definition. For define new account you can select from three service providers: Oracle Cloud, LocalStorage and Hadoop. During new account creation, it is necessary to enter identifier, storage type and prescription. Other parameters are different for every service provider. Obrázek 11: System admin menu Mandatory parameters for all storage definitions Storage type Identifier Prescription Select storage type for this account. Account name By prescription, individual containers are separated from other accounts, to avoid confusion or unwanted deletion. Containers are stored with this prescription Oracle Cloud Mandatory parameters for OracleCloud account: Storage type Password Host Select storage type for this account Password for cloud account. The address through which you access the Oracle Cloud repository. Strana 13
Service name The name of the service that is created for you on Oracle Cloud for storage purposes. When editing, you only have the option to edit the identifier, cloud password, prescription, and username. Obrázek 12: OracleCloud account Local Storage Compulsory parameters for creation LocalStorage account: Folder for record Path to folder, where will be data stored When editing, you can only edit the identifier. Obrázek 13: LocalStorage account Strana 14
Hadoop Compulsory parameters for Hadoop account creation: Username Host Port Byte for checksum Username is necessary to access Hadoop file system. IP address to which the account will be connected to Hadoop. Port for Hadoop WebHDFS service, so user interface for Hadoop. Basic port number is 50070. Byte number for checksum, which can detect faults found during transmission or during storage. Default value is 512. Checksum per block The size of the file system block. The default value is 128 MB Obrázek 14: Hadoop account Strana 15
C.2. Authorized IPs Authorized IP addresses are a list of addresses that have CLI access to CloudStorage. Users of the web interface are not limited here. IP address options: IP address is allowed IP address is banned IP address not found Delete IP address Access prevention First atempt to connect user via CLI Access allowed. Access denied. Waiting for approval. Cannot be deleted if IP address is assigned to FTD. Instead of IP removing address you can only ban it, and this will prevent certain IP addresses access to application via CLI. New address will be displayed, which tried to get to application via CLI. It is in pending approval status. Setup address mask Allowed addresses may have a mask set. The mask is used to specify a larger range of allowed addresses Obrázek 15: Authorized IP addresses Obrázek 16: IP address edit Strana 16
C.3. Servers Servers need to have a server role set. Whether it's a master or a slave server. For individual servers, it is possible to see how their individual threads are loaded. You can also turn off individual actions. Additionally, the administrator can see the status of the server whether it is enabled, connected, or has created threads. In the usage column, you can see the current percentage of usage and the number of files transferred. Servers can be edited, added, enabled, disallowed and deleted. Obrázek 17: Servers There are 2 server roles. Master that there is one, and is the main server. His ID is always a master. The second server role is Slave, which is used to divide the load when receiving new data. To connect the new slave server, you first need to add a new server, and then in the configuration file to set individual values. When creating a new server, it is important for proper functionality that the values in the Host and UID parameters match the values set in cloudstorage_slave.properties. Other parameters are optional, depending on your discretion and server options. The same settings for the configuration file with the settings in the application: slave.uid slave.uri master.uri Server name Server address, Host. Address, on which the master server is running. Mandatory parameters for the server: UID Host Transmission threads Server name. Address on which the server will run. The number of threads that will transfer the data. Strana 17
Computing thread Used space [GiB] Allow The number of threads determinated for calculations. The disk space that the server takes for itself for its operation. On / Off, This applies only to uploading data to the server. Obrázek 18: Add/Edit server C.4. Statistics Under this tab, you can monitor the use of the MASTER server's disk and RAM. Here are graphs for the use of individual data repositories. Strana 18
D. User account admin User accounts admin takes care of creating or maintenance of user accounts. For each account, it takes care to add administrator roles or allocate appropriate FTD groups. When managing groups, he cares about their work, followed by allocating FTD. Individual FTDs in a group can also be assigned permissions over what FTD will be able to perform for the action. Individual groups and their permissions can be edited and deleted. D.1. User management Administrator with the assigned role "User Account Manager" is entitled to view and manage individual user accounts. The Welcome Screen for User Management allows you to add, edit, delete, disable, and then enable user accounts. Obrázek 19: Správce uživatelských účtů menu User account options: Edit Ban Allow Delete Allows you to customize user data. Add or remove admin roles. You can also assign or remove a modified group account. The group leader sees the filters created over the group. However, it has the same rights as any other users who have a given group. Disables the use of that user account. Enables disabled user account. Deletes user account. Strana 19
Obrázek 20: User management Obrázek 21: User edit D.2. Groups management The opening screen of the "Group Management" tab only allows you to edit or remove a group. In the upper right corner, there is a button to add a new group. Strana 20
When editing or adding a new group, you need to enter the group identifier and select the individual FTDs that will be included in the group. For each FTD, you can manage the permissions that the user with the assigned group will have over the given FTD. Authorization to define the file type: Identifier Search Filter Download Upload Reindex Manage tags Delete Name of file type definition. Allows users to search in files. It authorizes users to create filters above the files you search Permissions for downloading files. Permissions to upload files. Entitles the user to re-index the selected file Allows users to edit individual tags over files Permission to delete files Obrázek 22: Groups management Obrázek 23: Edit group Strana 21
E. Filter results admin See all filtering results regardless of what FTD has been allocated. Above each filter has no rights to work with. Can not view, download or delete them. He can see the author of that filter and the group under which the filter was created. The role is used to navigate through the filters. Obrázek 24: Filter results admin menu Strana 22
F. File admin See all files stored on all repositories, regardless of allocated FTDs. He does not have any rights to perform actions over these files. It cannot download, add or edit or delete it. The role is designed for simple viewing of files in application. Obrázek 25: File admin menu Strana 23