Comprehend-NER

Watch below video for a quick glimpse into Comprehend NER project creation. Each step and option is detailed further below on this page.

Create Project

  • On the Projects menu in left-side navigation pane, Click Create New Project

  • Enter a unique Project Name and a relevant Project Description

  • On the project type drop-down, select Comprehend-NER and Click Next

_images/Comprehend-NER-Project.gif

File Upload

  • To add text documents to this project from local workstation, click Drop files here or to click upload and choose file/files to upload. Click Next to proceed.

  • Files for the project can be referenced from AWS S3 Bucket as source too. Refer to section AWS Security for more information.

Labels

Textract processing

  • Extract Tables using Amazon Textract - It extracts tables.abels

Labels

  • Enable Labels

  • Enter Labels

  • Click Next

Advanced Settings

The Advanced Settings section has some useful features which would drive the task workflow behaviour. Each setting is explained in detail below.

Note

Configuration in Advanced settings will be applicable to only the selected project.

Following are the options available in the ‘Advanced Settings’ section and each of these are explain in detail below.

Alternative text Alternative text

Project Attributes

Each Project Attribute setting is explained below

Allow text mode

This setting enables icons in the task which would show the position of text, labels and relationships in the document, depending on the value selected in the Default toolbar checkbox values section. This setting is related to the Default toolbar checkbox values setting. If Text option is checked, then default text mode is enabled in the task. If Labels and Relationships options are checked, then default labels and relationships are displayed in the task.

Allow table mode

This setting enables Table icon in the task.

In task,To create a table, click on the table icon on the top.

Then select the area where the table should be created.

Alternative text

User can edit the table by right clicking on cells.

User can also select multiple cells with rightclick.

The context menu will show the options

Alternative text
Disable tag overlap

This setting either allows (when not selected) or disallows (when selected) overlapping annotations in the document.

Disable Quality Audit

If this option is unchecked, then tasks submitted by the annotator will be in Submitted for review state and has to be reviewed by the QC /Reviewer. If this option is checked, then there is no reviewer in the workflow. Upon submission by the annotator, the task will transition directly to Submitted state.

QC may request adjustment
This option will be visible only if the Disable Quality Audit option is unchecked. When the QC may request adjustment is unchecked, then below two options are available for the reviewer to choose in the task.
  1. Annotations are ok - when reviewer selects this option, the task will transition to Approved state.

  2. Reject annotations - when reviewer selects this option, the task will transition to Rejected state.

When the QC may request adjustment option is checked, then an additional option: Request Adjustment is available for the reviewer, along with the earlier two options Annotations are ok and Reject annotations. If the reviewer opts for Request Adjustment, then task will be returned to the Annotator with Adjustment Requested status

Allow new label input

When this option is selected, then Add label option is available for users in the task lable list. Through this feature, the user has the ability to add a new label in the task.

Use Amazon Textract for OCR

Amazon Textract is a service that automatically extracts text, handwriting, and data from scanned documents.

It goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

List all tasks to Annotator/Review Team

If Full view is set, all tasks from the project will feature in the annotators jobs page as below. The user might view all tasks but some may be locked depending on ownership and task state.

Maximum concurrent tasks per annotator

This setting controls the maximum number of concurrent tasks that will be allotted per annotator.

Note

When List all tasks to Annotator/Review Team setting is enabled, then Maximum concurrent tasks per annotator option will not be available.

Project Metadata

Under Project Metadata, Labels and Values can be set and these metadata values will be preserved in the task, export mainifest file and project descriptor json file. This can be leveraged to track and filter useful metadata related to the project.

Page and Document Attribute

Through this setting, a user can provide a customized status of a page in the document or the entire document itself.

Page attribute appears as below in the task.

Alternative text

Document attribute appears as below in the task.

Alternative text

Webhook

Webhooks are automated messages that can be sent to a configured server URI in response to a specific event. Currently, webhooks are supported for Task update and Project update.

Annotation Format

There are two annotation formats available for the NER project.

  1. Comprehend Format Full

  2. Comprehend Format with linked Blocks

  3. ADL Bounding Box Format

AWS Security

This section defines how the workbench would authenticate with AWS account, where the project is configured to reference data from a source S3 bucket. Three authentication methods are available as listed below.

  1. Use Global IAM Credentials In this option, the workbench leverages the global IAM credentials to authenticate to the S3 bucket.

  2. AWS Credentials In this option, the user can configure AWS Access key, Secret key and Region to enable Newton workbench to authenticate with AWS.

  3. ARN Role In this option, the user can specify AWS Role ARN and External Id for the workbench to authenticate with AWS. This is the most secure and preferred method.

In all the above options, also specify the Intermediate S3 Path which will be utilized by the workbench for task processing.

Note

The user is required click :Save for any changes in Advanced Settings to take effect.

Teams

In this section, it is required to specify the users who are going to collaborate and work on the project tasks.

  1. Add Team member

    • Enter the user-name or email-id to filter in the text box. Select the desired user.

    • Choose the role as Annotator, Reviewer or Supervisor

    • Click :Add Collaborator

  2. Delete Team member

  • Click x symbol against the user you wish to remove from this project.

  • Click Yes to confirm and the user will get removed from the project.

Browse tasks

This section will display a summary of tasks in the selected project.

Alternative text