Input and Output Data

Input Data

The input data for uploading items to the dataset includes manifest,csv and txt files.

These files are utilized within the dataset during the uploading process, specifically when the Select Manifest/CSV/TXT option is selected.

For a detailed view of the dataset types and the compatible file types, please refer to this section Upload files to Dataset.

During dataset-item upload, it is essential to adhere to a specific format for the manifest file to ensure successful processing.Additionally,metadata can be included in both manifest and csv files.

  1. Use an input manifest file

The following is an example of a manifest file for files stored in an Amazon S3 bucket:

{"Sr_No":11,"source":"s3://EXAMPLE-BUCKET/example1.tiff","presigned":"http://abc.com"}
{"Sr_No":12,"source":"s3://EXAMPLE-BUCKET/example2.pdf","presigned":"http://abd.com"}
  1. Use an input CSV file

The following is an example of a CSV file for files stored in an Amazon S3 bucket:

id  File    Batch   Pages   Source
1   tif          1      1       s3://objectways-ergo-poc/input_documents/MLK_F4BKRD3R00FI2OA.tif
2   tif          2      2       s3://objectways-ergo-poc/input_documents/MLK_REZ_multipage.tif
3   tif          3      1       s3://objectways-ergo-poc/input_documents/AmbA_J3Y61NDT0021050L2.tif

The following is an example of a CSV file for Text as source:

origin      text
wikipedia   ChatGPT[a] is an artificial intelligence (AI) chatbot developed by OpenAI and released in November 2022. It is built on top of OpenAI's GPT-3.5 and GPT-4 foundational large language models (LLMs) and has been fine-tuned (an approach to transfer learning) using both supervised and reinforcement learning techniques.
wikipedia   ChatGPT launched as a prototype on November 30, 2022, and garnered attention for its detailed responses and articulate answers across many domains of knowledge.[3] Its propensity, at times, to confidently provide factually incorrect responses, however, has been identified as a significant drawback.[4] In 2023, following the release of ChatGPT, OpenAI's valuation was estimated at US$29 billion.[5] The advent of the chatbot has increased competition within the space, motivating the creation of Google's Bard and Meta's LLaMA.
wikipedia   The original release of ChatGPT was based on GPT-3.5. A version based on GPT-4, the newest OpenAI model, was released on March 14, 2023, and is available for paid subscribers on a limited basis.
wikipedia   ChatGPT is a member of the generative pre-trained transformer (GPT) family of language models. It was fine-tuned over an improved version of OpenAI's GPT-3 known as "GPT-3.5".[6]
wikipedia   he fine-tuning process leveraged both supervised learning as well as reinforcement learning in a process called reinforcement learning from human feedback (RLHF).[7][8] Both approaches use human trainers to improve the model's performance. In the case of supervised learning, the model was provided with conversations in which the trainers played both sides: the user and the AI assistant. In the reinforcement learning step, human trainers first ranked responses that the model had created in a previous conversation.[9] These rankings were used to create "reward models" that were used to fine-tune the model further by using several iterations of Proximal Policy Optimization (PPO).[7][10]
  1. Use an input Txt file

The following is an example of a txt file for files stored in an Amazon S3 bucket:

s3://EXAMPLE-BUCKET/example1.pdf
s3://EXAMPLE-BUCKET/example2.pdf

Output Data

Dataset exports

There are three export formats available for datasets:

  1. Dataset Export: This format allows you to export the dataset in its original form, containing the raw data without any specific model run or ground truth annotations.

  2. Dataset Export with Model Run: This format includes the dataset along with the results of a specific model run.It captures the model’s predictions, classifications, or other outputs generated by applying the trained model to the dataset.

  3. Dataset Export with Ground Truth Project: This format exports the dataset along with the annotations created in a ground truth project. A ground truth project involves manual annotation or labeling of data by human annotators. The export includes both the original dataset and the annotations, providing valuable labeled data that can be used for training or validating machine learning models.

1. Scanned(OCR) Dataset

1. Dataset Export
{
     "source": "https://********/presigned/92d1f5a0b8667c883e797608571c8616.pdf?sig=62cc37fc5ec6d91864ae062e4da9f6ad81dda083b2207b0b12517f6d5d37a1be4400dd381a64d25ec5c4fda7c6a76103ca54f8434687b948b0a6175007fc82d3:6ee530eca4b69d17633726d4ad1220b2:64cddaf9:1243ecda3ebadc9a1ce6fa1fea8f3808",
     "name": "ABSTRACT - Axia.tiff",
     "itemId": "4611f8e6331908278b5160ca",
     "datasetId": "e3773b85655ea8646005158a",
     "type": "application/pdf",
     "tags": [
         "invoice tag"
     ],
     "metadata": {
         "ocr_model": "Textract (default)",
         "use-textract-only": true,
         "source_ref": "/uploads/e3773b85655ea8646005158a/4611f8e6331908278b5160ca",
         "document_id": "4611f8e6331908278b5160ca"
     },
     "active": true,
     "ext": "pdf"
}
table:Scanned(OCR) Dataset Export Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

The metadata associated with the dataset and dataset item

ocr_model

str

The OCR model used for processing

use_textract_only

bool

Indicates if only Textract is used for processing

source_ref

str

Reference to the source of dataset item

document_id

str

The Id of the document

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

2. With Model Run
{
    "source": "https://sandboxdocuments.tensoract.com/presigned/92d1f5a0b8667c883e797608571c8616.pdf?sig=7da49bf64299dc09cb3405ff33cb0444ed6d310208f13655ae773407d405b9d003db5d7f980679f85f0949151c4f9904c837f90eb434a5ba2bfbd2520f1d43b9:4887b3566f7f75b587ad0ea9ebe0e6dc:64cdc5b3:37f85a3533b89761e7282654214ba4bf",
    "name": "ABSTRACT - Axia.tiff",
    "itemId": "4611f8e6331908278b5160ca",
    "datasetId": "e3773b85655ea8646005158a",
    "type": "application/pdf",
    "tags": [],
    "metadata": {
        "ocr_model": "Textract (default)",
        "use-textract-only": true,
        "source_ref": "/uploads/e3773b85655ea8646005158a/4611f8e6331908278b5160ca",
        "document_id": "4611f8e6331908278b5160ca"
    },
    "active": true,
    "modelRuns": [
        {
            "modelRunId": "2023-08-04T03:44:10.5248871",
            "tags": [
                {
                    "type": "Organization Entity",
                    "text": "Women's Health",
                    "page": 1,
                    "boxes": [
                        [
                            0.861730694770813,
                            0.08322431892156601,
                            0.937999427318573,
                            0.09277199301868677
                        ],
                        [
                            0.06499018520116806,
                            0.11739349365234375,
                            0.07347860559821129,
                            0.12546881940215826
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "Womens Health",
                    "page": 1,
                    "boxes": [
                        [
                            0.0935770571231842,
                            0.11707708239555359,
                            0.11941905505955219,
                            0.1253887191414833
                        ],
                        [
                            0.12276646494865417,
                            0.11710146069526672,
                            0.17684946581721306,
                            0.1254600789397955
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "HP Main Line LLC",
                    "page": 1,
                    "boxes": [
                        [
                            0.5065937042236328,
                            0.11725140362977982,
                            0.5151590080931783,
                            0.1253813849762082
                        ],
                        [
                            0.5505043864250183,
                            0.11691059917211533,
                            0.7601913809776306,
                            0.1277432944625616
                        ],
                        [
                            0.5505043864250183,
                            0.1170012354850769,
                            0.6015823110938072,
                            0.1273022061213851
                        ],
                        [
                            0.6051791906356812,
                            0.11715823411941528,
                            0.6570645309984684,
                            0.12562930211424828
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Location Entity",
                    "text": "Laurel Road",
                    "page": 1,
                    "boxes": [
                        [
                            0.06458062678575516,
                            0.13079734146595,
                            0.0742951761931181,
                            0.1387380100786686
                        ],
                        [
                            0.09427980333566666,
                            0.13045595586299896,
                            0.19984688609838486,
                            0.1387722697108984
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Location Entity",
                    "text": "Bryn Mawr",
                    "page": 1,
                    "boxes": [
                        [
                            0.5502040982246399,
                            0.13049453496932983,
                            0.5714401658624411,
                            0.13857773877680302
                        ],
                        [
                            0.5759334564208984,
                            0.13051286339759827,
                            0.6116651147603989,
                            0.13872544467449188
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "Regional Womens Health Management",
                    "page": 1,
                    "boxes": [
                        [
                            0.6013859510421753,
                            0.14393498003482819,
                            0.6286550257354975,
                            0.15307497046887875
                        ],
                        [
                            0.6331984400749207,
                            0.14391060173511505,
                            0.6625380869954824,
                            0.1522916592657566
                        ],
                        [
                            0.6665995121002197,
                            0.14416542649269104,
                            0.6881919391453266,
                            0.1522371843457222
                        ],
                        [
                            0.06526166200637817,
                            0.15757058560848236,
                            0.07337938901036978,
                            0.16564789321273565
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "ABA",
                    "page": 1,
                    "boxes": [
                        [
                            0.5443362593650818,
                            0.23810118436813354,
                            0.574140515178442,
                            0.2483070008456707
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "ABA",
                    "page": 1,
                    "boxes": [
                        [
                            0.22924329340457916,
                            0.2950398325920105,
                            0.28950661048293114,
                            0.3033293457701802
                        ]
                    ],
                    "kv_type": "value"
                }
            ]
        }
    ],
    "ext": "pdf"
}
table:Scanned(OCR) Dataset Export With Model Run Summary:

Field Names

Type

Description

source

str

Thep resigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The Type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

Metadata associated with the dataset and dataset item

ocr_model

str

The OCR model used for processing

use_textract_only

bool

Indicates if only Textract is used for processing

source_ref

str

Reference to the source of dataset item

document_id

str

The Id of the document

active

bool

Indicates the dataset item is currently active

modelRuns

list

List of dictionaries containing details of predicted labels

modelRunId

str

The Id of the model run

tags

list

List of dictionaries containing the predicted labels

type

str

The type of the label

text

str

Selected Text for prediction

page

int

Page number associated with text

boxes

list

List of bounding box coordinates for OCRed words

kv_type

str

Flag to indicate whether tag is key or value (KEY/VAL)

ext(local file)

str

Extension of local files, if any

3. With GroundTruth Project
{
    "source": "https://**********/presigned/a01f5c95d843b4fd4f890570e5cac51c.pdf?sig=838fefa7e55ab214cfa71b70d36d19ee3a263b5c750f49d8ddb105d90f81b82668548ecec76dc79f3df8195c45a7e2702e543611f7f210e761755db7a6c1ea86:3c4f5271ef6c34813cb136a93ba8e7bd:64cdea6d:ae53c052c424a59ee74995c52cc94222",
    "name": "ABSTRACT - Axia.tiff",
    "itemId": "9cebea4c95edc877ca6f2603",
    "datasetId": "e3773b85655ea8646005158a",
    "type": "application/pdf",
    "tags": [
        "invoice tag"
    ],
    "metadata": {
        "ocr_model": "Textract (default)",
        "use-textract-only": true,
        "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
        "document_id": "9cebea4c95edc877ca6f2603"
    },
    "active": true,
    "project": "7b3020dd437ce2a30bae1c5a",
    "taskId": "0931952ce4a27f53a3678cfe",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 14,
            "date": "2023-08-04T06:21:00.589Z",
            "content": {
                "pdf_fingerprint": "c04f692d342c06d433f751ac32c6d8b1",
                "metadata": {
                    "File": "ABSTRACT - Axia.tiff",
                    "TaskId": "0931952ce4a27f53a3678cfe",
                    "ocr_model": "Textract (default)",
                    "use-textract-only": true,
                    "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
                    "document_id": "9cebea4c95edc877ca6f2603",
                    "Type of Project": "OCR"
                },
                "tags": [
                    {
                        "page": 1,
                        "text": "N A M E",
                        "id": 1,
                        "type": "Name",
                        "kv_type": "key",
                        "words": [
                            "N",
                            "A",
                            "M",
                            "E"
                        ],
                        "boxes": [
                            [
                                0.06499018520116806,
                                0.11739349365234375,
                                0.07347860559821129,
                                0.12546881940215826
                            ],
                            [
                                0.06458062678575516,
                                0.13079734146595,
                                0.0742951761931181,
                                0.1387380100786686
                            ],
                            [
                                0.06520503759384155,
                                0.14403623342514038,
                                0.07536023296415806,
                                0.15211013052612543
                            ],
                            [
                                0.06526166200637817,
                                0.15757058560848236,
                                0.07337938901036978,
                                0.16564789321273565
                            ]
                        ],
                        "range": [
                            [
                                71,
                                72
                            ],
                            [
                                126,
                                127
                            ],
                            [
                                165,
                                166
                            ],
                            [
                                194,
                                195
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Axia Women's Health",
                        "id": 2,
                        "type": "Name",
                        "textAdjust": "Axia Women's",
                        "kv_type": "value",
                        "words": [
                            "Axia",
                            "Women's",
                            "Health"
                        ],
                        "boxes": [
                            [
                                0.0935770571231842,
                                0.11707708239555359,
                                0.11941905505955219,
                                0.1253887191414833
                            ],
                            [
                                0.12276646494865417,
                                0.11710146069526672,
                                0.17684946581721306,
                                0.1254600789397955
                            ],
                            [
                                0.18119750916957855,
                                0.11732043325901031,
                                0.21823260188102722,
                                0.12542327493429184
                            ]
                        ],
                        "range": [
                            [
                                73,
                                77
                            ],
                            [
                                78,
                                85
                            ],
                            [
                                86,
                                92
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "BILL TO",
                        "id": 3,
                        "type": "Name",
                        "rawBox": true,
                        "kv_type": "key",
                        "words": [
                            "BILL TO"
                        ],
                        "boxes": [
                            [
                                0.4980276134122288,
                                0.10967250571210967,
                                0.5374753451676528,
                                0.1706016755521706
                            ]
                        ],
                        "range": []
                    },
                    {
                        "page": 1,
                        "text": "Regional Womens Health",
                        "id": 4,
                        "type": "Name",
                        "rotate": 24,
                        "rawBox": true,
                        "kv_type": "value",
                        "words": [
                            "Regional Womens Health"
                        ],
                        "boxes": [
                            [
                                0.5473372781065089,
                                0.11119573495811119,
                                0.7682445759368837,
                                0.12795125666412796
                            ]
                        ],
                        "range": []
                    },
                    {
                        "page": 1,
                        "text": "Cat.",
                        "id": 5,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Cat."
                        ],
                        "boxes": [
                            [
                                0.39583876729011536,
                                0.3084534704685211,
                                0.4190108198672533,
                                0.31684120278805494
                            ]
                        ],
                        "range": [
                            [
                                543,
                                547
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Cat.",
                        "id": 6,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Cat."
                        ],
                        "boxes": [
                            [
                                0.39583876729011536,
                                0.3084534704685211,
                                0.4190108198672533,
                                0.31684120278805494
                            ]
                        ],
                        "range": [
                            [
                                543,
                                547
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Description",
                        "id": 7,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Description"
                        ],
                        "boxes": [
                            [
                                0.4328092038631439,
                                0.3084268271923065,
                                0.49752890318632126,
                                0.3184952298179269
                            ]
                        ],
                        "range": [
                            [
                                548,
                                559
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Description",
                        "id": 8,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Description"
                        ],
                        "boxes": [
                            [
                                0.4328092038631439,
                                0.3084268271923065,
                                0.49752890318632126,
                                0.3184952298179269
                            ]
                        ],
                        "range": [
                            [
                                548,
                                559
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Effective",
                        "id": 9,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 0,
                            "cell": true
                        },
                        "words": [
                            "Effective"
                        ],
                        "boxes": [
                            [
                                0.6239141225814819,
                                0.2947663366794586,
                                0.6735980845987797,
                                0.30344805866479874
                            ]
                        ],
                        "range": [
                            [
                                476,
                                485
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Sqft.",
                        "id": 10,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Sqft."
                        ],
                        "boxes": [
                            [
                                0.5750880241394043,
                                0.30830204486846924,
                                0.6010445598512888,
                                0.3183623990043998
                            ]
                        ],
                        "range": [
                            [
                                560,
                                565
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Sqft.",
                        "id": 11,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Sqft."
                        ],
                        "boxes": [
                            [
                                0.5750880241394043,
                                0.30830204486846924,
                                0.6010445598512888,
                                0.3183623990043998
                            ]
                        ],
                        "range": [
                            [
                                560,
                                565
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "ABA",
                        "id": 12,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "ABA"
                        ],
                        "boxes": [
                            [
                                0.3953396677970886,
                                0.3291471600532532,
                                0.42196371778845787,
                                0.3373938351869583
                            ]
                        ],
                        "range": [
                            [
                                626,
                                629
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Date",
                        "id": 13,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Date"
                        ],
                        "boxes": [
                            [
                                0.6240901350975037,
                                0.3085164725780487,
                                0.6510729901492596,
                                0.31685456447303295
                            ]
                        ],
                        "range": [
                            [
                                566,
                                570
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Date",
                        "id": 14,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Date"
                        ],
                        "boxes": [
                            [
                                0.6240901350975037,
                                0.3085164725780487,
                                0.6510729901492596,
                                0.31685456447303295
                            ]
                        ],
                        "range": [
                            [
                                566,
                                570
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Rent Abatements/Cor",
                        "id": 15,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "Rent",
                            "Abatements/Cor"
                        ],
                        "boxes": [
                            [
                                0.4329037368297577,
                                0.3290809392929077,
                                0.4603371527045965,
                                0.3374354373663664
                            ],
                            [
                                0.46285462379455566,
                                0.32896438241004944,
                                0.5594801902770996,
                                0.3374544633552432
                            ]
                        ],
                        "range": [
                            [
                                630,
                                634
                            ],
                            [
                                635,
                                649
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "4,850",
                        "id": 16,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "4,850"
                        ],
                        "boxes": [
                            [
                                0.5759893655776978,
                                0.3291241228580475,
                                0.6087189093232155,
                                0.3381931884214282
                            ]
                        ],
                        "range": [
                            [
                                650,
                                655
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "6/15/2021",
                        "id": 17,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "6/15/2021"
                        ],
                        "boxes": [
                            [
                                0.6162644028663635,
                                0.32898813486099243,
                                0.6728598773479462,
                                0.3374910345301032
                            ]
                        ],
                        "range": [
                            [
                                656,
                                665
                            ]
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3355,
                    5983
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 1,
                        "id2": 2,
                        "relationship": "key-pair"
                    },
                    {
                        "page": 1,
                        "id1": 3,
                        "id2": 4,
                        "relationship": "key-pair"
                    }
                ],
                "attributes": {
                    "Is document damaged": "No"
                },
                "pageAttributes": [
                    {
                        "Is page damaged?": "No"
                    }
                ],
                "tables": [
                    {
                        "x": [
                            0.3953396677970886,
                            0.4273864608258009,
                            0.567284107208252,
                            0.6124916560947895,
                            0.6735980845987797
                        ],
                        "y": [
                            0.2947663366794586,
                            0.305875051766634,
                            0.32372980611398816,
                            0.3381931884214282
                        ],
                        "rows": 3,
                        "cols": 4,
                        "box": [
                            0.3953396677970886,
                            0.2947663366794586,
                            0.6735980845987797,
                            0.3381931884214282
                        ],
                        "id": 4,
                        "page": 1,
                        "description": "Table 1"
                    }
                ],
                "plainText": {
                    "1": "Lease Id: PR0001 - 000222 Lease Profile Master Occupant Id: 00000162-1 N Axia Women's Health B Regional Womens Health Managem A HP Main Line LLC I T 227 Laurel Road M L o Echelon One, Suite 300 E Bryn Mawr PA 19010 L Voorhees NJ 08043 Legal Name: Regional Womens Health Management Tenant Id: Contact Name: Jenni Witters Tenant Type Id: Phone No: SIC Group: Fax No: NAICS Code Lease Stop: No Suite Information Current Recurring Charges Building Id: PR0001 Execution: 3/15/2021 Effective Monthly Annual Amount Suite Id: 401 Beginning: 6/15/2021 Cat. Description Sqft. Date Amount Amount PSF Lease Id: 000222 Occupancy: 9/1/2021 ABA Rent Abatements/Cor 4,850 6/15/2021 -12,125.00 -145,500.00 -30.00 Leased Sqft: 4,850 Rent Start: 6/15/2021 ABA Rent Abatements/Cor 4,850 12/1/2021 0.00 0.00 0.00 Pro-Rata Share: 0.17 Expiration: 9/30/2028 ROF Base Rent Office 4,850 6/15/2021 12,125.00 145,500.00 30.00 Ann. Mkt. Rent PSF: 0.00 Vacate: TIC Tenant Improvement 4,850 11/1/2021 3,059.54 36,714.48 7.57 UTI Utility Reimbursement 4,850 6/15/2021 808.33 9,699.96 2.00 Occupancy Status: Current Rate Change Schedule Effective Monthly Annual Amount Cat. Description Sqft. Date Amount Amount PSF ABA Rent Abatements/Con 4,850 11/1/2021 -2,575.00 -30,900.00 -6.37 ROF Base Rent Office 4,850 7/1/2022 12,367.50 148,410.00 30.60 ROF Base Rent Office 4,850 7/1/2023 12,614.04 151,368.48 31.21 ROF Base Rent Office 4,850 7/1/2024 12,868.67 154,424.04 31.84 ROF Base Rent Office 4,850 7/1/2025 13,123.29 157,479.48 32.47 ROF Base Rent Office 4,850 7/1/2026 13,386.00 160,632.00 33.12 ROF Base Rent Office 4,850 7/1/2027 13,652.75 163,833.00 33.78 ROF Base Rent - Office 4,850 7/1/2028 13,927.58 167,130.96 34.46 Lease Notes Effective Date Ref 1 Ref 2 Note 3/15/2021 ALTERTN Article 8 of Lease Landlord's consent required for any alterations, other than cosmetic Alterations which do not cost more than $1,000 per alteration and which do not affect (i) the structural portions or roof of the Premises or the 3/15/2021 ASGNSUB Article 9 Landlord consent required for any assignment/sublease. Landlord has 30 days after receipt of notice from Tenant to either approve assignment/sublease, not approve assignment/sublease, recapture the Premises 3/15/2021 DEFAULT Article 18 of Lease 1. If Tenant does not make payment within 5 days after date due, provided that, Landlord shall not more than 1 time per 12 full calendar month period of the term, deliver written notice to Tenant with respect to 3/15/2021 ESTOPEL Article 17 of Lease Estoppel required to be provided within 10 days after request. In the form set forth in Exhibit D 3/15/2021 HOLDOVR Section 19 (b) of Lease Landlord may either (i) increase Rent to 200% of the highest monthly aggregate Fixed Rent and additional 3/15/2021 INS Article 11 - Landlord responsible for repairs to all plumbing and other fixtures, equipment and systems (including replacement, if necessary) in or serving the Premises. Landlord to provide janitorial services (Exhibit E) and pest control as needed. 3/15/2021 LATECHG Article 3 of Lease Tenant shall pay Landlord a service and handling charge equal to five percent (5%) of any Rent not paid within five (5) days after the date first due, which shall apply cumulatively each month with respect to Report Id WEBX_PROFILE Database HAVERFORD Reported by Joe Staugaard 1/7/2022 11:50 Page 1"
                },
                "dimensions": [
                    {
                        "width": 1275,
                        "height": 1650
                    },
                    {
                        "width": 1275,
                        "height": 1650
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1691128396,
                "sessionTime": 14,
                "elapsedTime": 86,
                "updateTime": 1691130059,
                "selectBoundingBox": true,
                "lastUpdate": 1691130060583
            }
        }
    ],
    "ext": "pdf"
}
table:Scanned(OCR) Dataset Export With GroundTruth Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

ocr_model

str

The OCR model used for processing

use_textract_only

bool

Indicates if only Textract is used for processing

source_ref

str

Reference to the source of dataset item

document_id

str

The Id of the document

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries containing details of annotations

email

str

The email associated with user

messages

str

The messages associated with the user

role

str

The role associated with the user

elapsed time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

The content of the annotation

pdf_fingerprint

str

The fingerprint of the document

metadata

str

The metadata associated with the task and project

File

str

The name of the file

TaskId

str

The Id of the task

ocr_model

str

The OCR model used for processing

use_textract_only

bool

Indicates if only Textract is used for processing

source_ref

str

Reference to the source of dataset item

document_id

str

The Id of the document

tags

list

List of dictionaries containing the annotated tags

pages

int

The page number of selected text

text

str

The selectd text for annotation

id

str

The Id of selected text for annotation

type

str

The type of the label

kv_type

str

Flag to indicate whether tag is key or value (KEY/VAL)

words

str

The words in the selected text

boxes

list

List of bounding box coordinates for OCRed words

range

list

List of selected text box start offset and end offset using plaintext

textAdjust

str

Modified OCRed text

rawbox

str

Flag to indicate if bounding box is created manually

rotate

str

The angle of bounding box rotation(degrees)

table

list

The table information

id

str

The Id of the table

x

int

The vertical grid coordinates

y

int

The horiziontal grid coordinates

cell

bool

Flag to indicate if the current object is a cell of the table

pageoffsets

list

The list of page offsets

links

list

The list of relationship

page

int

The page number associated with key and value field

id1

int

The Id of the key field

id2

int

The Id of the value field

relationship

str

The name of the relationship

attributes

dict

The document attributes associated with task

pageAttributes

list

List of dictionaries containing the attributes for each page

tables

list

List of dictionaries containing table information

x

int

The vertical grid coordinates

y

int

The horiziontal grid coordinates

rows

int

The number of rows in the table

cols

int

The number of columns in the table

box

list

List of bounding box coordinates for OCRed words

id

int

The Id of the table

page

int

The page number of the table

description

str

The title of the table

plaintext

str

Dictionary containing page numbers and the corresponding plain text extracted from the file

dimensions

list

The dimensions of the pages

width

float

The width of the page

height

float

The height of the page

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

str

The start time of the annotation

sessionTime

str

The session time of the annotation

elapsedTime

str

The elapsed time of the annotation

updateTime

str

The update time of the annotation

lastUpdate

str

The last update time

ext

str

The extension of the local file

2. PDF Dataset

1. Dataset Export
{
    "source": "s3://EXAMPLE-BUCKET/testna.pdf",
    "name": "testna.pdf",
    "itemId": "aec104ce48aa0eece0a94c1b",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "type": "application/pdf",
    "tags": [],
    "metadata": {
        "xxx": 14,
        "presigned": "http://aaa.com"
    },
    "active": true
}
table:PDF Dataset Export Summary:

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

2. With GroundTruth Project
{
    "source": "https://sandboxdocuments.tensoract.com/presigned/33e268b66cb90138b84cc627a501afa2.pdf?sig=cc753891da92d55d769969ebf280f7aabaa8847de2ff31141c7b1869900a6c84f3b09f5fb2f4d32e27dc442f9f2841dfc94983f1e42df8569b849cb9153c866a:9ad06a28916bab71cf5140fedd06ae74:64b65760:d5e5b898249da98bf428147b361c0094",
    "name": "1810.04805.pdf",
    "itemId": "0ed98ab31666242a417504f9",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "type": "application/pdf",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "Dataset": "PDF"
    },
    "active": true,
    "project": "866ad732042bde9b94929cc3",
    "taskId": "d6aae2114d0947b1bfe5dcd3",
    "annotations": [
        {
            "email": "yannevarsha6@gmail.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 18,
            "date": "2023-07-17T09:11:08.530Z",
            "content": {
                "pdf_fingerprint": "dccb9bc542f22b2bdd94110918c68f96",
                "metadata": {
                    "File": "1810.04805.pdf",
                    "TaskId": "d6aae2114d0947b1bfe5dcd3",
                    "Type of Project": "NER"
                },
                "tags": [
                    {
                        "page": 1,
                        "text": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
                        "id": 1,
                        "type": "DATE",
                        "box": [
                            0.1957394553114858,
                            0.08355623157419612,
                            0.8080743211552288,
                            0.11953028994286674
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Jacob Devlin",
                        "id": 2,
                        "type": "PERSON",
                        "box": [
                            0.20464120844784606,
                            0.15506947083348188,
                            0.31506005550366556,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Ming-Wei Chang",
                        "id": 3,
                        "type": "PERSON",
                        "box": [
                            0.34016437686048157,
                            0.15506947083348188,
                            0.48781795335273054,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "text": "2018a",
                        "id": 4,
                        "type": "DATE",
                        "box": [
                            0.3736872717865327,
                            0.3484841506610129,
                            0.4145903056733348,
                            0.36031776312819985
                        ]
                    },
                    {
                        "page": 2,
                        "text": "(2018a)",
                        "id": 5,
                        "type": "DATE",
                        "box": [
                            0.3769863661562031,
                            0.3271071821734426,
                            0.4339806024432365,
                            0.3400650507786048
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3988,
                    8509,
                    12206,
                    17069,
                    20918,
                    25368,
                    29080,
                    33539,
                    37641,
                    42160,
                    46926,
                    50816,
                    54525,
                    58589,
                    60965,
                    64088
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 2,
                        "id2": 3,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "boxes": [
                    {
                        "page": 1,
                        "box": [
                            0.6285714285714286,
                            0.1505226480836237,
                            0.8216748768472907,
                            0.178397212543554
                        ],
                        "label": "Bounding_box"
                    },
                    {
                        "page": 2,
                        "box": [
                            0.10246305418719212,
                            0.3797909407665505,
                            0.49064039408866994,
                            0.4961672473867596
                        ],
                        "label": "Bounding_box",
                        "rotate": 22
                    }
                ],
                "plainText": {
                    "1": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova Google AI Language {jacobdevlin,mingweichang,kentonl,kristout}@google.com Abstract We introduce a new language representa- tion model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language repre- sentation models (Peters et al., 2018a; Rad- ford et al., 2018), BERT is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a re- sult, the pre-trained BERT model can be fine- tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task- specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art re- sults on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answer- ing Test F1 to 93.2 (1.5 point absolute im- provement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). 1 Introduction Language model pre-training has been shown to be effective for improving many natural language processing tasks (Dai and Le, 2015; Peters et al., 2018a; Radford et al., 2018; Howard and Ruder, 2018). These include sentence-level tasks such as natural language inference (Bowman et al., 2015; Williams et al., 2018) and paraphrasing (Dolan and Brockett, 2005), which aim to predict the re- lationships between sentences by analyzing them holistically, as well as token-level tasks such as named entity recognition and question answering, wheremodels are required to produce fine-grained output at the token level (Tjong Kim Sang and DeMeulder, 2003; Rajpurkar et al., 2016). There are two existing strategies for apply- ing pre-trained language representations to down- stream tasks: feature-based and fine-tuning. The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-specific architectures that include the pre-trained representations as addi- tional features. The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre- trained parameters. The two approaches share the same objective function during pre-training,where they use unidirectional language models to learn general language representations. We argue that current techniques restrict the power of the pre-trained representations, espe- cially for the fine-tuning approaches. The ma- jor limitation is that standard language models are unidirectional, and this limits the choice of archi- tectures that can be used during pre-training. For example, inOpenAIGPT, the authors use a left-to- right architecture, where every token can only at- tend to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such re- strictions are sub-optimal for sentence-level tasks, and could be very harmful when applying fine- tuning based approaches to token-level tasks such as question answering, where it is crucial to incor- porate context from both directions. In this paper, we improve the fine-tuning based approaches by proposing BERT: Bidirectional Encoder Representations from Transformers. BERT alleviates the previously mentioned unidi- rectionality constraint by using a “masked lan- guage model” (MLM) pre-training objective, in- spired by the Cloze task (Taylor, 1953). The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked a r X i v : 1 8 1 0 . 0 4 8 0 5 v 2     [ c s . C L ]     2 4   M a y   2 0 1 9",
                    "2": "word based only on its context. Unlike left-to- right language model pre-training, the MLM ob- jective enables the representation to fuse the left and the right context, which allows us to pre- train a deep bidirectional Transformer. In addi- tion to the masked language model, we also use a “next sentence prediction” task that jointly pre- trains text-pair representations. The contributions of our paper are as follows: • We demonstrate the importance of bidirectional pre-training for language representations. Un- like Radford et al. (2018), which uses unidirec- tional language models for pre-training, BERT uses masked language models to enable pre- trained deep bidirectional representations. This is also in contrast to Peters et al. (2018a), which uses a shallow concatenation of independently trained left-to-right and right-to-left LMs. • We show that pre-trained representations reduce the need for many heavily-engineered task- specific architectures. BERT is the first fine- tuning based representationmodel that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outper- forming many task-specific architectures. • BERT advances the state of the art for eleven NLP tasks. The code and pre-trained mod- els are available at https://github.com/ google-research/bert. 2 RelatedWork There is a long history of pre-training general lan- guage representations, and we briefly review the most widely-used approaches in this section. 2.1 Unsupervised Feature-based Approaches Learning widely applicable representations of words has been an active area of research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods. Pre-trained word embeddings are an integral part of modern NLP systems, of- fering significant improvements over embeddings learned from scratch (Turian et al., 2010). To pre- train word embedding vectors, left-to-right lan- guage modeling objectives have been used (Mnih and Hinton, 2009), as well as objectives to dis- criminate correct from incorrect words in left and right context (Mikolov et al., 2013). These approaches have been generalized to coarser granularities, such as sentence embed- dings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings (Le andMikolov, 2014). To train sentence representations, prior work has used objectives to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-right generation of next sen- tence words given a representation of the previous sentence (Kiros et al., 2015), or denoising auto- encoder derived objectives (Hill et al., 2016). ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word embedding re- search along a different dimension. They extract context-sensitive features from a left-to-right and a right-to-left language model. The contextual rep- resentation of each token is the concatenation of the left-to-right and right-to-left representations. When integrating contextual word embeddings with existing task-specific architectures, ELMo advances the state of the art for severalmajor NLP benchmarks (Peters et al., 2018a) including ques- tion answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al., 2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud et al. (2016) proposed learning contextual representations through a task to pre- dict a single word from both left and right context using LSTMs. Similar to ELMo, their model is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the cloze task can be used to improve the robustness of text generation mod- els. 2.2 Unsupervised Fine-tuning Approaches As with the feature-based approaches, the first works in this direction only pre-trained word em- bedding parameters from unlabeled text (Col- lobert andWeston, 2008). More recently, sentence or document encoders which produce contextual token representations have been pre-trained from unlabeled text and fine-tuned for a supervised downstream task (Dai and Le, 2015; Howard and Ruder, 2018; Radford et al., 2018). The advantage of these approaches is that few parameters need to be learned from scratch. At least partly due to this advantage, OpenAI GPT (Radford et al., 2018) achieved pre- viously state-of-the-art results on many sentence- level tasks from the GLUE benchmark (Wang et al., 2018a). Left-to-right language model-"
                },
                "dimensions": [
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1689583831,
                "sessionTime": 18,
                "elapsedTime": 31,
                "updateTime": 1689585066,
                "lastUpdate": 1689585068525
            }
        }
    ],
    "ext": "pdf"
}
table:PDF Dataset Export With GroundTruth Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries containing details of annotations

email

str

The email associated with user

messages

str

The messages associated with the user

role

str

The role associated with the user

elapsed time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

The content of the annotation

pdf_fingerprint

str

The fingerprint of the document

metadata

str

The metadata associated with task and project

File

str

The name of the file

TaskId

str

The Id of the task

Type of Project

bool

The metadata added in advanced setting of project

tags

list

List of dictionaries containing the annotated tags

pages

int

The page number for selected text

text

str

The select text for annotation

id

str

The Id of selected text for annotation

type

str

The type of the label

box

list

The annotation bounding box

pageoffsets

list

List of page offsets

links

list

The list of relationships

attributes

dict

The document attributes associated with task

pageAttributes

list

List of dictionaries containing the attributes for each page

plaintext

str

Dictionary containing page numbers and the corresponding plain text extracted from the file

dimensions

list

The dimensions of the pages

width

float

The width of the page

height

float

The height of the page

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

str

The start time of the annotation

sessionTime

str

The session time of the annotation

elapsedTime

str

The elapsed time of the annotation

updateTime

str

The update time of the annotation

lastUpdate

str

The last update time

ext

str

The extension of the local file,if any

3. TXT Dataset

1. Dataset Export
{
     "source": "https://********/presigned/b423bc857fcb780860add83807e61316.txt?sig=e61dcf794cd5bf4dadcca9e964de63f8b9a4f07f57ce1a65628ba45a976ab99c759c8b7fe315002b58911afddc00ff7b3e2ea51169a5389901b15c9c850f5d7f:fcc4c466036fd1ca58bfa36f53ea4507:64b761d1:6d5745dabe02fc0123cf535b1fd5cb9c",
     "name": "ca_newspapers_en_ab_the_calgary_herald_1950_05_29_issue1_page_0008.txt",
     "itemId": "214fb51145ff6524a7c5fa23",
     "datasetId": "414855a6e615c76816fba51f",
     "type": "text/plain",
     "tags": [
         "dataset tag 1"
     ],
     "metadata": {
         "Dataset": "TXT"
     },
     "active": true,
     "ext": "txt"
 }
table:TXT Dataset Export Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

2. With GroundTruth NER-project
{
    "source": "https://********/presigned/ffd1f1fd01f23a07150051eb3a0ba3ed.txt?sig=fa417407505cfdc13f08a4144f12c7a42c2794470913d6f21d4f3a9ce71f92d4714be51eed1f21f2865f6d2947fe90f4fad19710e7d91a5604a931fb1f4d064b:8daf874146dea01203e9966672b452af:64b76a36:365a5794f4451398d62e10cb08f82929",
    "name": "ca_newspapers_en_ab_edmonton_journal_1928_02_16_issue1_page_0010.txt",
    "itemId": "17a66e34546c77d6a6ed095a",
    "datasetId": "414855a6e615c76816fba51f",
    "type": "text/plain",
    "tags": [],
    "metadata": {
        "Dataset": "TXT"
    },
    "active": true,
    "project": "866ad732042bde9b94929cc3",
    "taskId": "52755d415dd68822fbdafc20",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 8,
            "date": "2023-07-18T04:40:32.847Z",
            "content": {
                "metadata": {
                    "File": "ca_newspapers_en_ab_edmonton_journal_1928_02_16_issue1_page_0010.txt",
                    "TaskId": "52755d415dd68822fbdafc20",
                    "Type of Project": "NER"
                },
                "absoluteOffsets": true,
                "tags": [
                    {
                        "page": 1,
                        "text": "BRITAIN WILL HONOR",
                        "id": 1,
                        "type": "PERSON"
                    },
                    {
                        "page": 1,
                        "text": "Grent Britain",
                        "id": 2,
                        "type": "PERSON"
                    },
                    {
                        "page": 1,
                        "text": "February 21",
                        "id": 3,
                        "type": "DATE"
                    },
                    {
                        "page": 1,
                        "text": "Earl of Oxford",
                        "id": 4,
                        "type": "ORGANIZATION"
                    },
                    {
                        "page": 1,
                        "text": "SUTTON COURTENAY",
                        "id": 5,
                        "type": "ORGANIZATION"
                    }
                ],
                "pageOffsets": [
                    0
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 1,
                        "id2": 2,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "plainText": {
                    "1": "BRITAIN WILL HONOR EARL AT ABBEY SERVICE ! But Great British Statesman Is to Be Buried Privately SUTTON COURTENAT. England, Feb. eminent men and the press of Grent Britain praised the Earl of Oxford's life of service ed mourned his death, the body of the aged state man, who died at his home here early yesterday, was carried last night to the parish church of Sutton Courtenny. The early will be buried privately and not in Il'estminster Abbey. Tals announcement was made last night by the family. and the decision was in accordance with the special wish expressed by Lord Oxford some time ago Memorial Service A memorial service for the former premier, however, will be held In the abbey at noon February 21. A simple service Tor the family w! be held In the parish church Saturday morning. Praise of the Earl of Oxford and Asquith as a great parliamentarian, a forceful, gracious debater and an the selfish servant of the nation's welfare is contained in thousands of messages of condolence published and received my his widow. All recall his activities In the early days of the war. when. as  ==========  man Is to Be Buried Privately SUTTON COURTENAY. England, Feb. 16. -Wl'hlle eminent men and the press of Grent Britain praised the Earl of Oxford's life of service ed mourned his death, the body of the aged states man, who died at his home here early yesterday, was carried last night to the parish church of Sutton Courtenay, The early will be burled privately and not in l'estminster Abbey. Tals announcement was made last night by the family. and the decision was in accordance with the special wish is. pressed by Lord Oxford some time ago Memorial Service A memorial service for the former premier, however, will be held In the abbey at noon February 21. A simple service Tor the family n! be held in the parish church Saturday morning. Praise of the Earl of Oxford and Asquith as a great parliament, a forceful, gracious debater and an the selfish servant of the nation's welfare is contained in thousands of messages of condolence published and received my his widow. All recall his activities In the early days of the war. when. prime minister, he breathed the Britisn ---------- Recall Declaration Many proudly remember his declaration In the face of Germany's seemingly irresistible advance when the  \"* We shall never sheathe the sword which we have not lightly drawn until Beiglum recovers in full measure all and more than all. she had sacrificed. until France is adequately secured against the menace of aggression ;  until the rights of the stiller nationalists of Europe are placed upon an unassailable foundation, and until the mill ward domination of Prussia Is wholly and finally destroyed  \"  ==========  Ottawa House Pays Mead of Tribute OTTAWA, Feb. 16. --Tho prime minister at the opening of the house or commons yesterday afternoon rose to suggest that the house should pause In the midst* of its duties to pay, tribute to the memory of Lord Oxtord and Asquith, Mr. King reminded the house that Lord Oxford's career cox tended over the greater part of half a century and that he had held the post of prime minister continuously for 0 longer period than any who had over held that office. As to his part in the war Premier King stated that the burden of responsibility undoubtedly affected the constitution of the former prime of Britaln and hastened his death. \"  was fitting that members of the Can'1dian committee should join with the members of l'estminster in extending sympathy to the people of Great Britain for the great old that bad been created.  ---------- Bennett Adds Word Hon. R. L. Bennett, leader of the opposition, said that it fell to the leader of the house, the prime minister to extend the sympathy of the people. Un behalf of those who sat in opposition he desired to Join In the sympathy that had been expressed. The prime minister of Canada, Mr. Bennett said, might feel that he was u worthy disciple of Mr. Asquith because the latter had held office for some time with the aid of conflicting groups in the house of commons. Mr. Asquith had been a great scholar, n great orator, and had well maintained the noble traditions of parliament. The empire had lost a very fine citizen but he had left behind him n most Inspiring legacy. Robert Gardiner (U. F /  Aendin) speaking on behalf of his Grotius joined In the tribute to a man who would he best remembered r. s. the man who had  \" at heart the Interests of the common people \""
                },
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "62de356a2f027ab62a00bef1"
                },
                "jobStart": 1689655223,
                "sessionTime": 8,
                "elapsedTime": 8,
                "updateTime": 1689655231,
                "lastUpdate": 1689655232841
            }
        }
    ],
    "ext": "txt"
}
table:TXT Dataset Export With GroundTruth NER Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries containing details of annotations

email

str

The email associated with user

messages

str

The messages associated with the user

role

str

The role associated with the user

elapsed time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.

metadata

str

The metadata associated with task and project

File

str

The name of the file

TaskId

str

The Id of the task

Type of Project

bool

The metadata added in advanced setting of project

absoluteOffsets

bool

Indicates annotation format is absolute entity offset

tags

list

List of dictionaries containing the annotated tags

pages

int

Page number of selected text

text

str

The selectd text for annotation

id

str

The Id of selected text for annotation

type

str

The type of the label

links

list

The list of relationship

attributes

dict

The document attributes associated with task

pageAttributes

list

List of dictionaries containing the attributes for each page

plaintext

str

Dictionary containing page numbers and the corresponding plain text extracted from the file

dimensions

list

The dimensions of the pages

width

float

The width of the page

height

float

The height of the page

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

str

The start time of the annotation

sessionTime

str

The session time of the annotation

elapsedTime

str

The elapsed time of the annotation

updateTime

str

The update time of the annotation

lastUpdate

str

The last update time

ext

str

The extension of the local file , if any

3. With GroundTruth Classification Project
{
        "source": "https://sandboxdocuments.tensoract.com/presigned/3c9a446180a44faa43ab2464d45633c7.txt?sig=e5128796abbbc2f33ba9b83af2a755207d9b247a3a624c7992bf8c70a5af621d95cd95c7b63b940d14832d295814310e2b7fff4622944e9ac9815d52ed507311:2e41603c382003a3c456e47b2768981f:64b7848f:2650c588fc2cced10e6118802086d776",
        "name": "business_2.txt",
        "itemId": "2d2020aa8e3deb383fb7c74f",
        "datasetId": "60974d4e9e7759842cdff3be",
        "type": "text/plain",
        "tags": [
            "dataset tag"
        ],
        "metadata": {
            "Dataset Type": "TXT"
        },
        "active": true,
        "project": "591351b938d008ca0745510a",
        "taskId": "8c939483b594e0de5d5efb54",
        "annotations": [
            {
                "email": "johndoe@me.com",
                "messages": [],
                "role": "nlp_qc",
                "elapsedTime": 8,
                "date": "2023-07-18T06:03:46.657Z",
                "content": {
                    "metadata": {
                        "File": "business_2.txt",
                        "TaskId": "8c939483b594e0de5d5efb54",
                        "Type of Project": "Classification"
                    },
                    "classificationTypes": {
                        "Select Type of Document": "select",
                        "Type of Documents": "multi",
                        "Put a note": "text"
                    },
                    "classifications": {
                        "Select Type of Document": [
                            "Technology"
                        ],
                        "Type of Documents": [
                            "Graphics",
                            "Bussiness"
                        ],
                        "Put a note": [
                            "Multi-type document"
                        ]
                    },
                    "plainText": {
                        "1": "Japanese growth grinds to a halt  Growth in Japan evaporated in the three months to September, sparking renewed concern about an economy not long out of a decade-long trough.  Output in the period grew just 0.1%, an annual rate of 0.3%. Exports - the usual engine of recovery - faltered, while domestic demand stayed subdued and corporate investment also fell short. The growth falls well short of expectations, but does mark a sixth straight quarter of expansion.  The economy had stagnated throughout the 1990s, experiencing only brief spurts of expansion amid long periods in the doldrums. One result was deflation - prices falling rather than rising - which made Japanese shoppers cautious and kept them from spending.  The effect was to leave the economy more dependent than ever on exports for its recent recovery. But high oil prices have knocked 0.2% off the growth rate, while the falling dollar means products shipped to the US are becoming relatively more expensive.  The performance for the third quarter marks a sharp downturn from earlier in the year. The first quarter showed annual growth of 6.3%, with the second showing 1.1%, and economists had been predicting as much as 2% this time around. \"Exports slowed while capital spending became weaker,\" said Hiromichi Shirakawa, chief economist at UBS Securities in Tokyo. \"Personal consumption looks good, but it was mainly due to temporary factors such as the Olympics. \"The amber light is flashing.\" The government may now find it more difficult to raise taxes, a policy it will have to implement when the economy picks up to help deal with Japan's massive public debt. "
                    },
                    "review": {
                        "rate": "Ok",
                        "note": "",
                        "reviewerId": "614b55be8af65dcf41da535b"
                    },
                    "jobStart": 1689660217,
                    "sessionTime": 8,
                    "elapsedTime": 8,
                    "updateTime": 1689660225,
                    "pageOffsets": [
                        0
                    ],
                    "lastUpdate": 1689660226654
                }
            }
        ],
        "ext": "txt"
    }
table:TXT Dataset Export With GroundTruth Classification Project Summary:

Field Names

Type

Description

source

str

The pesigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries containing details of annotations

email

str

The email associated with user

messages

str

The messages associated with the user

role

str

The role associated with the user

elapsed time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

The content of the annotation

metadata

str

The metadata associated with task and project

File

str

The name of the file

TaskId

str

The Id of the task

Type of Project

bool

The metadata added in advanced setting of project

classificationTypes

dict

Dictionary containing the labels defined in the project

Select Type of Document

str

Single Select Label

Type of Documents

str

Multi Select Label

Put a note

str

Plain Text Label

classifications

dict

Dictionary containing the classifications labels in the task

plaintext

dict

Dictionary containing page numbers and the corresponding plain text extracted from the file

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

str

The start time of the annotation

sessionTime

str

The session time of the annotation

elapsedTime

str

The elapsed time of the annotation

updateTime

str

The update time of the annotation

lastUpdate

str

The last update time

ext

str

The extension of the local file

4. Image Dataset

1. Dataset Export
{
    "source": "s3://newton-ai-internal-share/files/car1.jpeg",
    "name": "car1.jpeg",
    "itemId": "9a0498002663227e1e7d5e14",
    "datasetId": "25987f46e5febb50484e8497",
    "type": "image/jpeg",
    "tags": [
        "dataset tag"
    ],
    "metadata": {
        "Dataset Type": "Image",
        "xxx": 12,
        "presigned": "http://aaa.com"
    },
    "active": true
}
table:Image Dataset Export:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

Type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

Metadata associated with the dataset and datasetitem

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

2. With GroundTruth Bulk Image Classification Project
{
    "source": "https://**********/presigned/b2b6d70656c53b10b0a194296a3598b5.tiff?sig=d3c6212eb811deb8c325f38e829d6f542a32e8a171c062d7be0e2125ff7c62628e05b3e66d996edf089745ee35699c48bc7fb7c3db86ada01667cb3f18c54990:6d4aa6c56aa19b8ac45d6630fb465e4d:64b7b5c5:ea2ae71fa9cc9950931e3335ced82926",
    "name": "cyan.tiff",
    "itemId": "93007aa49a9288b9b460528b",
    "datasetId": "25987f46e5febb50484e8497",
    "type": "image/tiff",
    "tags": [
        "color images"
    ],
    "metadata": {
        "Dataset Type": "Image",
        "color": "cyan"
    },
    "active": true,
    "project": "e3c9b4a1dd6df1c4c7091895",
    "taskId": "ff5e6111d67ec09cb7578577",
    "annotations": [],
    "classification": "Cyan",
    "ext": "tiff"
}
table:Image Dataset Export With GroundTruth Bulk Image Classification Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

classification

str

Classified Label of task

ext

str

The extension of the local file

3. With GroundTruth Object Detection Project
{
    "source": "https://sandboxdocuments.tensoract.com/presigned/2ba176b5957d922c0b5867cf99a6895a.jpg?sig=6ef9b9444985b2d637dc53869e894b6b64779f65dc5dfa0a1d23129ed89621c2859f32988ca8fb76ed3d292a441c167dd85dd9987b27ed6997e90e4cd1051584:59e6f8bc53146080d740b01e67328244:64b8a815:f5df07dd0c032aa261ffa1470b799829",
    "name": "im00.jpg",
    "itemId": "9c8b14f93c1fbc5b2fe996ae",
    "datasetId": "abd204685e5c074b282d6744",
    "type": "image/jpeg",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "Dataset": "Image Dataset"
    },
    "active": true,
    "project": "0461637f62c18082f3c14cc3",
    "taskId": "dff56fb67e79f0cb887263cb",
    "annotations": [
        {
            "email": "jdoeqa@acme.org",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 13,
            "date": "2023-06-17T10:10:29.789Z",
            "content": {
                "url": "https://sandboxdocuments.tensoract.com/presigned/2ba176b5957d922c0b5867cf99a6895a.jpeg?sig=35d229ffb09200bbd28eb9c0ab00d7c2a446f0c85daa6b6204e0b1f043229c1e0b25b6633b477b2145d77d96a0ddd6cb7922e104e72df00583df7c9bed233058:1f88e35afe77cfb705c7dfabce067f20:648ed802:fc875ac8ba21580b86f20874bafee1d9",
                "imageWidth": 720,
                "imageHeight": 1280,
                "selected": null,
                "boxes": [
                    {
                        "x1": 310.7026,
                        "y1": 468.443,
                        "x2": 507.9414,
                        "y2": 1021.1236,
                        "id": "b0",
                        "type": "box",
                        "oid": "b1",
                        "outside_image": {},
                        "occluded": {},
                        "invisible": false,
                        "attrs": {},
                        "title": "",
                        "label": "Group 1",
                        "sub_labels": [
                            {
                                "x1": 357.9577,
                                "y1": 495.1525,
                                "id": "b1",
                                "type": "keypoint",
                                "oid": "b2",
                                "outside_image": {},
                                "occluded": {},
                                "invisible": false,
                                "title": "",
                                "label": "Top of head",
                                "sub_labels": []
                            }
                        ]
                    }
                ],
                "image_attrs": {},
                "review": {
                    "rate": "Rejected",
                    "note": ""
                },
                "jobStart": 1686996615,
                "sessionTime": 13,
                "elapsedTime": 13,
                "tsSeconds": true,
                "updateTime": 1686996628,
                "lastUpdate": 1686996629784
            }
        }
    ],
    "ext": "jpg"
}
table:Image Dataset Export With GroundTruth Object Detection Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

Metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.

url

str

The presigned URL or S3 path of the task

imageWidth

int

The width of the image in pixels

imageHeight

int

The height of the image in pixels

boxes

list

List of bounding boxes drawn around objects in the image

x1

float

The x-coordinate of the top-left corner of the bounding box

y1

float

The y-coordinate of the top-left corner of the bounding box

x2

float

The x-coordinate of the bottom-right corner of the bounding box

y2

float

The y-coordinate of the bottom-right corner of the bounding box

id

str

The Id of bounding box

type

str

Flag to indicate wheter it is box/keypoint

outside_image

dict

Indicates whether the object extends beyond the boundaries of the image

occuluded

dict

Indicates whether the object is occluded or partially hidden

attrs

dict

Represents any additional attributes or properties associated with the object

label

str

The label assigned to the bounding box

sub_labels

list

Represents any sub-labels or sub-categories associated with the object

x1

float

The x-coordinate of the top-left corner of the keypoint

y1

float

The y-coordinate of the top-left corner of the keypoint

id

str

The Id of keypoint

type

str

Flag to indicate wheter it is box/keypoint

outside_image

dict

Indicates whether the object extends beyond the boundaries of the image

occuluded

dict

Indicates whether the object is occluded or partially hidden

label

str

The label or category assigned to keypoint

sub_labels

list

Represents any sub-labels associated with the object

image_attrs

dict

The image attributes associated with the task

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

lastUpdate

int

The last update time

ext

str

Extension of local files, if any

5. Video Dataset

1. Dataset Export
 {
     "source": "s3://test-pocs/mira640.mp4",
     "name": "mira640.mp4",
     "itemId": "a90bf16c2ba7b6e9ac1a6d9d",
     "datasetId": "84431d90c6497d6ab6425dfc",
     "type": "video/mp4",
     "tags": [
     "dataset tag"
     ],
     "metadata": {
         "xxx": 11,
         "presigned": "http://aaa.com"
     },
     "active": true
}
table:Video Dataset Export Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The ID of the dataset

type

str

Type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

The metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

2. With GroundTruth Media Transcription Project
{
         "source": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=6ab60805f723d8565a15b6dfacc057e8953f99fbb21d39ff30a8a0da2f39b1cbefb9f28139b8eedc55545d6cd0fadaf56ed14a450ead8f79a48b1d233083ba63:f116065b73cb90ba8f217d0bfee72ab5:64b8acc9:3ab540299e282e90e6ece2f127f7d15a",
         "name": "Video 1.mp4",
         "itemId": "0b9ff34daa6a2f6f95c59bb3",
         "datasetId": "1ac8fb72573008ce5626bbfb",
         "type": "video/mp4",
         "tags": [
             "dataset tag1"
         ],
         "metadata": {
             "Dataset Type": "Video"
         },
         "active": true,
         "project": "c859cc0a92b7bd4d6d166707",
         "taskId": "c53ca1cace7d7e784705b631",
         "annotations": [
             {
                 "email": "jdoeqa@acme.org",
                 "messages": [],
                 "role": "nlp_qc",
                 "elapsedTime": 62,
                 "date": "2023-06-17T12:16:38.937Z",
                 "content": {
                     "review": {
                         "rate": "Ok",
                         "note": "",
                         "reviewerId": "63ca81cd31d698c1825328f3"
                     },
                     "videoSource": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=9f13caf4ac583fbc9c074b91770e579120196b3f08e192ad641290c246a7add58f6d6f40ce1aa63e031ada96ff8c996038e4ed46516ffce1f56262cc6a435eeb:062d59d190dc03e090dcd5ee5ff17faa:648ef563:ac1a510bd2fb6c0ab328a3202eb9c846",
                     "streams": {
                         "Transcription": [
                             {
                                 "start": 0.025000260441083017,
                                 "end": 0.9500098967611547,
                                 "confidence": 1,
                                 "text": "Bonjoi"
                             },
                             {
                                 "start": 0.9500098967611547,
                                 "end": 2.0812716817201613,
                                 "confidence": 1,
                                 "text": "Tava Tuti"
                             },
                             {
                                 "start": 2.1187720723817858,
                                 "end": 3.156282880686731,
                                 "confidence": 1,
                                 "text": "Hello"
                             },
                             {
                                 "start": 3.156282880686731,
                                 "end": 4.13629310905087,
                                 "confidence": 1,
                                 "text": "Ola"
                             },
                             {
                                 "start": 4.165043351337061,
                                 "end": 4.608797974166285,
                                 "confidence": 1,
                                 "text": "Tutu beng"
                             },
                             {
                                 "start": 10.453858750849383,
                                 "end": 12.07887567951978,
                                 "confidence": 1,
                                 "text": "oye tutubeng"
                             }
                         ],
                         "Language Segmentation": [
                             {
                                 "start": 0.018750195330812264,
                                 "end": 0.5687559250346386,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 0.5937561854757216,
                                 "end": 2.0937718119407025,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 2.0937718119407025,
                                 "end": 3.1650329694568993,
                                 "confidence": 1,
                                 "tag": "English"
                             },
                             {
                                 "start": 3.2437837922305217,
                                 "end": 4.156293298330052,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 4.250044274984113,
                                 "end": 6.318815826483732,
                                 "confidence": 1,
                                 "tag": "Russian"
                             },
                             {
                                 "start": 6.8338213441595235,
                                 "end": 8.190085473088276,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 8.321336840403962,
                                 "end": 9.865102922640839,
                                 "confidence": 1,
                                 "tag": "English"
                             },
                             {
                                 "start": 9.915103443523005,
                                 "end": 11.071365488923096,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 11.233867181790135,
                                 "end": 12.6088815060497,
                                 "confidence": 1,
                                 "tag": "Arabic"
                             }
                         ]
                     },
                     "mediaAttributes": {
                         "Is Video Clear?": "Yes",
                         "Aditional Notes": ""
                     },
                     "jobStart": 1687003443,
                     "sessionTime": 62,
                     "elapsedTime": 93.075,
                     "tsSeconds": true,
                     "updateTime": 1687004194,
                     "metadata": {
                         "File": "Video 1.mp4",
                         "TaskId": "c53ca1cace7d7e784705b631",
                         "Type": "Media Transcription"
                     },
                     "lastUpdate": 1687004198934
                 }
             }
         ],
         "ext": "mp4"
     }
table:Video Dataset Export With GroundTruth Media Transcription Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

The metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

videoSource

str

The presigned URL or S3 path of the task

streams

dict

Dictionary of different streams within the video, each containing specific information

Transcription

list

The stream containing transcribed text from

the video

start

float

The starting timestamp(in seconds) of the

transcribed text segment

end

float

The ending timestamp(in seconds) of the the transcribed text segment

confidence

int

Indicates the confidence level

text

str

The actual transcribed text for the corresponding segment

Segmentation

list

The stream containing information about the segemtations in the video

start

float

The starting timestamp of the segment

end

float

The ending timestamp of the segment

confidence

int

Indicates the confidence level

tag

str

The tag in the corresponding segment

mediaAttribute

dict

The media attributes associated with the task

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

tsSeconds

bool

update time

int

The update time of the annotation

metadata

dict

Dictionary containing metadata of the task and

the project

lastUpdate

int

The last update time

ext(localfiles)

str

Extension of local files, if any

6. Audio Dataset

1. Dataset Export
{
    "source": "s3://test-pocs/mira640.mp3",
    "name": "mira640.mp3",
    "itemId": "4012452ecf60608061c2baed",
    "datasetId": "a38d3e9d800fc55e079a3b1d",
    "type": "audio/mpeg",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "DATASET Type": "Audio",
        "xxx": 11,
        "presigned": "http://aaa.com"
    },
    "active": true
}
table:Audio Dataset Export Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

Type of the dataset

tags

list

List of tags associated with the dataset item

metadata

dict

The metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

ext(localfiles)

str

Extension of local files, if any

3. With GroundTruth Media Transcription Project
{

         "source": "s3://test-pocs/mira640.mp3",
         "name": "mira640.mp3",
         "itemId": "1c944e058058dbe48c980ead",
         "datasetId": "edd9c7d7ae5c1b0c2bc73643",
         "type": "audio/mpeg",
         "tags": [
             "dataset tag"
         ],
         "metadata": {
             "Dataset Type": "Audio"
         },
         "active": true,
         "project": "c859cc0a92b7bd4d6d166707",
         "taskId": "eba777874d6d268ece56b33a",
         "annotations": [
             {
                 "email": "johndoe@me.com",
                 "messages": [],
                 "role": "nlp_qc",
                 "elapsedTime": 6,
                 "date": "2023-07-19T04:01:20.036Z",
                 "content": {
                     "review": {
                         "rate": "Ok",
                         "note": "",
                         "reviewerId": "614b55be8af65dcf41da535b"
                     },
                     "audioSource": "https://test-pocs.s3.amazonaws.com/mira640.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUD4REC47DTY4PF7A%2F20230719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230719T040111Z&X-Amz-Expires=7200&X-Amz-Signature=10048222e39f53cd9dbbb2b97b8aa210e323d8fc544a9ea18ff46e5922974f44&X-Amz-SignedHeaders=host",
                     "streams": {
                         "Transcription": [
                             {
                                 "start": 0.018749917353218702,
                                 "end": 0.6312472175583629,
                                 "confidence": 1,
                                 "text": "Bonjoi"
                             },
                             {
                                 "start": 0.7187468318733835,
                                 "end": 1.881241707772943,
                                 "confidence": 1,
                                 "text": "Tava Tuti"
                             },
                             {
                                 "start": 2.0124911292454737,
                                 "end": 2.4874890355270143,
                                 "confidence": 1,
                                 "text": "Hello"
                             }
                         ],
                         "Language Segmentation": [
                             {
                                 "start": 0.006249972451072901,
                                 "end": 0.4437480440261759,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 0.5437476032433424,
                                 "end": 1.8687417628707974,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 1.9187415424793806,
                                 "end": 2.6687382366081285,
                                 "confidence": 1,
                                 "tag": "English"
                             }
                         ]
                     },
                     "mediaAttributes": {
                         "Is Video Clear?": "Yes",
                         "Aditional Notes": ""
                     },
                     "jobStart": 1689739272,
                     "sessionTime": 6,
                     "elapsedTime": 6,
                     "tsSeconds": true,
                     "updateTime": 1689739278,
                     "lastUpdate": 1689739280033
                 }
             }
         ]
     }
table:Audio Dataset Export With GroundTruth Media Transcription Project Summary:

Field Names

Type

Description

source

str

The presigned URL or S3 path of the data source

name

str

The name of the dataset item

itemId

str

The Id of the dataset item

datasetId

str

The Id of the dataset

type

str

The type of the dataset

tags

list

List of tags associated with the dataset item

metadata

str

The metadata associated with the dataset and dataset item

active

bool

Indicates the dataset item is currently active

project

str

The project associated with the dataset item

taskId

str

The Id of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

audioSource

str

The presigned URL or S3 path of the task

streams

dict

Dictionary of different streams within the video, each containing specific information

Transcription

list

The stream containing transcribed text from

the video

start

float

The starting timestamp(in seconds) of the

transcribed text segment

end

float

The ending timestamp(in seconds) of the the transcribed text segment

confidence

int

Indicates the confidence level

text

str

The actual transcribed text for the corresponding segment

Segmentation

list

The stream containing information about the segemtations in the video

start

float

The starting timestamp of the segment

end

float

The ending timestamp of the segment

confidence

int

Indicates the confidence level

tag

str

The tag in the corresponding segment

mediaAttribute

dict

The media attributes associated with the task

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

metadata

dict

Dictionary containing metadata of the task and

the project

lastUpdate

int

The last update time

ext(localfiles)

str

Extension of local files, if any

Project Exports

1. OCR Project

{
     "project_id": "7b3020dd437ce2a30bae1c5a",
     "project_name": "Test-OCR-Project-1",
     "project_type": "OCR",
     "datasetId": "e3773b85655ea8646005158a",
     "itemId": "9cebea4c95edc877ca6f2603",
     "file_name": "ABSTRACT - Axia.tiff",
     "file_type": "application/pdf",
     "source": "https://sandboxdocuments.tensoract.com/presigned/a01f5c95d843b4fd4f890570e5cac51c.pdf?sig=fc8cd601e4b6d76de180378b1663c1b8b1ac21c2a82fd7909bf959bce43964344830f137475dd53602d458cbd780b08626379b8329f5dea96f7bdf78b727d5f2:60847ae533c350f801adca47a54b6cfb:64cdee0b:bad3e8fb52c965453a0f9fd8ffde6c9e",
     "state": 4,
     "task_id": "0931952ce4a27f53a3678cfe",
     "state_description": "Approved",
     "annotations": [
         {
             "email": "yannevarsha6@gmail.com",
             "messages": [],
             "role": "Reviewer",
             "elapsedTime": 14,
             "date": "2023-08-04T06:21:00.589Z",
             "content": {
                 "pdf_fingerprint": "c04f692d342c06d433f751ac32c6d8b1",
                 "metadata": {
                     "ocr_model": "Textract (default)",
                     "use-textract-only": true,
                     "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
                     "document_id": "9cebea4c95edc877ca6f2603",
                     "Type of Project": "OCR"
                 },
                 "tags": [
                     {
                         "page": 1,
                         "text": "N A M E",
                         "id": 1,
                         "type": "Name",
                         "kv_type": "key",
                         "words": [
                             "N",
                             "A",
                             "M",
                             "E"
                         ],
                         "boxes": [
                             [
                                 0.06499018520116806,
                                 0.11739349365234375,
                                 0.07347860559821129,
                                 0.12546881940215826
                             ],
                             [
                                 0.06458062678575516,
                                 0.13079734146595,
                                 0.0742951761931181,
                                 0.1387380100786686
                             ],
                             [
                                 0.06520503759384155,
                                 0.14403623342514038,
                                 0.07536023296415806,
                                 0.15211013052612543
                             ],
                             [
                                 0.06526166200637817,
                                 0.15757058560848236,
                                 0.07337938901036978,
                                 0.16564789321273565
                             ]
                         ],
                         "range": [
                             [
                                 71,
                                 72
                             ],
                             [
                                 126,
                                 127
                             ],
                             [
                                 165,
                                 166
                             ],
                             [
                                 194,
                                 195
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Axia Women's Health",
                         "id": 2,
                         "type": "Name",
                         "textAdjust": "Axia Women's",
                         "kv_type": "value",
                         "words": [
                             "Axia",
                             "Women's",
                             "Health"
                         ],
                         "boxes": [
                             [
                                 0.0935770571231842,
                                 0.11707708239555359,
                                 0.11941905505955219,
                                 0.1253887191414833
                             ],
                             [
                                 0.12276646494865417,
                                 0.11710146069526672,
                                 0.17684946581721306,
                                 0.1254600789397955
                             ],
                             [
                                 0.18119750916957855,
                                 0.11732043325901031,
                                 0.21823260188102722,
                                 0.12542327493429184
                             ]
                         ],
                         "range": [
                             [
                                 73,
                                 77
                             ],
                             [
                                 78,
                                 85
                             ],
                             [
                                 86,
                                 92
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "BILL TO",
                         "id": 3,
                         "type": "Name",
                         "rawBox": true,
                         "kv_type": "key",
                         "words": [
                             "BILL TO"
                         ],
                         "boxes": [
                             [
                                 0.4980276134122288,
                                 0.10967250571210967,
                                 0.5374753451676528,
                                 0.1706016755521706
                             ]
                         ],
                         "range": []
                     },
                     {
                         "page": 1,
                         "text": "Regional Womens Health",
                         "id": 4,
                         "type": "Name",
                         "rotate": 24,
                         "rawBox": true,
                         "kv_type": "value",
                         "words": [
                             "Regional Womens Health"
                         ],
                         "boxes": [
                             [
                                 0.5473372781065089,
                                 0.11119573495811119,
                                 0.7682445759368837,
                                 0.12795125666412796
                             ]
                         ],
                         "range": []
                     },
                     {
                         "page": 1,
                         "text": "Cat.",
                         "id": 5,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Cat."
                         ],
                         "boxes": [
                             [
                                 0.39583876729011536,
                                 0.3084534704685211,
                                 0.4190108198672533,
                                 0.31684120278805494
                             ]
                         ],
                         "range": [
                             [
                                 543,
                                 547
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Cat.",
                         "id": 6,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Cat."
                         ],
                         "boxes": [
                             [
                                 0.39583876729011536,
                                 0.3084534704685211,
                                 0.4190108198672533,
                                 0.31684120278805494
                             ]
                         ],
                         "range": [
                             [
                                 543,
                                 547
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Description",
                         "id": 7,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Description"
                         ],
                         "boxes": [
                             [
                                 0.4328092038631439,
                                 0.3084268271923065,
                                 0.49752890318632126,
                                 0.3184952298179269
                             ]
                         ],
                         "range": [
                             [
                                 548,
                                 559
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Description",
                         "id": 8,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Description"
                         ],
                         "boxes": [
                             [
                                 0.4328092038631439,
                                 0.3084268271923065,
                                 0.49752890318632126,
                                 0.3184952298179269
                             ]
                         ],
                         "range": [
                             [
                                 548,
                                 559
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Effective",
                         "id": 9,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 0,
                             "cell": true
                         },
                         "words": [
                             "Effective"
                         ],
                         "boxes": [
                             [
                                 0.6239141225814819,
                                 0.2947663366794586,
                                 0.6735980845987797,
                                 0.30344805866479874
                             ]
                         ],
                         "range": [
                             [
                                 476,
                                 485
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Sqft.",
                         "id": 10,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Sqft."
                         ],
                         "boxes": [
                             [
                                 0.5750880241394043,
                                 0.30830204486846924,
                                 0.6010445598512888,
                                 0.3183623990043998
                             ]
                         ],
                         "range": [
                             [
                                 560,
                                 565
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Sqft.",
                         "id": 11,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Sqft."
                         ],
                         "boxes": [
                             [
                                 0.5750880241394043,
                                 0.30830204486846924,
                                 0.6010445598512888,
                                 0.3183623990043998
                             ]
                         ],
                         "range": [
                             [
                                 560,
                                 565
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "ABA",
                         "id": 12,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "ABA"
                         ],
                         "boxes": [
                             [
                                 0.3953396677970886,
                                 0.3291471600532532,
                                 0.42196371778845787,
                                 0.3373938351869583
                             ]
                         ],
                         "range": [
                             [
                                 626,
                                 629
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Date",
                         "id": 13,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Date"
                         ],
                         "boxes": [
                             [
                                 0.6240901350975037,
                                 0.3085164725780487,
                                 0.6510729901492596,
                                 0.31685456447303295
                             ]
                         ],
                         "range": [
                             [
                                 566,
                                 570
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Date",
                         "id": 14,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Date"
                         ],
                         "boxes": [
                             [
                                 0.6240901350975037,
                                 0.3085164725780487,
                                 0.6510729901492596,
                                 0.31685456447303295
                             ]
                         ],
                         "range": [
                             [
                                 566,
                                 570
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Rent Abatements/Cor",
                         "id": 15,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "Rent",
                             "Abatements/Cor"
                         ],
                         "boxes": [
                             [
                                 0.4329037368297577,
                                 0.3290809392929077,
                                 0.4603371527045965,
                                 0.3374354373663664
                             ],
                             [
                                 0.46285462379455566,
                                 0.32896438241004944,
                                 0.5594801902770996,
                                 0.3374544633552432
                             ]
                         ],
                         "range": [
                             [
                                 630,
                                 634
                             ],
                             [
                                 635,
                                 649
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "4,850",
                         "id": 16,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "4,850"
                         ],
                         "boxes": [
                             [
                                 0.5759893655776978,
                                 0.3291241228580475,
                                 0.6087189093232155,
                                 0.3381931884214282
                             ]
                         ],
                         "range": [
                             [
                                 650,
                                 655
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "6/15/2021",
                         "id": 17,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "6/15/2021"
                         ],
                         "boxes": [
                             [
                                 0.6162644028663635,
                                 0.32898813486099243,
                                 0.6728598773479462,
                                 0.3374910345301032
                             ]
                         ],
                         "range": [
                             [
                                 656,
                                 665
                             ]
                         ]
                     }
                 ],
                 "pageOffsets": [
                     0,
                     3355,
                     5983
                 ],
                 "links": [
                     {
                         "page": 1,
                         "id1": 1,
                         "id2": 2,
                         "relationship": "key-pair"
                     },
                     {
                         "page": 1,
                         "id1": 3,
                         "id2": 4,
                         "relationship": "key-pair"
                     }
                 ],
                 "attributes": {
                     "Is document damaged": "No"
                 },
                 "pageAttributes": [
                     {
                         "Is page damaged?": "No"
                     }
                 ],
                 "tables": [
                     {
                         "x": [
                             0.3953396677970886,
                             0.4273864608258009,
                             0.567284107208252,
                             0.6124916560947895,
                             0.6735980845987797
                         ],
                         "y": [
                             0.2947663366794586,
                             0.305875051766634,
                             0.32372980611398816,
                             0.3381931884214282
                         ],
                         "rows": 3,
                         "cols": 4,
                         "box": [
                             0.3953396677970886,
                             0.2947663366794586,
                             0.6735980845987797,
                             0.3381931884214282
                         ],
                         "id": 4,
                         "page": 1,
                         "mergedList": null,
                         "description": "Table 1"
                     }
                 ],
                 "plainText": {
                     "1": "Lease Id: PR0001 - 000222 Lease Profile Master Occupant Id: 00000162-1 N Axia Women's Health B Regional Womens Health Managem A HP Main Line LLC I T 227 Laurel Road M L o Echelon One, Suite 300 E Bryn Mawr PA 19010 L Voorhees NJ 08043 Legal Name: Regional Womens Health Management Tenant Id: Contact Name: Jenni Witters Tenant Type Id: Phone No: SIC Group: Fax No: NAICS Code Lease Stop: No Suite Information Current Recurring Charges Building Id: PR0001 Execution: 3/15/2021 Effective Monthly Annual Amount Suite Id: 401 Beginning: 6/15/2021 Cat. Description Sqft. Date Amount Amount PSF Lease Id: 000222 Occupancy: 9/1/2021 ABA Rent Abatements/Cor 4,850 6/15/2021 -12,125.00 -145,500.00 -30.00 Leased Sqft: 4,850 Rent Start: 6/15/2021 ABA Rent Abatements/Cor 4,850 12/1/2021 0.00 0.00 0.00 Pro-Rata Share: 0.17 Expiration: 9/30/2028 ROF Base Rent Office 4,850 6/15/2021 12,125.00 145,500.00 30.00 Ann. Mkt. Rent PSF: 0.00 Vacate: TIC Tenant Improvement 4,850 11/1/2021 3,059.54 36,714.48 7.57 UTI Utility Reimbursement 4,850 6/15/2021 808.33 9,699.96 2.00 Occupancy Status: Current Rate Change Schedule Effective Monthly Annual Amount Cat. Description Sqft. Date Amount Amount PSF ABA Rent Abatements/Con 4,850 11/1/2021 -2,575.00 -30,900.00 -6.37 ROF Base Rent Office 4,850 7/1/2022 12,367.50 148,410.00 30.60 ROF Base Rent Office 4,850 7/1/2023 12,614.04 151,368.48 31.21 ROF Base Rent Office 4,850 7/1/2024 12,868.67 154,424.04 31.84 ROF Base Rent Office 4,850 7/1/2025 13,123.29 157,479.48 32.47 ROF Base Rent Office 4,850 7/1/2026 13,386.00 160,632.00 33.12 ROF Base Rent Office 4,850 7/1/2027 13,652.75 163,833.00 33.78 ROF Base Rent - Office 4,850 7/1/2028 13,927.58 167,130.96 34.46 Lease Notes Effective Date Ref 1 Ref 2 Note 3/15/2021 ALTERTN Article 8 of Lease Landlord's consent required for any alterations, other than cosmetic Alterations which do not cost more than $1,000 per alteration and which do not affect (i) the structural portions or roof of the Premises or the 3/15/2021 ASGNSUB Article 9 Landlord consent required for any assignment/sublease. Landlord has 30 days after receipt of notice from Tenant to either approve assignment/sublease, not approve assignment/sublease, recapture the Premises 3/15/2021 DEFAULT Article 18 of Lease 1. If Tenant does not make payment within 5 days after date due, provided that, Landlord shall not more than 1 time per 12 full calendar month period of the term, deliver written notice to Tenant with respect to 3/15/2021 ESTOPEL Article 17 of Lease Estoppel required to be provided within 10 days after request. In the form set forth in Exhibit D 3/15/2021 HOLDOVR Section 19 (b) of Lease Landlord may either (i) increase Rent to 200% of the highest monthly aggregate Fixed Rent and additional 3/15/2021 INS Article 11 - Landlord responsible for repairs to all plumbing and other fixtures, equipment and systems (including replacement, if necessary) in or serving the Premises. Landlord to provide janitorial services (Exhibit E) and pest control as needed. 3/15/2021 LATECHG Article 3 of Lease Tenant shall pay Landlord a service and handling charge equal to five percent (5%) of any Rent not paid within five (5) days after the date first due, which shall apply cumulatively each month with respect to Report Id WEBX_PROFILE Database HAVERFORD Reported by Joe Staugaard 1/7/2022 11:50 Page 1"
                 },
                 "dimensions": [
                     {
                         "width": 1275,
                         "height": 1650
                     },
                     {
                         "width": 1275,
                         "height": 1650
                     }
                 ],
                 "review": {
                     "rate": "Ok",
                     "note": "",
                     "reviewerId": "61685a5eb492d0845eb5e6b4"
                 },
                 "jobStart": 1691128396,
                 "sessionTime": 14,
                 "elapsedTime": 86,
                 "updateTime": 1691130059,
                 "selectBoundingBox": true,
                 "lastUpdate": 1691130060583
             }
         }
     ]
 }
table:OCR-Project-Manifest:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset item

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

list

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.

pdf_fingerprint

str

The fingerprint of the document

metadata

dict

Dictionary containing metadata of the task and the project

ocr_model

str

The OCR model used for processing

use_textract_only

bool

Indicates if only textract is used for processing

source_ref

str

The reference to the source of the document

document_id

str

The Id of the document

tags

list

List of dictionaries containing the tags added in the task

page

int

The page number for selected text

text

str

The selected text for annotation

id

int

The Id of selected text for annotation

type

str

The type of the label

kv_type

str

Flag to indicate whether tag is key or value (KEY/VAL)

words

str

The words in the selected text

boxes

list

List of bounding box coordinates for OCRed words

range

str

List of selected text box start offset and end offset using plaintext

textAdjust

str

Modified OCRed text

rawbox

bool

Flag to indicate if bounding box is created manually

rotate

str

The angle of bounding box rotation(degrees)

table

list

The table information

id

int

The id of the table

x

int

The vertical grid coordinates

y

int

The horiziontal grid coordinates

cell

bool

Flag to indicate if the current object is a cell of the table

pageoffsets

list

List of page offsets

links

list

List containing the relationships added in the task

page

int

The page number associated with key and value field

id1

int

The Id of the key field

id2

int

The Id of the value field

relationship

str

The name of the relationship

attributes

dict

The document attributes associated with the task

pageattributes

list

List of dictionaries containing the attributes for each page

tables

list

List of dictionaries containing table information

x

int

The vertical grid coordinates

y

int

The horiziontal grid coordinates

rows

int

The number of rows in the table

cols

int

The number of columns in the table

box

list

List of bounding box coordinates for OCRed words

id

int

The Id of the table

page

int

The page number of the table

description

str

The title of the table

plaintext

dict

Dictionary containing page numbers and the corresponding plain text extracted from the file

dimensions

list

List of dictionaries containing dimensions of pages in the task

width

float

The width of the page

height

flaot

The height of the page

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviwer

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

lastUpdate

int

The last update time

2. NER Project

{
    "project_id": "866ad732042bde9b94929cc3",
    "project_name": "NER-Project-DB",
    "project_type": "NER",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "itemId": "0ed98ab31666242a417504f9",
    "file_name": "1810.04805.pdf",
    "file_type": "application/pdf",
    "source": "https://sandboxdocuments.tensoract.com/presigned/33e268b66cb90138b84cc627a501afa2.pdf?sig=64e0f921a163164ebdac2b74a35f80c4dee52434a405f990a9163ea306ebb99cb1ee12cb6fba3d313a531f39c0f9195083dbb2582d9d397a00553ea403d7cc4e:102e036d864eee6141450c9ad545cf66:64b8d6cf:1e215800c51838b6308d8fb24fc60adc",
    "state": 4,
    "task_id": "d6aae2114d0947b1bfe5dcd3",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 18,
            "date": "2023-07-17T09:11:08.530Z",
            "content": {
                "pdf_fingerprint": "dccb9bc542f22b2bdd94110918c68f96",
                "metadata": {
                    "File": "1810.04805.pdf",
                    "TaskId": "d6aae2114d0947b1bfe5dcd3",
                    "Type of Project": "NER"
                },
                "tags": [
                    {
                        "page": 1,
                        "range": [
                            0,
                            80
                        ],
                        "text": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
                        "id": 1,
                        "type": "DATE",
                        "box": [
                            0.1957394553114858,
                            0.08355623157419612,
                            0.8080743211552288,
                            0.11953028994286674
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            81,
                            93
                        ],
                        "text": "Jacob Devlin",
                        "id": 2,
                        "type": "PERSON",
                        "box": [
                            0.20464120844784606,
                            0.15506947083348188,
                            0.31506005550366556,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            94,
                            108
                        ],
                        "text": "Ming-Wei Chang",
                        "id": 3,
                        "type": "PERSON",
                        "box": [
                            0.34016437686048157,
                            0.15506947083348188,
                            0.48781795335273054,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            423,
                            428
                        ],
                        "text": "2018a",
                        "id": 4,
                        "type": "DATE",
                        "box": [
                            0.3736872717865327,
                            0.3484841506610129,
                            0.4145903056733348,
                            0.36031776312819985
                        ]
                    },
                    {
                        "page": 2,
                        "range": [
                            743,
                            750
                        ],
                        "text": "(2018a)",
                        "id": 5,
                        "type": "DATE",
                        "box": [
                            0.3769863661562031,
                            0.3271071821734426,
                            0.4339806024432365,
                            0.3400650507786048
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3988,
                    8509,
                    12206,
                    17069,
                    20918,
                    25368,
                    29080,
                    33539,
                    37641,
                    42160,
                    46926,
                    50816,
                    54525,
                    58589,
                    60965,
                    64088
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 2,
                        "id2": 3,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "boxes": [
                    {
                        "page": 1,
                        "box": [
                            0.6285714285714286,
                            0.1505226480836237,
                            0.8216748768472907,
                            0.178397212543554
                        ],
                        "label": "Bounding_box"
                    },
                    {
                        "page": 2,
                        "box": [
                            0.10246305418719212,
                            0.3797909407665505,
                            0.49064039408866994,
                            0.4961672473867596
                        ],
                        "label": "Bounding_box",
                        "rotate": 22
                    }
                ],
                "plainText": {
                    "1": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova Google AI Language {jacobdevlin,mingweichang,kentonl,kristout}@google.com Abstract We introduce a new language representa- tion model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language repre- sentation models (Peters et al., 2018a; Rad- ford et al., 2018), BERT is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a re- sult, the pre-trained BERT model can be fine- tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task- specific architecture modifications. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art re- sults on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answer- ing Test F1 to 93.2 (1.5 point absolute im- provement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). 1 Introduction Language model pre-training has been shown to be effective for improving many natural language processing tasks (Dai and Le, 2015; Peters et al., 2018a; Radford et al., 2018; Howard and Ruder, 2018). These include sentence-level tasks such as natural language inference (Bowman et al., 2015; Williams et al., 2018) and paraphrasing (Dolan and Brockett, 2005), which aim to predict the re- lationships between sentences by analyzing them holistically, as well as token-level tasks such as named entity recognition and question answering, wheremodels are required to produce fine-grained output at the token level (Tjong Kim Sang and DeMeulder, 2003; Rajpurkar et al., 2016). There are two existing strategies for apply- ing pre-trained language representations to down- stream tasks: feature-based and fine-tuning. The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-specific architectures that include the pre-trained representations as addi- tional features. The fine-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-specific parameters, and is trained on the downstream tasks by simply fine-tuning all pre- trained parameters. The two approaches share the same objective function during pre-training,where they use unidirectional language models to learn general language representations. We argue that current techniques restrict the power of the pre-trained representations, espe- cially for the fine-tuning approaches. The ma- jor limitation is that standard language models are unidirectional, and this limits the choice of archi- tectures that can be used during pre-training. For example, inOpenAIGPT, the authors use a left-to- right architecture, where every token can only at- tend to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such re- strictions are sub-optimal for sentence-level tasks, and could be very harmful when applying fine- tuning based approaches to token-level tasks such as question answering, where it is crucial to incor- porate context from both directions. In this paper, we improve the fine-tuning based approaches by proposing BERT: Bidirectional Encoder Representations from Transformers. BERT alleviates the previously mentioned unidi- rectionality constraint by using a “masked lan- guage model” (MLM) pre-training objective, in- spired by the Cloze task (Taylor, 1953). The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked a r X i v : 1 8 1 0 . 0 4 8 0 5 v 2     [ c s . C L ]     2 4   M a y   2 0 1 9",
                    "2": "word based only on its context. Unlike left-to- right language model pre-training, the MLM ob- jective enables the representation to fuse the left and the right context, which allows us to pre- train a deep bidirectional Transformer. In addi- tion to the masked language model, we also use a “next sentence prediction” task that jointly pre- trains text-pair representations. The contributions of our paper are as follows: • We demonstrate the importance of bidirectional pre-training for language representations. Un- like Radford et al. (2018), which uses unidirec- tional language models for pre-training, BERT uses masked language models to enable pre- trained deep bidirectional representations. This is also in contrast to Peters et al. (2018a), which uses a shallow concatenation of independently trained left-to-right and right-to-left LMs. • We show that pre-trained representations reduce the need for many heavily-engineered task- specific architectures. BERT is the first fine- tuning based representationmodel that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outper- forming many task-specific architectures. • BERT advances the state of the art for eleven NLP tasks. The code and pre-trained mod- els are available at https://github.com/ google-research/bert. 2 RelatedWork There is a long history of pre-training general lan- guage representations, and we briefly review the most widely-used approaches in this section. 2.1 Unsupervised Feature-based Approaches Learning widely applicable representations of words has been an active area of research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods. Pre-trained word embeddings are an integral part of modern NLP systems, of- fering significant improvements over embeddings learned from scratch (Turian et al., 2010). To pre- train word embedding vectors, left-to-right lan- guage modeling objectives have been used (Mnih and Hinton, 2009), as well as objectives to dis- criminate correct from incorrect words in left and right context (Mikolov et al., 2013). These approaches have been generalized to coarser granularities, such as sentence embed- dings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings (Le andMikolov, 2014). To train sentence representations, prior work has used objectives to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-right generation of next sen- tence words given a representation of the previous sentence (Kiros et al., 2015), or denoising auto- encoder derived objectives (Hill et al., 2016). ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word embedding re- search along a different dimension. They extract context-sensitive features from a left-to-right and a right-to-left language model. The contextual rep- resentation of each token is the concatenation of the left-to-right and right-to-left representations. When integrating contextual word embeddings with existing task-specific architectures, ELMo advances the state of the art for severalmajor NLP benchmarks (Peters et al., 2018a) including ques- tion answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al., 2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud et al. (2016) proposed learning contextual representations through a task to pre- dict a single word from both left and right context using LSTMs. Similar to ELMo, their model is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the cloze task can be used to improve the robustness of text generation mod- els. 2.2 Unsupervised Fine-tuning Approaches As with the feature-based approaches, the first works in this direction only pre-trained word em- bedding parameters from unlabeled text (Col- lobert andWeston, 2008). More recently, sentence or document encoders which produce contextual token representations have been pre-trained from unlabeled text and fine-tuned for a supervised downstream task (Dai and Le, 2015; Howard and Ruder, 2018; Radford et al., 2018). The advantage of these approaches is that few parameters need to be learned from scratch. At least partly due to this advantage, OpenAI GPT (Radford et al., 2018) achieved pre- viously state-of-the-art results on many sentence- level tasks from the GLUE benchmark (Wang et al., 2018a). Left-to-right language model-"
                },
                "dimensions": [
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1689583831,
                "sessionTime": 18,
                "elapsedTime": 31,
                "updateTime": 1689585066,
                "lastUpdate": 1689585068525
            }
        }
    ]
}
table:NER-Project-Manifest:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset item

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

list | The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.

pdf_fingerprint

str

The fingerprint of the document

metadata

dict

Dictionary containing metadata of the task and the project

File

str

The name of the file

Task_id

str

The Id of the task

Type of Project

str

The metadata added in advanced setting of project

tags

list

List of dictionaries containing the tags added in the task

page

int

The page number for selected text

range

list

Selected text box start offset and end offset using plaintext

text

str

The selected text for annotation

id

int

The id of selected text for annotation

type

str

The type of the label

box

list

The annotation bounding box

pageoffsets

list

List of page offsets

link

list

List containing the relationships added in the task

id1

number

The Id of the first annotation field

id2

number

The Id of the second annotation field

relationship

str

The name of the relationship

attributes

dict

The document attributes associated with the task

pageattributes

list

List of dictionaries containing the attributes for each page

boxes

list

List of dictionaries containing details of bounding box

page

int

The page number in which bounding box is created

box

list

The annotation bounding box

labels

str

The type of label

rotate

int

The angle of bounding box rotation (degrees)

plaintext

dict

Dictionary containing page numbers and the corresponding plain text extracted from the file

dimensions

list

List of dictionaries containing dimensions of pages in the task

width

float

The width of the page

height

float

The height of the page

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

lastUpdate

int

The last update time

3. Classification Project

{
    "project_id": "591351b938d008ca0745510a",
    "project_name": "Classification-Project-1",
    "project_type": "Classification",
    "datasetId": "60974d4e9e7759842cdff3be",
    "itemId": "2d2020aa8e3deb383fb7c74f",
    "file_name": "business_2.txt",
    "file_type": "text/plain",
    "source": "https://sandboxdocuments.tensoract.com/presigned/3c9a446180a44faa43ab2464d45633c7.txt?sig=bc0af1e30e3bec94f7780fe1638f77b7d7080b5508545d891bc2a229d3e2c20e8179c3b4690e2ac7a19af4a978c363af312b570f2e89efa03ae41171e066c65d:670927f13e9c689722e1e85fe5649347:64b78055:aef5974f70f486095f3cd5f4cc922486",
    "state": 4,
    "task_id": "8c939483b594e0de5d5efb54",
    "state_description": "Approved",
    "annotations": [
        {
                    "email": "johndoe@me.com",
                    "messages": [],
                    "role": "Reviewer",
                    "elapsedTime": 8,
                    "date": "2023-07-18T06:03:46.657Z",
                    "content": {
                        "metadata": {
                            "File": "business_2.txt",
                            "TaskId": "8c939483b594e0de5d5efb54",
                            "Type of Project": "Classification"
                        },
                        "classificationTypes": {
                            "Select Type of Document": "select",
                            "Type of Documents": "multi",
                            "Put a note": "text"
                        },
                        "classifications": {
                            "Select Type of Document": [
                                "Technology"
                            ],
                            "Type of Documents": [
                                "Graphics",
                                "Bussiness"
                            ],
                            "Put a note": [
                                "Multi-type document"
                            ]
                        },
                        "plainText": {
                            "1": "Japanese growth grinds to a halt  Growth in Japan evaporated in the three months to September, sparking renewed concern about an economy not long out of a decade-long trough.  Output in the period grew just 0.1%, an annual rate of 0.3%. Exports - the usual engine of recovery - faltered, while domestic demand stayed subdued and corporate investment also fell short. The growth falls well short of expectations, but does mark a sixth straight quarter of expansion.  The economy had stagnated throughout the 1990s, experiencing only brief spurts of expansion amid long periods in the doldrums. One result was deflation - prices falling rather than rising - which made Japanese shoppers cautious and kept them from spending.  The effect was to leave the economy more dependent than ever on exports for its recent recovery. But high oil prices have knocked 0.2% off the growth rate, while the falling dollar means products shipped to the US are becoming relatively more expensive.  The performance for the third quarter marks a sharp downturn from earlier in the year. The first quarter showed annual growth of 6.3%, with the second showing 1.1%, and economists had been predicting as much as 2% this time around. \"Exports slowed while capital spending became weaker,\" said Hiromichi Shirakawa, chief economist at UBS Securities in Tokyo. \"Personal consumption looks good, but it was mainly due to temporary factors such as the Olympics. \"The amber light is flashing.\" The government may now find it more difficult to raise taxes, a policy it will have to implement when the economy picks up to help deal with Japan's massive public debt. "
                        },
                        "review": {
                            "rate": "Ok",
                            "note": "",
                            "reviewerId": "614b55be8af65dcf41da535b"
                        },
                        "jobStart": 1689660217,
                        "sessionTime": 8,
                        "elapsedTime": 8,
                        "updateTime": 1689660225,
                        "pageOffsets": [
                            0
                        ],
                        "lastUpdate": 1689660226654
                    }
                }
            ]
        }
table:Classification Project Manifest Summary:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetI

str

The Id of the dataset

itemId

str

The Id of the dataset item

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.

metadata

dict

Dictionary containing metadata of the task and the project

File

str

The name of the file

Task_id

str

The Id of the task

Type of Project

str

The metadata added in advanced setting of project

classificationTypes

dict

Dictionary containing the labels defined in the project

Select Type of Document

str

Single Select Label

Type of Documents

str

Multi Select Label

Put a note

str

Plain Text Label

classifications

dict

Dictionary containing the classifications labels in the task

plaintext

dict

Dictionary containing page numbers and the corresponding plain text extracted from the file

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

reviewerId

str

The Id of the reviewer

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

lastUpdate

int

The last update time

4. Bulk Image Classification Project

{
    "project_id": "e3c9b4a1dd6df1c4c7091895",
    "project_name": "Bulk Image Classification DB",
    "project_type": "Bulk Image Classification",
    "datasetId": "25987f46e5febb50484e8497",
    "itemId": "1431d151a693edeb3baade14",
    "file_name": "green.tiff",
    "file_type": "image/tiff",
    "source": "https://sandboxdocuments.tensoract.com/presigned/6cc95fbb44dccfacbacc923fbd24091e.tiff?sig=7748be40ae97b0c4559e0c9de0016e925d5e49a89bc1415ae03370f986b067ac42fc06ba2b89848f1245f5aacbf37c44a9d69ea7ff4d952e6d1ebb2cc7bff2e8:188babf42f5690cdb9fee99f58ee209f:64b8db5e:73a9d14256d5247bb72703427484eab4",
    "state": 4,
    "task_id": "b5ac251d8209079a300868f1",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 4.333333333333333,
            "date": "2023-07-18T09:08:27.935Z",
            "content": {
                "redOffset": 1,
                "greenOffset": 1,
                "brightness": 1,
                "selected": false,
                "classification": "Green",
                "review": {
                    "rate": "Ok"
                },
                "elapsedTime": 4.333333333333333,
                "updateTime": 1689671308,
                "lastUpdate": 1689671307935,
                "metadata": {
                    "color": "green"
                }
            }
        }
    ],
    "dataset_id": "25987f46e5febb50484e8497",
    "item_id": "1431d151a693edeb3baade14",
    "item_metadata": {
        "color": "green"
    },
    "project_metada": {
        "Type of Project": "Bulk"
    },
    "classification": "Green"
}
table:Bulk Image Classification Project Manifest Summary:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset item

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

redOffset

int

An integer value representing the offset or adjustment applied to the red color channel.

greenOffset

int

An integer value representing the offset or adjustment applied to the red color channel.

brightness

int

An integer value representing the overall brightness adjustment applied to the image.

classification

str

Clasified Label

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

lastUpdate

int

The last update time

metadata

dict

The metadta of the task

dataset_id

str

The Id of the dataset

item_id

str

The Id of the dataset item

item_metadata

dict

The metadata of the dataset item

project_metada

dict

The metadata of project

classification

str

Clasified Label

5. Object Detection Project

{
        "project_id": "0461637f62c18082f3c14cc3",
        "project_name": "Object-Detection-Project-2",
        "project_type": "Pose Estimation",
        "datasetId": "abd204685e5c074b282d6744",
        "itemId": "ab22598223c9ad06a7cb7fbc",
        "file_name": "im05.jpg",
        "file_type": "image/jpeg",
        "source": "https://sandboxdocuments.tensoract.com/presigned/eec676782f87ae20ff1ce9d282043b55.jpg?sig=4bb851b3ea1bee96136f64ec7a0a23aa514068356434dfddb3b9dbbdefb3568d8b6d246a5de7112e3afe8c261ab8647bff8db558f52b5502df86a0984f4d666b:d9838174235badf2412366a794eafbe4:64b7dd9e:54f78a7b1d2c36e89b2f8a43898db7fd",
        "state": 4,
        "task_id": "528cb44ce376c753590c3b07",
        "state_description": "Approved",
        "annotations": [
            {
                "email": "jdoeqa@acme.org",
                "messages": [],
                "role": "Reviewer",
                "elapsedTime": 29,
                "date": "2023-06-17T10:08:34.324Z",
                "content": {
                    "url": "https://sandboxdocuments.tensoract.com/presigned/eec676782f87ae20ff1ce9d282043b55.jpeg?sig=a4ecbc5cdcc4745f78f23897c8179a4f0fcd2398a1bf7b0aaf8cd40df2c6a44781ad2222a108af1865c87c7f0027f6d930646c83714f22f94f1f1d4b479e59b6:17fd9429b9bd1ee881c403895971765e:648ed781:84eaba2d2ce4610c8ec1989f2a3ef0fa",
                    "imageWidth": 720,
                    "imageHeight": 1280,
                    "selected": null,
                    "boxes": [
                        {
                            "x1": 97.0271,
                            "y1": 242.4398,
                            "x2": 715.4532,
                            "y2": 1280,
                            "id": "b0",
                            "type": "box",
                            "oid": "b24",
                            "outside_image": {},
                            "occluded": {},
                            "invisible": false,
                            "attrs": {},
                            "title": "",
                            "label": "Group 1",
                            "sub_labels": [
                                {
                                    "x1": 329.1937,
                                    "y1": 289.695,
                                    "id": "b1",
                                    "type": "keypoint",
                                    "oid": "b27",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Top of head",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 360.0123,
                                    "y1": 357.496,
                                    "id": "b2",
                                    "type": "keypoint",
                                    "oid": "b28",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Nose",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 345.6303,
                                    "y1": 421.1878,
                                    "id": "b3",
                                    "type": "keypoint",
                                    "oid": "b29",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Chin",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 300.4297,
                                    "y1": 421.1878,
                                    "id": "b4",
                                    "type": "keypoint",
                                    "oid": "b30",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Neck",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 454.5226,
                                    "y1": 454.061,
                                    "id": "b5",
                                    "type": "keypoint",
                                    "oid": "b31",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Left Shoulder",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 191.5374,
                                    "y1": 521.862,
                                    "id": "b6",
                                    "type": "keypoint",
                                    "oid": "b32",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Shoulder",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 575.7423,
                                    "y1": 509.5345,
                                    "id": "b7",
                                    "type": "keypoint",
                                    "oid": "b33",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Left Elbow",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 197.7011,
                                    "y1": 597.8812,
                                    "id": "b8",
                                    "type": "keypoint",
                                    "oid": "b34",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Elbow",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 296.3206,
                                    "y1": 667.7368,
                                    "id": "b9",
                                    "type": "keypoint",
                                    "oid": "b36",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Wrist",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 339.4666,
                                    "y1": 673.9005,
                                    "id": "b10",
                                    "type": "keypoint",
                                    "oid": "b37",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Hand",
                                    "sub_labels": []
                                }
                            ]
                        }
                    ],
                    "image_attrs": {
                        "Is Image clear?": "Yes"
                    },
                    "review": {
                        "rate": "Ok",
                        "note": ""
                    },
                    "jobStart": 1686996484,
                    "sessionTime": 29,
                    "elapsedTime": 29,
                    "tsSeconds": true,
                    "updateTime": 1686996513,
                    "lastUpdate": 1686996514320,
                    "metadata": {}
                }
            }
        ]
    }
table:Object Detection Classification Project Manifest Summary:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset item

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

url

str

The presigned URL or S3 path of the task

imageWidth

int

The width of the image in pixels

imageHeight

int

The height of the image in pixels

boxes

list

List of bounding boxes drawn around objects in the image

x1

float

The x-coordinate of the top-left corner of the bounding box

y1

float

The y-coordinate of the top-left corner of the bounding box

x2

float

The x-coordinate of the bottom-right corner of the bounding box

y2

float

The x-coordinate of the bottom-right corner of the bounding box

id

str

The Id of bounding box

type

str

Flag to indicate wheter it is box/keypoint

outside_image

dict

Indicates whether the object extends beyond the boundaries of the image

occuluded

Indicates whether the object is occluded or partially hidden

attrs

dict

Represents any additional attributes or properties associated with the object

label

str

The label or category assigned to the object

sub_labels

list

Represents any sub-labels or sub-categories associated with the object

x1

float

The x-coordinate of the top-left corner of the bounding box

y1

float

The y-coordinate of the top-left corner of the bounding box

id

str

The Id of keypoint

type

str

Flag to indicate wheter it is box/keypoint

outside_image

dict

Indicates whether the object extends beyond the boundaries of the image

occuluded

dict

Indicates whether the object is occluded or partially hidden

label

str

The label or category assigned to keypoint

sub_labels

list

Represents any sub-labels associated with the object

image_attrs

dict

The image attributes associated with the task

review

dict

The review details

rate

str

The rate of the review

note

str

The note associated with the reviewer

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

update time

int

The update time of the annotation

metadata

dict

The metdata of task and project

6. Media Transcription Project

  1. Video Files

{
    "project_id": "c859cc0a92b7bd4d6d166707",
    "project_name": "Video-Project-3",
    "project_type": "Media Transcription",
    "datasetId": "1ac8fb72573008ce5626bbfb",
    "itemId": "0b9ff34daa6a2f6f95c59bb3",
    "file_name": "Video 1.mp4",
    "file_type": "video/mp4",
    "source": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=41de6d036aafbad36d075616c0d0b8d56a460c26f769a5128a57f411d3a47c0562f33c3ae0ec77e1adb8f6c51c46a6b11863d9b0bcfb9a8ec48b9c02a6f1d220:5f547425b01d3a41ecc069ae0dc15acc:64b7ddb9:f0dbc85b6ebaf920717d096daa954cc2",
    "state": 4,
    "task_id": "c53ca1cace7d7e784705b631",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "jdoeqa@acme.org",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 62,
            "date": "2023-06-17T12:16:38.937Z",
            "content": {
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "63ca81cd31d698c1825328f3"
                },
                "videoSource": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=9f13caf4ac583fbc9c074b91770e579120196b3f08e192ad641290c246a7add58f6d6f40ce1aa63e031ada96ff8c996038e4ed46516ffce1f56262cc6a435eeb:062d59d190dc03e090dcd5ee5ff17faa:648ef563:ac1a510bd2fb6c0ab328a3202eb9c846",
                "streams": {
                    "Transcription": [
                        {
                            "start": 0.025000260441083017,
                            "end": 0.9500098967611547,
                            "confidence": 1,
                            "text": "Bonjoi"
                        },
                        {
                            "start": 0.9500098967611547,
                            "end": 2.0812716817201613,
                            "confidence": 1,
                            "text": "Tava Tuti"
                        },
                        {
                            "start": 2.1187720723817858,
                            "end": 3.156282880686731,
                            "confidence": 1,
                            "text": "Hello"
                        },
                        {
                            "start": 3.156282880686731,
                            "end": 4.13629310905087,
                            "confidence": 1,
                            "text": "Ola"
                        },
                        {
                            "start": 4.165043351337061,
                            "end": 4.608797974166285,
                            "confidence": 1,
                            "text": "Tutu beng"
                        },
                        {
                            "start": 10.453858750849383,
                            "end": 12.07887567951978,
                            "confidence": 1,
                            "text": "oye tutubeng"
                        }
                    ],
                    "Language Segmentation": [
                        {
                            "start": 0.018750195330812264,
                            "end": 0.5687559250346386,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 0.5937561854757216,
                            "end": 2.0937718119407025,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 2.0937718119407025,
                            "end": 3.1650329694568993,
                            "confidence": 1,
                            "tag": "English"
                        },
                        {
                            "start": 3.2437837922305217,
                            "end": 4.156293298330052,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 4.250044274984113,
                            "end": 6.318815826483732,
                            "confidence": 1,
                            "tag": "Russian"
                        },
                        {
                            "start": 6.8338213441595235,
                            "end": 8.190085473088276,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 8.321336840403962,
                            "end": 9.865102922640839,
                            "confidence": 1,
                            "tag": "English"
                        },
                        {
                            "start": 9.915103443523005,
                            "end": 11.071365488923096,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 11.233867181790135,
                            "end": 12.6088815060497,
                            "confidence": 1,
                            "tag": "Arabic"
                        }
                    ]
                },
                "mediaAttributes": {
                    "Is Video Clear?": "Yes",
                    "Aditional Notes": ""
                },
                "jobStart": 1687003443,
                "sessionTime": 62,
                "elapsedTime": 93.075,
                "tsSeconds": true,
                "updateTime": 1687004194,
                "metadata": {
                    "File": "Video 1.mp4",
                    "TaskId": "c53ca1cace7d7e784705b631",
                    "Type": "Media Transcription"
                },
                "lastUpdate": 1687004198934
            }
        }
    ]
}
table:Media Transcription Project Manifest Summary:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset tem

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

review

dict

The review details

rate

str

The rate of the review

reviewerId

int

The Id of reviewer

videoSource

str

The presigned URL or S3 path of the task

streams

dict

Dictionary of different streams within the video, each containing specific information

Transcription

list

The stream containing transcribed text from the video

start

float

The starting timestamp(in seconds) of the transcribed text segment

end

float

The ending timestamp(in seconds) of the transcribed text segment

confidence

int

Indicates the confidence level

text

str

The actual transcribed text for the corresponding segment

Segmentation

list

The stream containing information about the segemtations in the video

start

float

The starting timestamp of the segment

end

float

The ending timestamp of the segment

confidence

int

Indicates the confidence level

tag

str

The tag in the corresponding segment

mediaAttributes

dict

The media attributes associated with the task

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

tsseconds

bool

update time

int

The update time of the annotation

lastUpdate

int

The last update time

  1. Audio Files

{
    "project_id": "c859cc0a92b7bd4d6d166707",
    "project_name": "Video-Project-3",
    "project_type": "Media Transcription",
    "datasetId": "edd9c7d7ae5c1b0c2bc73643",
    "itemId": "1c944e058058dbe48c980ead",
    "file_name": "mira640.mp3",
    "file_type": "audio/mpeg",
    "source": "s3://test-pocs/mira640.mp3",
    "state": 4,
    "task_id": "eba777874d6d268ece56b33a",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "johndoe@me.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 6,
            "date": "2023-07-19T04:01:20.036Z",
            "content": {
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "614b55be8af65dcf41da535b"
                },
                "audioSource": "https://test-pocs.s3.amazonaws.com/mira640.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUD4REC47DTY4PF7A%2F20230719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230719T040111Z&X-Amz-Expires=7200&X-Amz-Signature=10048222e39f53cd9dbbb2b97b8aa210e323d8fc544a9ea18ff46e5922974f44&X-Amz-SignedHeaders=host",
                "streams": {
                    "Transcription": [
                        {
                            "start": 0.018749917353218702,
                            "end": 0.6312472175583629,
                            "confidence": 1,
                            "text": "Bonjoi"
                        },
                        {
                            "start": 0.7187468318733835,
                            "end": 1.881241707772943,
                            "confidence": 1,
                            "text": "Tava Tuti"
                        },
                        {
                            "start": 2.0124911292454737,
                            "end": 2.4874890355270143,
                            "confidence": 1,
                            "text": "Hello"
                        }
                    ],
                    "Language Segmentation": [
                        {
                            "start": 0.006249972451072901,
                            "end": 0.4437480440261759,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 0.5437476032433424,
                            "end": 1.8687417628707974,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 1.9187415424793806,
                            "end": 2.6687382366081285,
                            "confidence": 1,
                            "tag": "English"
                        }
                    ]
                },
                "mediaAttributes": {
                    "Is Video Clear?": "Yes",
                    "Aditional Notes": ""
                },
                "jobStart": 1689739272,
                "sessionTime": 6,
                "elapsedTime": 6,
                "tsSeconds": true,
                "updateTime": 1689739278,
                "lastUpdate": 1689739280033
            }
        }
    ]
}
table:Media Transcription Project Manifest Summary:

Field Names

Type

Description

project_id

str

The Id of the project

project_name

str

The name of the project

project_type

str

The type of the project

datasetId

str

The Id of the dataset

itemId

str

The Id of the dataset tem

file_name

str

The name of the file

file_type

str

The type of the file

source

str

Internal source file reference on local storage disk

state

int

The state of the task

task_id

str

The Id of the task

state_description

str

The state description of the task

annotations

list

List of dictionaries representing the annotations

email

str

The email associated with the user

messages

str

The messages associated with the user

role

str

The role associated with user

elapsed_time

str

The elapsed time of the annotation

date

str

The date of the annotation

content

dict

Dictionary containing the PDF fingerprint and metadata

review

dict

The review details

rate

str

The rate of the review

reviewerId

int

The Id of reviewer

audioSource

str

The presigned URL or S3 path of the task

streams

dict

Dictionary of different streams within the video, each containing specific information

Transcription

list

The stream containing transcribed text from the video

start

float

The starting timestamp(in seconds) of the transcribed text segment

end

float

The ending timestamp(in seconds) of the transcribed text segment

confidence

int

Indicates the confidence level

text

str

The actual transcribed text for the corresponding segment

Segmentation

list

The stream containing information about the segemtations in the video

start

float

The starting timestamp of the segment

end

float

The ending timestamp of the segment

confidence

int

Indicates the confidence level

tag

str

The tag in the corresponding segment

mediaAttributes

dict

The media attributes associated with the task

jobstart

int

The start time of the annotation

sessiontime

int

The session time of the annotation

elapsedTime

int

The elapsed time of the annotation

tsseconds

bool

update time

int

The update time of the annotation

lastUpdate

int

The last update time

Model Integration: Request Payloads and Responses

1. NER Labeling

A NER (Named Entity Recognition) labeling model is a model designed to automatically identify and classify named entities (such as names of people, organizations, locations, etc.) in text.

Request payload

{
    "text": {

                "1": "This is a text by Michael Smith",
                "2": "A paper from Oxford University"
            }
}
table:NER Labeling Request Payload:

Field Names

Type

Description

text

dict

Dictionary containing text. Each key represents a page number, and the corresponding value is a text.

1

str

Page numbers accompanied by corresponding text.

2

str

Page numbers accompanied by corresponding text.

Response

{
    "entities": {
        "1": [
            {
                "type": "Person Entity",
                "text": "Michael Smith",
                "range": [
                    18,
                    31
                ]
            }
        ],
        "2": [
            {
                "type": "Organization Entity",
                "text": "Oxford University",
                "range": [
                    13,
                    30
                ]
            }
        ]
    }
}
table:NER Labeling Response:

Field Names

Type

Description

entities

Dict

A dictionary containing entity annotations

1

Str

Page numbers accompanied by corresponding text.

type

Str

The type of the label

text

Str

The selected text for labeling

range

list

Selected text box start offset and end offset using plaintext

2

Str

Page numbers accompanied by corresponding text.

type

Str

The type of the label

text

Str

The selected text for labeling

range

list

Selected text box start offset and end offset using plaintext

2.OCR(Tesseract) Model

An OCR (Optical Character Recognition) model is employed for extracting text from images, such as scanned documents. It processes visual content to recognize characters and convert them into editable and searchable text.

Request payload

{
"source": "https://sandbox.tensoract.com/testfiles/test_text.png"
}
table:OCR Model Request Payload:

Field Names

Type

Description

source

str

The URL pointing to the source image (e.g.scanned document) for OCR extraction

table:OCR Model Response:

{
        "pages": [
            {
                "page": 1,
                "dimentions": {
                    "width": 2484,
                    "height": 3509
                },
                "words": [
                    {
                        "box": [
                            0.12198067632850242,
                            0.09204901681390709,
                            0.20128824476650564,
                            0.11456255343402678
                        ],
                        "text": "This"
                    },
                    {
                        "box": [
                            0.2177938808373591,
                            0.09204901681390709,
                            0.24476650563607086,
                            0.11456255343402678
                        ],
                        "text": "is"
                    },
                    {
                        "box": [
                            0.26006441223832527,
                            0.09803362781419207,
                            0.2805958132045089,
                            0.11456255343402678
                        ],
                        "text": "a"
                    },
                    {
                        "box": [
                            0.29589371980676327,
                            0.09290396124251923,
                            0.3647342995169082,
                            0.11456255343402678
                        ],
                        "text": "test"
                    },
                    {
                        "box": [
                            0.3788244766505636,
                            0.09204901681390709,
                            0.539049919484702,
                            0.11456255343402678
                        ],
                        "text": "scanned"
                    },
                    {
                        "box": [
                            0.5559581320450886,
                            0.09204901681390709,
                            0.7455716586151369,
                            0.11456255343402678
                        ],
                        "text": "document"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.14990025648332858,
                            0.17149758454106281,
                            0.15958962667426618
                        ],
                        "text": "Lorem"
                    },
                    {
                        "box": [
                            0.17914653784219,
                            0.14990025648332858,
                            0.22584541062801933,
                            0.16186947848389854
                        ],
                        "text": "ipsum"
                    },
                    {
                        "box": [
                            0.23309178743961353,
                            0.14990025648332858,
                            0.27375201288244766,
                            0.15958962667426618
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.27938808373590984,
                            0.14990025648332858,
                            0.2966988727858293,
                            0.15958962667426618
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.3027375201288245,
                            0.15018523795953262,
                            0.3466183574879227,
                            0.1610145340552864
                        ],
                        "text": "amet,"
                    },
                    {
                        "box": [
                            0.3538647342995169,
                            0.15018523795953262,
                            0.44887278582930756,
                            0.15958962667426618
                        ],
                        "text": "consectetur"
                    },
                    {
                        "box": [
                            0.45450885668276975,
                            0.14990025648332858,
                            0.534219001610306,
                            0.16215445996010258
                        ],
                        "text": "adipiscing"
                    },
                    {
                        "box": [
                            0.5414653784219001,
                            0.14990025648332858,
                            0.5680354267310789,
                            0.1610145340552864
                        ],
                        "text": "elit,"
                    },
                    {
                        "box": [
                            0.5752818035426731,
                            0.14990025648332858,
                            0.6030595813204509,
                            0.15958962667426618
                        ],
                        "text": "sed"
                    },
                    {
                        "box": [
                            0.6103059581320451,
                            0.14990025648332858,
                            0.6292270531400966,
                            0.15958962667426618
                        ],
                        "text": "do"
                    },
                    {
                        "box": [
                            0.6356682769726248,
                            0.14990025648332858,
                            0.7033011272141707,
                            0.15958962667426618
                        ],
                        "text": "eiusmod"
                    },
                    {
                        "box": [
                            0.7101449275362319,
                            0.15018523795953262,
                            0.7677133655394525,
                            0.16186947848389854
                        ],
                        "text": "tempor"
                    },
                    {
                        "box": [
                            0.7737520128824477,
                            0.14990025648332858,
                            0.8498389694041868,
                            0.15958962667426618
                        ],
                        "text": "incididunt"
                    },
                    {
                        "box": [
                            0.856682769726248,
                            0.15018523795953262,
                            0.8703703703703703,
                            0.15958962667426618
                        ],
                        "text": "ut"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.1669991450555714,
                            0.1710950080515298,
                            0.17668851524650897
                        ],
                        "text": "labore"
                    },
                    {
                        "box": [
                            0.177938808373591,
                            0.16728412653177543,
                            0.19202898550724637,
                            0.17668851524650897
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.19806763285024154,
                            0.1669991450555714,
                            0.24798711755233493,
                            0.17668851524650897
                        ],
                        "text": "dolore"
                    },
                    {
                        "box": [
                            0.25523349436392917,
                            0.1695639783414078,
                            0.30917874396135264,
                            0.1792533485323454
                        ],
                        "text": "magna"
                    },
                    {
                        "box": [
                            0.31602254428341386,
                            0.1669991450555714,
                            0.36835748792270534,
                            0.17896836705614136
                        ],
                        "text": "aliqua."
                    },
                    {
                        "box": [
                            0.37640901771336555,
                            0.1669991450555714,
                            0.4396135265700483,
                            0.17668851524650897
                        ],
                        "text": "Porttitor"
                    },
                    {
                        "box": [
                            0.44565217391304346,
                            0.1669991450555714,
                            0.5092592592592593,
                            0.17668851524650897
                        ],
                        "text": "rhoncus"
                    },
                    {
                        "box": [
                            0.5161030595813204,
                            0.1669991450555714,
                            0.5563607085346216,
                            0.17668851524650897
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.5623993558776168,
                            0.1695639783414078,
                            0.606682769726248,
                            0.17896836705614136
                        ],
                        "text": "purus"
                    },
                    {
                        "box": [
                            0.6139291465378421,
                            0.1695639783414078,
                            0.6421095008051529,
                            0.17668851524650897
                        ],
                        "text": "non"
                    },
                    {
                        "box": [
                            0.6493558776167472,
                            0.1669991450555714,
                            0.6920289855072463,
                            0.17668851524650897
                        ],
                        "text": "enim."
                    },
                    {
                        "box": [
                            0.7000805152979066,
                            0.1669991450555714,
                            0.7801932367149759,
                            0.17668851524650897
                        ],
                        "text": "Habitasse"
                    },
                    {
                        "box": [
                            0.7870370370370371,
                            0.1669991450555714,
                            0.8349436392914654,
                            0.17896836705614136
                        ],
                        "text": "platea"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.18438301510401825,
                            0.1888083735909823,
                            0.19407238529495582
                        ],
                        "text": "dictumst"
                    },
                    {
                        "box": [
                            0.19524959742351047,
                            0.18438301510401825,
                            0.2584541062801932,
                            0.1963522371045882
                        ],
                        "text": "quisque"
                    },
                    {
                        "box": [
                            0.2648953301127214,
                            0.18438301510401825,
                            0.32085346215780997,
                            0.19663721858079225
                        ],
                        "text": "sagittis"
                    },
                    {
                        "box": [
                            0.3276972624798712,
                            0.18694784838985465,
                            0.3719806763285024,
                            0.1963522371045882
                        ],
                        "text": "purus"
                    },
                    {
                        "box": [
                            0.3784219001610306,
                            0.18438301510401825,
                            0.3961352657004831,
                            0.19407238529495582
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.40217391304347827,
                            0.1846679965802223,
                            0.4420289855072464,
                            0.19407238529495582
                        ],
                        "text": "amet"
                    },
                    {
                        "box": [
                            0.44806763285024154,
                            0.18438301510401825,
                            0.5116747181964574,
                            0.1963522371045882
                        ],
                        "text": "volutpat"
                    },
                    {
                        "box": [
                            0.5181159420289855,
                            0.1846679965802223,
                            0.605877616747182,
                            0.1963522371045882
                        ],
                        "text": "consequat."
                    },
                    {
                        "box": [
                            0.6139291465378421,
                            0.18438301510401825,
                            0.6501610305958132,
                            0.19663721858079225
                        ],
                        "text": "Eget"
                    },
                    {
                        "box": [
                            0.6561996779388084,
                            0.1846679965802223,
                            0.6799516908212561,
                            0.19407238529495582
                        ],
                        "text": "est"
                    },
                    {
                        "box": [
                            0.6863929146537843,
                            0.18438301510401825,
                            0.7302737520128825,
                            0.19407238529495582
                        ],
                        "text": "lorem"
                    },
                    {
                        "box": [
                            0.7379227053140096,
                            0.18438301510401825,
                            0.784621578099839,
                            0.1963522371045882
                        ],
                        "text": "ipsum"
                    },
                    {
                        "box": [
                            0.7914653784219001,
                            0.18438301510401825,
                            0.8321256038647343,
                            0.19407238529495582
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.8377616747181964,
                            0.18438301510401825,
                            0.855072463768116,
                            0.19407238529495582
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.20205186662866914,
                            0.16143317230273752,
                            0.21145625534340268
                        ],
                        "text": "amet"
                    },
                    {
                        "box": [
                            0.16747181964573268,
                            0.20205186662866914,
                            0.26247987117552335,
                            0.21145625534340268
                        ],
                        "text": "consectetur"
                    },
                    {
                        "box": [
                            0.26811594202898553,
                            0.2017668851524651,
                            0.3526570048309179,
                            0.2140210886292391
                        ],
                        "text": "adipiscing."
                    },
                    {
                        "box": [
                            0.3603059581320451,
                            0.2017668851524651,
                            0.4355877616747182,
                            0.21145625534340268
                        ],
                        "text": "Senectus"
                    },
                    {
                        "box": [
                            0.4420289855072464,
                            0.20205186662866914,
                            0.45652173913043476,
                            0.21145625534340268
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.46296296296296297,
                            0.20205186662866914,
                            0.5060386473429952,
                            0.21145625534340268
                        ],
                        "text": "netus"
                    },
                    {
                        "box": [
                            0.5128824476650563,
                            0.20205186662866914,
                            0.5273752012882448,
                            0.21145625534340268
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.533816425120773,
                            0.2017668851524651,
                            0.6219806763285024,
                            0.21145625534340268
                        ],
                        "text": "malesuada"
                    },
                    {
                        "box": [
                            0.6280193236714976,
                            0.2017668851524651,
                            0.677536231884058,
                            0.21145625534340268
                        ],
                        "text": "fames"
                    },
                    {
                        "box": [
                            0.6839774557165862,
                            0.2043317184383015,
                            0.7024959742351047,
                            0.21145625534340268
                        ],
                        "text": "ac"
                    },
                    {
                        "box": [
                            0.7081320450885669,
                            0.2017668851524651,
                            0.7564412238325282,
                            0.21373610715303507
                        ],
                        "text": "turpis."
                    },
                    {
                        "box": [
                            0.7640901771336553,
                            0.2017668851524651,
                            0.8268921095008052,
                            0.21145625534340268
                        ],
                        "text": "Gravida"
                    },
                    {
                        "box": [
                            0.8337359098228664,
                            0.2043317184383015,
                            0.8663446054750402,
                            0.21145625534340268
                        ],
                        "text": "cum"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.2188657737247079,
                            0.16586151368760063,
                            0.2285551439156455
                        ],
                        "text": "sociis"
                    },
                    {
                        "box": [
                            0.17310789049919484,
                            0.21915075520091193,
                            0.23792270531400966,
                            0.23083499572527785
                        ],
                        "text": "natoque"
                    },
                    {
                        "box": [
                            0.24476650563607086,
                            0.2188657737247079,
                            0.32286634460547503,
                            0.23083499572527785
                        ],
                        "text": "penatibus"
                    },
                    {
                        "box": [
                            0.32930756843800324,
                            0.21915075520091193,
                            0.34782608695652173,
                            0.2285551439156455
                        ],
                        "text": "et."
                    },
                    {
                        "box": [
                            0.35426731078904994,
                            0.2188657737247079,
                            0.41626409017713367,
                            0.2285551439156455
                        ],
                        "text": "Aenean"
                    },
                    {
                        "box": [
                            0.4243156199677939,
                            0.2188657737247079,
                            0.49074074074074076,
                            0.23083499572527785
                        ],
                        "text": "pharetra"
                    },
                    {
                        "box": [
                            0.49798711755233493,
                            0.22143060701054432,
                            0.5523349436392915,
                            0.2311199772014819
                        ],
                        "text": "magna"
                    },
                    {
                        "box": [
                            0.5591787439613527,
                            0.22143060701054432,
                            0.5772946859903382,
                            0.2285551439156455
                        ],
                        "text": "ac"
                    },
                    {
                        "box": [
                            0.5841384863123994,
                            0.2188657737247079,
                            0.6481481481481481,
                            0.23083499572527785
                        ],
                        "text": "placerat"
                    },
                    {
                        "box": [
                            0.6541867954911433,
                            0.2188657737247079,
                            0.7451690821256038,
                            0.2285551439156455
                        ],
                        "text": "vestibulum."
                    },
                    {
                        "box": [
                            0.7536231884057971,
                            0.2188657737247079,
                            0.8132045088566827,
                            0.2311199772014819
                        ],
                        "text": "Feugiat"
                    },
                    {
                        "box": [
                            0.8192431561996779,
                            0.2188657737247079,
                            0.8470209339774557,
                            0.2285551439156455
                        ],
                        "text": "sed"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.23624964377315474,
                            0.1678743961352657,
                            0.24593901396409235
                        ],
                        "text": "lectus"
                    },
                    {
                        "box": [
                            0.17431561996779388,
                            0.23624964377315474,
                            0.2608695652173913,
                            0.24593901396409235
                        ],
                        "text": "vestibulum"
                    },
                    {
                        "box": [
                            0.26851851851851855,
                            0.23624964377315474,
                            0.31561996779388085,
                            0.24593901396409235
                        ],
                        "text": "mattis"
                    },
                    {
                        "box": [
                            0.322463768115942,
                            0.23624964377315474,
                            0.42028985507246375,
                            0.2482188657737247
                        ],
                        "text": "ullamcorper."
                    }
                ]
            }
        ]
    }
table:OCR Model Response Payload:

Field Names

Type

Description

pages

list

A list containing page objects, each representing a page in the scanned document.

page

int

The page number within the document.

dimentions

dict

A dictionary containing the dimensions (width and height) of the page in pixels.

width

int

The width of the page in pixels.

height

int

The height of the page in pixels.

words

list

A list containing word objects, each representing a word found in the page.

box

list[float]

List of bounding box coordinates for OCRed words

text

str

The text content of the word.