Input and Output Data¶

Input Data¶

The input data for uploading items to the dataset includes manifest,csv and txt files.

These files are utilized within the dataset during the uploading process, specifically when the Select Manifest/CSV/TXT option is selected.

For a detailed view of the dataset types and the compatible file types, please refer to this section Upload files to Dataset.

During dataset-item upload, it is essential to adhere to a specific format for the manifest file to ensure successful processing.Additionally,metadata can be included in both manifest and csv files.

Use an input manifest file

The following is an example of a manifest file for files stored in an Amazon S3 bucket:

{"Sr_No":11,"source":"s3://EXAMPLE-BUCKET/example1.tiff","presigned":"http://abc.com"}
{"Sr_No":12,"source":"s3://EXAMPLE-BUCKET/example2.pdf","presigned":"http://abd.com"}

Use an input CSV file

The following is an example of a CSV file for files stored in an Amazon S3 bucket:

id  File    Batch   Pages   Source
 tif          1      1       s3://objectways-ergo-poc/input_documents/MLK_F4BKRD3R00FI2OA.tif
 tif          2      2       s3://objectways-ergo-poc/input_documents/MLK_REZ_multipage.tif
 tif          3      1       s3://objectways-ergo-poc/input_documents/AmbA_J3Y61NDT0021050L2.tif

The following is an example of a CSV file for Text as source:

origin      text
wikipedia   ChatGPT[a] is an artificial intelligence (AI) chatbot developed by OpenAI and released in November 2022. It is built on top of OpenAI's GPT-3.5 and GPT-4 foundational large language models (LLMs) and has been fine-tuned (an approach to transfer learning) using both supervised and reinforcement learning techniques.
wikipedia   ChatGPT launched as a prototype on November 30, 2022, and garnered attention for its detailed responses and articulate answers across many domains of knowledge.[3] Its propensity, at times, to confidently provide factually incorrect responses, however, has been identified as a significant drawback.[4] In 2023, following the release of ChatGPT, OpenAI's valuation was estimated at US$29 billion.[5] The advent of the chatbot has increased competition within the space, motivating the creation of Google's Bard and Meta's LLaMA.
wikipedia   The original release of ChatGPT was based on GPT-3.5. A version based on GPT-4, the newest OpenAI model, was released on March 14, 2023, and is available for paid subscribers on a limited basis.
wikipedia   ChatGPT is a member of the generative pre-trained transformer (GPT) family of language models. It was fine-tuned over an improved version of OpenAI's GPT-3 known as "GPT-3.5".[6]
wikipedia   he fine-tuning process leveraged both supervised learning as well as reinforcement learning in a process called reinforcement learning from human feedback (RLHF).[7][8] Both approaches use human trainers to improve the model's performance. In the case of supervised learning, the model was provided with conversations in which the trainers played both sides: the user and the AI assistant. In the reinforcement learning step, human trainers first ranked responses that the model had created in a previous conversation.[9] These rankings were used to create "reward models" that were used to fine-tune the model further by using several iterations of Proximal Policy Optimization (PPO).[7][10]

Use an input Txt file

The following is an example of a txt file for files stored in an Amazon S3 bucket:

s3://EXAMPLE-BUCKET/example1.pdf
s3://EXAMPLE-BUCKET/example2.pdf

Output Data¶

Dataset exports¶

There are three export formats available for datasets:

Dataset Export: This format allows you to export the dataset in its original form, containing the raw data without any specific model run or ground truth annotations.
Dataset Export with Model Run: This format includes the dataset along with the results of a specific model run.It captures the model’s predictions, classifications, or other outputs generated by applying the trained model to the dataset.
Dataset Export with Ground Truth Project: This format exports the dataset along with the annotations created in a ground truth project. A ground truth project involves manual annotation or labeling of data by human annotators. The export includes both the original dataset and the annotations, providing valuable labeled data that can be used for training or validating machine learning models.

1. Scanned(OCR) Dataset¶

1. Dataset Export¶

{
     "source": "https://********/presigned/92d1f5a0b8667c883e797608571c8616.pdf?sig=62cc37fc5ec6d91864ae062e4da9f6ad81dda083b2207b0b12517f6d5d37a1be4400dd381a64d25ec5c4fda7c6a76103ca54f8434687b948b0a6175007fc82d3:6ee530eca4b69d17633726d4ad1220b2:64cddaf9:1243ecda3ebadc9a1ce6fa1fea8f3808",
     "name": "ABSTRACT - Axia.tiff",
     "itemId": "4611f8e6331908278b5160ca",
     "datasetId": "e3773b85655ea8646005158a",
     "type": "application/pdf",
     "tags": [
         "invoice tag"
     ],
     "metadata": {
         "ocr_model": "Textract (default)",
         "use-textract-only": true,
         "source_ref": "/uploads/e3773b85655ea8646005158a/4611f8e6331908278b5160ca",
         "document_id": "4611f8e6331908278b5160ca"
     },
     "active": true,
     "ext": "pdf"
}

table:Scanned(OCR) Dataset Export Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	The metadata associated with the dataset and dataset item
ocr_model	str	The OCR model used for processing
use_textract_only	bool	Indicates if only Textract is used for processing
source_ref	str	Reference to the source of dataset item
document_id	str	The Id of the document
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

2. With Model Run¶

{
    "source": "https://sandboxdocuments.tensoract.com/presigned/92d1f5a0b8667c883e797608571c8616.pdf?sig=7da49bf64299dc09cb3405ff33cb0444ed6d310208f13655ae773407d405b9d003db5d7f980679f85f0949151c4f9904c837f90eb434a5ba2bfbd2520f1d43b9:4887b3566f7f75b587ad0ea9ebe0e6dc:64cdc5b3:37f85a3533b89761e7282654214ba4bf",
    "name": "ABSTRACT - Axia.tiff",
    "itemId": "4611f8e6331908278b5160ca",
    "datasetId": "e3773b85655ea8646005158a",
    "type": "application/pdf",
    "tags": [],
    "metadata": {
        "ocr_model": "Textract (default)",
        "use-textract-only": true,
        "source_ref": "/uploads/e3773b85655ea8646005158a/4611f8e6331908278b5160ca",
        "document_id": "4611f8e6331908278b5160ca"
    },
    "active": true,
    "modelRuns": [
        {
            "modelRunId": "2023-08-04T03:44:10.5248871",
            "tags": [
                {
                    "type": "Organization Entity",
                    "text": "Women's Health",
                    "page": 1,
                    "boxes": [
                        [
                            0.861730694770813,
                            0.08322431892156601,
                            0.937999427318573,
                            0.09277199301868677
                        ],
                        [
                            0.06499018520116806,
                            0.11739349365234375,
                            0.07347860559821129,
                            0.12546881940215826
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "Womens Health",
                    "page": 1,
                    "boxes": [
                        [
                            0.0935770571231842,
                            0.11707708239555359,
                            0.11941905505955219,
                            0.1253887191414833
                        ],
                        [
                            0.12276646494865417,
                            0.11710146069526672,
                            0.17684946581721306,
                            0.1254600789397955
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "HP Main Line LLC",
                    "page": 1,
                    "boxes": [
                        [
                            0.5065937042236328,
                            0.11725140362977982,
                            0.5151590080931783,
                            0.1253813849762082
                        ],
                        [
                            0.5505043864250183,
                            0.11691059917211533,
                            0.7601913809776306,
                            0.1277432944625616
                        ],
                        [
                            0.5505043864250183,
                            0.1170012354850769,
                            0.6015823110938072,
                            0.1273022061213851
                        ],
                        [
                            0.6051791906356812,
                            0.11715823411941528,
                            0.6570645309984684,
                            0.12562930211424828
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Location Entity",
                    "text": "Laurel Road",
                    "page": 1,
                    "boxes": [
                        [
                            0.06458062678575516,
                            0.13079734146595,
                            0.0742951761931181,
                            0.1387380100786686
                        ],
                        [
                            0.09427980333566666,
                            0.13045595586299896,
                            0.19984688609838486,
                            0.1387722697108984
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Location Entity",
                    "text": "Bryn Mawr",
                    "page": 1,
                    "boxes": [
                        [
                            0.5502040982246399,
                            0.13049453496932983,
                            0.5714401658624411,
                            0.13857773877680302
                        ],
                        [
                            0.5759334564208984,
                            0.13051286339759827,
                            0.6116651147603989,
                            0.13872544467449188
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "Regional Womens Health Management",
                    "page": 1,
                    "boxes": [
                        [
                            0.6013859510421753,
                            0.14393498003482819,
                            0.6286550257354975,
                            0.15307497046887875
                        ],
                        [
                            0.6331984400749207,
                            0.14391060173511505,
                            0.6625380869954824,
                            0.1522916592657566
                        ],
                        [
                            0.6665995121002197,
                            0.14416542649269104,
                            0.6881919391453266,
                            0.1522371843457222
                        ],
                        [
                            0.06526166200637817,
                            0.15757058560848236,
                            0.07337938901036978,
                            0.16564789321273565
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "ABA",
                    "page": 1,
                    "boxes": [
                        [
                            0.5443362593650818,
                            0.23810118436813354,
                            0.574140515178442,
                            0.2483070008456707
                        ]
                    ],
                    "kv_type": "value"
                },
                {
                    "type": "Organization Entity",
                    "text": "ABA",
                    "page": 1,
                    "boxes": [
                        [
                            0.22924329340457916,
                            0.2950398325920105,
                            0.28950661048293114,
                            0.3033293457701802
                        ]
                    ],
                    "kv_type": "value"
                }
            ]
        }
    ],
    "ext": "pdf"
}

table:Scanned(OCR) Dataset Export With Model Run Summary:

Field Names	Type	Description
source	str	Thep resigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The Type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	Metadata associated with the dataset and dataset item
ocr_model	str	The OCR model used for processing
use_textract_only	bool	Indicates if only Textract is used for processing
source_ref	str	Reference to the source of dataset item
document_id	str	The Id of the document
active	bool	Indicates the dataset item is currently active
modelRuns	list	List of dictionaries containing details of predicted labels
modelRunId	str	The Id of the model run
tags	list	List of dictionaries containing the predicted labels
type	str	The type of the label
text	str	Selected Text for prediction
page	int	Page number associated with text
boxes	list	List of bounding box coordinates for OCRed words
kv_type	str	Flag to indicate whether tag is key or value (KEY/VAL)
ext(local file)	str	Extension of local files, if any

3. With GroundTruth Project¶

{
    "source": "https://**********/presigned/a01f5c95d843b4fd4f890570e5cac51c.pdf?sig=838fefa7e55ab214cfa71b70d36d19ee3a263b5c750f49d8ddb105d90f81b82668548ecec76dc79f3df8195c45a7e2702e543611f7f210e761755db7a6c1ea86:3c4f5271ef6c34813cb136a93ba8e7bd:64cdea6d:ae53c052c424a59ee74995c52cc94222",
    "name": "ABSTRACT - Axia.tiff",
    "itemId": "9cebea4c95edc877ca6f2603",
    "datasetId": "e3773b85655ea8646005158a",
    "type": "application/pdf",
    "tags": [
        "invoice tag"
    ],
    "metadata": {
        "ocr_model": "Textract (default)",
        "use-textract-only": true,
        "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
        "document_id": "9cebea4c95edc877ca6f2603"
    },
    "active": true,
    "project": "7b3020dd437ce2a30bae1c5a",
    "taskId": "0931952ce4a27f53a3678cfe",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 14,
            "date": "2023-08-04T06:21:00.589Z",
            "content": {
                "pdf_fingerprint": "c04f692d342c06d433f751ac32c6d8b1",
                "metadata": {
                    "File": "ABSTRACT - Axia.tiff",
                    "TaskId": "0931952ce4a27f53a3678cfe",
                    "ocr_model": "Textract (default)",
                    "use-textract-only": true,
                    "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
                    "document_id": "9cebea4c95edc877ca6f2603",
                    "Type of Project": "OCR"
                },
                "tags": [
                    {
                        "page": 1,
                        "text": "N A M E",
                        "id": 1,
                        "type": "Name",
                        "kv_type": "key",
                        "words": [
                            "N",
                            "A",
                            "M",
                            "E"
                        ],
                        "boxes": [
                            [
                                0.06499018520116806,
                                0.11739349365234375,
                                0.07347860559821129,
                                0.12546881940215826
                            ],
                            [
                                0.06458062678575516,
                                0.13079734146595,
                                0.0742951761931181,
                                0.1387380100786686
                            ],
                            [
                                0.06520503759384155,
                                0.14403623342514038,
                                0.07536023296415806,
                                0.15211013052612543
                            ],
                            [
                                0.06526166200637817,
                                0.15757058560848236,
                                0.07337938901036978,
                                0.16564789321273565
                            ]
                        ],
                        "range": [
                            [
                                71,
                                72
                            ],
                            [
                                126,
                                127
                            ],
                            [
                                165,
                                166
                            ],
                            [
                                194,
                                195
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Axia Women's Health",
                        "id": 2,
                        "type": "Name",
                        "textAdjust": "Axia Women's",
                        "kv_type": "value",
                        "words": [
                            "Axia",
                            "Women's",
                            "Health"
                        ],
                        "boxes": [
                            [
                                0.0935770571231842,
                                0.11707708239555359,
                                0.11941905505955219,
                                0.1253887191414833
                            ],
                            [
                                0.12276646494865417,
                                0.11710146069526672,
                                0.17684946581721306,
                                0.1254600789397955
                            ],
                            [
                                0.18119750916957855,
                                0.11732043325901031,
                                0.21823260188102722,
                                0.12542327493429184
                            ]
                        ],
                        "range": [
                            [
                                73,
                                77
                            ],
                            [
                                78,
                                85
                            ],
                            [
                                86,
                                92
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "BILL TO",
                        "id": 3,
                        "type": "Name",
                        "rawBox": true,
                        "kv_type": "key",
                        "words": [
                            "BILL TO"
                        ],
                        "boxes": [
                            [
                                0.4980276134122288,
                                0.10967250571210967,
                                0.5374753451676528,
                                0.1706016755521706
                            ]
                        ],
                        "range": []
                    },
                    {
                        "page": 1,
                        "text": "Regional Womens Health",
                        "id": 4,
                        "type": "Name",
                        "rotate": 24,
                        "rawBox": true,
                        "kv_type": "value",
                        "words": [
                            "Regional Womens Health"
                        ],
                        "boxes": [
                            [
                                0.5473372781065089,
                                0.11119573495811119,
                                0.7682445759368837,
                                0.12795125666412796
                            ]
                        ],
                        "range": []
                    },
                    {
                        "page": 1,
                        "text": "Cat.",
                        "id": 5,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Cat."
                        ],
                        "boxes": [
                            [
                                0.39583876729011536,
                                0.3084534704685211,
                                0.4190108198672533,
                                0.31684120278805494
                            ]
                        ],
                        "range": [
                            [
                                543,
                                547
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Cat.",
                        "id": 6,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Cat."
                        ],
                        "boxes": [
                            [
                                0.39583876729011536,
                                0.3084534704685211,
                                0.4190108198672533,
                                0.31684120278805494
                            ]
                        ],
                        "range": [
                            [
                                543,
                                547
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Description",
                        "id": 7,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Description"
                        ],
                        "boxes": [
                            [
                                0.4328092038631439,
                                0.3084268271923065,
                                0.49752890318632126,
                                0.3184952298179269
                            ]
                        ],
                        "range": [
                            [
                                548,
                                559
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Description",
                        "id": 8,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Description"
                        ],
                        "boxes": [
                            [
                                0.4328092038631439,
                                0.3084268271923065,
                                0.49752890318632126,
                                0.3184952298179269
                            ]
                        ],
                        "range": [
                            [
                                548,
                                559
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Effective",
                        "id": 9,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 0,
                            "cell": true
                        },
                        "words": [
                            "Effective"
                        ],
                        "boxes": [
                            [
                                0.6239141225814819,
                                0.2947663366794586,
                                0.6735980845987797,
                                0.30344805866479874
                            ]
                        ],
                        "range": [
                            [
                                476,
                                485
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Sqft.",
                        "id": 10,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Sqft."
                        ],
                        "boxes": [
                            [
                                0.5750880241394043,
                                0.30830204486846924,
                                0.6010445598512888,
                                0.3183623990043998
                            ]
                        ],
                        "range": [
                            [
                                560,
                                565
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Sqft.",
                        "id": 11,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Sqft."
                        ],
                        "boxes": [
                            [
                                0.5750880241394043,
                                0.30830204486846924,
                                0.6010445598512888,
                                0.3183623990043998
                            ]
                        ],
                        "range": [
                            [
                                560,
                                565
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "ABA",
                        "id": 12,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 0,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "ABA"
                        ],
                        "boxes": [
                            [
                                0.3953396677970886,
                                0.3291471600532532,
                                0.42196371778845787,
                                0.3373938351869583
                            ]
                        ],
                        "range": [
                            [
                                626,
                                629
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Date",
                        "id": 13,
                        "type": "Name",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 1,
                            "cell": true
                        },
                        "kv_type": "key",
                        "words": [
                            "Date"
                        ],
                        "boxes": [
                            [
                                0.6240901350975037,
                                0.3085164725780487,
                                0.6510729901492596,
                                0.31685456447303295
                            ]
                        ],
                        "range": [
                            [
                                566,
                                570
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Date",
                        "id": 14,
                        "type": "TABLEHEADER",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 1,
                            "cell": true
                        },
                        "words": [
                            "Date"
                        ],
                        "boxes": [
                            [
                                0.6240901350975037,
                                0.3085164725780487,
                                0.6510729901492596,
                                0.31685456447303295
                            ]
                        ],
                        "range": [
                            [
                                566,
                                570
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Rent Abatements/Cor",
                        "id": 15,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 1,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "Rent",
                            "Abatements/Cor"
                        ],
                        "boxes": [
                            [
                                0.4329037368297577,
                                0.3290809392929077,
                                0.4603371527045965,
                                0.3374354373663664
                            ],
                            [
                                0.46285462379455566,
                                0.32896438241004944,
                                0.5594801902770996,
                                0.3374544633552432
                            ]
                        ],
                        "range": [
                            [
                                630,
                                634
                            ],
                            [
                                635,
                                649
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "4,850",
                        "id": 16,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 2,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "4,850"
                        ],
                        "boxes": [
                            [
                                0.5759893655776978,
                                0.3291241228580475,
                                0.6087189093232155,
                                0.3381931884214282
                            ]
                        ],
                        "range": [
                            [
                                650,
                                655
                            ]
                        ]
                    },
                    {
                        "page": 1,
                        "text": "6/15/2021",
                        "id": 17,
                        "type": "TABLECELL",
                        "table": {
                            "id": 4,
                            "x": 3,
                            "y": 2,
                            "cell": true
                        },
                        "words": [
                            "6/15/2021"
                        ],
                        "boxes": [
                            [
                                0.6162644028663635,
                                0.32898813486099243,
                                0.6728598773479462,
                                0.3374910345301032
                            ]
                        ],
                        "range": [
                            [
                                656,
                                665
                            ]
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3355,
                    5983
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 1,
                        "id2": 2,
                        "relationship": "key-pair"
                    },
                    {
                        "page": 1,
                        "id1": 3,
                        "id2": 4,
                        "relationship": "key-pair"
                    }
                ],
                "attributes": {
                    "Is document damaged": "No"
                },
                "pageAttributes": [
                    {
                        "Is page damaged?": "No"
                    }
                ],
                "tables": [
                    {
                        "x": [
                            0.3953396677970886,
                            0.4273864608258009,
                            0.567284107208252,
                            0.6124916560947895,
                            0.6735980845987797
                        ],
                        "y": [
                            0.2947663366794586,
                            0.305875051766634,
                            0.32372980611398816,
                            0.3381931884214282
                        ],
                        "rows": 3,
                        "cols": 4,
                        "box": [
                            0.3953396677970886,
                            0.2947663366794586,
                            0.6735980845987797,
                            0.3381931884214282
                        ],
                        "id": 4,
                        "page": 1,
                        "description": "Table 1"
                    }
                ],
                "plainText": {
                    "1": "Lease Id: PR0001 - 000222 Lease Profile Master Occupant Id: 00000162-1 N Axia Women's Health B Regional Womens Health Managem A HP Main Line LLC I T 227 Laurel Road M L o Echelon One, Suite 300 E Bryn Mawr PA 19010 L Voorhees NJ 08043 Legal Name: Regional Womens Health Management Tenant Id: Contact Name: Jenni Witters Tenant Type Id: Phone No: SIC Group: Fax No: NAICS Code Lease Stop: No Suite Information Current Recurring Charges Building Id: PR0001 Execution: 3/15/2021 Effective Monthly Annual Amount Suite Id: 401 Beginning: 6/15/2021 Cat. Description Sqft. Date Amount Amount PSF Lease Id: 000222 Occupancy: 9/1/2021 ABA Rent Abatements/Cor 4,850 6/15/2021 -12,125.00 -145,500.00 -30.00 Leased Sqft: 4,850 Rent Start: 6/15/2021 ABA Rent Abatements/Cor 4,850 12/1/2021 0.00 0.00 0.00 Pro-Rata Share: 0.17 Expiration: 9/30/2028 ROF Base Rent Office 4,850 6/15/2021 12,125.00 145,500.00 30.00 Ann. Mkt. Rent PSF: 0.00 Vacate: TIC Tenant Improvement 4,850 11/1/2021 3,059.54 36,714.48 7.57 UTI Utility Reimbursement 4,850 6/15/2021 808.33 9,699.96 2.00 Occupancy Status: Current Rate Change Schedule Effective Monthly Annual Amount Cat. Description Sqft. Date Amount Amount PSF ABA Rent Abatements/Con 4,850 11/1/2021 -2,575.00 -30,900.00 -6.37 ROF Base Rent Office 4,850 7/1/2022 12,367.50 148,410.00 30.60 ROF Base Rent Office 4,850 7/1/2023 12,614.04 151,368.48 31.21 ROF Base Rent Office 4,850 7/1/2024 12,868.67 154,424.04 31.84 ROF Base Rent Office 4,850 7/1/2025 13,123.29 157,479.48 32.47 ROF Base Rent Office 4,850 7/1/2026 13,386.00 160,632.00 33.12 ROF Base Rent Office 4,850 7/1/2027 13,652.75 163,833.00 33.78 ROF Base Rent - Office 4,850 7/1/2028 13,927.58 167,130.96 34.46 Lease Notes Effective Date Ref 1 Ref 2 Note 3/15/2021 ALTERTN Article 8 of Lease Landlord's consent required for any alterations, other than cosmetic Alterations which do not cost more than $1,000 per alteration and which do not affect (i) the structural portions or roof of the Premises or the 3/15/2021 ASGNSUB Article 9 Landlord consent required for any assignment/sublease. Landlord has 30 days after receipt of notice from Tenant to either approve assignment/sublease, not approve assignment/sublease, recapture the Premises 3/15/2021 DEFAULT Article 18 of Lease 1. If Tenant does not make payment within 5 days after date due, provided that, Landlord shall not more than 1 time per 12 full calendar month period of the term, deliver written notice to Tenant with respect to 3/15/2021 ESTOPEL Article 17 of Lease Estoppel required to be provided within 10 days after request. In the form set forth in Exhibit D 3/15/2021 HOLDOVR Section 19 (b) of Lease Landlord may either (i) increase Rent to 200% of the highest monthly aggregate Fixed Rent and additional 3/15/2021 INS Article 11 - Landlord responsible for repairs to all plumbing and other fixtures, equipment and systems (including replacement, if necessary) in or serving the Premises. Landlord to provide janitorial services (Exhibit E) and pest control as needed. 3/15/2021 LATECHG Article 3 of Lease Tenant shall pay Landlord a service and handling charge equal to five percent (5%) of any Rent not paid within five (5) days after the date first due, which shall apply cumulatively each month with respect to Report Id WEBX_PROFILE Database HAVERFORD Reported by Joe Staugaard 1/7/2022 11:50 Page 1"
                },
                "dimensions": [
                    {
                        "width": 1275,
                        "height": 1650
                    },
                    {
                        "width": 1275,
                        "height": 1650
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1691128396,
                "sessionTime": 14,
                "elapsedTime": 86,
                "updateTime": 1691130059,
                "selectBoundingBox": true,
                "lastUpdate": 1691130060583
            }
        }
    ],
    "ext": "pdf"
}

table:Scanned(OCR) Dataset Export With GroundTruth Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
ocr_model	str	The OCR model used for processing
use_textract_only	bool	Indicates if only Textract is used for processing
source_ref	str	Reference to the source of dataset item
document_id	str	The Id of the document
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries containing details of annotations
email	str	The email associated with user
messages	str	The messages associated with the user
role	str	The role associated with the user
elapsed time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	The content of the annotation
pdf_fingerprint	str	The fingerprint of the document
metadata	str	The metadata associated with the task and project
File	str	The name of the file
TaskId	str	The Id of the task
ocr_model	str	The OCR model used for processing
use_textract_only	bool	Indicates if only Textract is used for processing
source_ref	str	Reference to the source of dataset item
document_id	str	The Id of the document
tags	list	List of dictionaries containing the annotated tags
pages	int	The page number of selected text
text	str	The selectd text for annotation
id	str	The Id of selected text for annotation
type	str	The type of the label
kv_type	str	Flag to indicate whether tag is key or value (KEY/VAL)
words	str	The words in the selected text
boxes	list	List of bounding box coordinates for OCRed words
range	list	List of selected text box start offset and end offset using plaintext
textAdjust	str	Modified OCRed text
rawbox	str	Flag to indicate if bounding box is created manually
rotate	str	The angle of bounding box rotation(degrees)
table	list	The table information
id	str	The Id of the table
x	int	The vertical grid coordinates
y	int	The horiziontal grid coordinates
cell	bool	Flag to indicate if the current object is a cell of the table
pageoffsets	list	The list of page offsets
links	list	The list of relationship
page	int	The page number associated with key and value field
id1	int	The Id of the key field
id2	int	The Id of the value field
relationship	str	The name of the relationship
attributes	dict	The document attributes associated with task
pageAttributes	list	List of dictionaries containing the attributes for each page
tables	list	List of dictionaries containing table information
x	int	The vertical grid coordinates
y	int	The horiziontal grid coordinates
rows	int	The number of rows in the table
cols	int	The number of columns in the table
box	list	List of bounding box coordinates for OCRed words
id	int	The Id of the table
page	int	The page number of the table
description	str	The title of the table
plaintext	str	Dictionary containing page numbers and the corresponding plain text extracted from the file
dimensions	list	The dimensions of the pages
width	float	The width of the page
height	float	The height of the page
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	str	The start time of the annotation
sessionTime	str	The session time of the annotation
elapsedTime	str	The elapsed time of the annotation
updateTime	str	The update time of the annotation
lastUpdate	str	The last update time
ext	str	The extension of the local file

2. PDF Dataset¶

1. Dataset Export¶

{
    "source": "s3://EXAMPLE-BUCKET/testna.pdf",
    "name": "testna.pdf",
    "itemId": "aec104ce48aa0eece0a94c1b",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "type": "application/pdf",
    "tags": [],
    "metadata": {
        "xxx": 14,
        "presigned": "http://aaa.com"
    },
    "active": true
}

table:PDF Dataset Export Summary:

source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

2. With GroundTruth Project¶

{
    "source": "https://sandboxdocuments.tensoract.com/presigned/33e268b66cb90138b84cc627a501afa2.pdf?sig=cc753891da92d55d769969ebf280f7aabaa8847de2ff31141c7b1869900a6c84f3b09f5fb2f4d32e27dc442f9f2841dfc94983f1e42df8569b849cb9153c866a:9ad06a28916bab71cf5140fedd06ae74:64b65760:d5e5b898249da98bf428147b361c0094",
    "name": "1810.04805.pdf",
    "itemId": "0ed98ab31666242a417504f9",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "type": "application/pdf",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "Dataset": "PDF"
    },
    "active": true,
    "project": "866ad732042bde9b94929cc3",
    "taskId": "d6aae2114d0947b1bfe5dcd3",
    "annotations": [
        {
            "email": "yannevarsha6@gmail.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 18,
            "date": "2023-07-17T09:11:08.530Z",
            "content": {
                "pdf_fingerprint": "dccb9bc542f22b2bdd94110918c68f96",
                "metadata": {
                    "File": "1810.04805.pdf",
                    "TaskId": "d6aae2114d0947b1bfe5dcd3",
                    "Type of Project": "NER"
                },
                "tags": [
                    {
                        "page": 1,
                        "text": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
                        "id": 1,
                        "type": "DATE",
                        "box": [
                            0.1957394553114858,
                            0.08355623157419612,
                            0.8080743211552288,
                            0.11953028994286674
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Jacob Devlin",
                        "id": 2,
                        "type": "PERSON",
                        "box": [
                            0.20464120844784606,
                            0.15506947083348188,
                            0.31506005550366556,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "text": "Ming-Wei Chang",
                        "id": 3,
                        "type": "PERSON",
                        "box": [
                            0.34016437686048157,
                            0.15506947083348188,
                            0.48781795335273054,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "text": "2018a",
                        "id": 4,
                        "type": "DATE",
                        "box": [
                            0.3736872717865327,
                            0.3484841506610129,
                            0.4145903056733348,
                            0.36031776312819985
                        ]
                    },
                    {
                        "page": 2,
                        "text": "(2018a)",
                        "id": 5,
                        "type": "DATE",
                        "box": [
                            0.3769863661562031,
                            0.3271071821734426,
                            0.4339806024432365,
                            0.3400650507786048
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3988,
                    8509,
                    12206,
                    17069,
                    20918,
                    25368,
                    29080,
                    33539,
                    37641,
                    42160,
                    46926,
                    50816,
                    54525,
                    58589,
                    60965,
                    64088
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 2,
                        "id2": 3,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "boxes": [
                    {
                        "page": 1,
                        "box": [
                            0.6285714285714286,
                            0.1505226480836237,
                            0.8216748768472907,
                            0.178397212543554
                        ],
                        "label": "Bounding_box"
                    },
                    {
                        "page": 2,
                        "box": [
                            0.10246305418719212,
                            0.3797909407665505,
                            0.49064039408866994,
                            0.4961672473867596
                        ],
                        "label": "Bounding_box",
                        "rotate": 22
                    }
                ],
                "plainText": {
                    "1": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova Google AI Language {jacobdevlin,mingweichang,kentonl,kristout}@google.com Abstract We introduce a new language representa- tion model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language repre- sentation models (Peters et al., 2018a; Rad- ford et al., 2018), BERT is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a re- sult, the pre-trained BERT model can be ﬁne- tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task- speciﬁc architecture modiﬁcations. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art re- sults on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answer- ing Test F1 to 93.2 (1.5 point absolute im- provement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). 1 Introduction Language model pre-training has been shown to be effective for improving many natural language processing tasks (Dai and Le, 2015; Peters et al., 2018a; Radford et al., 2018; Howard and Ruder, 2018). These include sentence-level tasks such as natural language inference (Bowman et al., 2015; Williams et al., 2018) and paraphrasing (Dolan and Brockett, 2005), which aim to predict the re- lationships between sentences by analyzing them holistically, as well as token-level tasks such as named entity recognition and question answering, wheremodels are required to produce ﬁne-grained output at the token level (Tjong Kim Sang and DeMeulder, 2003; Rajpurkar et al., 2016). There are two existing strategies for apply- ing pre-trained language representations to down- stream tasks: feature-based and ﬁne-tuning. The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-speciﬁc architectures that include the pre-trained representations as addi- tional features. The ﬁne-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-speciﬁc parameters, and is trained on the downstream tasks by simply ﬁne-tuning all pre- trained parameters. The two approaches share the same objective function during pre-training,where they use unidirectional language models to learn general language representations. We argue that current techniques restrict the power of the pre-trained representations, espe- cially for the ﬁne-tuning approaches. The ma- jor limitation is that standard language models are unidirectional, and this limits the choice of archi- tectures that can be used during pre-training. For example, inOpenAIGPT, the authors use a left-to- right architecture, where every token can only at- tend to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such re- strictions are sub-optimal for sentence-level tasks, and could be very harmful when applying ﬁne- tuning based approaches to token-level tasks such as question answering, where it is crucial to incor- porate context from both directions. In this paper, we improve the ﬁne-tuning based approaches by proposing BERT: Bidirectional Encoder Representations from Transformers. BERT alleviates the previously mentioned unidi- rectionality constraint by using a “masked lan- guage model” (MLM) pre-training objective, in- spired by the Cloze task (Taylor, 1953). The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked a r X i v : 1 8 1 0 . 0 4 8 0 5 v 2     [ c s . C L ]     2 4   M a y   2 0 1 9",
                    "2": "word based only on its context. Unlike left-to- right language model pre-training, the MLM ob- jective enables the representation to fuse the left and the right context, which allows us to pre- train a deep bidirectional Transformer. In addi- tion to the masked language model, we also use a “next sentence prediction” task that jointly pre- trains text-pair representations. The contributions of our paper are as follows: • We demonstrate the importance of bidirectional pre-training for language representations. Un- like Radford et al. (2018), which uses unidirec- tional language models for pre-training, BERT uses masked language models to enable pre- trained deep bidirectional representations. This is also in contrast to Peters et al. (2018a), which uses a shallow concatenation of independently trained left-to-right and right-to-left LMs. • We show that pre-trained representations reduce the need for many heavily-engineered task- speciﬁc architectures. BERT is the ﬁrst ﬁne- tuning based representationmodel that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outper- forming many task-speciﬁc architectures. • BERT advances the state of the art for eleven NLP tasks. The code and pre-trained mod- els are available at https://github.com/ google-research/bert. 2 RelatedWork There is a long history of pre-training general lan- guage representations, and we brieﬂy review the most widely-used approaches in this section. 2.1 Unsupervised Feature-based Approaches Learning widely applicable representations of words has been an active area of research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods. Pre-trained word embeddings are an integral part of modern NLP systems, of- fering signiﬁcant improvements over embeddings learned from scratch (Turian et al., 2010). To pre- train word embedding vectors, left-to-right lan- guage modeling objectives have been used (Mnih and Hinton, 2009), as well as objectives to dis- criminate correct from incorrect words in left and right context (Mikolov et al., 2013). These approaches have been generalized to coarser granularities, such as sentence embed- dings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings (Le andMikolov, 2014). To train sentence representations, prior work has used objectives to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-right generation of next sen- tence words given a representation of the previous sentence (Kiros et al., 2015), or denoising auto- encoder derived objectives (Hill et al., 2016). ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word embedding re- search along a different dimension. They extract context-sensitive features from a left-to-right and a right-to-left language model. The contextual rep- resentation of each token is the concatenation of the left-to-right and right-to-left representations. When integrating contextual word embeddings with existing task-speciﬁc architectures, ELMo advances the state of the art for severalmajor NLP benchmarks (Peters et al., 2018a) including ques- tion answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al., 2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud et al. (2016) proposed learning contextual representations through a task to pre- dict a single word from both left and right context using LSTMs. Similar to ELMo, their model is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the cloze task can be used to improve the robustness of text generation mod- els. 2.2 Unsupervised Fine-tuning Approaches As with the feature-based approaches, the ﬁrst works in this direction only pre-trained word em- bedding parameters from unlabeled text (Col- lobert andWeston, 2008). More recently, sentence or document encoders which produce contextual token representations have been pre-trained from unlabeled text and ﬁne-tuned for a supervised downstream task (Dai and Le, 2015; Howard and Ruder, 2018; Radford et al., 2018). The advantage of these approaches is that few parameters need to be learned from scratch. At least partly due to this advantage, OpenAI GPT (Radford et al., 2018) achieved pre- viously state-of-the-art results on many sentence- level tasks from the GLUE benchmark (Wang et al., 2018a). Left-to-right language model-"
                },
                "dimensions": [
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1689583831,
                "sessionTime": 18,
                "elapsedTime": 31,
                "updateTime": 1689585066,
                "lastUpdate": 1689585068525
            }
        }
    ],
    "ext": "pdf"
}

table:PDF Dataset Export With GroundTruth Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries containing details of annotations
email	str	The email associated with user
messages	str	The messages associated with the user
role	str	The role associated with the user
elapsed time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	The content of the annotation
pdf_fingerprint	str	The fingerprint of the document
metadata	str	The metadata associated with task and project
File	str	The name of the file
TaskId	str	The Id of the task
Type of Project	bool	The metadata added in advanced setting of project
tags	list	List of dictionaries containing the annotated tags
pages	int	The page number for selected text
text	str	The select text for annotation
id	str	The Id of selected text for annotation
type	str	The type of the label
box	list	The annotation bounding box
pageoffsets	list	List of page offsets
links	list	The list of relationships
attributes	dict	The document attributes associated with task
pageAttributes	list	List of dictionaries containing the attributes for each page
plaintext	str	Dictionary containing page numbers and the corresponding plain text extracted from the file
dimensions	list	The dimensions of the pages
width	float	The width of the page
height	float	The height of the page
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	str	The start time of the annotation
sessionTime	str	The session time of the annotation
elapsedTime	str	The elapsed time of the annotation
updateTime	str	The update time of the annotation
lastUpdate	str	The last update time
ext	str	The extension of the local file,if any

3. TXT Dataset¶

1. Dataset Export¶

{
     "source": "https://********/presigned/b423bc857fcb780860add83807e61316.txt?sig=e61dcf794cd5bf4dadcca9e964de63f8b9a4f07f57ce1a65628ba45a976ab99c759c8b7fe315002b58911afddc00ff7b3e2ea51169a5389901b15c9c850f5d7f:fcc4c466036fd1ca58bfa36f53ea4507:64b761d1:6d5745dabe02fc0123cf535b1fd5cb9c",
     "name": "ca_newspapers_en_ab_the_calgary_herald_1950_05_29_issue1_page_0008.txt",
     "itemId": "214fb51145ff6524a7c5fa23",
     "datasetId": "414855a6e615c76816fba51f",
     "type": "text/plain",
     "tags": [
         "dataset tag 1"
     ],
     "metadata": {
         "Dataset": "TXT"
     },
     "active": true,
     "ext": "txt"
 }

table:TXT Dataset Export Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

2. With GroundTruth NER-project¶

{
    "source": "https://********/presigned/ffd1f1fd01f23a07150051eb3a0ba3ed.txt?sig=fa417407505cfdc13f08a4144f12c7a42c2794470913d6f21d4f3a9ce71f92d4714be51eed1f21f2865f6d2947fe90f4fad19710e7d91a5604a931fb1f4d064b:8daf874146dea01203e9966672b452af:64b76a36:365a5794f4451398d62e10cb08f82929",
    "name": "ca_newspapers_en_ab_edmonton_journal_1928_02_16_issue1_page_0010.txt",
    "itemId": "17a66e34546c77d6a6ed095a",
    "datasetId": "414855a6e615c76816fba51f",
    "type": "text/plain",
    "tags": [],
    "metadata": {
        "Dataset": "TXT"
    },
    "active": true,
    "project": "866ad732042bde9b94929cc3",
    "taskId": "52755d415dd68822fbdafc20",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 8,
            "date": "2023-07-18T04:40:32.847Z",
            "content": {
                "metadata": {
                    "File": "ca_newspapers_en_ab_edmonton_journal_1928_02_16_issue1_page_0010.txt",
                    "TaskId": "52755d415dd68822fbdafc20",
                    "Type of Project": "NER"
                },
                "absoluteOffsets": true,
                "tags": [
                    {
                        "page": 1,
                        "text": "BRITAIN WILL HONOR",
                        "id": 1,
                        "type": "PERSON"
                    },
                    {
                        "page": 1,
                        "text": "Grent Britain",
                        "id": 2,
                        "type": "PERSON"
                    },
                    {
                        "page": 1,
                        "text": "February 21",
                        "id": 3,
                        "type": "DATE"
                    },
                    {
                        "page": 1,
                        "text": "Earl of Oxford",
                        "id": 4,
                        "type": "ORGANIZATION"
                    },
                    {
                        "page": 1,
                        "text": "SUTTON COURTENAY",
                        "id": 5,
                        "type": "ORGANIZATION"
                    }
                ],
                "pageOffsets": [
                    0
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 1,
                        "id2": 2,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "plainText": {
                    "1": "BRITAIN WILL HONOR EARL AT ABBEY SERVICE ! But Great British Statesman Is to Be Buried Privately SUTTON COURTENAT. England, Feb. eminent men and the press of Grent Britain praised the Earl of Oxford's life of service ed mourned his death, the body of the aged state man, who died at his home here early yesterday, was carried last night to the parish church of Sutton Courtenny. The early will be buried privately and not in Il'estminster Abbey. Tals announcement was made last night by the family. and the decision was in accordance with the special wish expressed by Lord Oxford some time ago Memorial Service A memorial service for the former premier, however, will be held In the abbey at noon February 21. A simple service Tor the family w! be held In the parish church Saturday morning. Praise of the Earl of Oxford and Asquith as a great parliamentarian, a forceful, gracious debater and an the selfish servant of the nation's welfare is contained in thousands of messages of condolence published and received my his widow. All recall his activities In the early days of the war. when. as  ==========  man Is to Be Buried Privately SUTTON COURTENAY. England, Feb. 16. -Wl'hlle eminent men and the press of Grent Britain praised the Earl of Oxford's life of service ed mourned his death, the body of the aged states man, who died at his home here early yesterday, was carried last night to the parish church of Sutton Courtenay, The early will be burled privately and not in l'estminster Abbey. Tals announcement was made last night by the family. and the decision was in accordance with the special wish is. pressed by Lord Oxford some time ago Memorial Service A memorial service for the former premier, however, will be held In the abbey at noon February 21. A simple service Tor the family n! be held in the parish church Saturday morning. Praise of the Earl of Oxford and Asquith as a great parliament, a forceful, gracious debater and an the selfish servant of the nation's welfare is contained in thousands of messages of condolence published and received my his widow. All recall his activities In the early days of the war. when. prime minister, he breathed the Britisn ---------- Recall Declaration Many proudly remember his declaration In the face of Germany's seemingly irresistible advance when the  \"* We shall never sheathe the sword which we have not lightly drawn until Beiglum recovers in full measure all and more than all. she had sacrificed. until France is adequately secured against the menace of aggression ;  until the rights of the stiller nationalists of Europe are placed upon an unassailable foundation, and until the mill ward domination of Prussia Is wholly and finally destroyed  \"  ==========  Ottawa House Pays Mead of Tribute OTTAWA, Feb. 16. --Tho prime minister at the opening of the house or commons yesterday afternoon rose to suggest that the house should pause In the midst* of its duties to pay, tribute to the memory of Lord Oxtord and Asquith, Mr. King reminded the house that Lord Oxford's career cox tended over the greater part of half a century and that he had held the post of prime minister continuously for 0 longer period than any who had over held that office. As to his part in the war Premier King stated that the burden of responsibility undoubtedly affected the constitution of the former prime of Britaln and hastened his death. \"  was fitting that members of the Can'1dian committee should join with the members of l'estminster in extending sympathy to the people of Great Britain for the great old that bad been created.  ---------- Bennett Adds Word Hon. R. L. Bennett, leader of the opposition, said that it fell to the leader of the house, the prime minister to extend the sympathy of the people. Un behalf of those who sat in opposition he desired to Join In the sympathy that had been expressed. The prime minister of Canada, Mr. Bennett said, might feel that he was u worthy disciple of Mr. Asquith because the latter had held office for some time with the aid of conflicting groups in the house of commons. Mr. Asquith had been a great scholar, n great orator, and had well maintained the noble traditions of parliament. The empire had lost a very fine citizen but he had left behind him n most Inspiring legacy. Robert Gardiner (U. F /  Aendin) speaking on behalf of his Grotius joined In the tribute to a man who would he best remembered r. s. the man who had  \" at heart the Interests of the common people \""
                },
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "62de356a2f027ab62a00bef1"
                },
                "jobStart": 1689655223,
                "sessionTime": 8,
                "elapsedTime": 8,
                "updateTime": 1689655231,
                "lastUpdate": 1689655232841
            }
        }
    ],
    "ext": "txt"
}

table:TXT Dataset Export With GroundTruth NER Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries containing details of annotations
email	str	The email associated with user
messages	str	The messages associated with the user
role	str	The role associated with the user
elapsed time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.
metadata	str	The metadata associated with task and project
File	str	The name of the file
TaskId	str	The Id of the task
Type of Project	bool	The metadata added in advanced setting of project
absoluteOffsets	bool	Indicates annotation format is absolute entity offset
tags	list	List of dictionaries containing the annotated tags
pages	int	Page number of selected text
text	str	The selectd text for annotation
id	str	The Id of selected text for annotation
type	str	The type of the label
links	list	The list of relationship
attributes	dict	The document attributes associated with task
pageAttributes	list	List of dictionaries containing the attributes for each page
plaintext	str	Dictionary containing page numbers and the corresponding plain text extracted from the file
dimensions	list	The dimensions of the pages
width	float	The width of the page
height	float	The height of the page
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	str	The start time of the annotation
sessionTime	str	The session time of the annotation
elapsedTime	str	The elapsed time of the annotation
updateTime	str	The update time of the annotation
lastUpdate	str	The last update time
ext	str	The extension of the local file , if any

3. With GroundTruth Classification Project¶

{
        "source": "https://sandboxdocuments.tensoract.com/presigned/3c9a446180a44faa43ab2464d45633c7.txt?sig=e5128796abbbc2f33ba9b83af2a755207d9b247a3a624c7992bf8c70a5af621d95cd95c7b63b940d14832d295814310e2b7fff4622944e9ac9815d52ed507311:2e41603c382003a3c456e47b2768981f:64b7848f:2650c588fc2cced10e6118802086d776",
        "name": "business_2.txt",
        "itemId": "2d2020aa8e3deb383fb7c74f",
        "datasetId": "60974d4e9e7759842cdff3be",
        "type": "text/plain",
        "tags": [
            "dataset tag"
        ],
        "metadata": {
            "Dataset Type": "TXT"
        },
        "active": true,
        "project": "591351b938d008ca0745510a",
        "taskId": "8c939483b594e0de5d5efb54",
        "annotations": [
            {
                "email": "johndoe@me.com",
                "messages": [],
                "role": "nlp_qc",
                "elapsedTime": 8,
                "date": "2023-07-18T06:03:46.657Z",
                "content": {
                    "metadata": {
                        "File": "business_2.txt",
                        "TaskId": "8c939483b594e0de5d5efb54",
                        "Type of Project": "Classification"
                    },
                    "classificationTypes": {
                        "Select Type of Document": "select",
                        "Type of Documents": "multi",
                        "Put a note": "text"
                    },
                    "classifications": {
                        "Select Type of Document": [
                            "Technology"
                        ],
                        "Type of Documents": [
                            "Graphics",
                            "Bussiness"
                        ],
                        "Put a note": [
                            "Multi-type document"
                        ]
                    },
                    "plainText": {
                        "1": "Japanese growth grinds to a halt  Growth in Japan evaporated in the three months to September, sparking renewed concern about an economy not long out of a decade-long trough.  Output in the period grew just 0.1%, an annual rate of 0.3%. Exports - the usual engine of recovery - faltered, while domestic demand stayed subdued and corporate investment also fell short. The growth falls well short of expectations, but does mark a sixth straight quarter of expansion.  The economy had stagnated throughout the 1990s, experiencing only brief spurts of expansion amid long periods in the doldrums. One result was deflation - prices falling rather than rising - which made Japanese shoppers cautious and kept them from spending.  The effect was to leave the economy more dependent than ever on exports for its recent recovery. But high oil prices have knocked 0.2% off the growth rate, while the falling dollar means products shipped to the US are becoming relatively more expensive.  The performance for the third quarter marks a sharp downturn from earlier in the year. The first quarter showed annual growth of 6.3%, with the second showing 1.1%, and economists had been predicting as much as 2% this time around. \"Exports slowed while capital spending became weaker,\" said Hiromichi Shirakawa, chief economist at UBS Securities in Tokyo. \"Personal consumption looks good, but it was mainly due to temporary factors such as the Olympics. \"The amber light is flashing.\" The government may now find it more difficult to raise taxes, a policy it will have to implement when the economy picks up to help deal with Japan's massive public debt. "
                    },
                    "review": {
                        "rate": "Ok",
                        "note": "",
                        "reviewerId": "614b55be8af65dcf41da535b"
                    },
                    "jobStart": 1689660217,
                    "sessionTime": 8,
                    "elapsedTime": 8,
                    "updateTime": 1689660225,
                    "pageOffsets": [
                        0
                    ],
                    "lastUpdate": 1689660226654
                }
            }
        ],
        "ext": "txt"
    }

table:TXT Dataset Export With GroundTruth Classification Project Summary:

Field Names	Type	Description
source	str	The pesigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries containing details of annotations
email	str	The email associated with user
messages	str	The messages associated with the user
role	str	The role associated with the user
elapsed time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	The content of the annotation
metadata	str	The metadata associated with task and project
File	str	The name of the file
TaskId	str	The Id of the task
Type of Project	bool	The metadata added in advanced setting of project
classificationTypes	dict	Dictionary containing the labels defined in the project
Select Type of Document	str	Single Select Label
Type of Documents	str	Multi Select Label
Put a note	str	Plain Text Label
classifications	dict	Dictionary containing the classifications labels in the task
plaintext	dict	Dictionary containing page numbers and the corresponding plain text extracted from the file
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	str	The start time of the annotation
sessionTime	str	The session time of the annotation
elapsedTime	str	The elapsed time of the annotation
updateTime	str	The update time of the annotation
lastUpdate	str	The last update time
ext	str	The extension of the local file

4. Image Dataset¶

1. Dataset Export¶

{
    "source": "s3://newton-ai-internal-share/files/car1.jpeg",
    "name": "car1.jpeg",
    "itemId": "9a0498002663227e1e7d5e14",
    "datasetId": "25987f46e5febb50484e8497",
    "type": "image/jpeg",
    "tags": [
        "dataset tag"
    ],
    "metadata": {
        "Dataset Type": "Image",
        "xxx": 12,
        "presigned": "http://aaa.com"
    },
    "active": true
}

table:Image Dataset Export:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	Type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	Metadata associated with the dataset and datasetitem
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

2. With GroundTruth Bulk Image Classification Project¶

{
    "source": "https://**********/presigned/b2b6d70656c53b10b0a194296a3598b5.tiff?sig=d3c6212eb811deb8c325f38e829d6f542a32e8a171c062d7be0e2125ff7c62628e05b3e66d996edf089745ee35699c48bc7fb7c3db86ada01667cb3f18c54990:6d4aa6c56aa19b8ac45d6630fb465e4d:64b7b5c5:ea2ae71fa9cc9950931e3335ced82926",
    "name": "cyan.tiff",
    "itemId": "93007aa49a9288b9b460528b",
    "datasetId": "25987f46e5febb50484e8497",
    "type": "image/tiff",
    "tags": [
        "color images"
    ],
    "metadata": {
        "Dataset Type": "Image",
        "color": "cyan"
    },
    "active": true,
    "project": "e3c9b4a1dd6df1c4c7091895",
    "taskId": "ff5e6111d67ec09cb7578577",
    "annotations": [],
    "classification": "Cyan",
    "ext": "tiff"
}

table:Image Dataset Export With GroundTruth Bulk Image Classification Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
classification	str	Classified Label of task
ext	str	The extension of the local file

3. With GroundTruth Object Detection Project¶

{
    "source": "https://sandboxdocuments.tensoract.com/presigned/2ba176b5957d922c0b5867cf99a6895a.jpg?sig=6ef9b9444985b2d637dc53869e894b6b64779f65dc5dfa0a1d23129ed89621c2859f32988ca8fb76ed3d292a441c167dd85dd9987b27ed6997e90e4cd1051584:59e6f8bc53146080d740b01e67328244:64b8a815:f5df07dd0c032aa261ffa1470b799829",
    "name": "im00.jpg",
    "itemId": "9c8b14f93c1fbc5b2fe996ae",
    "datasetId": "abd204685e5c074b282d6744",
    "type": "image/jpeg",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "Dataset": "Image Dataset"
    },
    "active": true,
    "project": "0461637f62c18082f3c14cc3",
    "taskId": "dff56fb67e79f0cb887263cb",
    "annotations": [
        {
            "email": "jdoeqa@acme.org",
            "messages": [],
            "role": "nlp_qc",
            "elapsedTime": 13,
            "date": "2023-06-17T10:10:29.789Z",
            "content": {
                "url": "https://sandboxdocuments.tensoract.com/presigned/2ba176b5957d922c0b5867cf99a6895a.jpeg?sig=35d229ffb09200bbd28eb9c0ab00d7c2a446f0c85daa6b6204e0b1f043229c1e0b25b6633b477b2145d77d96a0ddd6cb7922e104e72df00583df7c9bed233058:1f88e35afe77cfb705c7dfabce067f20:648ed802:fc875ac8ba21580b86f20874bafee1d9",
                "imageWidth": 720,
                "imageHeight": 1280,
                "selected": null,
                "boxes": [
                    {
                        "x1": 310.7026,
                        "y1": 468.443,
                        "x2": 507.9414,
                        "y2": 1021.1236,
                        "id": "b0",
                        "type": "box",
                        "oid": "b1",
                        "outside_image": {},
                        "occluded": {},
                        "invisible": false,
                        "attrs": {},
                        "title": "",
                        "label": "Group 1",
                        "sub_labels": [
                            {
                                "x1": 357.9577,
                                "y1": 495.1525,
                                "id": "b1",
                                "type": "keypoint",
                                "oid": "b2",
                                "outside_image": {},
                                "occluded": {},
                                "invisible": false,
                                "title": "",
                                "label": "Top of head",
                                "sub_labels": []
                            }
                        ]
                    }
                ],
                "image_attrs": {},
                "review": {
                    "rate": "Rejected",
                    "note": ""
                },
                "jobStart": 1686996615,
                "sessionTime": 13,
                "elapsedTime": 13,
                "tsSeconds": true,
                "updateTime": 1686996628,
                "lastUpdate": 1686996629784
            }
        }
    ],
    "ext": "jpg"
}

table:Image Dataset Export With GroundTruth Object Detection Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	Metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.
url	str	The presigned URL or S3 path of the task
imageWidth	int	The width of the image in pixels
imageHeight	int	The height of the image in pixels
boxes	list	List of bounding boxes drawn around objects in the image
x1	float	The x-coordinate of the top-left corner of the bounding box
y1	float	The y-coordinate of the top-left corner of the bounding box
x2	float	The x-coordinate of the bottom-right corner of the bounding box
y2	float	The y-coordinate of the bottom-right corner of the bounding box
id	str	The Id of bounding box
type	str	Flag to indicate wheter it is box/keypoint
outside_image	dict	Indicates whether the object extends beyond the boundaries of the image
occuluded	dict	Indicates whether the object is occluded or partially hidden
attrs	dict	Represents any additional attributes or properties associated with the object
label	str	The label assigned to the bounding box
sub_labels	list	Represents any sub-labels or sub-categories associated with the object
x1	float	The x-coordinate of the top-left corner of the keypoint
y1	float	The y-coordinate of the top-left corner of the keypoint
id	str	The Id of keypoint
type	str	Flag to indicate wheter it is box/keypoint
outside_image	dict	Indicates whether the object extends beyond the boundaries of the image
occuluded	dict	Indicates whether the object is occluded or partially hidden
label	str	The label or category assigned to keypoint
sub_labels	list	Represents any sub-labels associated with the object
image_attrs	dict	The image attributes associated with the task
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
lastUpdate	int	The last update time
ext	str	Extension of local files, if any

5. Video Dataset¶

1. Dataset Export¶

 {
     "source": "s3://test-pocs/mira640.mp4",
     "name": "mira640.mp4",
     "itemId": "a90bf16c2ba7b6e9ac1a6d9d",
     "datasetId": "84431d90c6497d6ab6425dfc",
     "type": "video/mp4",
     "tags": [
     "dataset tag"
     ],
     "metadata": {
         "xxx": 11,
         "presigned": "http://aaa.com"
     },
     "active": true
}

table:Video Dataset Export Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The ID of the dataset
type	str	Type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	The metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

2. With GroundTruth Media Transcription Project¶

{
         "source": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=6ab60805f723d8565a15b6dfacc057e8953f99fbb21d39ff30a8a0da2f39b1cbefb9f28139b8eedc55545d6cd0fadaf56ed14a450ead8f79a48b1d233083ba63:f116065b73cb90ba8f217d0bfee72ab5:64b8acc9:3ab540299e282e90e6ece2f127f7d15a",
         "name": "Video 1.mp4",
         "itemId": "0b9ff34daa6a2f6f95c59bb3",
         "datasetId": "1ac8fb72573008ce5626bbfb",
         "type": "video/mp4",
         "tags": [
             "dataset tag1"
         ],
         "metadata": {
             "Dataset Type": "Video"
         },
         "active": true,
         "project": "c859cc0a92b7bd4d6d166707",
         "taskId": "c53ca1cace7d7e784705b631",
         "annotations": [
             {
                 "email": "jdoeqa@acme.org",
                 "messages": [],
                 "role": "nlp_qc",
                 "elapsedTime": 62,
                 "date": "2023-06-17T12:16:38.937Z",
                 "content": {
                     "review": {
                         "rate": "Ok",
                         "note": "",
                         "reviewerId": "63ca81cd31d698c1825328f3"
                     },
                     "videoSource": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=9f13caf4ac583fbc9c074b91770e579120196b3f08e192ad641290c246a7add58f6d6f40ce1aa63e031ada96ff8c996038e4ed46516ffce1f56262cc6a435eeb:062d59d190dc03e090dcd5ee5ff17faa:648ef563:ac1a510bd2fb6c0ab328a3202eb9c846",
                     "streams": {
                         "Transcription": [
                             {
                                 "start": 0.025000260441083017,
                                 "end": 0.9500098967611547,
                                 "confidence": 1,
                                 "text": "Bonjoi"
                             },
                             {
                                 "start": 0.9500098967611547,
                                 "end": 2.0812716817201613,
                                 "confidence": 1,
                                 "text": "Tava Tuti"
                             },
                             {
                                 "start": 2.1187720723817858,
                                 "end": 3.156282880686731,
                                 "confidence": 1,
                                 "text": "Hello"
                             },
                             {
                                 "start": 3.156282880686731,
                                 "end": 4.13629310905087,
                                 "confidence": 1,
                                 "text": "Ola"
                             },
                             {
                                 "start": 4.165043351337061,
                                 "end": 4.608797974166285,
                                 "confidence": 1,
                                 "text": "Tutu beng"
                             },
                             {
                                 "start": 10.453858750849383,
                                 "end": 12.07887567951978,
                                 "confidence": 1,
                                 "text": "oye tutubeng"
                             }
                         ],
                         "Language Segmentation": [
                             {
                                 "start": 0.018750195330812264,
                                 "end": 0.5687559250346386,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 0.5937561854757216,
                                 "end": 2.0937718119407025,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 2.0937718119407025,
                                 "end": 3.1650329694568993,
                                 "confidence": 1,
                                 "tag": "English"
                             },
                             {
                                 "start": 3.2437837922305217,
                                 "end": 4.156293298330052,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 4.250044274984113,
                                 "end": 6.318815826483732,
                                 "confidence": 1,
                                 "tag": "Russian"
                             },
                             {
                                 "start": 6.8338213441595235,
                                 "end": 8.190085473088276,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 8.321336840403962,
                                 "end": 9.865102922640839,
                                 "confidence": 1,
                                 "tag": "English"
                             },
                             {
                                 "start": 9.915103443523005,
                                 "end": 11.071365488923096,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 11.233867181790135,
                                 "end": 12.6088815060497,
                                 "confidence": 1,
                                 "tag": "Arabic"
                             }
                         ]
                     },
                     "mediaAttributes": {
                         "Is Video Clear?": "Yes",
                         "Aditional Notes": ""
                     },
                     "jobStart": 1687003443,
                     "sessionTime": 62,
                     "elapsedTime": 93.075,
                     "tsSeconds": true,
                     "updateTime": 1687004194,
                     "metadata": {
                         "File": "Video 1.mp4",
                         "TaskId": "c53ca1cace7d7e784705b631",
                         "Type": "Media Transcription"
                     },
                     "lastUpdate": 1687004198934
                 }
             }
         ],
         "ext": "mp4"
     }

table:Video Dataset Export With GroundTruth Media Transcription Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	The metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
videoSource	str	The presigned URL or S3 path of the task
streams	dict	Dictionary of different streams within the video, each containing specific information
Transcription	list	The stream containing transcribed text from the video
start	float	The starting timestamp(in seconds) of the transcribed text segment
end	float	The ending timestamp(in seconds) of the the transcribed text segment
confidence	int	Indicates the confidence level
text	str	The actual transcribed text for the corresponding segment
Segmentation	list	The stream containing information about the segemtations in the video
start	float	The starting timestamp of the segment
end	float	The ending timestamp of the segment
confidence	int	Indicates the confidence level
tag	str	The tag in the corresponding segment
mediaAttribute	dict	The media attributes associated with the task
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
tsSeconds	bool
update time	int	The update time of the annotation
metadata	dict	Dictionary containing metadata of the task and the project
lastUpdate	int	The last update time
ext(localfiles)	str	Extension of local files, if any

6. Audio Dataset¶

1. Dataset Export¶

{
    "source": "s3://test-pocs/mira640.mp3",
    "name": "mira640.mp3",
    "itemId": "4012452ecf60608061c2baed",
    "datasetId": "a38d3e9d800fc55e079a3b1d",
    "type": "audio/mpeg",
    "tags": [
        "dataset tag 1"
    ],
    "metadata": {
        "DATASET Type": "Audio",
        "xxx": 11,
        "presigned": "http://aaa.com"
    },
    "active": true
}

table:Audio Dataset Export Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	Type of the dataset
tags	list	List of tags associated with the dataset item
metadata	dict	The metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
ext(localfiles)	str	Extension of local files, if any

3. With GroundTruth Media Transcription Project¶

{

         "source": "s3://test-pocs/mira640.mp3",
         "name": "mira640.mp3",
         "itemId": "1c944e058058dbe48c980ead",
         "datasetId": "edd9c7d7ae5c1b0c2bc73643",
         "type": "audio/mpeg",
         "tags": [
             "dataset tag"
         ],
         "metadata": {
             "Dataset Type": "Audio"
         },
         "active": true,
         "project": "c859cc0a92b7bd4d6d166707",
         "taskId": "eba777874d6d268ece56b33a",
         "annotations": [
             {
                 "email": "johndoe@me.com",
                 "messages": [],
                 "role": "nlp_qc",
                 "elapsedTime": 6,
                 "date": "2023-07-19T04:01:20.036Z",
                 "content": {
                     "review": {
                         "rate": "Ok",
                         "note": "",
                         "reviewerId": "614b55be8af65dcf41da535b"
                     },
                     "audioSource": "https://test-pocs.s3.amazonaws.com/mira640.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUD4REC47DTY4PF7A%2F20230719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230719T040111Z&X-Amz-Expires=7200&X-Amz-Signature=10048222e39f53cd9dbbb2b97b8aa210e323d8fc544a9ea18ff46e5922974f44&X-Amz-SignedHeaders=host",
                     "streams": {
                         "Transcription": [
                             {
                                 "start": 0.018749917353218702,
                                 "end": 0.6312472175583629,
                                 "confidence": 1,
                                 "text": "Bonjoi"
                             },
                             {
                                 "start": 0.7187468318733835,
                                 "end": 1.881241707772943,
                                 "confidence": 1,
                                 "text": "Tava Tuti"
                             },
                             {
                                 "start": 2.0124911292454737,
                                 "end": 2.4874890355270143,
                                 "confidence": 1,
                                 "text": "Hello"
                             }
                         ],
                         "Language Segmentation": [
                             {
                                 "start": 0.006249972451072901,
                                 "end": 0.4437480440261759,
                                 "confidence": 1,
                                 "tag": "French"
                             },
                             {
                                 "start": 0.5437476032433424,
                                 "end": 1.8687417628707974,
                                 "confidence": 1,
                                 "tag": "German"
                             },
                             {
                                 "start": 1.9187415424793806,
                                 "end": 2.6687382366081285,
                                 "confidence": 1,
                                 "tag": "English"
                             }
                         ]
                     },
                     "mediaAttributes": {
                         "Is Video Clear?": "Yes",
                         "Aditional Notes": ""
                     },
                     "jobStart": 1689739272,
                     "sessionTime": 6,
                     "elapsedTime": 6,
                     "tsSeconds": true,
                     "updateTime": 1689739278,
                     "lastUpdate": 1689739280033
                 }
             }
         ]
     }

table:Audio Dataset Export With GroundTruth Media Transcription Project Summary:

Field Names	Type	Description
source	str	The presigned URL or S3 path of the data source
name	str	The name of the dataset item
itemId	str	The Id of the dataset item
datasetId	str	The Id of the dataset
type	str	The type of the dataset
tags	list	List of tags associated with the dataset item
metadata	str	The metadata associated with the dataset and dataset item
active	bool	Indicates the dataset item is currently active
project	str	The project associated with the dataset item
taskId	str	The Id of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
audioSource	str	The presigned URL or S3 path of the task
streams	dict	Dictionary of different streams within the video, each containing specific information
Transcription	list	The stream containing transcribed text from the video
start	float	The starting timestamp(in seconds) of the transcribed text segment
end	float	The ending timestamp(in seconds) of the the transcribed text segment
confidence	int	Indicates the confidence level
text	str	The actual transcribed text for the corresponding segment
Segmentation	list	The stream containing information about the segemtations in the video
start	float	The starting timestamp of the segment
end	float	The ending timestamp of the segment
confidence	int	Indicates the confidence level
tag	str	The tag in the corresponding segment
mediaAttribute	dict	The media attributes associated with the task
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
metadata	dict	Dictionary containing metadata of the task and the project
lastUpdate	int	The last update time
ext(localfiles)	str	Extension of local files, if any

Project Exports¶

1. OCR Project¶

{
     "project_id": "7b3020dd437ce2a30bae1c5a",
     "project_name": "Test-OCR-Project-1",
     "project_type": "OCR",
     "datasetId": "e3773b85655ea8646005158a",
     "itemId": "9cebea4c95edc877ca6f2603",
     "file_name": "ABSTRACT - Axia.tiff",
     "file_type": "application/pdf",
     "source": "https://sandboxdocuments.tensoract.com/presigned/a01f5c95d843b4fd4f890570e5cac51c.pdf?sig=fc8cd601e4b6d76de180378b1663c1b8b1ac21c2a82fd7909bf959bce43964344830f137475dd53602d458cbd780b08626379b8329f5dea96f7bdf78b727d5f2:60847ae533c350f801adca47a54b6cfb:64cdee0b:bad3e8fb52c965453a0f9fd8ffde6c9e",
     "state": 4,
     "task_id": "0931952ce4a27f53a3678cfe",
     "state_description": "Approved",
     "annotations": [
         {
             "email": "yannevarsha6@gmail.com",
             "messages": [],
             "role": "Reviewer",
             "elapsedTime": 14,
             "date": "2023-08-04T06:21:00.589Z",
             "content": {
                 "pdf_fingerprint": "c04f692d342c06d433f751ac32c6d8b1",
                 "metadata": {
                     "ocr_model": "Textract (default)",
                     "use-textract-only": true,
                     "source_ref": "/uploads/e3773b85655ea8646005158a/9cebea4c95edc877ca6f2603",
                     "document_id": "9cebea4c95edc877ca6f2603",
                     "Type of Project": "OCR"
                 },
                 "tags": [
                     {
                         "page": 1,
                         "text": "N A M E",
                         "id": 1,
                         "type": "Name",
                         "kv_type": "key",
                         "words": [
                             "N",
                             "A",
                             "M",
                             "E"
                         ],
                         "boxes": [
                             [
                                 0.06499018520116806,
                                 0.11739349365234375,
                                 0.07347860559821129,
                                 0.12546881940215826
                             ],
                             [
                                 0.06458062678575516,
                                 0.13079734146595,
                                 0.0742951761931181,
                                 0.1387380100786686
                             ],
                             [
                                 0.06520503759384155,
                                 0.14403623342514038,
                                 0.07536023296415806,
                                 0.15211013052612543
                             ],
                             [
                                 0.06526166200637817,
                                 0.15757058560848236,
                                 0.07337938901036978,
                                 0.16564789321273565
                             ]
                         ],
                         "range": [
                             [
                                 71,
                                 72
                             ],
                             [
                                 126,
                                 127
                             ],
                             [
                                 165,
                                 166
                             ],
                             [
                                 194,
                                 195
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Axia Women's Health",
                         "id": 2,
                         "type": "Name",
                         "textAdjust": "Axia Women's",
                         "kv_type": "value",
                         "words": [
                             "Axia",
                             "Women's",
                             "Health"
                         ],
                         "boxes": [
                             [
                                 0.0935770571231842,
                                 0.11707708239555359,
                                 0.11941905505955219,
                                 0.1253887191414833
                             ],
                             [
                                 0.12276646494865417,
                                 0.11710146069526672,
                                 0.17684946581721306,
                                 0.1254600789397955
                             ],
                             [
                                 0.18119750916957855,
                                 0.11732043325901031,
                                 0.21823260188102722,
                                 0.12542327493429184
                             ]
                         ],
                         "range": [
                             [
                                 73,
                                 77
                             ],
                             [
                                 78,
                                 85
                             ],
                             [
                                 86,
                                 92
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "BILL TO",
                         "id": 3,
                         "type": "Name",
                         "rawBox": true,
                         "kv_type": "key",
                         "words": [
                             "BILL TO"
                         ],
                         "boxes": [
                             [
                                 0.4980276134122288,
                                 0.10967250571210967,
                                 0.5374753451676528,
                                 0.1706016755521706
                             ]
                         ],
                         "range": []
                     },
                     {
                         "page": 1,
                         "text": "Regional Womens Health",
                         "id": 4,
                         "type": "Name",
                         "rotate": 24,
                         "rawBox": true,
                         "kv_type": "value",
                         "words": [
                             "Regional Womens Health"
                         ],
                         "boxes": [
                             [
                                 0.5473372781065089,
                                 0.11119573495811119,
                                 0.7682445759368837,
                                 0.12795125666412796
                             ]
                         ],
                         "range": []
                     },
                     {
                         "page": 1,
                         "text": "Cat.",
                         "id": 5,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Cat."
                         ],
                         "boxes": [
                             [
                                 0.39583876729011536,
                                 0.3084534704685211,
                                 0.4190108198672533,
                                 0.31684120278805494
                             ]
                         ],
                         "range": [
                             [
                                 543,
                                 547
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Cat.",
                         "id": 6,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Cat."
                         ],
                         "boxes": [
                             [
                                 0.39583876729011536,
                                 0.3084534704685211,
                                 0.4190108198672533,
                                 0.31684120278805494
                             ]
                         ],
                         "range": [
                             [
                                 543,
                                 547
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Description",
                         "id": 7,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Description"
                         ],
                         "boxes": [
                             [
                                 0.4328092038631439,
                                 0.3084268271923065,
                                 0.49752890318632126,
                                 0.3184952298179269
                             ]
                         ],
                         "range": [
                             [
                                 548,
                                 559
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Description",
                         "id": 8,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Description"
                         ],
                         "boxes": [
                             [
                                 0.4328092038631439,
                                 0.3084268271923065,
                                 0.49752890318632126,
                                 0.3184952298179269
                             ]
                         ],
                         "range": [
                             [
                                 548,
                                 559
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Effective",
                         "id": 9,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 0,
                             "cell": true
                         },
                         "words": [
                             "Effective"
                         ],
                         "boxes": [
                             [
                                 0.6239141225814819,
                                 0.2947663366794586,
                                 0.6735980845987797,
                                 0.30344805866479874
                             ]
                         ],
                         "range": [
                             [
                                 476,
                                 485
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Sqft.",
                         "id": 10,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Sqft."
                         ],
                         "boxes": [
                             [
                                 0.5750880241394043,
                                 0.30830204486846924,
                                 0.6010445598512888,
                                 0.3183623990043998
                             ]
                         ],
                         "range": [
                             [
                                 560,
                                 565
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Sqft.",
                         "id": 11,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Sqft."
                         ],
                         "boxes": [
                             [
                                 0.5750880241394043,
                                 0.30830204486846924,
                                 0.6010445598512888,
                                 0.3183623990043998
                             ]
                         ],
                         "range": [
                             [
                                 560,
                                 565
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "ABA",
                         "id": 12,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 0,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "ABA"
                         ],
                         "boxes": [
                             [
                                 0.3953396677970886,
                                 0.3291471600532532,
                                 0.42196371778845787,
                                 0.3373938351869583
                             ]
                         ],
                         "range": [
                             [
                                 626,
                                 629
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Date",
                         "id": 13,
                         "type": "Name",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 1,
                             "cell": true
                         },
                         "kv_type": "key",
                         "words": [
                             "Date"
                         ],
                         "boxes": [
                             [
                                 0.6240901350975037,
                                 0.3085164725780487,
                                 0.6510729901492596,
                                 0.31685456447303295
                             ]
                         ],
                         "range": [
                             [
                                 566,
                                 570
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Date",
                         "id": 14,
                         "type": "TABLEHEADER",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 1,
                             "cell": true
                         },
                         "words": [
                             "Date"
                         ],
                         "boxes": [
                             [
                                 0.6240901350975037,
                                 0.3085164725780487,
                                 0.6510729901492596,
                                 0.31685456447303295
                             ]
                         ],
                         "range": [
                             [
                                 566,
                                 570
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "Rent Abatements/Cor",
                         "id": 15,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 1,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "Rent",
                             "Abatements/Cor"
                         ],
                         "boxes": [
                             [
                                 0.4329037368297577,
                                 0.3290809392929077,
                                 0.4603371527045965,
                                 0.3374354373663664
                             ],
                             [
                                 0.46285462379455566,
                                 0.32896438241004944,
                                 0.5594801902770996,
                                 0.3374544633552432
                             ]
                         ],
                         "range": [
                             [
                                 630,
                                 634
                             ],
                             [
                                 635,
                                 649
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "4,850",
                         "id": 16,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 2,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "4,850"
                         ],
                         "boxes": [
                             [
                                 0.5759893655776978,
                                 0.3291241228580475,
                                 0.6087189093232155,
                                 0.3381931884214282
                             ]
                         ],
                         "range": [
                             [
                                 650,
                                 655
                             ]
                         ]
                     },
                     {
                         "page": 1,
                         "text": "6/15/2021",
                         "id": 17,
                         "type": "TABLECELL",
                         "table": {
                             "id": 4,
                             "x": 3,
                             "y": 2,
                             "cell": true
                         },
                         "words": [
                             "6/15/2021"
                         ],
                         "boxes": [
                             [
                                 0.6162644028663635,
                                 0.32898813486099243,
                                 0.6728598773479462,
                                 0.3374910345301032
                             ]
                         ],
                         "range": [
                             [
                                 656,
                                 665
                             ]
                         ]
                     }
                 ],
                 "pageOffsets": [
                     0,
                     3355,
                     5983
                 ],
                 "links": [
                     {
                         "page": 1,
                         "id1": 1,
                         "id2": 2,
                         "relationship": "key-pair"
                     },
                     {
                         "page": 1,
                         "id1": 3,
                         "id2": 4,
                         "relationship": "key-pair"
                     }
                 ],
                 "attributes": {
                     "Is document damaged": "No"
                 },
                 "pageAttributes": [
                     {
                         "Is page damaged?": "No"
                     }
                 ],
                 "tables": [
                     {
                         "x": [
                             0.3953396677970886,
                             0.4273864608258009,
                             0.567284107208252,
                             0.6124916560947895,
                             0.6735980845987797
                         ],
                         "y": [
                             0.2947663366794586,
                             0.305875051766634,
                             0.32372980611398816,
                             0.3381931884214282
                         ],
                         "rows": 3,
                         "cols": 4,
                         "box": [
                             0.3953396677970886,
                             0.2947663366794586,
                             0.6735980845987797,
                             0.3381931884214282
                         ],
                         "id": 4,
                         "page": 1,
                         "mergedList": null,
                         "description": "Table 1"
                     }
                 ],
                 "plainText": {
                     "1": "Lease Id: PR0001 - 000222 Lease Profile Master Occupant Id: 00000162-1 N Axia Women's Health B Regional Womens Health Managem A HP Main Line LLC I T 227 Laurel Road M L o Echelon One, Suite 300 E Bryn Mawr PA 19010 L Voorhees NJ 08043 Legal Name: Regional Womens Health Management Tenant Id: Contact Name: Jenni Witters Tenant Type Id: Phone No: SIC Group: Fax No: NAICS Code Lease Stop: No Suite Information Current Recurring Charges Building Id: PR0001 Execution: 3/15/2021 Effective Monthly Annual Amount Suite Id: 401 Beginning: 6/15/2021 Cat. Description Sqft. Date Amount Amount PSF Lease Id: 000222 Occupancy: 9/1/2021 ABA Rent Abatements/Cor 4,850 6/15/2021 -12,125.00 -145,500.00 -30.00 Leased Sqft: 4,850 Rent Start: 6/15/2021 ABA Rent Abatements/Cor 4,850 12/1/2021 0.00 0.00 0.00 Pro-Rata Share: 0.17 Expiration: 9/30/2028 ROF Base Rent Office 4,850 6/15/2021 12,125.00 145,500.00 30.00 Ann. Mkt. Rent PSF: 0.00 Vacate: TIC Tenant Improvement 4,850 11/1/2021 3,059.54 36,714.48 7.57 UTI Utility Reimbursement 4,850 6/15/2021 808.33 9,699.96 2.00 Occupancy Status: Current Rate Change Schedule Effective Monthly Annual Amount Cat. Description Sqft. Date Amount Amount PSF ABA Rent Abatements/Con 4,850 11/1/2021 -2,575.00 -30,900.00 -6.37 ROF Base Rent Office 4,850 7/1/2022 12,367.50 148,410.00 30.60 ROF Base Rent Office 4,850 7/1/2023 12,614.04 151,368.48 31.21 ROF Base Rent Office 4,850 7/1/2024 12,868.67 154,424.04 31.84 ROF Base Rent Office 4,850 7/1/2025 13,123.29 157,479.48 32.47 ROF Base Rent Office 4,850 7/1/2026 13,386.00 160,632.00 33.12 ROF Base Rent Office 4,850 7/1/2027 13,652.75 163,833.00 33.78 ROF Base Rent - Office 4,850 7/1/2028 13,927.58 167,130.96 34.46 Lease Notes Effective Date Ref 1 Ref 2 Note 3/15/2021 ALTERTN Article 8 of Lease Landlord's consent required for any alterations, other than cosmetic Alterations which do not cost more than $1,000 per alteration and which do not affect (i) the structural portions or roof of the Premises or the 3/15/2021 ASGNSUB Article 9 Landlord consent required for any assignment/sublease. Landlord has 30 days after receipt of notice from Tenant to either approve assignment/sublease, not approve assignment/sublease, recapture the Premises 3/15/2021 DEFAULT Article 18 of Lease 1. If Tenant does not make payment within 5 days after date due, provided that, Landlord shall not more than 1 time per 12 full calendar month period of the term, deliver written notice to Tenant with respect to 3/15/2021 ESTOPEL Article 17 of Lease Estoppel required to be provided within 10 days after request. In the form set forth in Exhibit D 3/15/2021 HOLDOVR Section 19 (b) of Lease Landlord may either (i) increase Rent to 200% of the highest monthly aggregate Fixed Rent and additional 3/15/2021 INS Article 11 - Landlord responsible for repairs to all plumbing and other fixtures, equipment and systems (including replacement, if necessary) in or serving the Premises. Landlord to provide janitorial services (Exhibit E) and pest control as needed. 3/15/2021 LATECHG Article 3 of Lease Tenant shall pay Landlord a service and handling charge equal to five percent (5%) of any Rent not paid within five (5) days after the date first due, which shall apply cumulatively each month with respect to Report Id WEBX_PROFILE Database HAVERFORD Reported by Joe Staugaard 1/7/2022 11:50 Page 1"
                 },
                 "dimensions": [
                     {
                         "width": 1275,
                         "height": 1650
                     },
                     {
                         "width": 1275,
                         "height": 1650
                     }
                 ],
                 "review": {
                     "rate": "Ok",
                     "note": "",
                     "reviewerId": "61685a5eb492d0845eb5e6b4"
                 },
                 "jobStart": 1691128396,
                 "sessionTime": 14,
                 "elapsedTime": 86,
                 "updateTime": 1691130059,
                 "selectBoundingBox": true,
                 "lastUpdate": 1691130060583
             }
         }
     ]
 }

table:OCR-Project-Manifest:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset item
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	list	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.
pdf_fingerprint	str	The fingerprint of the document
metadata	dict	Dictionary containing metadata of the task and the project
ocr_model	str	The OCR model used for processing
use_textract_only	bool	Indicates if only textract is used for processing
source_ref	str	The reference to the source of the document
document_id	str	The Id of the document
tags	list	List of dictionaries containing the tags added in the task
page	int	The page number for selected text
text	str	The selected text for annotation
id	int	The Id of selected text for annotation
type	str	The type of the label
kv_type	str	Flag to indicate whether tag is key or value (KEY/VAL)
words	str	The words in the selected text
boxes	list	List of bounding box coordinates for OCRed words
range	str	List of selected text box start offset and end offset using plaintext
textAdjust	str	Modified OCRed text
rawbox	bool	Flag to indicate if bounding box is created manually
rotate	str	The angle of bounding box rotation(degrees)
table	list	The table information
id	int	The id of the table
x	int	The vertical grid coordinates
y	int	The horiziontal grid coordinates
cell	bool	Flag to indicate if the current object is a cell of the table
pageoffsets	list	List of page offsets
links	list	List containing the relationships added in the task
page	int	The page number associated with key and value field
id1	int	The Id of the key field
id2	int	The Id of the value field
relationship	str	The name of the relationship
attributes	dict	The document attributes associated with the task
pageattributes	list	List of dictionaries containing the attributes for each page
tables	list	List of dictionaries containing table information
x	int	The vertical grid coordinates
y	int	The horiziontal grid coordinates
rows	int	The number of rows in the table
cols	int	The number of columns in the table
box	list	List of bounding box coordinates for OCRed words
id	int	The Id of the table
page	int	The page number of the table
description	str	The title of the table
plaintext	dict	Dictionary containing page numbers and the corresponding plain text extracted from the file
dimensions	list	List of dictionaries containing dimensions of pages in the task
width	float	The width of the page
height	flaot	The height of the page
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviwer
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
lastUpdate	int	The last update time

2. NER Project¶

{
    "project_id": "866ad732042bde9b94929cc3",
    "project_name": "NER-Project-DB",
    "project_type": "NER",
    "datasetId": "8d9736f30411ae81fa4983d4",
    "itemId": "0ed98ab31666242a417504f9",
    "file_name": "1810.04805.pdf",
    "file_type": "application/pdf",
    "source": "https://sandboxdocuments.tensoract.com/presigned/33e268b66cb90138b84cc627a501afa2.pdf?sig=64e0f921a163164ebdac2b74a35f80c4dee52434a405f990a9163ea306ebb99cb1ee12cb6fba3d313a531f39c0f9195083dbb2582d9d397a00553ea403d7cc4e:102e036d864eee6141450c9ad545cf66:64b8d6cf:1e215800c51838b6308d8fb24fc60adc",
    "state": 4,
    "task_id": "d6aae2114d0947b1bfe5dcd3",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 18,
            "date": "2023-07-17T09:11:08.530Z",
            "content": {
                "pdf_fingerprint": "dccb9bc542f22b2bdd94110918c68f96",
                "metadata": {
                    "File": "1810.04805.pdf",
                    "TaskId": "d6aae2114d0947b1bfe5dcd3",
                    "Type of Project": "NER"
                },
                "tags": [
                    {
                        "page": 1,
                        "range": [
                            0,
                            80
                        ],
                        "text": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding",
                        "id": 1,
                        "type": "DATE",
                        "box": [
                            0.1957394553114858,
                            0.08355623157419612,
                            0.8080743211552288,
                            0.11953028994286674
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            81,
                            93
                        ],
                        "text": "Jacob Devlin",
                        "id": 2,
                        "type": "PERSON",
                        "box": [
                            0.20464120844784606,
                            0.15506947083348188,
                            0.31506005550366556,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            94,
                            108
                        ],
                        "text": "Ming-Wei Chang",
                        "id": 3,
                        "type": "PERSON",
                        "box": [
                            0.34016437686048157,
                            0.15506947083348188,
                            0.48781795335273054,
                            0.16926990081839677
                        ]
                    },
                    {
                        "page": 1,
                        "range": [
                            423,
                            428
                        ],
                        "text": "2018a",
                        "id": 4,
                        "type": "DATE",
                        "box": [
                            0.3736872717865327,
                            0.3484841506610129,
                            0.4145903056733348,
                            0.36031776312819985
                        ]
                    },
                    {
                        "page": 2,
                        "range": [
                            743,
                            750
                        ],
                        "text": "(2018a)",
                        "id": 5,
                        "type": "DATE",
                        "box": [
                            0.3769863661562031,
                            0.3271071821734426,
                            0.4339806024432365,
                            0.3400650507786048
                        ]
                    }
                ],
                "pageOffsets": [
                    0,
                    3988,
                    8509,
                    12206,
                    17069,
                    20918,
                    25368,
                    29080,
                    33539,
                    37641,
                    42160,
                    46926,
                    50816,
                    54525,
                    58589,
                    60965,
                    64088
                ],
                "links": [
                    {
                        "page": 1,
                        "id1": 2,
                        "id2": 3,
                        "relationship": "Precede"
                    },
                    {
                        "page": 1,
                        "id1": 4,
                        "id2": 5,
                        "relationship": "Precede"
                    }
                ],
                "attributes": {
                    "tags": [],
                    "links": [],
                    "Doc Ok?": "Yes"
                },
                "pageAttributes": [
                    {
                        "Page OK?": null
                    },
                    {
                        "Page OK?": "Yes"
                    }
                ],
                "boxes": [
                    {
                        "page": 1,
                        "box": [
                            0.6285714285714286,
                            0.1505226480836237,
                            0.8216748768472907,
                            0.178397212543554
                        ],
                        "label": "Bounding_box"
                    },
                    {
                        "page": 2,
                        "box": [
                            0.10246305418719212,
                            0.3797909407665505,
                            0.49064039408866994,
                            0.4961672473867596
                        ],
                        "label": "Bounding_box",
                        "rotate": 22
                    }
                ],
                "plainText": {
                    "1": "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova Google AI Language {jacobdevlin,mingweichang,kentonl,kristout}@google.com Abstract We introduce a new language representa- tion model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language repre- sentation models (Peters et al., 2018a; Rad- ford et al., 2018), BERT is designed to pre- train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a re- sult, the pre-trained BERT model can be ﬁne- tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task- speciﬁc architecture modiﬁcations. BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art re- sults on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answer- ing Test F1 to 93.2 (1.5 point absolute im- provement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). 1 Introduction Language model pre-training has been shown to be effective for improving many natural language processing tasks (Dai and Le, 2015; Peters et al., 2018a; Radford et al., 2018; Howard and Ruder, 2018). These include sentence-level tasks such as natural language inference (Bowman et al., 2015; Williams et al., 2018) and paraphrasing (Dolan and Brockett, 2005), which aim to predict the re- lationships between sentences by analyzing them holistically, as well as token-level tasks such as named entity recognition and question answering, wheremodels are required to produce ﬁne-grained output at the token level (Tjong Kim Sang and DeMeulder, 2003; Rajpurkar et al., 2016). There are two existing strategies for apply- ing pre-trained language representations to down- stream tasks: feature-based and ﬁne-tuning. The feature-based approach, such as ELMo (Peters et al., 2018a), uses task-speciﬁc architectures that include the pre-trained representations as addi- tional features. The ﬁne-tuning approach, such as the Generative Pre-trained Transformer (OpenAI GPT) (Radford et al., 2018), introduces minimal task-speciﬁc parameters, and is trained on the downstream tasks by simply ﬁne-tuning all pre- trained parameters. The two approaches share the same objective function during pre-training,where they use unidirectional language models to learn general language representations. We argue that current techniques restrict the power of the pre-trained representations, espe- cially for the ﬁne-tuning approaches. The ma- jor limitation is that standard language models are unidirectional, and this limits the choice of archi- tectures that can be used during pre-training. For example, inOpenAIGPT, the authors use a left-to- right architecture, where every token can only at- tend to previous tokens in the self-attention layers of the Transformer (Vaswani et al., 2017). Such re- strictions are sub-optimal for sentence-level tasks, and could be very harmful when applying ﬁne- tuning based approaches to token-level tasks such as question answering, where it is crucial to incor- porate context from both directions. In this paper, we improve the ﬁne-tuning based approaches by proposing BERT: Bidirectional Encoder Representations from Transformers. BERT alleviates the previously mentioned unidi- rectionality constraint by using a “masked lan- guage model” (MLM) pre-training objective, in- spired by the Cloze task (Taylor, 1953). The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked a r X i v : 1 8 1 0 . 0 4 8 0 5 v 2     [ c s . C L ]     2 4   M a y   2 0 1 9",
                    "2": "word based only on its context. Unlike left-to- right language model pre-training, the MLM ob- jective enables the representation to fuse the left and the right context, which allows us to pre- train a deep bidirectional Transformer. In addi- tion to the masked language model, we also use a “next sentence prediction” task that jointly pre- trains text-pair representations. The contributions of our paper are as follows: • We demonstrate the importance of bidirectional pre-training for language representations. Un- like Radford et al. (2018), which uses unidirec- tional language models for pre-training, BERT uses masked language models to enable pre- trained deep bidirectional representations. This is also in contrast to Peters et al. (2018a), which uses a shallow concatenation of independently trained left-to-right and right-to-left LMs. • We show that pre-trained representations reduce the need for many heavily-engineered task- speciﬁc architectures. BERT is the ﬁrst ﬁne- tuning based representationmodel that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks, outper- forming many task-speciﬁc architectures. • BERT advances the state of the art for eleven NLP tasks. The code and pre-trained mod- els are available at https://github.com/ google-research/bert. 2 RelatedWork There is a long history of pre-training general lan- guage representations, and we brieﬂy review the most widely-used approaches in this section. 2.1 Unsupervised Feature-based Approaches Learning widely applicable representations of words has been an active area of research for decades, including non-neural (Brown et al., 1992; Ando and Zhang, 2005; Blitzer et al., 2006) and neural (Mikolov et al., 2013; Pennington et al., 2014) methods. Pre-trained word embeddings are an integral part of modern NLP systems, of- fering signiﬁcant improvements over embeddings learned from scratch (Turian et al., 2010). To pre- train word embedding vectors, left-to-right lan- guage modeling objectives have been used (Mnih and Hinton, 2009), as well as objectives to dis- criminate correct from incorrect words in left and right context (Mikolov et al., 2013). These approaches have been generalized to coarser granularities, such as sentence embed- dings (Kiros et al., 2015; Logeswaran and Lee, 2018) or paragraph embeddings (Le andMikolov, 2014). To train sentence representations, prior work has used objectives to rank candidate next sentences (Jernite et al., 2017; Logeswaran and Lee, 2018), left-to-right generation of next sen- tence words given a representation of the previous sentence (Kiros et al., 2015), or denoising auto- encoder derived objectives (Hill et al., 2016). ELMo and its predecessor (Peters et al., 2017, 2018a) generalize traditional word embedding re- search along a different dimension. They extract context-sensitive features from a left-to-right and a right-to-left language model. The contextual rep- resentation of each token is the concatenation of the left-to-right and right-to-left representations. When integrating contextual word embeddings with existing task-speciﬁc architectures, ELMo advances the state of the art for severalmajor NLP benchmarks (Peters et al., 2018a) including ques- tion answering (Rajpurkar et al., 2016), sentiment analysis (Socher et al., 2013), and named entity recognition (Tjong Kim Sang and De Meulder, 2003). Melamud et al. (2016) proposed learning contextual representations through a task to pre- dict a single word from both left and right context using LSTMs. Similar to ELMo, their model is feature-based and not deeply bidirectional. Fedus et al. (2018) shows that the cloze task can be used to improve the robustness of text generation mod- els. 2.2 Unsupervised Fine-tuning Approaches As with the feature-based approaches, the ﬁrst works in this direction only pre-trained word em- bedding parameters from unlabeled text (Col- lobert andWeston, 2008). More recently, sentence or document encoders which produce contextual token representations have been pre-trained from unlabeled text and ﬁne-tuned for a supervised downstream task (Dai and Le, 2015; Howard and Ruder, 2018; Radford et al., 2018). The advantage of these approaches is that few parameters need to be learned from scratch. At least partly due to this advantage, OpenAI GPT (Radford et al., 2018) achieved pre- viously state-of-the-art results on many sentence- level tasks from the GLUE benchmark (Wang et al., 2018a). Left-to-right language model-"
                },
                "dimensions": [
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    },
                    {
                        "width": 595.276,
                        "height": 841.89
                    }
                ],
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "61685a5eb492d0845eb5e6b4"
                },
                "jobStart": 1689583831,
                "sessionTime": 18,
                "elapsedTime": 31,
                "updateTime": 1689585066,
                "lastUpdate": 1689585068525
            }
        }
    ]
}

table:NER-Project-Manifest:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset item
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	list \| The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.
pdf_fingerprint	str	The fingerprint of the document
metadata	dict	Dictionary containing metadata of the task and the project
File	str	The name of the file
Task_id	str	The Id of the task
Type of Project	str	The metadata added in advanced setting of project
tags	list	List of dictionaries containing the tags added in the task
page	int	The page number for selected text
range	list	Selected text box start offset and end offset using plaintext
text	str	The selected text for annotation
id	int	The id of selected text for annotation
type	str	The type of the label
box	list	The annotation bounding box
pageoffsets	list	List of page offsets
link	list	List containing the relationships added in the task
id1	number	The Id of the first annotation field
id2	number	The Id of the second annotation field
relationship	str	The name of the relationship
attributes	dict	The document attributes associated with the task
pageattributes	list	List of dictionaries containing the attributes for each page
boxes	list	List of dictionaries containing details of bounding box
page	int	The page number in which bounding box is created
box	list	The annotation bounding box
labels	str	The type of label
rotate	int	The angle of bounding box rotation (degrees)
plaintext	dict	Dictionary containing page numbers and the corresponding plain text extracted from the file
dimensions	list	List of dictionaries containing dimensions of pages in the task
width	float	The width of the page
height	float	The height of the page
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
lastUpdate	int	The last update time

3. Classification Project¶

{
    "project_id": "591351b938d008ca0745510a",
    "project_name": "Classification-Project-1",
    "project_type": "Classification",
    "datasetId": "60974d4e9e7759842cdff3be",
    "itemId": "2d2020aa8e3deb383fb7c74f",
    "file_name": "business_2.txt",
    "file_type": "text/plain",
    "source": "https://sandboxdocuments.tensoract.com/presigned/3c9a446180a44faa43ab2464d45633c7.txt?sig=bc0af1e30e3bec94f7780fe1638f77b7d7080b5508545d891bc2a229d3e2c20e8179c3b4690e2ac7a19af4a978c363af312b570f2e89efa03ae41171e066c65d:670927f13e9c689722e1e85fe5649347:64b78055:aef5974f70f486095f3cd5f4cc922486",
    "state": 4,
    "task_id": "8c939483b594e0de5d5efb54",
    "state_description": "Approved",
    "annotations": [
        {
                    "email": "johndoe@me.com",
                    "messages": [],
                    "role": "Reviewer",
                    "elapsedTime": 8,
                    "date": "2023-07-18T06:03:46.657Z",
                    "content": {
                        "metadata": {
                            "File": "business_2.txt",
                            "TaskId": "8c939483b594e0de5d5efb54",
                            "Type of Project": "Classification"
                        },
                        "classificationTypes": {
                            "Select Type of Document": "select",
                            "Type of Documents": "multi",
                            "Put a note": "text"
                        },
                        "classifications": {
                            "Select Type of Document": [
                                "Technology"
                            ],
                            "Type of Documents": [
                                "Graphics",
                                "Bussiness"
                            ],
                            "Put a note": [
                                "Multi-type document"
                            ]
                        },
                        "plainText": {
                            "1": "Japanese growth grinds to a halt  Growth in Japan evaporated in the three months to September, sparking renewed concern about an economy not long out of a decade-long trough.  Output in the period grew just 0.1%, an annual rate of 0.3%. Exports - the usual engine of recovery - faltered, while domestic demand stayed subdued and corporate investment also fell short. The growth falls well short of expectations, but does mark a sixth straight quarter of expansion.  The economy had stagnated throughout the 1990s, experiencing only brief spurts of expansion amid long periods in the doldrums. One result was deflation - prices falling rather than rising - which made Japanese shoppers cautious and kept them from spending.  The effect was to leave the economy more dependent than ever on exports for its recent recovery. But high oil prices have knocked 0.2% off the growth rate, while the falling dollar means products shipped to the US are becoming relatively more expensive.  The performance for the third quarter marks a sharp downturn from earlier in the year. The first quarter showed annual growth of 6.3%, with the second showing 1.1%, and economists had been predicting as much as 2% this time around. \"Exports slowed while capital spending became weaker,\" said Hiromichi Shirakawa, chief economist at UBS Securities in Tokyo. \"Personal consumption looks good, but it was mainly due to temporary factors such as the Olympics. \"The amber light is flashing.\" The government may now find it more difficult to raise taxes, a policy it will have to implement when the economy picks up to help deal with Japan's massive public debt. "
                        },
                        "review": {
                            "rate": "Ok",
                            "note": "",
                            "reviewerId": "614b55be8af65dcf41da535b"
                        },
                        "jobStart": 1689660217,
                        "sessionTime": 8,
                        "elapsedTime": 8,
                        "updateTime": 1689660225,
                        "pageOffsets": [
                            0
                        ],
                        "lastUpdate": 1689660226654
                    }
                }
            ]
        }

table:Classification Project Manifest Summary:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetI	str	The Id of the dataset
itemId	str	The Id of the dataset item
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing all the details of the the task as metadata,tags,pageoffsets etc.
metadata	dict	Dictionary containing metadata of the task and the project
File	str	The name of the file
Task_id	str	The Id of the task
Type of Project	str	The metadata added in advanced setting of project
classificationTypes	dict	Dictionary containing the labels defined in the project
Select Type of Document	str	Single Select Label
Type of Documents	str	Multi Select Label
Put a note	str	Plain Text Label
classifications	dict	Dictionary containing the classifications labels in the task
plaintext	dict	Dictionary containing page numbers and the corresponding plain text extracted from the file
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
reviewerId	str	The Id of the reviewer
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
lastUpdate	int	The last update time

4. Bulk Image Classification Project¶

{
    "project_id": "e3c9b4a1dd6df1c4c7091895",
    "project_name": "Bulk Image Classification DB",
    "project_type": "Bulk Image Classification",
    "datasetId": "25987f46e5febb50484e8497",
    "itemId": "1431d151a693edeb3baade14",
    "file_name": "green.tiff",
    "file_type": "image/tiff",
    "source": "https://sandboxdocuments.tensoract.com/presigned/6cc95fbb44dccfacbacc923fbd24091e.tiff?sig=7748be40ae97b0c4559e0c9de0016e925d5e49a89bc1415ae03370f986b067ac42fc06ba2b89848f1245f5aacbf37c44a9d69ea7ff4d952e6d1ebb2cc7bff2e8:188babf42f5690cdb9fee99f58ee209f:64b8db5e:73a9d14256d5247bb72703427484eab4",
    "state": 4,
    "task_id": "b5ac251d8209079a300868f1",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "q1@qc.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 4.333333333333333,
            "date": "2023-07-18T09:08:27.935Z",
            "content": {
                "redOffset": 1,
                "greenOffset": 1,
                "brightness": 1,
                "selected": false,
                "classification": "Green",
                "review": {
                    "rate": "Ok"
                },
                "elapsedTime": 4.333333333333333,
                "updateTime": 1689671308,
                "lastUpdate": 1689671307935,
                "metadata": {
                    "color": "green"
                }
            }
        }
    ],
    "dataset_id": "25987f46e5febb50484e8497",
    "item_id": "1431d151a693edeb3baade14",
    "item_metadata": {
        "color": "green"
    },
    "project_metada": {
        "Type of Project": "Bulk"
    },
    "classification": "Green"
}

table:Bulk Image Classification Project Manifest Summary:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset item
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
redOffset	int	An integer value representing the offset or adjustment applied to the red color channel.
greenOffset	int	An integer value representing the offset or adjustment applied to the red color channel.
brightness	int	An integer value representing the overall brightness adjustment applied to the image.
classification	str	Clasified Label
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
lastUpdate	int	The last update time
metadata	dict	The metadta of the task
dataset_id	str	The Id of the dataset
item_id	str	The Id of the dataset item
item_metadata	dict	The metadata of the dataset item
project_metada	dict	The metadata of project
classification	str	Clasified Label

5. Object Detection Project¶

{
        "project_id": "0461637f62c18082f3c14cc3",
        "project_name": "Object-Detection-Project-2",
        "project_type": "Pose Estimation",
        "datasetId": "abd204685e5c074b282d6744",
        "itemId": "ab22598223c9ad06a7cb7fbc",
        "file_name": "im05.jpg",
        "file_type": "image/jpeg",
        "source": "https://sandboxdocuments.tensoract.com/presigned/eec676782f87ae20ff1ce9d282043b55.jpg?sig=4bb851b3ea1bee96136f64ec7a0a23aa514068356434dfddb3b9dbbdefb3568d8b6d246a5de7112e3afe8c261ab8647bff8db558f52b5502df86a0984f4d666b:d9838174235badf2412366a794eafbe4:64b7dd9e:54f78a7b1d2c36e89b2f8a43898db7fd",
        "state": 4,
        "task_id": "528cb44ce376c753590c3b07",
        "state_description": "Approved",
        "annotations": [
            {
                "email": "jdoeqa@acme.org",
                "messages": [],
                "role": "Reviewer",
                "elapsedTime": 29,
                "date": "2023-06-17T10:08:34.324Z",
                "content": {
                    "url": "https://sandboxdocuments.tensoract.com/presigned/eec676782f87ae20ff1ce9d282043b55.jpeg?sig=a4ecbc5cdcc4745f78f23897c8179a4f0fcd2398a1bf7b0aaf8cd40df2c6a44781ad2222a108af1865c87c7f0027f6d930646c83714f22f94f1f1d4b479e59b6:17fd9429b9bd1ee881c403895971765e:648ed781:84eaba2d2ce4610c8ec1989f2a3ef0fa",
                    "imageWidth": 720,
                    "imageHeight": 1280,
                    "selected": null,
                    "boxes": [
                        {
                            "x1": 97.0271,
                            "y1": 242.4398,
                            "x2": 715.4532,
                            "y2": 1280,
                            "id": "b0",
                            "type": "box",
                            "oid": "b24",
                            "outside_image": {},
                            "occluded": {},
                            "invisible": false,
                            "attrs": {},
                            "title": "",
                            "label": "Group 1",
                            "sub_labels": [
                                {
                                    "x1": 329.1937,
                                    "y1": 289.695,
                                    "id": "b1",
                                    "type": "keypoint",
                                    "oid": "b27",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Top of head",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 360.0123,
                                    "y1": 357.496,
                                    "id": "b2",
                                    "type": "keypoint",
                                    "oid": "b28",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Nose",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 345.6303,
                                    "y1": 421.1878,
                                    "id": "b3",
                                    "type": "keypoint",
                                    "oid": "b29",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Chin",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 300.4297,
                                    "y1": 421.1878,
                                    "id": "b4",
                                    "type": "keypoint",
                                    "oid": "b30",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Neck",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 454.5226,
                                    "y1": 454.061,
                                    "id": "b5",
                                    "type": "keypoint",
                                    "oid": "b31",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Left Shoulder",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 191.5374,
                                    "y1": 521.862,
                                    "id": "b6",
                                    "type": "keypoint",
                                    "oid": "b32",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Shoulder",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 575.7423,
                                    "y1": 509.5345,
                                    "id": "b7",
                                    "type": "keypoint",
                                    "oid": "b33",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Left Elbow",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 197.7011,
                                    "y1": 597.8812,
                                    "id": "b8",
                                    "type": "keypoint",
                                    "oid": "b34",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Elbow",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 296.3206,
                                    "y1": 667.7368,
                                    "id": "b9",
                                    "type": "keypoint",
                                    "oid": "b36",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Wrist",
                                    "sub_labels": []
                                },
                                {
                                    "x1": 339.4666,
                                    "y1": 673.9005,
                                    "id": "b10",
                                    "type": "keypoint",
                                    "oid": "b37",
                                    "outside_image": {},
                                    "occluded": {},
                                    "invisible": false,
                                    "title": "",
                                    "label": "Right Hand",
                                    "sub_labels": []
                                }
                            ]
                        }
                    ],
                    "image_attrs": {
                        "Is Image clear?": "Yes"
                    },
                    "review": {
                        "rate": "Ok",
                        "note": ""
                    },
                    "jobStart": 1686996484,
                    "sessionTime": 29,
                    "elapsedTime": 29,
                    "tsSeconds": true,
                    "updateTime": 1686996513,
                    "lastUpdate": 1686996514320,
                    "metadata": {}
                }
            }
        ]
    }

table:Object Detection Classification Project Manifest Summary:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset item
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
url	str	The presigned URL or S3 path of the task
imageWidth	int	The width of the image in pixels
imageHeight	int	The height of the image in pixels
boxes	list	List of bounding boxes drawn around objects in the image
x1	float	The x-coordinate of the top-left corner of the bounding box
y1	float	The y-coordinate of the top-left corner of the bounding box
x2	float	The x-coordinate of the bottom-right corner of the bounding box
y2	float	The x-coordinate of the bottom-right corner of the bounding box
id	str	The Id of bounding box
type	str	Flag to indicate wheter it is box/keypoint
outside_image	dict	Indicates whether the object extends beyond the boundaries of the image
occuluded		Indicates whether the object is occluded or partially hidden
attrs	dict	Represents any additional attributes or properties associated with the object
label	str	The label or category assigned to the object
sub_labels	list	Represents any sub-labels or sub-categories associated with the object
x1	float	The x-coordinate of the top-left corner of the bounding box
y1	float	The y-coordinate of the top-left corner of the bounding box
id	str	The Id of keypoint
type	str	Flag to indicate wheter it is box/keypoint
outside_image	dict	Indicates whether the object extends beyond the boundaries of the image
occuluded	dict	Indicates whether the object is occluded or partially hidden
label	str	The label or category assigned to keypoint
sub_labels	list	Represents any sub-labels associated with the object
image_attrs	dict	The image attributes associated with the task
review	dict	The review details
rate	str	The rate of the review
note	str	The note associated with the reviewer
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
update time	int	The update time of the annotation
metadata	dict	The metdata of task and project

6. Media Transcription Project¶

Video Files

{
    "project_id": "c859cc0a92b7bd4d6d166707",
    "project_name": "Video-Project-3",
    "project_type": "Media Transcription",
    "datasetId": "1ac8fb72573008ce5626bbfb",
    "itemId": "0b9ff34daa6a2f6f95c59bb3",
    "file_name": "Video 1.mp4",
    "file_type": "video/mp4",
    "source": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=41de6d036aafbad36d075616c0d0b8d56a460c26f769a5128a57f411d3a47c0562f33c3ae0ec77e1adb8f6c51c46a6b11863d9b0bcfb9a8ec48b9c02a6f1d220:5f547425b01d3a41ecc069ae0dc15acc:64b7ddb9:f0dbc85b6ebaf920717d096daa954cc2",
    "state": 4,
    "task_id": "c53ca1cace7d7e784705b631",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "jdoeqa@acme.org",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 62,
            "date": "2023-06-17T12:16:38.937Z",
            "content": {
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "63ca81cd31d698c1825328f3"
                },
                "videoSource": "https://sandboxdocuments.tensoract.com/presigned/da00a38a91f62483658e2126e789f63e.mp4?sig=9f13caf4ac583fbc9c074b91770e579120196b3f08e192ad641290c246a7add58f6d6f40ce1aa63e031ada96ff8c996038e4ed46516ffce1f56262cc6a435eeb:062d59d190dc03e090dcd5ee5ff17faa:648ef563:ac1a510bd2fb6c0ab328a3202eb9c846",
                "streams": {
                    "Transcription": [
                        {
                            "start": 0.025000260441083017,
                            "end": 0.9500098967611547,
                            "confidence": 1,
                            "text": "Bonjoi"
                        },
                        {
                            "start": 0.9500098967611547,
                            "end": 2.0812716817201613,
                            "confidence": 1,
                            "text": "Tava Tuti"
                        },
                        {
                            "start": 2.1187720723817858,
                            "end": 3.156282880686731,
                            "confidence": 1,
                            "text": "Hello"
                        },
                        {
                            "start": 3.156282880686731,
                            "end": 4.13629310905087,
                            "confidence": 1,
                            "text": "Ola"
                        },
                        {
                            "start": 4.165043351337061,
                            "end": 4.608797974166285,
                            "confidence": 1,
                            "text": "Tutu beng"
                        },
                        {
                            "start": 10.453858750849383,
                            "end": 12.07887567951978,
                            "confidence": 1,
                            "text": "oye tutubeng"
                        }
                    ],
                    "Language Segmentation": [
                        {
                            "start": 0.018750195330812264,
                            "end": 0.5687559250346386,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 0.5937561854757216,
                            "end": 2.0937718119407025,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 2.0937718119407025,
                            "end": 3.1650329694568993,
                            "confidence": 1,
                            "tag": "English"
                        },
                        {
                            "start": 3.2437837922305217,
                            "end": 4.156293298330052,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 4.250044274984113,
                            "end": 6.318815826483732,
                            "confidence": 1,
                            "tag": "Russian"
                        },
                        {
                            "start": 6.8338213441595235,
                            "end": 8.190085473088276,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 8.321336840403962,
                            "end": 9.865102922640839,
                            "confidence": 1,
                            "tag": "English"
                        },
                        {
                            "start": 9.915103443523005,
                            "end": 11.071365488923096,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 11.233867181790135,
                            "end": 12.6088815060497,
                            "confidence": 1,
                            "tag": "Arabic"
                        }
                    ]
                },
                "mediaAttributes": {
                    "Is Video Clear?": "Yes",
                    "Aditional Notes": ""
                },
                "jobStart": 1687003443,
                "sessionTime": 62,
                "elapsedTime": 93.075,
                "tsSeconds": true,
                "updateTime": 1687004194,
                "metadata": {
                    "File": "Video 1.mp4",
                    "TaskId": "c53ca1cace7d7e784705b631",
                    "Type": "Media Transcription"
                },
                "lastUpdate": 1687004198934
            }
        }
    ]
}

table:Media Transcription Project Manifest Summary:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset tem
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
review	dict	The review details
rate	str	The rate of the review
reviewerId	int	The Id of reviewer
videoSource	str	The presigned URL or S3 path of the task
streams	dict	Dictionary of different streams within the video, each containing specific information
Transcription	list	The stream containing transcribed text from the video
start	float	The starting timestamp(in seconds) of the transcribed text segment
end	float	The ending timestamp(in seconds) of the transcribed text segment
confidence	int	Indicates the confidence level
text	str	The actual transcribed text for the corresponding segment
Segmentation	list	The stream containing information about the segemtations in the video
start	float	The starting timestamp of the segment
end	float	The ending timestamp of the segment
confidence	int	Indicates the confidence level
tag	str	The tag in the corresponding segment
mediaAttributes	dict	The media attributes associated with the task
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
tsseconds	bool
update time	int	The update time of the annotation
lastUpdate	int	The last update time

Audio Files

{
    "project_id": "c859cc0a92b7bd4d6d166707",
    "project_name": "Video-Project-3",
    "project_type": "Media Transcription",
    "datasetId": "edd9c7d7ae5c1b0c2bc73643",
    "itemId": "1c944e058058dbe48c980ead",
    "file_name": "mira640.mp3",
    "file_type": "audio/mpeg",
    "source": "s3://test-pocs/mira640.mp3",
    "state": 4,
    "task_id": "eba777874d6d268ece56b33a",
    "state_description": "Approved",
    "annotations": [
        {
            "email": "johndoe@me.com",
            "messages": [],
            "role": "Reviewer",
            "elapsedTime": 6,
            "date": "2023-07-19T04:01:20.036Z",
            "content": {
                "review": {
                    "rate": "Ok",
                    "note": "",
                    "reviewerId": "614b55be8af65dcf41da535b"
                },
                "audioSource": "https://test-pocs.s3.amazonaws.com/mira640.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUD4REC47DTY4PF7A%2F20230719%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20230719T040111Z&X-Amz-Expires=7200&X-Amz-Signature=10048222e39f53cd9dbbb2b97b8aa210e323d8fc544a9ea18ff46e5922974f44&X-Amz-SignedHeaders=host",
                "streams": {
                    "Transcription": [
                        {
                            "start": 0.018749917353218702,
                            "end": 0.6312472175583629,
                            "confidence": 1,
                            "text": "Bonjoi"
                        },
                        {
                            "start": 0.7187468318733835,
                            "end": 1.881241707772943,
                            "confidence": 1,
                            "text": "Tava Tuti"
                        },
                        {
                            "start": 2.0124911292454737,
                            "end": 2.4874890355270143,
                            "confidence": 1,
                            "text": "Hello"
                        }
                    ],
                    "Language Segmentation": [
                        {
                            "start": 0.006249972451072901,
                            "end": 0.4437480440261759,
                            "confidence": 1,
                            "tag": "French"
                        },
                        {
                            "start": 0.5437476032433424,
                            "end": 1.8687417628707974,
                            "confidence": 1,
                            "tag": "German"
                        },
                        {
                            "start": 1.9187415424793806,
                            "end": 2.6687382366081285,
                            "confidence": 1,
                            "tag": "English"
                        }
                    ]
                },
                "mediaAttributes": {
                    "Is Video Clear?": "Yes",
                    "Aditional Notes": ""
                },
                "jobStart": 1689739272,
                "sessionTime": 6,
                "elapsedTime": 6,
                "tsSeconds": true,
                "updateTime": 1689739278,
                "lastUpdate": 1689739280033
            }
        }
    ]
}

table:Media Transcription Project Manifest Summary:

Field Names	Type	Description
project_id	str	The Id of the project
project_name	str	The name of the project
project_type	str	The type of the project
datasetId	str	The Id of the dataset
itemId	str	The Id of the dataset tem
file_name	str	The name of the file
file_type	str	The type of the file
source	str	Internal source file reference on local storage disk
state	int	The state of the task
task_id	str	The Id of the task
state_description	str	The state description of the task
annotations	list	List of dictionaries representing the annotations
email	str	The email associated with the user
messages	str	The messages associated with the user
role	str	The role associated with user
elapsed_time	str	The elapsed time of the annotation
date	str	The date of the annotation
content	dict	Dictionary containing the PDF fingerprint and metadata
review	dict	The review details
rate	str	The rate of the review
reviewerId	int	The Id of reviewer
audioSource	str	The presigned URL or S3 path of the task
streams	dict	Dictionary of different streams within the video, each containing specific information
Transcription	list	The stream containing transcribed text from the video
start	float	The starting timestamp(in seconds) of the transcribed text segment
end	float	The ending timestamp(in seconds) of the transcribed text segment
confidence	int	Indicates the confidence level
text	str	The actual transcribed text for the corresponding segment
Segmentation	list	The stream containing information about the segemtations in the video
start	float	The starting timestamp of the segment
end	float	The ending timestamp of the segment
confidence	int	Indicates the confidence level
tag	str	The tag in the corresponding segment
mediaAttributes	dict	The media attributes associated with the task
jobstart	int	The start time of the annotation
sessiontime	int	The session time of the annotation
elapsedTime	int	The elapsed time of the annotation
tsseconds	bool
update time	int	The update time of the annotation
lastUpdate	int	The last update time

Model Integration: Request Payloads and Responses¶

1. NER Labeling¶

A NER (Named Entity Recognition) labeling model is a model designed to automatically identify and classify named entities (such as names of people, organizations, locations, etc.) in text.

Request payload

{
    "text": {

                "1": "This is a text by Michael Smith",
                "2": "A paper from Oxford University"
            }
}

table:NER Labeling Request Payload:

Field Names	Type	Description
text	dict	Dictionary containing text. Each key represents a page number, and the corresponding value is a text.
1	str	Page numbers accompanied by corresponding text.
2	str	Page numbers accompanied by corresponding text.

Response

{
    "entities": {
        "1": [
            {
                "type": "Person Entity",
                "text": "Michael Smith",
                "range": [
                    18,
                    31
                ]
            }
        ],
        "2": [
            {
                "type": "Organization Entity",
                "text": "Oxford University",
                "range": [
                    13,
                    30
                ]
            }
        ]
    }
}

table:NER Labeling Response:

Field Names	Type	Description
entities	Dict	A dictionary containing entity annotations
1	Str	Page numbers accompanied by corresponding text.
type	Str	The type of the label
text	Str	The selected text for labeling
range	list	Selected text box start offset and end offset using plaintext
2	Str	Page numbers accompanied by corresponding text.
type	Str	The type of the label
text	Str	The selected text for labeling
range	list	Selected text box start offset and end offset using plaintext

2.OCR(Tesseract) Model¶

An OCR (Optical Character Recognition) model is employed for extracting text from images, such as scanned documents. It processes visual content to recognize characters and convert them into editable and searchable text.

Request payload

{
"source": "https://sandbox.tensoract.com/testfiles/test_text.png"
}

table:OCR Model Request Payload:

Field Names	Type	Description
source	str	The URL pointing to the source image (e.g.scanned document) for OCR extraction

table:OCR Model Response:

{
        "pages": [
            {
                "page": 1,
                "dimentions": {
                    "width": 2484,
                    "height": 3509
                },
                "words": [
                    {
                        "box": [
                            0.12198067632850242,
                            0.09204901681390709,
                            0.20128824476650564,
                            0.11456255343402678
                        ],
                        "text": "This"
                    },
                    {
                        "box": [
                            0.2177938808373591,
                            0.09204901681390709,
                            0.24476650563607086,
                            0.11456255343402678
                        ],
                        "text": "is"
                    },
                    {
                        "box": [
                            0.26006441223832527,
                            0.09803362781419207,
                            0.2805958132045089,
                            0.11456255343402678
                        ],
                        "text": "a"
                    },
                    {
                        "box": [
                            0.29589371980676327,
                            0.09290396124251923,
                            0.3647342995169082,
                            0.11456255343402678
                        ],
                        "text": "test"
                    },
                    {
                        "box": [
                            0.3788244766505636,
                            0.09204901681390709,
                            0.539049919484702,
                            0.11456255343402678
                        ],
                        "text": "scanned"
                    },
                    {
                        "box": [
                            0.5559581320450886,
                            0.09204901681390709,
                            0.7455716586151369,
                            0.11456255343402678
                        ],
                        "text": "document"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.14990025648332858,
                            0.17149758454106281,
                            0.15958962667426618
                        ],
                        "text": "Lorem"
                    },
                    {
                        "box": [
                            0.17914653784219,
                            0.14990025648332858,
                            0.22584541062801933,
                            0.16186947848389854
                        ],
                        "text": "ipsum"
                    },
                    {
                        "box": [
                            0.23309178743961353,
                            0.14990025648332858,
                            0.27375201288244766,
                            0.15958962667426618
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.27938808373590984,
                            0.14990025648332858,
                            0.2966988727858293,
                            0.15958962667426618
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.3027375201288245,
                            0.15018523795953262,
                            0.3466183574879227,
                            0.1610145340552864
                        ],
                        "text": "amet,"
                    },
                    {
                        "box": [
                            0.3538647342995169,
                            0.15018523795953262,
                            0.44887278582930756,
                            0.15958962667426618
                        ],
                        "text": "consectetur"
                    },
                    {
                        "box": [
                            0.45450885668276975,
                            0.14990025648332858,
                            0.534219001610306,
                            0.16215445996010258
                        ],
                        "text": "adipiscing"
                    },
                    {
                        "box": [
                            0.5414653784219001,
                            0.14990025648332858,
                            0.5680354267310789,
                            0.1610145340552864
                        ],
                        "text": "elit,"
                    },
                    {
                        "box": [
                            0.5752818035426731,
                            0.14990025648332858,
                            0.6030595813204509,
                            0.15958962667426618
                        ],
                        "text": "sed"
                    },
                    {
                        "box": [
                            0.6103059581320451,
                            0.14990025648332858,
                            0.6292270531400966,
                            0.15958962667426618
                        ],
                        "text": "do"
                    },
                    {
                        "box": [
                            0.6356682769726248,
                            0.14990025648332858,
                            0.7033011272141707,
                            0.15958962667426618
                        ],
                        "text": "eiusmod"
                    },
                    {
                        "box": [
                            0.7101449275362319,
                            0.15018523795953262,
                            0.7677133655394525,
                            0.16186947848389854
                        ],
                        "text": "tempor"
                    },
                    {
                        "box": [
                            0.7737520128824477,
                            0.14990025648332858,
                            0.8498389694041868,
                            0.15958962667426618
                        ],
                        "text": "incididunt"
                    },
                    {
                        "box": [
                            0.856682769726248,
                            0.15018523795953262,
                            0.8703703703703703,
                            0.15958962667426618
                        ],
                        "text": "ut"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.1669991450555714,
                            0.1710950080515298,
                            0.17668851524650897
                        ],
                        "text": "labore"
                    },
                    {
                        "box": [
                            0.177938808373591,
                            0.16728412653177543,
                            0.19202898550724637,
                            0.17668851524650897
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.19806763285024154,
                            0.1669991450555714,
                            0.24798711755233493,
                            0.17668851524650897
                        ],
                        "text": "dolore"
                    },
                    {
                        "box": [
                            0.25523349436392917,
                            0.1695639783414078,
                            0.30917874396135264,
                            0.1792533485323454
                        ],
                        "text": "magna"
                    },
                    {
                        "box": [
                            0.31602254428341386,
                            0.1669991450555714,
                            0.36835748792270534,
                            0.17896836705614136
                        ],
                        "text": "aliqua."
                    },
                    {
                        "box": [
                            0.37640901771336555,
                            0.1669991450555714,
                            0.4396135265700483,
                            0.17668851524650897
                        ],
                        "text": "Porttitor"
                    },
                    {
                        "box": [
                            0.44565217391304346,
                            0.1669991450555714,
                            0.5092592592592593,
                            0.17668851524650897
                        ],
                        "text": "rhoncus"
                    },
                    {
                        "box": [
                            0.5161030595813204,
                            0.1669991450555714,
                            0.5563607085346216,
                            0.17668851524650897
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.5623993558776168,
                            0.1695639783414078,
                            0.606682769726248,
                            0.17896836705614136
                        ],
                        "text": "purus"
                    },
                    {
                        "box": [
                            0.6139291465378421,
                            0.1695639783414078,
                            0.6421095008051529,
                            0.17668851524650897
                        ],
                        "text": "non"
                    },
                    {
                        "box": [
                            0.6493558776167472,
                            0.1669991450555714,
                            0.6920289855072463,
                            0.17668851524650897
                        ],
                        "text": "enim."
                    },
                    {
                        "box": [
                            0.7000805152979066,
                            0.1669991450555714,
                            0.7801932367149759,
                            0.17668851524650897
                        ],
                        "text": "Habitasse"
                    },
                    {
                        "box": [
                            0.7870370370370371,
                            0.1669991450555714,
                            0.8349436392914654,
                            0.17896836705614136
                        ],
                        "text": "platea"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.18438301510401825,
                            0.1888083735909823,
                            0.19407238529495582
                        ],
                        "text": "dictumst"
                    },
                    {
                        "box": [
                            0.19524959742351047,
                            0.18438301510401825,
                            0.2584541062801932,
                            0.1963522371045882
                        ],
                        "text": "quisque"
                    },
                    {
                        "box": [
                            0.2648953301127214,
                            0.18438301510401825,
                            0.32085346215780997,
                            0.19663721858079225
                        ],
                        "text": "sagittis"
                    },
                    {
                        "box": [
                            0.3276972624798712,
                            0.18694784838985465,
                            0.3719806763285024,
                            0.1963522371045882
                        ],
                        "text": "purus"
                    },
                    {
                        "box": [
                            0.3784219001610306,
                            0.18438301510401825,
                            0.3961352657004831,
                            0.19407238529495582
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.40217391304347827,
                            0.1846679965802223,
                            0.4420289855072464,
                            0.19407238529495582
                        ],
                        "text": "amet"
                    },
                    {
                        "box": [
                            0.44806763285024154,
                            0.18438301510401825,
                            0.5116747181964574,
                            0.1963522371045882
                        ],
                        "text": "volutpat"
                    },
                    {
                        "box": [
                            0.5181159420289855,
                            0.1846679965802223,
                            0.605877616747182,
                            0.1963522371045882
                        ],
                        "text": "consequat."
                    },
                    {
                        "box": [
                            0.6139291465378421,
                            0.18438301510401825,
                            0.6501610305958132,
                            0.19663721858079225
                        ],
                        "text": "Eget"
                    },
                    {
                        "box": [
                            0.6561996779388084,
                            0.1846679965802223,
                            0.6799516908212561,
                            0.19407238529495582
                        ],
                        "text": "est"
                    },
                    {
                        "box": [
                            0.6863929146537843,
                            0.18438301510401825,
                            0.7302737520128825,
                            0.19407238529495582
                        ],
                        "text": "lorem"
                    },
                    {
                        "box": [
                            0.7379227053140096,
                            0.18438301510401825,
                            0.784621578099839,
                            0.1963522371045882
                        ],
                        "text": "ipsum"
                    },
                    {
                        "box": [
                            0.7914653784219001,
                            0.18438301510401825,
                            0.8321256038647343,
                            0.19407238529495582
                        ],
                        "text": "dolor"
                    },
                    {
                        "box": [
                            0.8377616747181964,
                            0.18438301510401825,
                            0.855072463768116,
                            0.19407238529495582
                        ],
                        "text": "sit"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.20205186662866914,
                            0.16143317230273752,
                            0.21145625534340268
                        ],
                        "text": "amet"
                    },
                    {
                        "box": [
                            0.16747181964573268,
                            0.20205186662866914,
                            0.26247987117552335,
                            0.21145625534340268
                        ],
                        "text": "consectetur"
                    },
                    {
                        "box": [
                            0.26811594202898553,
                            0.2017668851524651,
                            0.3526570048309179,
                            0.2140210886292391
                        ],
                        "text": "adipiscing."
                    },
                    {
                        "box": [
                            0.3603059581320451,
                            0.2017668851524651,
                            0.4355877616747182,
                            0.21145625534340268
                        ],
                        "text": "Senectus"
                    },
                    {
                        "box": [
                            0.4420289855072464,
                            0.20205186662866914,
                            0.45652173913043476,
                            0.21145625534340268
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.46296296296296297,
                            0.20205186662866914,
                            0.5060386473429952,
                            0.21145625534340268
                        ],
                        "text": "netus"
                    },
                    {
                        "box": [
                            0.5128824476650563,
                            0.20205186662866914,
                            0.5273752012882448,
                            0.21145625534340268
                        ],
                        "text": "et"
                    },
                    {
                        "box": [
                            0.533816425120773,
                            0.2017668851524651,
                            0.6219806763285024,
                            0.21145625534340268
                        ],
                        "text": "malesuada"
                    },
                    {
                        "box": [
                            0.6280193236714976,
                            0.2017668851524651,
                            0.677536231884058,
                            0.21145625534340268
                        ],
                        "text": "fames"
                    },
                    {
                        "box": [
                            0.6839774557165862,
                            0.2043317184383015,
                            0.7024959742351047,
                            0.21145625534340268
                        ],
                        "text": "ac"
                    },
                    {
                        "box": [
                            0.7081320450885669,
                            0.2017668851524651,
                            0.7564412238325282,
                            0.21373610715303507
                        ],
                        "text": "turpis."
                    },
                    {
                        "box": [
                            0.7640901771336553,
                            0.2017668851524651,
                            0.8268921095008052,
                            0.21145625534340268
                        ],
                        "text": "Gravida"
                    },
                    {
                        "box": [
                            0.8337359098228664,
                            0.2043317184383015,
                            0.8663446054750402,
                            0.21145625534340268
                        ],
                        "text": "cum"
                    },
                    {
                        "box": [
                            0.1215780998389694,
                            0.2188657737247079,
                            0.16586151368760063,
                            0.2285551439156455
                        ],
                        "text": "sociis"
                    },
                    {
                        "box": [
                            0.17310789049919484,
                            0.21915075520091193,
                            0.23792270531400966,
                            0.23083499572527785
                        ],
                        "text": "natoque"
                    },
                    {
                        "box": [
                            0.24476650563607086,
                            0.2188657737247079,
                            0.32286634460547503,
                            0.23083499572527785
                        ],
                        "text": "penatibus"
                    },
                    {
                        "box": [
                            0.32930756843800324,
                            0.21915075520091193,
                            0.34782608695652173,
                            0.2285551439156455
                        ],
                        "text": "et."
                    },
                    {
                        "box": [
                            0.35426731078904994,
                            0.2188657737247079,
                            0.41626409017713367,
                            0.2285551439156455
                        ],
                        "text": "Aenean"
                    },
                    {
                        "box": [
                            0.4243156199677939,
                            0.2188657737247079,
                            0.49074074074074076,
                            0.23083499572527785
                        ],
                        "text": "pharetra"
                    },
                    {
                        "box": [
                            0.49798711755233493,
                            0.22143060701054432,
                            0.5523349436392915,
                            0.2311199772014819
                        ],
                        "text": "magna"
                    },
                    {
                        "box": [
                            0.5591787439613527,
                            0.22143060701054432,
                            0.5772946859903382,
                            0.2285551439156455
                        ],
                        "text": "ac"
                    },
                    {
                        "box": [
                            0.5841384863123994,
                            0.2188657737247079,
                            0.6481481481481481,
                            0.23083499572527785
                        ],
                        "text": "placerat"
                    },
                    {
                        "box": [
                            0.6541867954911433,
                            0.2188657737247079,
                            0.7451690821256038,
                            0.2285551439156455
                        ],
                        "text": "vestibulum."
                    },
                    {
                        "box": [
                            0.7536231884057971,
                            0.2188657737247079,
                            0.8132045088566827,
                            0.2311199772014819
                        ],
                        "text": "Feugiat"
                    },
                    {
                        "box": [
                            0.8192431561996779,
                            0.2188657737247079,
                            0.8470209339774557,
                            0.2285551439156455
                        ],
                        "text": "sed"
                    },
                    {
                        "box": [
                            0.12198067632850242,
                            0.23624964377315474,
                            0.1678743961352657,
                            0.24593901396409235
                        ],
                        "text": "lectus"
                    },
                    {
                        "box": [
                            0.17431561996779388,
                            0.23624964377315474,
                            0.2608695652173913,
                            0.24593901396409235
                        ],
                        "text": "vestibulum"
                    },
                    {
                        "box": [
                            0.26851851851851855,
                            0.23624964377315474,
                            0.31561996779388085,
                            0.24593901396409235
                        ],
                        "text": "mattis"
                    },
                    {
                        "box": [
                            0.322463768115942,
                            0.23624964377315474,
                            0.42028985507246375,
                            0.2482188657737247
                        ],
                        "text": "ullamcorper."
                    }
                ]
            }
        ]
    }

table:OCR Model Response Payload:

Field Names	Type	Description
pages	list	A list containing page objects, each representing a page in the scanned document.
page	int	The page number within the document.
dimentions	dict	A dictionary containing the dimensions (width and height) of the page in pixels.
width	int	The width of the page in pixels.
height	int	The height of the page in pixels.
words	list	A list containing word objects, each representing a word found in the page.
box	list[float]	List of bounding box coordinates for OCRed words
text	str	The text content of the word.