DATAKU
  • DATAKU
    • Introduction
    • Extract Data from Content
    • View Processed Content
  • API Documentation
Powered by GitBook
On this page
  • Introduction
  • Prerequisites
  • Texts Transformation Endpoint
  • Description
  • Endpoint URL
  • Method
  • Data Parameters
  • Success Response
  • Error Responses
  • Request Details
  • Documents Transformation Endpoint
  • Description
  • Endpoint URL
  • Method
  • Supported Document Types
  • Form Parameters
  • Success and Error Responses
  • Request Details
  • Best Practices and Tips
  • Troubleshooting and Support

API Documentation

Introduction

Welcome to our API, designed to seamlessly transform texts and documents into structured data. This powerful tool is ideal for developers and businesses looking to automate data extraction and transformation from various text sources.


Prerequisites

  • Obtain your API key from our sales representative.

  • Basic understanding of HTTP methods and JSON is beneficial.


Texts Transformation Endpoint

Description

This endpoint is designed to transform texts into structured data. Batch job supported.

Endpoint URL

/api/transform/text/

Method

POST

Data Parameters

  • Schema: Defines the structure of the output data. Each schema object includes:

    • name: The column name for the value you would like to extract.

    • description: A brief description of the column content (optional).

  • Texts: An array of text strings to be transformed.

Success Response

  • Code: 200 OK

  • Content: Returns a structured data array, each with specified columns based on your schema.

Error Responses

  • 400 Bad Request: Occurs when there's a problem with the request format.

  • 500 Internal Server Error: Indicates a server-side issue.

Request Details

Headers

  • Content-Type: application/json

  • X-API-Key: {YOUR-API-KEY}

Body Structure

{
    "schema": [
        {"name": "column1", "description": "[optional] description for column1"},
        {"name": "column2", "description": "[optional] description for column2"},
        ...
    ],
    "texts": ["text1", "text2", ...]
}

curl Example

curl -X POST \
     -H "Content-Type: application/json" \
     -H "X-API-Key: YOUR_API_KEY" \
     -d '{"schema": [{"name": "date", "description": "Date mentioned in the text"}, {"name": "person_name", "description": "Name of individuals mentioned"}], "texts": ["June 12th event with John Doe", "Meeting scheduled on July 4th with Jane Smith"]}' \
     https://dataku.ai/api/transform/text

Example Success Response

[
    {
        "text": "June 12th event with John Doe",
        "date": "June 12th",
        "person_name": "John Doe"
    },
    {
        "text": "Meeting scheduled on July 4th with Jane Smith",
        "date": "July 4th",
        "person_name": "Jane Smith"
    }
]

Documents Transformation Endpoint

Description

This endpoint is designed to transform document of various formats into structured data.

Endpoint URL

/api/transform/doc/

Method

POST

Supported Document Types

txt, docx, pdf.

Form Parameters

  • Schema: Defines the structure of the output data. Each schema object includes:

    • name: The column name for the value you would like to extract.

    • description: A brief description of the column content (optional).

  • Files: Includes the actual documents to be uploaded and transformed.

Success and Error Responses

Similar to the Transform Texts Endpoint.

Request Details

Headers

  • Content-Type: multipart/form-data

  • X-API-Key: {YOUR-API-KEY}

Parameters

key
value

schema

files

file1.txt, file2.doc, file3.pdf, ...

curl Example

bashCopy codecurl -X POST \
     -H "X-API-Key: YOUR_API_KEY" \
     -F "schema=[{\"name\": \"year\", \"description\": \"The year this report was written\"}, {\"name\": \"author\", \"description\": \"Name of the document's author\"}]" \
     -F "file1=@/path/to/annual_report.txt" \
     https://dataku.ai/api/transform/file

Example Success Response

[
    {
        "file": "annual_report.txt",
        "year": "2023",
        "author": "Alex Johnson"
    }
]

Best Practices and Tips

  • Ensure your texts and documents are clearly formatted for optimal results.

  • Use descriptive names in your schema for easier data handling.

  • Add descriptions to schema when you want to specify the format of the extracted value,

Troubleshooting and Support

  • Verify request formats and API key validity for issues.

  • For additional help, contact support.

PreviousView Processed Content

Last updated 1 year ago

[
  {"name": "column1", "description": "[optional] description for column1"},
  {"name": "column2","description": "[optional] description for column1"},
  ...
]