Extract Data from Content
Overview
The extraction page is designed for extracting structured information from various input types: text, documents (like DOC, DOCX, TXT, PDF), and tables (CSV format). It facilitates easy upload and parsing of data, allowing users to define custom extraction criteria.
Features
Multiple Input Types: Supports text input, file upload (documents), and CSV table upload.
Custom Extraction Schema: Users can define a custom schema for data extraction, specifying field names and descriptions.
File Validation: Ensures file type and size validation for uploads.
AI Schema Detection: Offers an AI-based option to automatically define the extraction schema.
Responsive UI: Provides a user-friendly interface adaptable to various screen sizes.
How to Use
Providing Content
Select Input Type: Choose between text, file, or table.
Input Data:
For text: Paste the text in the provided textarea.
For files: Drag and drop or browse to upload document files.
For tables: Drag and drop or browse to upload a single-column CSV file.
Defining Extraction Criteria
Add Schema Fields: Click 'Add' to create more fields in your extraction schema.
Enter Field Details: Provide a name and an optional description for each field.
AI-Define: Use the AI-Define feature to automatically suggest a schema based on your input.
Extraction Process
Click 'Extract': Once the data and schema are set, click 'Extract' to initiate the extraction process.
View Results: The extracted data is displayed in a table format under the extraction sections.
Dowloand and Copy: You may copy or download the extracted data in CSV format by clicking the buttons at the top right corner.
Tips and Best Practices
Ensure file formats and sizes are within the specified limits for successful uploads.
Utilize the AI-Define feature for efficient schema generation, especially for large or complex datasets.
Regularly review and update the schema for precision in data extraction.
Troubleshooting
File Upload Errors: Check if the file size exceeds 20MB or if the file format is unsupported.
Extraction Errors: Ensure that the schema fields correctly correspond to the data format and content. The error may also occur when the input content is too large or the schema is too complicated.
Support
For further assistance or to report issues, contact support@dataku.ai
Last updated