OpenDCAI/DataFlow is a tool designed for data preparation and training purposes. It aims to create, improve, assess, and filter premium data for AI applications from varied inputs like PDFs, simple text, and lower-grade Question-Answer datasets.
This system is intended to enhance the efficiency of large language models (LLMs) through focused training in areas such as healthcare, finance, law, and academic studies.
The system employs an operator-based structure to convert the entire data refinement procedure into a pipeline that is reproducible, reusable, and shareable. This functions as the fundamental framework for the Data-Centric AI community.
Furthermore, OpenDCAI/DataFlow offers an intelligent agent feature capable of dynamically building new pipelines by either combining existing operators or developing new ones as needed.
This tool supports the development of superior LLM training datasets from unprocessed data by utilizing visual, low-code pipelines with versatile arrangement across sectors and applications.
The tool also incorporates functionalities for text, math, and code data production, along with instruments like AgenticRAG and Text2SQL for data generation. Additional capabilities encompass extensive PDF to QA conversion and structured data retrieval.
Produces high-quality data
Improves noisy sources
Assesses data quality
Noisy data refinement unclear
Limited languages support
Lacks multi-platform support

Released 1 year ago
Free + from $0.20/unit

Released 2 years ago
From $9/month

Released 2 years ago
Contact for pricing

Released 2 years ago
Free + from $50/month

Released 4 years ago
From $14.99/month

Released 2 years ago
Free + from $6

Released 2 years ago
Free + from $26

Released 2 years ago
Free + from free tier available

Released 3 years ago
Contact for pricing