Dataset Creation and Anonymization
This guide provides an overview of the process to create and anonymize datasets in the AI Studio. It includes steps for uploading data, selecting tables and columns, applying filters, anonymizing, and exporting datasets.
Steps to Create a Dataset with Integration
The dataset creation process consists of five key steps. Let’s go through each of them in detail.
Step 1: Configuration
- Upload a Dataset File: In the first step, you can either upload a new dataset or select an existing one from the database. You have two options:
- File Upload: Upload files directly by clicking or dragging them to the provided area.
- Database Integration: Select a pre-configured database (like MySQL) from which to pull the dataset.
You can upload new files or use datasets already available in your workspace.
Step 2: Select Tables and Columns
- Choose Tables and Columns: Once the dataset is uploaded or selected, the next step involves selecting the specific tables and columns you want to include in your dataset.
- You can choose a single table or select all tables and columns.
- Customize the dataset by specifying which columns are relevant.
Narrow down your dataset by selecting only the necessary columns for a more accurate and refined dataset.
Step 3: Filters
- Apply Filters: To further refine your dataset, apply filters to the selected columns.
- Add multiple filters based on specific conditions to extract the most relevant data.
Using filters helps to pinpoint the exact data you need, removing unnecessary information.
Step 4: Anonymization
- Data Anonymization: In this step, apply anonymization to sensitive data for privacy reasons.
- You can set transformers to anonymize data fields like personal information.
- Choose the appropriate anonymization strategy based on your data privacy needs.
Anonymization ensures that sensitive data remains protected while still being usable for analysis.
Step 5: Export
- Export the Dataset: In the final step, you can export the anonymized dataset in your preferred format:
- JSON
- CSV
- JSONL
You can also choose the export type, such as creating a table or exporting the dataset file directly. After the export format is chosen, you can generate the final dataset based on your configuration.
Export the anonymized dataset in the format you need for further use.
Generating Your Dataset
After completing the five steps, you can generate your customized and anonymized dataset in the format you’ve selected. Whether you choose JSON, CSV, or JSONL, the generated dataset will be ready for use in your project.