Data Store Import and Export

Data Store Import and Export

 

Introduction

Catalyst enables you to perform bulk read and bulk write operations on the tables in the Catalyst Data Store using the CLI.

  • Data Store Import: The import operation can fetch thousands of records from a CSV file uploaded in the File Store and bulk write them to a specific table in the Data Store. After the import operation has executed successfully, a CSV report file containing the details of the job execution will be generated at a download URL.
  • Data Store Export: The export operation can bulk read thousands of records from a specific table in the Data Store, and generate a CSV file containing the results of the read operation if the job is successful.

The specific requirements for both import and export operations can be defined in an optional input JSON configuration file, which you can pass along with your CLI commands. If you do not pass a configuration file, you must specify the mandatory information in the options of the command executions.

You can configure a callback URL in both the operations, to enable automatic posting of the job status responses in it. You can also manually check the job execution status whenever you need.

Note: Catalyst also provides APIs for Data Store import and export operations. You can create the jobs, check their statuses, and download the results using individual APIs for both operations. Refer to the Bulk Read API and Bulk Write API documentations for more information.
 

Data Store Import

The general syntax of the Data Store import command is:

catalyst ds:import [~CSV_file_path] [options]
 

Before we discuss the command execution, let us look at the operations and configurations available for this command in detail.

Operations

There are three operations that can be performed using the Data Store import command:

  • Insert: Inserts new rows of data into the table from the CSV file
  • Update: Updates existing rows of data in the table
  • Upsert: Enables both insert and update operations at the same time. If the record already exists, the row is updated. If the record does not exist, a new row with the data is inserted.

You can specify the operation to be executed in the input JSON configuration file.

Input CSV File

As mentioned earlier, the import operation bulk writes records from a CSV file to a table. You can provide the CSV file in two ways:

  1. You can upload the CSV file containing the records in the File Store before you begin executing this operation. In this case, you must specify the unique ID generated for the file in the File Store, in the JSON configuration file that you pass during the command execution.
  2. If you do not pass any JSON configuration file with the command, you must specify the path of the CSV file in your system in your command execution, as shown in the general syntax above. Catalyst will then upload the CSV file you provide to your project's File Store and automatically obtain it's File ID for execution. You can select the folder to save the file in the File Store.

Catalyst will consider the CSV file specified with the command as the higher priority, over the File ID specified in the JSON file. If the CSV file is not specified in either places, the CLI will display an error message during command execution.

Note:
  1. The CSV file's first row must be the header row. In other words, the column names must be present as the header row.
  2. The CSV file must contain data in the format that adheres to CSV file standards. If the file is not in the standard format, the data written to the table may be corrupted.
  3. You will be able to insert 5000 records in each table per project in the development environment. If the CSV file contains records more than that, the first 5000 records will be written to the table. The others will be skipped in the development environment. There are no upper limits for data insertion in the production environment.
 

JSON Configuration File Structure

A sample JSON configuration file specifying the requirements of the import operation, that you can optionally pass during the command execution, is shown below:

The parameters supported by the JSON configuration file are defined below:

AttributesData TypeMandatoryDescription
table_identifierStringYesThe unique ID of the table or the table name where data must be imported to.
You can also pass the table identifier using the --table option.
operationStringNoSpecifies the operation to be executed.

Allowed values:insert, update, upsert

Default operation:insert
file_idNumericalYesThe unique ID of the CSV file in the File Store.
You can also specify the CSV file path with the command.
callbackShow propertiesJSONNoThe section where you can define the properties of the callback URL, where automatic JSON responses of the job statuses will be sent to

callback Properties

AttributeData TypeMandatoryDescription
urlStringYes, if you want to specify the callback URLThe URL where the automatic responses will be sent to using the HTTP POST method, each time the job status changes. Information about the job details will be contained in the body of the JSON response.

The CLI will also display a live stream of the execution, and the final job status.
headersStringNoHeaders that you require to be passed in the JSON response to the callback URL

You must specify the headers as:
{
"{header_name_1}" : "{header_value_1}",
"{header_name_2}" : "{header_value_2}"
}


and so on.
paramsStringNoParameters that you require to be require to be appended to the callback URL

You must specify the parameters as:
{
"{param_name_1}" : "{param_value_1}",
"{param_name_2}" : "{param_value_2}"
}


and so on.
find_byStringMandatory for update and upsert operations

Not mandatory for insert operation
The unique column using which the records are identified.

For example, if you specify a unique column called 'EmployeeID' as the find_by value, Catalyst will search the records in the Data Store table using the EmployeeID value. For the update operation, Catalyst will update the rows that match the EmployeeID value from the CSV file. For the upsert operation, Catalyst will update existing rows, and insert new rows if there are no records with matching EmployeeID values.

Update: This can be a column with unique values as configured in the Data Store, or the ROWID generated by Catalyst for the records.

Upsert: You can only specify a column with unique values as configured in the Data Store. You will not be able to specify the ROWID as the find_by value. This is because, upsert will insert new rows if matching ones are not found, and the ROWID values will not be available for rows not existing in the Data Store.

Insert: If you specify a find_by column for an insert operation, any records matching the column values from the CSV file will be skipped and not inserted into the Data Store table. This is because, insert cannot update if a given record already exists.
fk_mappingJSONNoForeign key mapping for the columns that are primary keys in other tables. This is specified if any columns imported from the CSV file should be configured as foreign keys.

You must map foreign keys in the following format:
{
"local_column" : "local_column_name_1",
"reference_column" : "reference_column_name_1"
}


where local_column is the name of the column in the table where the bulk write operation is processed, and reference_column is the name of the column in the table in which it is a primary key.
Note: If you don't specify the mandatory CSV file and table identifier parameters in the JSON file, or if you don't pass a JSON file during the command execution, you must specify the CSV file path and the table identifier as import options.
 

Import Job Processing States

There are three job processing states for the import operation:

  • In-Progress: The job enters this state as soon as you execute the import command. The CLI will display a job_id parameter that you can use to check the status of the job execution for subsequent responses manually, if you have not configured a callback URL in the JSON.

    If you have configured a callback URL, a state response with the job_id will also be posted to the URL in the format you defined.
  • Success: If the import operation is successful, Catalyst will provide a download URL where you can download a CSV report file that contains the details of the job execution. Catalyst also enables you to download the report file directly to your system through the CLI. If you have configured a callback URL, this response will be posted to it as well.
  • Failed: If the import operation fails, the CLI will display the details of the errors that occurred. If you have configured a callback URL, this response will be posted to it as well.
 

Import Options

The import command execution process is described in detail in this section, for each option. The Data Store import command supports the following options:

  1. --config <path>

    The --config option enables you to define the path of the JSON configuration file in your system.

    For example, you can execute the import operation and pass the JSON file path as shown below:

    The CLI will then automatically schedule the job execution and mark it as "in-progress". It will display a live stream of the execution process, along with the job_id.

    If the job executes successfully, the CLI will prompt you to download the report file to your system. Type 'y', then press Enter to download.

    The ZIP file containing the report file will be downloaded to your project directory.

    You can use the download URL provided by Catalyst to download the report even when you are not working in your CLI. You must execute it as an API command.
     
    Note: The download URL will be valid for 1 day since its generation.
    The CSV report file will list all the records that were skipped from being written, along with the reasons for it.
  2. --table <name|id>

    The --table option enables you to specify the table name or Table ID of the table, to which the records must be written.

    As mentioned earlier, if you don't specify the table identifier in the JSON file, or if you don't pass a JSON file during the command execution, you must specify it using this option.

    Catalyst will consider the table specified in the option as the higher priority, over the one specified in the JSON file. If the table is not specified in either places, the CLI will display an error message during command execution.

    For example, you can specify the table identifier along with the CSV file path in the following way:

    If you specify the CSV file path, you must select the folder in your project's File Store to which the file must be uploaded. The CLI will then execute the import operation, bulk write the records to the specified table, and provide the report file in the same way.
    Note: Your project's File Store must contain atleast one folder to save the file in.
  3. --production

    The --production option enables you to directly execute the import operation in the production environment of your Catalyst project. If you use this option, the records from the input CSV file will be bulk written to the specified table directly in production.
     
    Note: You can only use this option if the production environment is already enabled for your project.
 

Data Store Export

The general syntax of the Data Store export command is:

catalyst ds:export [options]
 

JSON Configuration File Structure

The Data Store Export command will obtain records from the table you specify, which match the criteria you define in the input JSON configuration file. If you don't pass the JSON file with the command, or if you don't specify any criteria, all records from the table will be read by default.

A sample JSON configuration file specifying the requirements of the export operation, that you can optionally pass during the command execution, is shown below:

The parameters supported by the JSON configuration file are defined below:

AttributesData TypeMandatoryDescription
table_identifierStringYesThe unique ID of the table or the table name where data must be exported from.
You can also pass the table identifier using the --table option .
queryJSONNoThe section where you can define the conditions and criteria for the export job
pageNumericalNoThe CSV file generated as the result of the bulk read process contains two hundred thousand/two lakh records in one page.

Page value '1' indicates that the first two hundred thousand records matching your query will get exported. You can fetch subsequent records by increasing the page value.

For example, if you want to fetch records from the range 200,001 to 400,000, then you must specify the page value as '2'.

Default value: 1
You can also specify the page value using the --page option.
select_columnsArrayNoSpecific columns in the records that need to be fetched from the table. If you do not specify the columns, all columns will be selected and fetched.

You can include multiple columns in an array.
criteriaShow propertiesJSONNoA set of conditions based on which the records will be fetched

criteria Properties

AttributeData TypeMandatoryDescription
group_operatorStringYes, if you want to specify the criteriaThe operator that will bind the criteria groups together.

Supported Operators:AND, OR

If you define two or more criteria groups, you can use the AND or the OR operator. For example, you can specify a criteria like "column1 equal value 1 AND column2 contains value 2" in the format specified in the previous section.

Note: You will not be able to use combinations of both the operators. You can either use AND or OR in a single command execution.
groupShow propertiesJSONYes, if you want to specify the criteriaYou can include upto 25 criteria sets in this section. The sample JSON file shows 2 criteria sets.

Note: You can include only one overall group with one group operator in a single execution.

group Properties

AttributeData TypeMandatoryDescription
column_nameStringYes, if you want to specify the criteriaName of the specific column from the table that the criteria should be defined for
comparatorStringYes, if you want to specify the criteriaThe comparison operator that matches the column name with the criteria value

Supported Comparators:equal, not_equal, greater_than, greater_equal, less_than, less_equal, starts_with, ends_with, contains, not_contains, in, not_in, between, not_between

Note: 1. You can specify multiple values for in and not_in comparators, separated by commas.

2. Catalyst supports only the equal comparator for an encrypted column. You will not be able to use other comparators as the criteria for an encrypted column.
valueStringYes, if you want to specify the criteriaThe value for the specific column in the record that you require to be defined as a criteria
callbackJSONNoThe section where you can define the properties of the callback URL, where automatic JSON responses of the job statuses will be sent to
urlStringYes, if you want to specify the callback URLThe URL where the automatic responses will be sent to using the HTTP POST method, each time the job status changes. Information about the job details will be contained in the body of the JSON response.

The CLI will also display a live stream of the execution, and the final job status.
headersStringNoHeaders that you require to be passed in the JSON response to the callback URL

You must specify the headers as:
{
"{header_name_1}" : "{header_value_1}",
"{header_name_2}" : "{header_value_2}"
}


and so on.
paramsStringNoParameters that you require to be require to be appended to the callback URL

You must specify the parameters as:
{
"{param_name_1}" : "{param_value_1}",
"{param_name_2}" : "{param_value_2}"
}


and so on.
Note: If you don't specify the mandatory table identifier parameter in the JSON file, or if you don't pass a JSON file during the command execution, you must specify it through the export option.
 

Export Job Processing States

There are three job processing states for the export operation:

  • In-Progress: The job enters this state as soon as you execute the import command. The CLI will display a job_id parameter which you can use to check the status of the job execution for subsequent responses manually, if you have not configured a callback URL in the JSON.

    If you have configured a callback URL, a state response with the job_id will also be posted to the URL in the format defined by you.
  • Success: If the export operation is successful, Catalyst will provide a download URL where you can download the CSV file that contains the records matching your query. Catalyst also enables you to download the result file directly to your system through the CLI. If you have configured a callback URL, this response will be posted to it as well.
  • Failed: If the export operation fails, the CLI will display the details of the errors that occurred. If you have configured a callback URL, this response will be posted to it as well.
 

Export Options

The export command execution process is described in detail in this section, for each option. The Data Store export command supports the following options:

  1. --config <path>

    The --config option enables you to define the path of the JSON configuration file in your system.

    For example, you can execute the export operation and pass the JSON file path as shown below:

    The CLI will then automatically schedule the job execution and mark it as "in-progress". It will display a live stream of the execution process, along with the job_id.

    If the job executes successfully, the CLI will prompt you to download the result file to your system. Type 'y', then press Enter to download.

    The ZIP file containing the result file will be downloaded to your project directory.

    You can use the download URL provided by Catalyst to download the result file even when you are not working in your CLI. You must execute it as an API command.
     
    Note: The download URL will be valid for 1 day after its generation.
    The CSV result file will contain a list of all the records matching your export job query.
  2. --table <name|id>

    The --table option enables you to specify the table name or Table ID of the table, where the records must be read from.

    As mentioned earlier, if you don't specify the table identifier in the JSON file, or if you don't pass a JSON file during the command execution, you must specify it using this option.

    Catalyst will consider the table specified in the option as the higher priority, over the one specified in the JSON file. If the table is not specified in either places, the CLI will display an error message during command execution.

    For example, you can specify the table identifier in the following way:
    catalyst ds:export --table EmpDetails

    The CLI will then execute the export operation, bulk read the records from the specified table, and provide the results in the same way.
  3. --page <page>

    As discussed in the JSON parameters section, you can specify a page value to indicate the range of records to fetch from the table.

    For example, you can fetch records from the range 200,001 to 400,000 by executing the following command:
    catalyst ds:export --page 2
  4. --production

    The --production option enables you to directly execute the export operation in the production environment of your Catalyst project. If you use this option, If you use this option, the records from the table in the production environment will be bulk read.
     
    Note: You can only use this option if the production environment is already enabled for your project.
 

Check the Status of an Import or Export Operation

You can manually check the status of a Data Store import or export operation that you initiated previously any time you need.

If you had not configured a callback URL in your JSON file, or if you had terminated the CLI import or export command execution, you can check the job status using this command. The CLI will inform you if the import or export job is in progress, has successfully executed, or failed.

You must use the job_id that was assigned to the import or export job when you initiated the operation, in this command.

The general syntax of the import or export job status check command is:

ds:status [options] <operation> [job_id]
 

The value for <operation> must be either import or export, and you must specify it mandatorily. The job_id is optional. If you don't specify the job_id, all jobs of the specified operation executed in the last 24 hours will be listed.

For example, if you specify the export operation without providing a job_id, the CLI will display a list of all export jobs executed as shown below.

You can then select a specific job id from the list and press Enter, to view its detailed status.

The CLI will follow the same process of providing a download URL and enabling you to download the response file if the job executed successfully, as discussed in the previous sections.

If you provide the job_id with the command, the CLI will directly fetch and display the status of that particular job.

The --production Option

The import/export status check command supports one option: --production

This option enables you to check the status of an operation that was executed in the production environment of your Catalyst project. If you used the --production option in an import or export command execution, you must use the option for checking its status as well.

Note: You can only use this option if the production environment is already enabled for your project.

Share this post : FacebookTwitter

Still can't find what you're looking for?

Write to us: support@zohocatalyst.com