1. Overview

File Transfer Protocol (FTP) is a file transfer protocol that consists of two parts:

  • FTP server : used to store files. You can use an FTP client to access resources on the FTP server through FTP.
  • FTP client : An FTP client can be used to access resources on the FTP server over FTP.

After configuring the FTP data source, you can use Data Fusion > Task Management function to import its data into the divine data table or entity, easy to use in the report, analysis model, intelligent operations and other modules.

Before configuring the data source, please refer to the following to confirm that your FTP data source meets the requirements:

Data source typeData source nameVersion/protocol requirementsUser permission requirementOther requirements
Object storageFTPFTP/SFTPContains at least read access to the folder pathData files can only be in txt or csv format

2. Add FTP data source

  1. Select Data Fusion > Universal Data Access > Data Source Management.
  2. Click All Data Source Tab page.
  3. Click FTP data source.
  4. Click the Create button in the upper right corner.
  5. Fill in the FTP connection information.
    1. Data Source Connection Name: This is a customizable field and serves as the unique identifier for the data source connection within the platform.
    2. Protocol Type: Supports FTP and SFTP.
    3. Server: The IP address of the data source connection; multiple entries are supported in a cluster environment.
    4. Port Number: The port number for the data source connection.
    5. Base Path: This path is the absolute path to the root directory, for example: /home/sa_cluster.
    6. File Type: Specifies the types of data files to be read; currently supports txt and csv. During data synchronization, only files of the specified types will be read.
    7. Username: The valid username for the data source connection.
    8. Password: The valid password corresponding to the username.
  6. Click the Test Connection button.
  7. Click the Submit button.

2.1. FTP Dataset Configuration Method

To ingest data via FTP data source, configure paths, folders, and files according to the following method.

2.1.1. Path Rule Definition

When importing a data set, configure the path according to the path rule, for example:/home/dataGroupFile/dataFile

  • /home:basic path
  • dataGroupFile:Data sets are grouped into folders
  • dataFile:Data set folder

2.1.2. What a single data set needs to contain

Serial numberContent name and necessityEffectRestraintSample file
01Data sets are grouped into folders
required
Analogy structured database DB, plays the role of grouping data setsThe name is not restricted and can be customized-
02Data set folder
required

A folder represents a data set;
The folder contains three types of files: metadata files, data files, and ready files

Folder naming rules: can only contain letters, digits, and underscores (_), and must start with a letter. Maximum 100 characters

-
03Metadata file
required
Describes the data structure of the current dataset. Only one metadata file can be stored in a dataset folder

File format: yml format
File naming rules: is the same as the data set folder name
The field naming rule is as follows: can only contain letters, digits, and underscores and must start with a letter. Maximum 100 characters
字段类型:可设置的字段类型请参见第四章

04Data file
not required.

Stores the current data set data file,

Multiple data files can be stored in one data set folder.
Note: The data file must strictly comply with the naming rules; otherwise, it cannot be read by the system.

File format:txt or csv format
Separator:Comma separator
Character encoding:UTF-8
File naming rules:{fileName}_{dataTime}

  • fileName:Same as the data set folder name
  • dataTime: Must be written after the last underscore of the file name, representing the current data business time.

    The format is {yyyymmddHH} where {HH} is in the 24-hour system, that is, 00-23。

After FTP data is added, only three update modes are supported: full overwrite, full add, and incremental add.
Incremental appending is read incrementally using {dataTime} as the incremental identification field by default.

05

Ready file
When data files are included:required

Indicates that a data file is ready to be generated. There is no specific requirement on the contents of the file
Note: One data file corresponds to one ready file, and data without a ready file cannot be read.

File format:verf format
File naming rule: is the same as the name of a data file

3. Manage FTP data source

  1. Select Data Fusion > Universal Data Access > Data source management.
  2. Click Added data source Tab page.
  3. Click FTP data source.
    1. Edit: Supports modifying all configuration parameters of the data connection.
    2. Delete: Delete the current connection.

If the current data connection is used by a task, modifying the parameters or deleting the connection will cause the task to fail.

4. Mapping rules for field types

Import data from FTP data source into the Sensors data table. Field type mapping errors may cause content conversion errors or task execution failures. Configure field mapping according to the following rules to ensure safe data conversion:

Original field typeData table field type
tinyintNUMBER / INT / BIGINT
smallintNUMBER / INT / BIGINT
mediumintNUMBER / INT / BIGINT
intNUMBER / INT / BIGINT
bigintNUMBER / BIGINT
floatNUMBER
doubleNUMBER
decimalNUMBER
charSTRING
enumSTRING
longtextSTRING
mediumtextSTRING
stringSTRING
textSTRING
tinytextSTRING
varcharSTRING
yearSTRING
dateTIMESTAMP
datetimeTIMESTAMP
timestampTIMESTAMP