Common data error types

Attribute type error  error_type: PROPERTY_WITH_WRONG_TYPE

The reported attribute type must be consistent with the original attribute type in the system. If the attribute type is inconsistent, the system forcibly forwards the attribute. If the forcibly forwards fails, an error correction is reported. For example, if an attribute of the datatime type is passed with an empty string value, an error message is displayed.

数据的time不在Valid data interval,error_type: EXPIRED_RECORD

Cause: The data on the App is expired 10 days ago or 1 hour later by default. After an error is reported, the entire data will not be stored.

The solution:

(1)If the data expires due to delayed reporting and a small amount of data cannot be stored, the situation is normal. No related operations are required.

(2)If a large amount of data is delayed, you can confirm with the customer the usage scenario of the corresponding APP. If the customer's APP does have a lot of data, the report will be delayed for more than 10 days, so that the effective time window can be changed for the maintenance of the divine strategy. Note: The customer needs to be informed that after the window period is modified, there will be inconsistent query data at different times when the data is queried several days ago.

Reasons for setting the valid receiving window on the APP side:

(1)Ensure stable service indicator statistics: If you do not limit the time range for delayed data reporting, delayed data may cause different data to be queried at different times. The default setting to receive data within the last 10 days and within the next 1 hour is based on industry experience. This not only ensures that the data within a reasonable time range can be sent to the system, but also ensures that the statistical data is stable when querying historical data.
(2)Implementation: The HDFS file that stores the underlying data does not support data addition and can only be written once. New files need to be added before data is added. If the validity period of received data is not limited, new files are often added to the underlying storage to store the data that is reported late.

In summary, it is more reasonable to set the valid time range of the data.

An existing attribute whose name is different in case only from the attribute whose name you want to create. error_type: PROPERTY_NAME_CASE_INSENSITIVE_DUPLICATE

A message is displayed indicating that there is another attribute name that is the same except for case inconsistency. You need to change the name of the newly reported attribute. You can keep it the same as the existing one or use a new attribute.

Preset properties (those starting with $) do not exist. error_type:PRESET_PROPERTY_NAME_INVALID

User-defined attributes cannot contain $. User-defined attributes starting with $will be discarded (not displayed in the json error message), and an error message will be displayed in the buried point management. Other fields will be stored in the library normally. See the error_reason field in the json data to describe which attribute name cannot be created.

Batch import the same directory multiple times

You are advised to use different directories for each batch import to facilitate re-import if a problem occurs.

Real-time data cannot be imported properly

  1. Please check whether the target machine for sending data from SDK or LogAgent is correctly filled. Pay attention to the port number and sa, which are generally:http://sa_host:8106/sa;
  2. Please check if the data format is correct data format

The property cannot be seen on the frontend

Possible reasons:

  1. Type mismatch: using a different data type from before;
  2. Data is too long: for example, a string can be at most 255 bytes long;
  3. Invalid type: using a data type not defined in the data format, such as a property being a JSON object. In this case, extract the fields from the JSON object as separate properties;
  4. Non-intrinsic field with $ sign: only the keys of intrinsic fields defined in the data format can start with $.

You can check if there is any error data in the event management. If there is an error, there will be specific reasons provided.

How to clear an event?

To clear an event, hide the event that is no longer needed and import a new event with a different name. Once there is data stored for an event, the event name cannot be deleted.

What is the purpose of track_signup?

The purpose of track_signup is to associate anonymous IDs with login IDs, only when original_id is a required field and has meaning. distinct_id is a required field, and if this field is missing, the data will be considered invalid.

What is the role of a data import token?

Starting from Sensors Analytics 1.6, we have added data import tokens. There are currently two types of tokens:

  • Normal Token: This token can be used to import events that already exist (with only existing attributes) and user property data. If the event does not exist or if a certain attribute of the event or user has not been established in Sensors Analytics, the data will be discarded;
  • Super Token: This token can be used to import data as well as create events and attributes that do not exist in Sensors Analytics;

Others:

  • Custom data import tokens are not supported in the cloud version;
  • By default, the values of Normal Token and Super Token are empty;
  • Different projects can have different tokens;

Check the Token type of the data

The process of determining which Token to use for a piece of data is as follows:

  1. Initialization, mark the data as Invalid Data;
  2. Compare the data's Token with the Super Token. If the values are the same or the Super Token value is empty, mark the data as Super Data;
  3. Compare the data's Token with the Normal Token. If the values are the same or the Normal Token value is empty, mark the data as Normal Data;

In this 3-step process, a piece of data will try to obtain the maximum permission available. If it still ends up as "Invalid Data", the data will be discarded.

For example:

  • Super Data: Can create events, attributes, etc. based on the data;
  • Normal Data: Can be imported but cannot create events, attributes, etc. If the data contains non-existent events or attributes, the data will be discarded;
  • Invalid Data: Discard the data;
Super TokenNormal TokenData TokenReason for typeData type
(Empty)(Empty)(Empty)In the third step, "Super Token value is empty"Super Data
(empty)(empty)ABCThe "Super Token value is empty" in step 3Super Data
(empty)123(empty)The "Super Token value is empty" in step 3Super Data
(empty)123ABCThe "Super Token value is empty" in step 3Super Data
(empty)123123The "Super Token value is empty" in step 3Super Data
XYZ(empty)(empty)The Normal Token value is empty in the second stepNormal data
XYZ(empty)ABCThe Normal Token value is empty in the second stepNormal data
XYZ(empty)XYZThe Super Token value is the same in the third stepSuper data
XYZ123(empty)Data marked as invalid in the first stepInvalid data
XYZ123ABCThe data marked as invalid in the first stepInvalid data
XYZ123123"Normal Token value is empty" in the second stepNormal data
XYZ123XYZ"Same as Super Token value" in the third stepSuper data

Data Token

The process of obtaining data token is as follows:

  1. Retrieve the 'token' field in the data (at the same level as the 'type' field), for example:

    {"distinct_id":"a","time":1471586368135,"type":"track","event":"ViewProduct","properties":{},"token":"my_token"}
    JS
  2. If it is not retrieved in the previous step, retrieve the token value in the data access URL, for example:http://SA_HOST:8106/sa?token=my_token

  3. If it is still not retrieved, consider the value as empty.

Set Token

Use Project Management to set Normal Token and Super Token.

Use sdk import in buried point management found an error, how to retrieve the wrong data?

We have kept error logs in the log directory of the system for a period of time, and we need to log in to the machine to view them. The specific methods are as follows:

  1. Log on to the machine. Switch to the sa_cluster account, note that the cluster version needs to log in to each machine to get all the error data. Cloud version currently does not support logging in to the background machine, you need to contact us to operate.
  2. Get the log directory.
    1. If Sensors analytics big version < 1.14, execute grep 'sensors_analytics.log_dir' ~/sa/conf/sensors_analytics.property see the results, and then go to the directory in the results cd extractor, you'll find a series ofinvalid_records , the file at the beginning is reserved for error logs.
    2. If the Sensors analysis version is equal to 1.14, execute  grep 'sensors_data.log_dir' ~/conf/sensors_data.property see the results, and then go to the directory in the resultscd sa/extractor,会发现有一系列 invalid_records The file at the beginning is reserved for error logs.
    3. If the Sensors analysis version is greater than or equal to 1.15, execute  grep 'sensors_data.log_dir' ~/conf/sensors_data.property see the results, and then go to the directory in the resultscd sp/extractor,会发现有一系列 invalid_records The file at the beginning is reserved for error logs.
    4. If the Sensors analysis version is greater than or equal to 1.17, execute  grep 'sensors_data.log_dir' ~/conf/sensors_data.property see the results, If the sdf directory exists cd sdf/extractor,if not exist,cd sp/extractor, you'll find a series of invalid_records The file at the beginning is reserved for error logs.
    5. If the invalid_record file is not found in the directory found in step d, it is because the incorrect data is saved to kafka after installing SDF 2.2+ sdfadmin invalid_record Command query.
      1. Read (exit after reading terminal)       sdfadmin invalid_record read [–start][–end]
        1. --start optional, Start time (yyyy-MM-dd HH:mm:ss/yyyy-MM-dd). By default, the data is read from the earliest save time.Store for up to 7 days.
        2. --end optional,End time (yyyy-MM-dd HH:mm:ss/yyyy-MM-dd). The value is the latest by default.
        3. For example:Read the invalid_record of the entire cluster from 2021-01-01 to date (up to 7 days).

          sdfadmin invalid_record read --start 2021-01-01 | grep xxx > $SENSORS_DATAFLOW_LOG_DIR/kafka.log
          CODE

          xxx indicates the required filter criteria, filters the required criteria, and outputs the data to $SENSORS_DATAFLOW_LOG_DIR/kafka.log


      2. Continuous read (terminal waiting)   sdfadmin invalid_record tailf 
        1. For example:Continuously consume invalid_record for the entire cluster

          sdfadmin invalid_record tailf
          CODE

Note:

  1. Error logs are only retained for a certain period of time, which is by default 14 days. However, if there is a disk alert, some logs may be cleaned up.
  2. Error log files contain all errors from all projects. Please parse and select the logs you need.
  3. Note: When the data from the invalid_records log of an extractor is re-imported, the error_type and error_reason fields need to be removed during data cleaning. Otherwise, an error will occur during the data re-import.
  4. You can inquire the on-duty team for the version number of the Sensing Analytics platform.
  5. The sdfadmin invalid_record tool retains data for a maximum of 7 days.