[Draft] Preliminary Whitelisting Policy

Preliminary Policy Draft: Project and URL Whitelisting for Data Integrity and Security

Objective: This draft outlines a proactive approach to whitelisting crypto projects and their associated URLs to protect our data pipeline, users, and brand from risks such as scam or phishing links. The goal is to maintain data integrity and security while enabling a streamlined Quality Assurance (QA) process for scaling. This draft is a conversation starter / starting point for team discussions to develop a formal policy.

1. Overview

To safeguard the integrity of our data, we will create a curated whitelist for the top 300 crypto projects. This list will include approved URLs for white papers, technical documents, and key resources. The URLs must meet specific criteria to ensure they are secure, trustworthy, and accessible without errors. This policy will also define a straightforward QA process to support the scaling of the dataset with outsourced talent.

2. Whitelisting Criteria

The following criteria will guide the selection of URLs for the whitelist:

  • Direct Access to Documents:

    • URLs must provide direct access to white papers and technical documents, either through PDF downloads or properly hosted on platforms like GitHub.

    • URLs should link directly to the correct HTML or Gitbook pages without triggering 403 (forbidden) or 404 (not found) errors.

  • Security and Reliability:

    • All URLs must be free from phishing scams or redirects to suspicious domains. Domains must be secured with HTTPS and verified to be legitimate.

  • Continuous Monitoring:

    • Approved URLs will be continuously monitored to ensure they remain compliant with the whitelisting criteria. Any issues will trigger a re-evaluation and potential removal from the whitelist.

3. Quality Assurance Process [TBD]

To scale the dataset effectively while maintaining quality, we will implement a straightforward QA process:

  • Documentation for Outsourced Talent:

    • Develop clear guidelines and checklists to assist outsourced QA teams in validating URLs against the whitelisting criteria.

    • Regularly update the guidelines to reflect any changes in the criteria or processes.

  • Automated Validation Tools:

    • Use automated tools to periodically check the status of whitelisted URLs, identifying any errors or security issues.

  • Flagging and Review System:

    • Establish a system for flagging URLs that may no longer meet the criteria, triggering a manual review by the QA team.

4. Implications for Data Engineering

  • Data Pipeline Adjustments:

    • Data Engineers will create a validation layers within the data pipeline to ensure only whitelisted projects and validated URLs are ingested. Automated checks against the whitelist should occur before data processing or storage.

    • Implement procedures for re-validating existing data periodically to maintain compliance with the whitelist, ensuring that any changes in project status are accurately reflected in the dataset.

  • Data Integrity and Historical Accuracy:

    • Develop methods for handling projects that fall off the whitelist, such as archiving historical data to preserve continuity while keeping the active database free of potentially compromised projects.

    • Ensure that all data transformations and validations adhere to the whitelisting policy, particularly for projects that enter or exit the top project lists.

5. Implications for DevOps

  • Infrastructure Security:

    • DevOps must implement and maintain a secure infrastructure that supports the whitelisting process, including automated tools for scanning and verifying URLs prior to their ingestion into the database.

    • Monitoring tools should be deployed to continuously assess the status and safety of whitelisted URLs, with alerts set up for any detected anomalies or potential risks.

  • Incident Response:

    • Develop an incident response plan that allows for the swift removal of compromised projects or URLs from the whitelist, with immediate actions to prevent potential spread or damage. This plan should include clear communication channels and predefined steps to minimize impact.

  • Access Control:

    • Ensure strict access controls are in place to limit the ability to modify or add to the whitelist to authorized personnel only, thereby reducing the risk of internal errors or malicious actions.

Last updated