# \[Draft] Preliminary Whitelisting Policy

**Objective:**\
This draft outlines a proactive approach to whitelisting crypto projects and their associated URLs to protect our data pipeline, users, and brand from risks such as scam or phishing links. The goal is to maintain data integrity and security while enabling a streamlined Quality Assurance (QA) process for scaling. <mark style="background-color:green;">This draft is a conversation starter / starting point</mark> for team discussions to develop a formal policy.

#### **1. Overview**

To safeguard the integrity of our data, we will create a curated whitelist for the top 300 crypto projects. This list will include approved URLs for white papers, technical documents, and key resources. The URLs must meet specific criteria to ensure they are secure, trustworthy, and accessible without errors. This policy will also define a straightforward QA process to support the scaling of the dataset with outsourced talent.

#### **2. Whitelisting Criteria**

The following criteria will guide the selection of URLs for the whitelist:

* **Direct Access to Documents:**
  * URLs must provide direct access to white papers and technical documents, either through PDF downloads or properly hosted on platforms like GitHub.
  * URLs should link directly to the correct HTML or Gitbook pages without triggering 403 (forbidden) or 404 (not found) errors.
* **Security and Reliability:**
  * All URLs must be free from phishing scams or redirects to suspicious domains. Domains must be secured with HTTPS and verified to be legitimate.
* **Continuous Monitoring:**
  * Approved URLs will be continuously monitored to ensure they remain compliant with the whitelisting criteria. Any issues will trigger a re-evaluation and potential removal from the whitelist.

#### **3.  Quality Assurance Process \[**<mark style="background-color:green;">**TBD**</mark>**]**

To scale the dataset effectively while maintaining quality, we will implement a straightforward QA process:

* **Documentation for Outsourced Talent:**
  * Develop clear guidelines and checklists to assist outsourced QA teams in validating URLs against the **whitelisting criteria**.
  * Regularly update the guidelines to reflect any changes in the criteria or processes.
* **Automated Validation Tools:**
  * Use automated tools to periodically check the status of whitelisted URLs, identifying any errors or security issues.
* **Flagging and Review System:**
  * Establish a system for flagging URLs that may no longer meet the criteria, triggering a manual review by the QA team.

#### **4. Implications for Data Engineering**

* **Data Pipeline Adjustments:**
  * Data Engineers will create a validation layers within the data pipeline to ensure only **whitelisted** projects and validated URLs are ingested. Automated checks against the whitelist should occur before data processing or storage.
  * Implement procedures for re-validating existing data periodically to maintain compliance with the whitelist, ensuring that any changes in project status are accurately reflected in the dataset.
* **Data Integrity and Historical Accuracy:**
  * Develop methods for handling projects that fall off the whitelist, such as archiving historical data to preserve continuity while keeping the active database free of potentially compromised projects.
  * Ensure that all data transformations and validations adhere to the whitelisting policy, particularly for projects that enter or exit the top project lists.

#### **5. Implications for DevOps**

* **Infrastructure Security:**
  * DevOps must implement and maintain a secure infrastructure that supports the whitelisting process, including automated tools for scanning and verifying URLs prior to their ingestion into the database.
  * Monitoring tools should be deployed to continuously assess the status and safety of whitelisted URLs, with alerts set up for any detected anomalies or potential risks.
* **Incident Response:**
  * Develop an incident response plan that allows for the swift removal of compromised projects or URLs from the whitelist, with immediate actions to prevent potential spread or damage. This plan should include clear communication channels and predefined steps to minimize impact.
* **Access Control:**
  * Ensure strict access controls are in place to limit the ability to modify or add to the whitelist to authorized personnel only, thereby reducing the risk of internal errors or malicious actions.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.exponent.ai/internal-qa/draft-preliminary-whitelisting-policy.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.