Data Engineering
Data Pipeline Verification
Run pipeline to load data and verify it completes without errors.
Ensure data cleaning and integrity checks are running and passing.
Confirm that the database is populated with up-to-date and accurate data.
Validate that CMS tables are correctly filled with token project data.
Ensure that all necessary data transformations are performed correctly.
Implement dynamic project list generation:
Automate the update of project rankings daily or as frequently as data is ingested to ensure the pipeline can dynamically account for changes in rankings.
Establish a flagging system to identify projects near the inclusion/exclusion threshold (e.g., top 300 projects by market cap).
Define and apply a policy for inclusion/exclusion:
Set clear criteria for when a project should be included (e.g., top 300 by market cap) and when it should be excluded.
Implement soft deletion or archiving strategies for projects that fall out of the top 300, ensuring historical data is preserved while keeping the main database current and performant.
Define and apply policy for white listing
Ensure transformation consistency:
Apply the same data transformation rules to all projects, whether newly included or previously included, to maintain consistency across the dataset.
Implement procedures to backfill data for projects re-entering the top 300.
Data Access Testing
Confirm that the agent can access and retrieve data from the database as required.
Test that user queries correctly match and return data related to hashtags, slugs, and CMS entries.
Ensure data retrieval times are within acceptable limits and optimized for performance.
Account for project fluctuations:
Regularly test the system's response to project rank changes to ensure accurate and efficient data retrieval even as projects enter and exit the top 300.
Data Quality Checks
Conduct random sampling to verify data accuracy and integrity.
Ensure that data deduplication processes are effective.
Validate that data retention policies are implemented and adhered to.
Implement regular audits:
Perform regular audits to maintain historical data integrity, ensuring that even projects excluded from the current top 300 remain accurate in the database.
Monitor project movements:
Generate regular reports tracking which projects have entered or exited the top 300, providing insights into market trends and the overall performance of the data pipeline.
Set up automated alerts for significant rank fluctuations, enabling quick identification and resolution of potential data issues.
Run URL checks of all white listed projects
Last updated