Creating Public Goods for the Ecosystem
At Session, we are committed to fostering an ecosystem that thrives on openness, collaboration, and shared progress. We believe that the true potential of artificial intelligence is realized when communities work together to contribute and innovate. To this end, we aim to create public goods that benefit not only our users but also the broader AI and technology communities.
The Challenge of Non-Transparent Data
A significant challenge in AI today is the prevalence of non-transparent data. Many models are trained on datasets with unclear origins, raising concerns about data quality, biases, and ethical issues. This lack of transparency hinders trust and innovation, as developers and researchers cannot fully understand or verify the foundations of AI systems.
Collaborative Data Sharing Through Session's Network
To address this, we are collaborating with institutions using Session's network for web scraping and data collection. With appropriate permissions and compliance with data regulations, the data collected is automatically shared back with the community. We leverage blockchain technology to record metadata on-chain through a verifiable mechanism. Each piece of data contributed to public datasets includes immutable metadata detailing its provenance, such as its source, collection time, and processing methods. This transparency allows developers, researchers, and users to verify data origins, assess suitability, and build with confidence.
Example of Data Provenance Metadata
For example, a dataset collected for training an AI model on natural language understanding would have associated metadata recorded on the blockchain:
This ensures anyone using the data knows its origin, processing, and usage terms, promoting accountability and ethical use.
Training a Transparent Large Language Model (LLM)
Building on transparent data sharing, we plan to train a Large Language Model (LLM) exclusively on data with verified provenance. Traditional LLMs often face limitations due to training on datasets with uncertain origins, leading to biases and ethical concerns. Our LLM will be trained on datasets with fully documented and transparent provenance, setting a new standard in the AI industry.
We will use the public datasets generated through Session's network, ensuring all training data meets strict transparency and ethical sourcing standards. The training process will be open and observable, allowing the community to understand how the model learns and evolves. This transparency is crucial for building trust, as stakeholders can assess the model's biases, strengths, and limitations.
Deployment on Privacy-Preserving Platforms
We plan to deploy the LLM on a privacy-preserving platform like Nillion, which specializes in secure, decentralized computation. This allows us to perform complex computations and serve AI models without exposing sensitive data or compromising user privacy. Users can interact with the LLM, confident that their inputs and interactions are protected.
Creating and deploying a verifiable LLM on a privacy-preserving platform demonstrates that advanced AI systems can be developed without sacrificing ethical standards or user privacy. It encourages others in the industry to adopt similar practices and provides users and developers with tools that are both powerful and trustworthy.
Conclusion
In summary, our efforts to create public goods are rooted in the conviction that transparency, collaboration, and ethical practices are essential for the sustainable advancement of artificial intelligence. By transforming idle bandwidth into shared value, recording data provenance on-chain, and developing a verifiable LLM on a privacy-preserving platform, we are taking concrete steps toward this vision.
We invite developers, researchers, institutions, and enthusiasts to join us in this journey. Together, we can build a future where AI amplifies human potential and fosters innovation in a transparent, accountable, and beneficial manner for all.
Last updated