⭐️ If you like AI Fusion, give it a star on GitHub
AI Fusion

Awesome Public Datasets

Awesome Public Datasets is a curated list of publicly available datasets across various domains and disciplines. It is hosted on GitHub and maintained by the community. The repository contains a collection of links to high-quality datasets that are freely accessible for research, analysis, and other purposes.

AWS Open Data Registry

The AWS Open Data Registry is a platform provided by Amazon Web Services (AWS) that offers access to a wide range of publicly available datasets. These datasets cover various domains such as climate, healthcare, finance, and more. The registry makes it easier for users to discover and access these datasets, which can be used for research, analysis, and building applications. AWS provides tools and services to help users efficiently process and analyze these datasets in the cloud.


This website is the United States government's open data website. It provides access to datasets published by agencies across the federal government. The site aims to make government more open and accountable, and to create opportunities for economic development and informed decision making.


NASA Earthdata is a gateway providing free and open access to a wide variety of NASA's Earth science data and information. Users can search, download, and explore satellite imagery, weather data, climate data, and more. It is a valuable resource for scientists, students, and anyone interested in learning about our planet.

Google Dataset Search

Dataset Search is a free search engine powered by Google that allows users to find and explore datasets from various repositories worldwide, using keywords, topics, and filters. It provides detailed information, links to dataset homepages, and is regularly updated, making it an invaluable resource for researchers, data scientists, and journalists in need of comprehensive and user-friendly dataset discovery.


kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. The platform enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets


LAION (acronym for Large-scale Artificial Intelligence Open Network) is a German non-profit which makes open-sourced artificial intelligence models and datasets

Open Image Dataset

The Open Images Dataset by Google is a collection of millions of labeled images that can be used for computer vision tasks. It includes images of different things like objects, scenes, and activities, and each image has annotations to describe what's in it. The dataset is regularly updated and can be used for tasks like recognizing objects in images or understanding what's happening in a scene. It's widely used by researchers and developers to train and evaluate computer vision models.

opendata CERN

The CERN Open Data portal provides access to a vast collection of research data from CERN, including the Large Hadron Collider experiments. It offers open licenses, extensive documentation, and various tools for exploration and analysis. This valuable resource for researchers and educators in particle physics is constantly updated and available in multiple formats.

UCI Machine Learning Repository

The UCI Machine Learning Repository is a comprehensive resource with over 500 freely available datasets for machine learning. It offers easy access to well-documented datasets, organized by subject area, and provides various formats for download.
Developed by friends and Open Source Community