We design, build & manage data platforms for all types of organizations and all sizes of data from small to big.
Draw on our years of experience designing and implementing data-driven projects.
We can locate and prepare data for you fast, saving you time, effort and hassle.
Draw on our extensive experience in all areas of open data from platforms to policy.
Data OS is Datopian's open source data management framework. Metadata oriented it has built-in data quality and business rules management. Get in touch to find out more »
CyberGreen are leading a drive to make the Internet safer and more secure by proactively identifying and tackling vulnerabilities before they are exploited.
They needed to integrate billions of data points about Internet cybersecurity vulnerabilities from disparate sources and display them in a compelling and understandable form to policymakers.
Datopian were selected as CyberGreen primary technical partner. Datopian architected and implemented a robust, cost-effective data integration solution using a combination of open source technologies including Amazon Redshift, S3, RDS and workflow management like Luigi.
The resulting platform has the capacity to ingest billions of new data points on a weekly basis and track that data through to aggregate statistics for use in frontend dashboards. Datopian also took responsibility for the design and implementation of a custom dashboard experience to present the resulting data to policymakers.
Brave Venture Labs are building a next-generation machine learning platform for hiring and talent discovery.
Their work requires up-to-date, high quality integrated data on jobs and talent across diverse markets in Africa and the US.
Datopian were selected as their long-term technical partner for data sourcing and integration and have designed and implemented a data unification system to automatedly ingest and process data from a broad variety of sources. Data is unified into a form suitable for analysis and used by Brave Venture Labs insights and frontend teams.
The platform uses a variety of open source technologies and cloud infrastructure including Docker, Python, Data Package pipelines, S3, RDS and Apache Kafka.
Oxford University has over twentysix individual libraries not counting its colleges. None of these systems are interconnected making it hard to for scholars to discover the information they need. Between them the systems have tens of millions of pieces of metadata as well as 26km of archival material.
Datopian led a data unification effort to bring together metadata from across all 26 libraries stored in a disparate set of formats and quality (some of almost entirely undocumented). Datopian were able to move from initial discovery, through prototyping to a demonstration production system in a matter of weeks delivering on schedule and on budget.
Datopian used a variety of technologies including Python, Data Packages, ElasticSearch, S3, Docker and Apache Kafka.
Governments around the world have implemented data portals to present and share their open data with businesses and citizens. These platforms need to aggregate metadata and data, showcase datasets, enable exploration and data quality validation as well as integrate with a variety of backend processes and systems.
The majority of the world’s leading data portals including those of the US and UK government now run on software (CKAN) architected and implemented by one of our principals (Dr Rufus Pollock). He also played a leading role in developing implementations for a variety of governments including the US and UK.
Dr Pollock is an expert in data management and analysis with more than fifteen years of experience working with organisations on the technical, legal and social challenges to using and sharing information. A recognized global expert on digital policy and open data, he is also a serial entrepreneur and a published researcher, and has worked with G7 and G20 governments, IGOs such as the World Bank and the UN, businesses and CSOs.