Article Image

Enhancing Data Operations: Datopian’s DaaS Solution for a Fortune 500 Logistics Leader

11 mins read
Key facts
Service providers:

Datopian

Client:

Fortune 500 logistics company

Services:
Data Engineering; ETL; Data Delivery; Data-as-a-Service; Data Aggregation; Data Integration; Data Standardization; Schema Design; Data Validation; Metadata Management; API Development; Agile Delivery; Data Consultancy;
Period:
June 2020 - Present

Brief summary of the project.

To streamline global operations, a Fortune 500 logistics company partnered with Datopian for a comprehensive postal code solution. By sourcing and standardizing data from hundreds of countries, Datopian enabled seamless integration with the company's systems, optimizing route planning and enhancing logistics accuracy.

Exclamation mark pointing the problem
Problem

The client struggled to manage a vast array of open datasets due to challenges in resource allocation, data quality, customization, and licensing complexities. Building an in-house data team was costly and posed operational risks.

Interrogation mark pointing the need
Need

They required a scalable, cost-effective solution for sourcing, processing, and customizing open data, along with ongoing support to ensure reliability and minimize operational disruptions.

Checkmark pointing the solution
Solution

Datopian provided a tailored DaaS service, including automated ETL pipelines, bespoke data customization, and dedicated support. By managing compliance and delivering high-quality datasets seamlessly integrated into their workflows, we enabled the client to focus on their core operations with confidence.

Main technologies & tools used
GitHub
GitHub-Actions
ETL
CSV
Cloudflare-R2
Python
Frictionless-Data
FTP

Context

Our client, a leading Fortune 500 company, sought reliable and comprehensive datasets to support their global operations. They required reference data such as country codes, time zones, and other geopolitical information, alongside custom data processing capabilities. Despite the availability of many open datasets, the client needed a robust, streamlined process to access, manage, and customize this data efficiently.

Datopian enabled the client’s data team to build data products and fuel the rest of the enterprise with essential information in a standardized and high-quality fashion. This support empowered the client to make informed decisions and maintain operational efficiency across their global operations.

The Challenge

The client faced a complex yet relatable challenge: wrangling a massive array of openly available datasets while keeping costs, quality, and sanity in check. Here’s what they were up against:

  • Resource Allocation: Building an in-house data engineering dream team to extract, process, and update datasets might sound great on paper, but the numbers told a different story. A team of 2-3 engineers and a manager would rack up costs of 150,000to150,000 to 250,000 annually. Not to mention the inherent risks such as potential delays, technical issues, and ongoing personnel management.
  • Data Quality and Reliability: Open data is fantastic - until it isn’t. The client needed up-to-date, reliable data they could trust. Plus, they required custom, business-specific data on top of the publicly available ones, all while having a reliable point of contact to sort out any pesky data errors or inconsistencies.
  • Customization: Although much of the data was openly available on platforms like Datahub.io, they required customized schemas and integrations tailored to their business needs.
  • Compliance and Licensing: Who wants to spend hours untangling licensing agreements or processing open data when there are better things to do? The client needed a hands-off solution that would handle all the red tape, freeing them to focus on their core business. They did not want to manage licensing issues or invest time in sourcing and processing openly available data.

The Solution

Datopian stepped in to provide a tailored a Data-as-a-Service (DaaS) tailored solution that combined technical expertise with a deep understanding of the client’s unique needs. Our approach ensured the client could focus on their core business while we handled the complexities of data management. Here’s how we delivered:

  • Comprehensive Data Delivery: Over 30 data tables were delivered in CSV format, encompassing a wide range of geopolitical datasets, such as airport codes, country codes, time zones, and more. Datopian’s solution provided regular updates to keep the data current and relevant. To align with the client’s internal requirements, the data was delivered via FTP protocol, ensuring seamless integration with their existing systems.
  • ETL System: We built an Extract, Transform, and Load (ETL) system that automates the extraction of data from various open sources. This system normalizes, cleans, and transforms the data, consolidating it into a format aligned with the client’s specifications. We implemented the frictionless metadata specification to ensure consistency and interoperability of datasets.
  • Data Curation and Customization: While most of the data was sourced from open platforms like Datahub.io, Datopian curated and customized these datasets to meet the client’s specific needs. This included modifying existing schemas and creating additional bespoke datasets.
  • Support and Reliability: A dedicated support team was established to address any data-related issues, ensuring the client has a reliable point of contact. This level of support is crucial for enterprise clients who rely on accurate data for their daily operations.

Value Delivered

Partnering with Datopian transformed the client’s data operations, delivering measurable benefits that extended beyond cost savings. Here’s how we empowered their business:

  • Significant Cost Savings: By outsourcing their data engineering needs to Datopian, the client avoided the significant costs of building and maintaining an in-house team. The estimated annual cost savings ranged from 150,000to150,000 to 250,000, without factoring in the potential risks of managing an internal operation.
  • Reliable and Up-to-Date Data: Datopian’s automated ETL system ensured that the client received up-to-date and accurate datasets. The regular updates and cleaning processes provided the client with confidence in their data's integrity.
  • Bespoke Data Tailored to Business Needs: Our team tailored the data delivery to fit the client's specific needs, modifying schemas and integrating bespoke datasets. This customization enabled the client to use the data directly in their workflows, enhancing operational efficiency.
  • Reduced Complexity and Risk: With Datopian handling data sourcing, processing, and licensing considerations, the client could focus on their core business operations. This reduced the complexity and risk associated with managing open data.
  • Enterprise-Grade Support and Responsiveness: Having a dedicated team for data support meant the client could quickly resolve any issues, minimizing disruptions to their operations.

Agile Collaboration and Communication

To ensure continuous alignment with the client’s evolving needs, Datopian adopted an agile delivery model for this service. This approach fostered a close, collaborative relationship between our teams, enabling effective communication and rapid response to changes.

  • Regular Standups: We conducted regular meetings with the client’s team, similar to standups, to discuss ongoing progress, identify potential or current issues, and plan for upcoming tasks. These meetings facilitated transparency, kept both teams aligned, and ensured smooth data delivery.
  • Team Structure: Our agile team consisted of a diverse set of roles, each contributing to the seamless execution of the data service:
    • Project Manager: Oversaw the project's progress, managed timelines, and served as the main point of contact for the client, ensuring that their requirements were met promptly.
    • Senior Data Engineer: Led the development of the ETL processes, ensuring the data was extracted, cleaned, transformed, and delivered efficiently.
    • Data Analyst: Worked closely with the data engineer to verify data quality, structure, and relevance, making sure it aligned with the client’s business requirements.
    • Business Analyst: Engaged with the client to gather and refine requirements, translating them into actionable tasks for the technical team.
    • Support Specialists: Provided continuous support to address any data-related issues, ensuring that the client received timely assistance whenever needed.
  • Adaptability: This agile, cross-functional team structure allowed us to rapidly adapt to changing client requirements, including modifying data schemas or integrating new datasets as needed. By maintaining regular communication and collaboration, we ensured that the client always had up-to-date, high-quality data tailored to their needs.

This collaborative and adaptive approach was crucial in delivering value, allowing the client to rely on Datopian not just as a data provider but as a strategic partner.

Sample Datasets Provided

Country Codes Dataset: comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more.

Alpha_2_Country_codeCountry_Name_EnglishCountry_Name_English_CLDRCountry_Name_English_ReadableAlpha_3_Country_codeNumeric_Country_CodeContinental_Code
ADAndorraAndorraAndorraAND20EU
AEUnited Arab EmiratesUnited Arab EmiratesUnited Arab EmiratesARE784AS
AFAfghanistanAfghanistanAfghanistanAFG4AS
AGAntigua and BarbudaAntigua & BarbudaAntigua & BarbudaATG28NA
AIAnguillaAnguillaAnguillaAIA660NA
ALAlbaniaAlbaniaAlbaniaALB8EU
AMArmeniaArmeniaArmeniaARM51AS
AOAngolaAngolaAngolaAGO24AF
AQAntarcticaAntarcticaAntarcticaATA10AN
ARArgentinaArgentinaArgentinaARG32SA
ASAmerican SamoaAmerican SamoaAmerican SamoaASM16OC
ATAustriaAustriaAustriaAUT40EU
AUAustraliaAustraliaAustraliaAUS36OC
AWArubaArubaArubaABW533NA
AXÅland IslandsÅland IslandsÅland IslandsALA248EU
AZAzerbaijanAzerbaijanAzerbaijanAZE31AS

Holidays Dataset: List of holidays per country with their names, types and dates.

Country CodeHoliday NameType of HolidayDate
ADNew Year's Daypublic2024-01-01
ADEpiphanypublic2024-01-06
ADShrove Tuesdaypublic2024-02-13
ADConstitution Daypublic2024-03-14
ADMaundy Thursdaybank2024-03-28
ADGood Fridaypublic2024-03-29
ADEaster Sundaypublic2024-03-31
ADEaster Mondaypublic2024-04-01
ADLabour Daypublic2024-05-01
ADPentecostpublic2024-05-19
ADWhit Mondaypublic2024-05-20
ADAssumptionpublic2024-08-15
ADOur Lady of Meritxellpublic2024-09-08
ADAll Saints' Daypublic2024-11-01
ADImmaculate Conceptionpublic2024-12-08
ADChristmas Evebank2024-12-24
ADChristmas Daypublic2024-12-25
ADBoxing Daypublic2024-12-26
AENew Year's Daypublic2024-01-01
AELaylat al-Mi'rajpublic2024-02-08
AEFirst day of Ramadanpublic2024-03-11
AEEnd of Ramadan (Eid al-Fitr)public2024-04-10
AEFeast of the Sacrifice (Eid al-Adha)public2024-06-16
AEIslamic New Yearpublic2024-07-07
AEBirthday of Muhammad (Mawlid)public2024-09-15
AENational Daypublic2024-12-02

Time Zones Dataset: Lists standardized time zone information, including UTC offsets, time zone names, and daylight saving time status.

countryCodecountryNamezoneNamegmtOffset
CIIvory CoastAfrica/Abidjan0
GHGhanaAfrica/Accra0
ETEthiopiaAfrica/Addis_Ababa10800
DZAlgeriaAfrica/Algiers3600
EREritreaAfrica/Asmara10800
MLMaliAfrica/Bamako0
CFCentral African RepublicAfrica/Bangui3600
GMGambiaAfrica/Banjul0
GWGuinea-BissauAfrica/Bissau0

Metadata Examples

Below is a snippet of the frictionless metadata specification used for one of the datasets to showcase how the data is documented and organized:

name: country-codes
title: Comprehensive country codes: ISO 3166, ITU, ISO 4217 currency codes and many more
format: csv
datapackage_version: 1.0.0
last_modified: 2024-09-25
licenses:
  - name: ODC-PDDL-1.0
    path: http://opendatacommons.org/licenses/pddl/
    title: Open Data Commons Public Domain Dedication and License v1.0
sources:
  - name: United Nations Protocol and Liaison Service
    title: United Nations Protocol and Liaison Service
    path: https://www.un.org/dgacm/sites/www.un.org.dgacm/files/Documents_Protocol/unterm-efsrca.xlsx
  - name: Unicode Common Locale Data Repository (CLDR) Project
    title: Unicode Common Locale Data Repository (CLDR) Project
    path: https://github.com/unicode-org/cldr-json/blob/d38478855dd8342749f0494332cc8acc2895d20d/cldr-json/cldr-localenames-full/main/ms/territories.json
  - name: United Nations Department of Economic and Social Affairs Statistics Division
    title: United Nations Department of Economic and Social Affairs Statistics Division
    path: https://unstats.un.org/unsd/methodology/m49/overview/
  - name: SIX Interbank Clearing Ltd (on behalf of ISO)
    title: SIX Interbank Clearing Ltd (on behalf of ISO)
    path: https://www.six-group.com/dam/download/financial-information/data-center/iso-currrency/lists/list-one.xml
  - name: Statoids
    title: Statoids
    path: http://www.statoids.com/wab.html
  - name: Geonames
    title: Geonames
    path: http://download.geonames.org/export/dump/countryInfo.txt
  - name: US Securities and Exchange Commission
    title: US Securities and Exchange Commission
    path: https://www.sec.gov/submit-filings/filer-support-resources/edgar-state-country-codes
resources:
  - name: country-codes
    format: csv
    path: data/country-codes.csv
    schema:
      fields:
        - name: FIFA
          title: FIFA code
          description: Codes assigned by the Fédération Internationale de Football Association
          type: string
        - name: Dial
          title: telephone dialing code
          description: Country code from ITU-T recommendation E.164, sometimes followed by area code
          type: string
        - name: ISO3166-1-Alpha-3
          title: ISO3166-1-Alpha-3
          description: Alpha-3 codes from ISO 3166-1 (synonymous with World Bank Codes)
          type: string
          constraints:
            unique: true
            minLength: 3
            maxLength: 3
        - name: MARC
          title: MARC code
          description: MAchine-Readable Cataloging codes from the Library of Congress
          type: string
        - name: is_independent
          title: independent country
          description: Country status, based on the CIA World Factbook
          type: string
        - name: ISO3166-1-numeric
          title: ISO3166-1-numeric
          description: Numeric codes from ISO 3166-1
          type: string
        - name: GAUL
          title: GAUL code
          description: Global Administrative Unit Layers from the Food and Agriculture Organization
          type: string
        - name: FIPS
          title: FIPS code
          description: Codes from the U.S. standard FIPS PUB 10-4
          type: string
        - name: WMO
          title: WMO code
          description: Country abbreviations by the World Meteorological Organization
          type: string
          constraints:
            maxLength: 2
        - name: ISO3166-1-Alpha-2
          title: ISO3166-1-Alpha-2
          description: Alpha-2 codes from ISO 3166-1
          type: string
          constraints:
            unique: true
            minLength: 2
            maxLength: 2
        - name: ITU
          title: ITU code
          description: Codes assigned by the International Telecommunications Union
          type: string
        - name: IOC
          title: IOC code
          description: Codes assigned by the International Olympics Committee
          type: string
          constraints:
            maxLength: 3
        - name: DS
          title: distinguishing signs of vehicles
          description: Distinguishing signs of vehicles in international traffic
          type: string
        - name: UNTERM Spanish Formal
          title: UNTERM Spanish Formal
          description: Country's formal Spanish name from UN Protocol and Liaison Service
          type: string
        - name: Global Code
          title: global code
          description: Country classification from United Nations Statistics Division
          type: string
        - name: Intermediate Region Code
          title: intermediate region code
          description: Country classification from United Nations Statistics Division
          type: string
        - name: official_name_fr
          title: official name French
          description: Country or Area official French short name from UN Statistics Division
          type: string
        - name: UNTERM French Short
          title: UNTERM French Short
          description: Country's short French name from UN Protocol and Liaison Service
          type: string
        - name: ISO4217-currency_name
          title: ISO4217-currency_name
          description: ISO 4217 currency name
          type: string
        - name: UNTERM Russian Formal
          title: UNTERM Russian Formal
          description: Country's formal Russian name from UN Protocol and Liaison Service
          type: string
        - name: UNTERM English Short
          title: UNTERM English Short
          description: Country's short English name from UN Protocol and Liaison Service
          type: string
        - name: ISO4217-currency_alphabetic_code
          title: ISO4217-currency_alphabetic_code
          description: ISO 4217 currency alphabetic code
          type: string
        - name: Small Island Developing States (SIDS)
          title: small island developing state (SIDS)
          description: Country classification from United Nations Statistics Division
          type: string
        - name: UNTERM Spanish Short
          title: UNTERM Spanish Short
          description: Country's short Spanish name from UN Protocol and Liaison Service
          type: string
        - name: ISO4217-currency_numeric_code
          title: ISO4217-currency_numeric_code
          description: ISO 4217 currency numeric code
          type: string
        - name: UNTERM Chinese Formal
          title: UNTERM Chinese Formal
          description: Country's formal Chinese name from UN Protocol and Liaison Service
          type: string
        - name: UNTERM French Formal
          title: UNTERM French Formal
          description: Country's formal French name from UN Protocol and Liaison Service
          type: string
        - name: UNTERM Russian Short
          title: UNTERM Russian Short
          description: Country's short Russian name from UN Protocol and Liaison Service
          type: string
        - name: M49
          title: M49
          description: UN Statistics M49 numeric codes (nearly synonymous with ISO 3166-1 numeric codes, which are based on UN M49. ISO 3166-1 does not include Channel Islands or Sark, for example)
          type: number
          constraints:
            unique: true
        - name: Sub-region Code
          title: sub-region code
          description: Country classification from United Nations Statistics Division
          type: string
        - name: Region Code
          title: region code
          description: Country classification from United Nations Statistics Division
          type: string
        - name: official_name_ar
          title: official name Arabic
          description: Country or Area official Arabic short name from UN Statistics Division
          type: string
        - name: ISO4217-currency_minor_unit
          title: ISO4217-currency_minor_unit
          description: ISO 4217 currency number of minor units
          type: string
        - name: UNTERM Arabic Formal
          title: UNTERM Arabic Formal
          description: Country's formal Arabic name from UN Protocol and Liaison Service
          type: string
        - name: UNTERM Chinese Short
          title: UNTERM Chinese Short
          description: Country's short Chinese name from UN Protocol and Liaison Service
          type: string
        - name: Land Locked Developing Countries (LLDC)
          title: land locked developing country (LLDC)
          description: Country classification from United Nations Statistics Division
          type: string
        - name: Intermediate Region Name
          title: intermediate region name
          description: Country classification from United Nations Statistics Division
          type: string
        - name: official_name_es
          title: official name Spanish
          description: Country or Area official Spanish short name from UN Statistics Division
          type: string
        - name: UNTERM English Formal
          title: UNTERM English Formal
          description: Country's formal English name from UN Protocol and Liaison Service
          type: string
        - name: official_name_cn
          title: official name Chinese
          description: Country or Area official Chinese short name from UN Statistics Division
          type: string
        - name: official_name_en
          title: official name English
          description: Country or Area official English short name from UN Statistics Division
          type: string
        - name: ISO4217-currency_country_name
          title: ISO4217-currency_country_name
          description: ISO 4217 country name
          type: string
        - name: Least Developed Countries (LDC)
          title: least developed country (LDC)
          description: Country classification from United Nations Statistics Division
          type: string
        - name: Region Name
          title: region name
          description: Country classification from United Nations Statistics Division
          type: string
        - name: UNTERM Arabic Short
          title: UNTERM Arabic Short
          description: Country's short Arabic name from UN Protocol and Liaison Service
          type: string
        - name: Sub-region Name
          title: sub-region name
          description: Country classification from United Nations Statistics Division
          type: string
        - name: official_name_ru
          title: official name Russian
          description: Country or Area official Russian short name from UN Statistics Division
          type: string
        - name: Global Name
          title: global name
          description: Country classification from United Nations Statistics Division
          type: string
        - name: Capital
          title: capital city
          description: Capital city from Geonames
          type: string
        - name: Continent
          title: continent
          description: Continent from Geonames
          type: string
          constraints:
            minLength: 2
            maxLength: 2
        - name: TLD
          title: TLD
          description: Top level domain from Geonames
          type: string
        - name: Languages
          title: languages
          description: Languages from Geonames
          type: string
        - name: Geoname ID
          title: Geoname ID
          description: Geoname ID
          type: number
          constraints:
            unique: true
        - name: CLDR display name
          title: CLDR display name
          description: Country's customary English short name (CLDR)
          type: string
        - name: EDGAR
          title: EDGAR code
          description: EDGAR country code from SEC
          type: string
          constraints:
            maxLength: 2

Conclusion

The partnership with Datopian has empowered the client to leverage open data efficiently while avoiding the pitfalls and expenses of managing an in-house data engineering team. Our professional services, combined with our data expertise, provided the client with the reliability and customization they needed, making Datopian a valuable partner in their data journey.

If you are an organization looking to streamline your data operations, Datopian offers the expertise and support to deliver high-quality, tailored data solutions.

Not finding the data you need? We can get it for you! Check out our premium data service at Datahub.io. You can also reach out to us to discuss how we can help you achieve your goals.

Don't forget to check out our postal codes collection page at datahub.io/collections/postal-codes-datasets

We are the CKAN experts.

Datopian are the co-creators, co-stewards, and one of the main developers of CKAN. We design, develop and scale CKAN solutions for everyone from government to the Fortune 500. We also monitor client use cases for data to ensure that CKAN is responding to genuine challenges faced by real organizations.

Related Case Studies