Content Library
Learn more about how DDD supports projects in Computer Vision and Natural Language Processing for innovative Autonomous Driving, Agricultural Technology, Robotics, and more.
Read our case studies on successful projects in content conversion and digitization for Cultural Heritage organizations, National Libraries, Museums, and more.
Featured content
Key strategies to advancing autonomous driving levels
What Are Electric Carmakers Doing to Enable Higher SAE Levels?
EBOOK
Why MSMs Are Critical to Training Autonomous Driving Systems
The Autonomous Driving industry is fast-growing Managed Service Models are becoming increasingly important.
WHITE PAPER
Work with DDD
Learn about what services DDD offers and how we can benefit your AI project.
BRAND BROCHURE
Aerial Image Segmentation
Learn what the common pitfalls and challenges are to Aerial Image Segmentation.
EBOOK
Training Data Considerations
Learn what you need to take into account before you start an AI project at scale.
EBOOK
Reliable DAta Annotation Demands a Disciplined Methodology
Industry specialists need to develop methodologies that contribute to delivering consistent, high-quality results.
WHITE PAPER
Accelerating Autonomous Driving Systems with Digital Twins and HITL Processes
Explore how technologies like digital twins and AI are transforming ADS and ADAS development.
WHITE PAPER
MAXIMIZING AV PERFORMANCE by OPTIMIZING WORKFLOW ARCHITECTURE
In this whitepaper, we will dive into the practical implications of each approach on your AVP, VNV, and triage processes.
WHITE PAPER
Enhancing Driver and In-Cabin Monitoring with hITL to Ensure Safety and Reliability
This whitepaper examines the DMS landscape, including sensor technologies, data processing paradigms, and machine learning algorithms.
WHITE PAPER
Leveraging Digital Twins for ADAS Excellence With insights on Data Management
This whitepaper examines the sophisticated techniques and implementation strategies behind AI, digital twins, and simulation in AD/ADAS development.
WHITE PAPER
WHITE PAPER
Optimizing AI models for real world ad/adas perception using hitl
This whitepaper explores how human-in-the-loop (HITL) processes power AI-enhanced perception and prediction for autonomous driving and help AD/ADAS systems become safer, more robust, and ultimately more reliable.
WHITE PAPER
THE ETHICAL ANNOTATION Playbook for Autonomous driving vehicles
Drawing from the latest research, real-world case studies, and hard-won industry insights, this ebook offers a comprehensive framework for managing bias and implementing trustworthy AI in AV development.
Case Studies
Corporate
-
Microsoft is a US-based software company. Microsoft was in need of training data to develop and improve their machine learning platform to detect recorded audio and speech from video and audio clips. In addition, Microsoft was developing their photo recognition software to detect images within photos.
Microsoft partnered with DDD to verify, classify, and tag hundreds of auditory utterances to input into the platform and accurately and efficiently transcribed the audio files for further analysis and quality assurance. Additionally, DDD associates tagged and annotated thousands of photos, including object labeling and image quality assessment. With these data sets, Microsoft was able to develop and improve their machine learning platform to recognize hundreds of speech utterances and images.
-
Large scale photo editing and post production for the world’s leading fashion brands
Beautiful images sell products, especially online. But making beautiful images is a complicated, time-consuming and expensive endeavor. Pixelz partnered with DDD to create a cost-effective workflow to get millions of product images edited consistently, quickly and professionally.
When executives from DDD and Pixelz first met, they knew they had a shared mission. “At Pixelz, we pride ourselves in providing excellence to our clients, and we are deeply committed to empowering our employees. In DDD, we found a partner who does both,” said Thomas Kragelund, Co-Founder and CEO of Pixelz.
-
AWS Cloud Migration Services
DDD partnered with Enquizit to provide cloud migration and managed services to Enquizit’s customers. DDD’s AWS SysOps and DevOps engineers in Kenya work 24/5 to support Enquizit customers.
About the recent partnership, TC Ratnapuri, Enquizit President said, “We hope to further dissolve barriers to socio-economic gains for motivated and intelligent youths in regions that lack the opportunity for them to realize their potential.
Demonstrating our strong relationship with AWS and a commitment to being a socially responsible IT company, we’ll strive to get these young professionals the chance they’ve earned."
-
AMP Robotics develops robotic systems to remove recyclable materials from conveyor belts for recovery. DDD input valuable recycling data, segmented identifiable recycling materials using color-coded polygons, and labeled thousands of different materials based on 17 categories such as plastic, paper, and aluminum. Extremely high accuracy was required to prevent materials from ending up in landfills.
Results: DDD performed QA to ensure that 100% of labels were correct to train and improve the robotic systems.
-
DDD partnered with a sports analytics company to improve their video training data by tagging and categorizing sports videos for teams to track their performance and the performance of their opponents. DDD associates also analyzed and classified player positions. With this data, DDD generated high-quality reports, analysis, and visualizations for teams to improve their performance and win more games.
Results: DDD’s improved video training data to enable teams to improve their performance and win more games.
-
DDD partnered with a startup that assists the global farming community by analyzing drone and satellite imagery of orchards to monitor crop health. DDD classified, structured, cleaned, and optimized the satellite images for use. Our associates labeled different trees, effectively teaching the AI system to identify trees. The DDD team created ML training data by cleaning up and annotating orchard crop imagery involving nearly a million trees.
Results: Data was used to detect agricultural pests and diseases in orchards.
-
iWantGreatCare is the world’s leading platform for independent patient ratings and reviews. Every month, they receive over 100,000 new reviews in multiple languages. iWantGreatCare partnered with DDD to transcribe the handwritten feedback surveys from patients. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis.
Results: DDD developed a document analysis tool to transcribe handwritten review cards.
-
Spillage, or the loss of inventory by theft, accidents, and mistakes, costs retailers billions of dollars a year in the US alone. As self-checkouts grow in popularity, there is an equally growing need to help increase accurate identification of unscanned items that leave the store.
DDD helped our client label 25,000 images over a 5-week period
DDD and our client built a schema to track customer movements, relevant objects, and flag potential misidentification events
DDD’s dataset captured 99%+ of qualifying objects with an estimated F1 score of >0.97
Results: Our client was able to create a Proof of Concept (POC) system to begin building a new product line for this and similar retail use cases. They have been able to successfully build a pipeline around this project.
-
Broadband Bill
Challenge: Broadband bills are intentionally difficult to read and parse. DDD’s client was investigating irregularities and trends in billing practices, and DDD analyzed 30,000 bills and extracted 10+ fields per bill with accuracy levels exceeding 97%.
Solution:
DDD automated Personal Identifiable Information (PII) redaction for easier file storage DDD built data structures and dictionaries for different American broadband bills DDD associates tagged the data for thousands of bills to create a training corpus DDD deployed an AWS-based machine learning model to augment data extraction.
Results: DDD’s client was able to perform a variety of analyses on the dataset to determine variability in pricing structures across the US. They are currently consolidating these insights to inform policy changes and educate consumers.
-
Natural Language Processing
Challenge: We helped a financial institution build a chatbot to help customers execute a growing list of commands and requests without the need for human assistance. This is a challenge because of the variety of ways in which people express their intents.
Solution:
Intent Creation - i.e. display my balance
Text Variant Collection and Validation
Tagged key entities to help drive further analysis and understanding.
Performed Relationship Tagging between entities and sentiments to understand more complex commands
Results: Over 50,000 unique, validated utterances were generated across 100+ intents. Complex utterances with multiple intents enabled a more useful and functional Proof of Concept (POC). We expanded into several other use cases for the client and were able to reuse the process for 4 other languages within 2 years.
-
Computer Vision
Challenge: Factory and Industrial Safety is an ever- present concern. Keeping workers safe remains a high priority, and is especially critical in high-risk work environments involving heavy machinery and equipment. Ensuring adherence to safety protocols and guidelines keeps workers safe and reduces risk for companies and their employees.
Solution:
DDD worked with our client to develop a rigorous schema to apply to CCTV images that were often complex and unclear
DDD labeled over 10,000 images within 1 month to identify workers and body parts (heads, body, etc), PPE, and objects being carried, as well as grouped these instances together
The overall dataset was delivered with an F1 score exceeding 0.96
Results: Our customer was able to build and deploy a system to augment their existing safety check and protocols.
-
Data Cleanup
Challenge: A large commercial dataset vendor acquires millions of records for new and existing products, including descriptions, SKU numbers, MSRP, and units sold. This company needed to aggregate and augment the data with a focus on quality, as this was one of their product offerings.
Solution:
We trained a team of 50 individuals to understand the proper format
We identified outliers to drive updated categorization
We performed market research to understand accurate descriptions, classifications, etc, of the millions of products the client monitors
Results: The client was able to use DDD as a one-stop data shop and reduced their costs by 40% while maintaining quality. The DDD team presented edge cases and preliminary data analysis to drive new classifications on a regular basis. Our client continued to migrate their internal solutions to DDD for cost savings and additional flexibility.
universities & research institutions
-
Challenge: Economic historians at Emory University and Gesellschaft für Kapitalmarktforschung were researching global financial growth throughout history. The main sources were New York and German newspapers from the late 19th and early 20th century, which included detailed daily stock tables from both the New York Stock Exchange as well as the Berlin Stock Exchange. Due to the quality of the scans, table format and font size, the data needed to be entered manually.
Solution:
The principle investigators partnered with DDD to:
Captured and verified over two million financial data points from English and German newspapers
Clean and normalize the data to include standardized abbreviations and names
Enter the data with high accuracy and attention to detail
Results: With this data, the co-collaborators were able to complete their analysis of the data and are looking forward to publishing their findings.
-
Digitizing and structuring over 2 million pages of cultural heritage and historical content
The National Digital Newspaper Program (NDNP), a partnership between the National Endowment for the Humanities and the Library of Congress, is a long-term effort to develop an internet-based, searchable database of U.S. newspapers with descriptive information and select digitization of historic pages.
DDD worked with the Library of Congress and over 10 state and local institutions to digitize more than 2 million pages of historic newspaper archives residing in their collections. DDD encoded the text into METS/ALTO xml files with article segmentation and descriptive metadata to create an enhanced browsing and searching experience. With these files, the Library of Congress was able to create a digital archive to enhance the study of American history.
-
Product migration and optimization on AWS Cloud
DuraSpace is a not-for-profit organization that stewards community-supported open source software developed by librarians, archivists, technologists, and researchers who share the goal of creating and preserving long-term access to the world’s digital heritage. DuraSpace’s applications are deployed on the AWS cloud platform.
DDD has a team of AWS certified SysOps engineers in Kenya working closely with the US-based DuraSpace team to optimize the deployment of their product suite, build test automations, and perform R&D support where the team looks at new and innovative ways to reduce costs, improve resource utilization and scaling on product deployments on the AWS platform. The DDD team built out a new architecture in AWS for the deployment of DSpaceDirect, including load balancing, use of an elastic file system, and managed database services.
DuraSpace’s Services Technical Director, Bill Branan, had an opportunity to meet two of the Cloud Academy engineers involved in the project. “Hearing the stories of where these students started their journey, and all the work they have done as part of the DDD program to establish technical proficiency, provides a startling glimpse at the wealth of talent that is available if we are only willing to look. DDD is doing an incredible job of finding students who are eager to learn and providing them with the training they need to succeed. I look forward to continued work with DDD, to further both the DuraSpace and DDD missions.”
-
DDD’s team placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving.
Data Collection, Baseline and Endline Surveys, and Randomized Control Trials in Africa
Georgetown University’s Initiative on Innovation, Development and Evaluation (Gui2de) was researching different interventions to improve road safety and decrease road deaths and accidents in East Africa. Beginning in Kenya, stickers were placed inside buses with motivational messages encouraging passengers to speak up directly to their driver against bad driving. Results from multiple studies proved that buses in which these stickers were placed had between 25-50% fewer insurance accident claims, translating into 140 avoided accidents and 55 lives annually. Gui2de wanted to scale this intervention to Tanzania.
Over the course of 1 year, DDD’s field research team randomly selected, inspected, and placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving. DDD then collected accident data from police stations in 26 regions in Tanzania, digitizing and cleaning up the data in order to analyze and compare accident rates among vehicles with/without stickers.
Through this project, Gui2de was able to collect and analyze important data to increase the effectiveness of road safety interventions in Tanzania.
-
DDD’s field research team collected data on the financial behavior of 300 households among low-income families in Kenya.
Bankable Frontier Associates (BFA) is a global consulting firm specializing in using finance to create solutions for low-income people. BFA was conducting a study about the financial behavior of low-income households in Africa through financial data collection and the compilation of “Financial Diaries.”
DDD’s field research team worked on a landmark study to collect detailed data on the financial behavior of 300 households among low-income families in Kenya across 5 locations. DDD associates visited households every two weeks for 18 months, collecting over 500,000 transaction entries. Data included information on on wage, income, spending habits, day-to-day expenses, capital expenditures for businesses, and schooling costs. With this data, BFA was able to create a collage illustrating the financial behavior of the poverty-stricken, giving researchers a vivid picture of the lives of low-income households. A similar study was conducted on a smaller scale in Tanzania, funded by CGAP.
-
DDD’s field research team distributed calendars to 36,000 households to motivate families to demand better WASH facilities.
CARE International works around the globe to save lives, defeat poverty and achieve social justice. CARE, with funding from the Bill and Melinda Gates Foundation, partnered with DDD to research and implement a School Water Sanitation Hygiene plus community action (SWASH+) project to test and evaluate alternative strategies to improve the sustainable provision of sanitation facilities in primary schools in Kenya. The implementations included a cell phone-based Education Management Information System (EMIS) to improve the collection of reliable school-level information for planning and management purposes and complementary interventions to improve delivery and management of school WASH inputs.
DDD’s field research team distributed provocative and educational calendars to 36,000 households to motivate families to demand better WASH facilities for their children. Parents were given a mobile phone application to rate and share school WASH performance, thereby exerting community-level pressure on administrators. DDD associates implemented a series of randomized interventions, as well as tracking sanitation and hygiene benchmarks in over 350+ schools to measure changes in demand for improved hygiene and sanitation through the EMIS.
With this data, CARE International was able to implement a number of complementary interventions to improve WASH provisions in schools across Kenya. A year later, DDD performed a secondary study to test and evaluate the success of the improved interventions.
CULTURAL HERITAGE
-
Data Curation and Annotation
Libraries, museums, and archives rely on their data being extracted from modern and ancient documents for research and preservation. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis. DDD has transcribed over 500 million characters of handwritten text annually across various document types.
Results: Historical documents are able to be digitized for broader audience reach.
-
Digital preservation, Cloud-based digital archive and collections management system for one of the world’s largest archeology and paleontology collections
The National Museums of Kenya (NMK) is the custodian of Kenya’s natural and cultural heritage. With over 10 million artifacts, fossils, and specimens, its collections represent the longest record of human evolution in the world. For years, NMK sought to preserve these rare and important collections through digital preservation to mitigate the risk of losing valuable information and records due to decay and the passage of time.
DDD is enabling NMK to achieve this objective by creating an entire digital records management, collections, and archiving system on the AWS Cloud. The DDD team is also digitizing the collections including undertaking 3D imaging, photometry, geotagging, geo-spatial analysis and training the NMK teams. Additionally, DDD is creating a virtual museum experience for the public, while providing access to the rarest of materials and artifacts to the research and academic community.
Dr. Mzalendo Kibunjia, Director General of NMK, said, “The searchable digital archive will not only be of immense value to researchers worldwide, but will also make our evolutionary history and culture more accessible to the younger generations which would foster a deeper and richer understanding of our heritage.”
-
DDD digitized and preserved the Tuol Sleng Genocide Museum archives of Cambodia, a Cambodian high-school-turned-prison.
Digital Preservation of at-risk records at the Tuol Sleng Genocide Museum in Cambodia
UNESCO partnered with DDD and The Brechin Group Inc to digitize and preserve the Tuol Sleng Genocide Museum archives of Cambodia to recognize its historical importance and increase awareness about the Cambodian genocide. The Tuol Sleng Genocide Museum, a Cambodian high-school-turned-prison by the Khmer Rouge in 1975, has a collection of over 400,000 documents. The archive contains photographs of over 6,000 prisoners, as well as elaborate handwritten “confessions” to real and imagined offenses, many extracted under torture, and other biographical records of prisoners and prison guards.
Together, DDD and Brechin undertook an extensive project to digitize the collection, assessing and improving conditions for optimal preservation. After receiving indexing and technical metadata training from Brechin, DDD associates were able to digitized the fragile collection. Approximately 400,000 documents were indexed and descriptive metadata was added. The digital files were then complied into a database in English and Khmer.
Additionally, Brechin trained the Tuol Sleng museum staff in preservation, digitization, and indexing techniques to continue preserving the collection as needed. Currently, DDD associates are building the website to host and disseminate the educational resources to thousands of online visitors.
-
DDD worked with the White House Historical Association to digitize their image collection dating back to 1962.
The White House Historical Association (WHHA) was founded in 1962 to enhance the understanding and appreciation of the Executive Mansion. From 1962 through the mid-1980s, the WHHA worked with photographers to document major events and day-to-day life in the White House. These images were preserved on 35mm slide film, remaining in storage for more than two decades.The physical slides were at risk of degradation given their age and many had never been scanned before. In the early 2000s, WHHA switched to digital photography, generating roughly 7TB of image files that were stored on external drives. The digital files were also not being utilized as the were stored on external hard drives and thus unsearchable.
WWHA partnered with DDD to digitize their image collections. DDD associates digitized the images, adding metadata and creating searchable files for their digital asset management (DAM) system. The digital files were then optimized and processed for Amazon Glacier cloud storage for long-term preservation.
With these digital images, the WWHA was able to create a Digital Library - launched on Amazon Snowball - to ensure public access to these unique records and to protect the physical assets by creating digital copies.
WWHA was recently awarded the Digital Preservation Award for their significant and innovative contributions to securing digital legacy.
Case Studies
Corporate
-
Microsoft is a US-based software company. Microsoft was in need of training data to develop and improve their machine learning platform to detect recorded audio and speech from video and audio clips. In addition, Microsoft was developing their photo recognition software to detect images within photos.
Microsoft partnered with DDD to verify, classify, and tag hundreds of auditory utterances to input into the platform and accurately and efficiently transcribed the audio files for further analysis and quality assurance. Additionally, DDD associates tagged and annotated thousands of photos, including object labeling and image quality assessment. With these data sets, Microsoft was able to develop and improve their machine learning platform to recognize hundreds of speech utterances and images.
-
Large scale photo editing and post production for the world’s leading fashion brands
Beautiful images sell products, especially online. But making beautiful images is a complicated, time-consuming and expensive endeavor. Pixelz partnered with DDD to create a cost-effective workflow to get millions of product images edited consistently, quickly and professionally.
When executives from DDD and Pixelz first met, they knew they had a shared mission. “At Pixelz, we pride ourselves in providing excellence to our clients, and we are deeply committed to empowering our employees. In DDD, we found a partner who does both,” said Thomas Kragelund, Co-Founder and CEO of Pixelz.
-
AWS Cloud Migration Services
DDD partnered with Enquizit to provide cloud migration and managed services to Enquizit’s customers. DDD’s AWS SysOps and DevOps engineers in Kenya work 24/5 to support Enquizit customers.
About the recent partnership, TC Ratnapuri, Enquizit President said, “We hope to further dissolve barriers to socio-economic gains for motivated and intelligent youths in regions that lack the opportunity for them to realize their potential.
Demonstrating our strong relationship with AWS and a commitment to being a socially responsible IT company, we’ll strive to get these young professionals the chance they’ve earned."
-
AMP Robotics develops robotic systems to remove recyclable materials from conveyor belts for recovery. DDD input valuable recycling data, segmented identifiable recycling materials using color-coded polygons, and labeled thousands of different materials based on 17 categories such as plastic, paper, and aluminum. Extremely high accuracy was required to prevent materials from ending up in landfills.
Results: DDD performed QA to ensure that 100% of labels were correct to train and improve the robotic systems.
-
DDD partnered with a sports analytics company to improve their video training data by tagging and categorizing sports videos for teams to track their performance and the performance of their opponents. DDD associates also analyzed and classified player positions. With this data, DDD generated high-quality reports, analysis, and visualizations for teams to improve their performance and win more games.
Results: DDD’s improved video training data to enable teams to improve their performance and win more games.
-
DDD partnered with a startup that assists the global farming community by analyzing drone and satellite imagery of orchards to monitor crop health. DDD classified, structured, cleaned, and optimized the satellite images for use. Our associates labeled different trees, effectively teaching the AI system to identify trees. The DDD team created ML training data by cleaning up and annotating orchard crop imagery involving nearly a million trees.
Results: Data was used to detect agricultural pests and diseases in orchards.
-
iWantGreatCare is the world’s leading platform for independent patient ratings and reviews. Every month, they receive over 100,000 new reviews in multiple languages. iWantGreatCare partnered with DDD to transcribe the handwritten feedback surveys from patients. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis.
Results: DDD developed a document analysis tool to transcribe handwritten review cards.
-
Spillage, or the loss of inventory by theft, accidents, and mistakes, costs retailers billions of dollars a year in the US alone. As self-checkouts grow in popularity, there is an equally growing need to help increase accurate identification of unscanned items that leave the store.
DDD helped our client label 25,000 images over a 5-week period
DDD and our client built a schema to track customer movements, relevant objects, and flag potential misidentification events
DDD’s dataset captured 99%+ of qualifying objects with an estimated F1 score of >0.97
Results: Our client was able to create a Proof of Concept (POC) system to begin building a new product line for this and similar retail use cases. They have been able to successfully build a pipeline around this project.
-
Broadband Bill
Challenge: Broadband bills are intentionally difficult to read and parse. DDD’s client was investigating irregularities and trends in billing practices, and DDD analyzed 30,000 bills and extracted 10+ fields per bill with accuracy levels exceeding 97%.
Solution:
DDD automated Personal Identifiable Information (PII) redaction for easier file storage DDD built data structures and dictionaries for different American broadband bills DDD associates tagged the data for thousands of bills to create a training corpus DDD deployed an AWS-based machine learning model to augment data extraction.
Results: DDD’s client was able to perform a variety of analyses on the dataset to determine variability in pricing structures across the US. They are currently consolidating these insights to inform policy changes and educate consumers.
-
Natural Language Processing
Challenge: We helped a financial institution build a chatbot to help customers execute a growing list of commands and requests without the need for human assistance. This is a challenge because of the variety of ways in which people express their intents.
Solution:
Intent Creation - i.e. display my balance
Text Variant Collection and Validation
Tagged key entities to help drive further analysis and understanding.
Performed Relationship Tagging between entities and sentiments to understand more complex commands
Results: Over 50,000 unique, validated utterances were generated across 100+ intents. Complex utterances with multiple intents enabled a more useful and functional Proof of Concept (POC). We expanded into several other use cases for the client and were able to reuse the process for 4 other languages within 2 years.
-
Computer Vision
Challenge: Factory and Industrial Safety is an ever- present concern. Keeping workers safe remains a high priority, and is especially critical in high-risk work environments involving heavy machinery and equipment. Ensuring adherence to safety protocols and guidelines keeps workers safe and reduces risk for companies and their employees.
Solution:
DDD worked with our client to develop a rigorous schema to apply to CCTV images that were often complex and unclear
DDD labeled over 10,000 images within 1 month to identify workers and body parts (heads, body, etc), PPE, and objects being carried, as well as grouped these instances together
The overall dataset was delivered with an F1 score exceeding 0.96
Results: Our customer was able to build and deploy a system to augment their existing safety check and protocols.
-
Data Cleanup
Challenge: A large commercial dataset vendor acquires millions of records for new and existing products, including descriptions, SKU numbers, MSRP, and units sold. This company needed to aggregate and augment the data with a focus on quality, as this was one of their product offerings.
Solution:
We trained a team of 50 individuals to understand the proper format
We identified outliers to drive updated categorization
We performed market research to understand accurate descriptions, classifications, etc, of the millions of products the client monitors
Results: The client was able to use DDD as a one-stop data shop and reduced their costs by 40% while maintaining quality. The DDD team presented edge cases and preliminary data analysis to drive new classifications on a regular basis. Our client continued to migrate their internal solutions to DDD for cost savings and additional flexibility.
universities & research institutions
-
Challenge: Economic historians at Emory University and Gesellschaft für Kapitalmarktforschung were researching global financial growth throughout history. The main sources were New York and German newspapers from the late 19th and early 20th century, which included detailed daily stock tables from both the New York Stock Exchange as well as the Berlin Stock Exchange. Due to the quality of the scans, table format and font size, the data needed to be entered manually.
Solution:
The principle investigators partnered with DDD to:
Captured and verified over two million financial data points from English and German newspapers
Clean and normalize the data to include standardized abbreviations and names
Enter the data with high accuracy and attention to detail
Results: With this data, the co-collaborators were able to complete their analysis of the data and are looking forward to publishing their findings.
-
Digitizing and structuring over 2 million pages of cultural heritage and historical content
The National Digital Newspaper Program (NDNP), a partnership between the National Endowment for the Humanities and the Library of Congress, is a long-term effort to develop an internet-based, searchable database of U.S. newspapers with descriptive information and select digitization of historic pages.
DDD worked with the Library of Congress and over 10 state and local institutions to digitize more than 2 million pages of historic newspaper archives residing in their collections. DDD encoded the text into METS/ALTO xml files with article segmentation and descriptive metadata to create an enhanced browsing and searching experience. With these files, the Library of Congress was able to create a digital archive to enhance the study of American history.
-
Product migration and optimization on AWS Cloud
DuraSpace is a not-for-profit organization that stewards community-supported open source software developed by librarians, archivists, technologists, and researchers who share the goal of creating and preserving long-term access to the world’s digital heritage. DuraSpace’s applications are deployed on the AWS cloud platform.
DDD has a team of AWS certified SysOps engineers in Kenya working closely with the US-based DuraSpace team to optimize the deployment of their product suite, build test automations, and perform R&D support where the team looks at new and innovative ways to reduce costs, improve resource utilization and scaling on product deployments on the AWS platform. The DDD team built out a new architecture in AWS for the deployment of DSpaceDirect, including load balancing, use of an elastic file system, and managed database services.
DuraSpace’s Services Technical Director, Bill Branan, had an opportunity to meet two of the Cloud Academy engineers involved in the project. “Hearing the stories of where these students started their journey, and all the work they have done as part of the DDD program to establish technical proficiency, provides a startling glimpse at the wealth of talent that is available if we are only willing to look. DDD is doing an incredible job of finding students who are eager to learn and providing them with the training they need to succeed. I look forward to continued work with DDD, to further both the DuraSpace and DDD missions.”
-
DDD’s team placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving.
Data Collection, Baseline and Endline Surveys, and Randomized Control Trials in Africa
Georgetown University’s Initiative on Innovation, Development and Evaluation (Gui2de) was researching different interventions to improve road safety and decrease road deaths and accidents in East Africa. Beginning in Kenya, stickers were placed inside buses with motivational messages encouraging passengers to speak up directly to their driver against bad driving. Results from multiple studies proved that buses in which these stickers were placed had between 25-50% fewer insurance accident claims, translating into 140 avoided accidents and 55 lives annually. Gui2de wanted to scale this intervention to Tanzania.
Over the course of 1 year, DDD’s field research team randomly selected, inspected, and placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving. DDD then collected accident data from police stations in 26 regions in Tanzania, digitizing and cleaning up the data in order to analyze and compare accident rates among vehicles with/without stickers.
Through this project, Gui2de was able to collect and analyze important data to increase the effectiveness of road safety interventions in Tanzania.
-
DDD’s field research team collected data on the financial behavior of 300 households among low-income families in Kenya.
Bankable Frontier Associates (BFA) is a global consulting firm specializing in using finance to create solutions for low-income people. BFA was conducting a study about the financial behavior of low-income households in Africa through financial data collection and the compilation of “Financial Diaries.”
DDD’s field research team worked on a landmark study to collect detailed data on the financial behavior of 300 households among low-income families in Kenya across 5 locations. DDD associates visited households every two weeks for 18 months, collecting over 500,000 transaction entries. Data included information on on wage, income, spending habits, day-to-day expenses, capital expenditures for businesses, and schooling costs. With this data, BFA was able to create a collage illustrating the financial behavior of the poverty-stricken, giving researchers a vivid picture of the lives of low-income households. A similar study was conducted on a smaller scale in Tanzania, funded by CGAP.
-
DDD’s field research team distributed calendars to 36,000 households to motivate families to demand better WASH facilities.
CARE International works around the globe to save lives, defeat poverty and achieve social justice. CARE, with funding from the Bill and Melinda Gates Foundation, partnered with DDD to research and implement a School Water Sanitation Hygiene plus community action (SWASH+) project to test and evaluate alternative strategies to improve the sustainable provision of sanitation facilities in primary schools in Kenya. The implementations included a cell phone-based Education Management Information System (EMIS) to improve the collection of reliable school-level information for planning and management purposes and complementary interventions to improve delivery and management of school WASH inputs.
DDD’s field research team distributed provocative and educational calendars to 36,000 households to motivate families to demand better WASH facilities for their children. Parents were given a mobile phone application to rate and share school WASH performance, thereby exerting community-level pressure on administrators. DDD associates implemented a series of randomized interventions, as well as tracking sanitation and hygiene benchmarks in over 350+ schools to measure changes in demand for improved hygiene and sanitation through the EMIS.
With this data, CARE International was able to implement a number of complementary interventions to improve WASH provisions in schools across Kenya. A year later, DDD performed a secondary study to test and evaluate the success of the improved interventions.
CULTURAL HERITAGE
-
Data Curation and Annotation
Libraries, museums, and archives rely on their data being extracted from modern and ancient documents for research and preservation. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis. DDD has transcribed over 500 million characters of handwritten text annually across various document types.
Results: Historical documents are able to be digitized for broader audience reach.
-
Digital preservation, Cloud-based digital archive and collections management system for one of the world’s largest archeology and paleontology collections
The National Museums of Kenya (NMK) is the custodian of Kenya’s natural and cultural heritage. With over 10 million artifacts, fossils, and specimens, its collections represent the longest record of human evolution in the world. For years, NMK sought to preserve these rare and important collections through digital preservation to mitigate the risk of losing valuable information and records due to decay and the passage of time.
DDD is enabling NMK to achieve this objective by creating an entire digital records management, collections, and archiving system on the AWS Cloud. The DDD team is also digitizing the collections including undertaking 3D imaging, photometry, geotagging, geo-spatial analysis and training the NMK teams. Additionally, DDD is creating a virtual museum experience for the public, while providing access to the rarest of materials and artifacts to the research and academic community.
Dr. Mzalendo Kibunjia, Director General of NMK, said, “The searchable digital archive will not only be of immense value to researchers worldwide, but will also make our evolutionary history and culture more accessible to the younger generations which would foster a deeper and richer understanding of our heritage.”
-
DDD digitized and preserved the Tuol Sleng Genocide Museum archives of Cambodia, a Cambodian high-school-turned-prison.
Digital Preservation of at-risk records at the Tuol Sleng Genocide Museum in Cambodia
UNESCO partnered with DDD and The Brechin Group Inc to digitize and preserve the Tuol Sleng Genocide Museum archives of Cambodia to recognize its historical importance and increase awareness about the Cambodian genocide. The Tuol Sleng Genocide Museum, a Cambodian high-school-turned-prison by the Khmer Rouge in 1975, has a collection of over 400,000 documents. The archive contains photographs of over 6,000 prisoners, as well as elaborate handwritten “confessions” to real and imagined offenses, many extracted under torture, and other biographical records of prisoners and prison guards.
Together, DDD and Brechin undertook an extensive project to digitize the collection, assessing and improving conditions for optimal preservation. After receiving indexing and technical metadata training from Brechin, DDD associates were able to digitized the fragile collection. Approximately 400,000 documents were indexed and descriptive metadata was added. The digital files were then complied into a database in English and Khmer.
Additionally, Brechin trained the Tuol Sleng museum staff in preservation, digitization, and indexing techniques to continue preserving the collection as needed. Currently, DDD associates are building the website to host and disseminate the educational resources to thousands of online visitors.
-
DDD worked with the White House Historical Association to digitize their image collection dating back to 1962.
The White House Historical Association (WHHA) was founded in 1962 to enhance the understanding and appreciation of the Executive Mansion. From 1962 through the mid-1980s, the WHHA worked with photographers to document major events and day-to-day life in the White House. These images were preserved on 35mm slide film, remaining in storage for more than two decades.The physical slides were at risk of degradation given their age and many had never been scanned before. In the early 2000s, WHHA switched to digital photography, generating roughly 7TB of image files that were stored on external drives. The digital files were also not being utilized as the were stored on external hard drives and thus unsearchable.
WWHA partnered with DDD to digitize their image collections. DDD associates digitized the images, adding metadata and creating searchable files for their digital asset management (DAM) system. The digital files were then optimized and processed for Amazon Glacier cloud storage for long-term preservation.
With these digital images, the WWHA was able to create a Digital Library - launched on Amazon Snowball - to ensure public access to these unique records and to protect the physical assets by creating digital copies.
WWHA was recently awarded the Digital Preservation Award for their significant and innovative contributions to securing digital legacy.
Case Studies
Corporate
-
Microsoft is a US-based software company. Microsoft was in need of training data to develop and improve their machine learning platform to detect recorded audio and speech from video and audio clips. In addition, Microsoft was developing their photo recognition software to detect images within photos.
Microsoft partnered with DDD to verify, classify, and tag hundreds of auditory utterances to input into the platform and accurately and efficiently transcribed the audio files for further analysis and quality assurance. Additionally, DDD associates tagged and annotated thousands of photos, including object labeling and image quality assessment. With these data sets, Microsoft was able to develop and improve their machine learning platform to recognize hundreds of speech utterances and images.
-
Large scale photo editing and post production for the world’s leading fashion brands
Beautiful images sell products, especially online. But making beautiful images is a complicated, time-consuming and expensive endeavor. Pixelz partnered with DDD to create a cost-effective workflow to get millions of product images edited consistently, quickly and professionally.
When executives from DDD and Pixelz first met, they knew they had a shared mission. “At Pixelz, we pride ourselves in providing excellence to our clients, and we are deeply committed to empowering our employees. In DDD, we found a partner who does both,” said Thomas Kragelund, Co-Founder and CEO of Pixelz.
-
AWS Cloud Migration Services
DDD partnered with Enquizit to provide cloud migration and managed services to Enquizit’s customers. DDD’s AWS SysOps and DevOps engineers in Kenya work 24/5 to support Enquizit customers.
About the recent partnership, TC Ratnapuri, Enquizit President said, “We hope to further dissolve barriers to socio-economic gains for motivated and intelligent youths in regions that lack the opportunity for them to realize their potential.
Demonstrating our strong relationship with AWS and a commitment to being a socially responsible IT company, we’ll strive to get these young professionals the chance they’ve earned."
-
AMP Robotics develops robotic systems to remove recyclable materials from conveyor belts for recovery. DDD input valuable recycling data, segmented identifiable recycling materials using color-coded polygons, and labeled thousands of different materials based on 17 categories such as plastic, paper, and aluminum. Extremely high accuracy was required to prevent materials from ending up in landfills.
Results: DDD performed QA to ensure that 100% of labels were correct to train and improve the robotic systems.
-
DDD partnered with a sports analytics company to improve their video training data by tagging and categorizing sports videos for teams to track their performance and the performance of their opponents. DDD associates also analyzed and classified player positions. With this data, DDD generated high-quality reports, analysis, and visualizations for teams to improve their performance and win more games.
Results: DDD’s improved video training data to enable teams to improve their performance and win more games.
-
DDD partnered with a startup that assists the global farming community by analyzing drone and satellite imagery of orchards to monitor crop health. DDD classified, structured, cleaned, and optimized the satellite images for use. Our associates labeled different trees, effectively teaching the AI system to identify trees. The DDD team created ML training data by cleaning up and annotating orchard crop imagery involving nearly a million trees.
Results: Data was used to detect agricultural pests and diseases in orchards.
-
iWantGreatCare is the world’s leading platform for independent patient ratings and reviews. Every month, they receive over 100,000 new reviews in multiple languages. iWantGreatCare partnered with DDD to transcribe the handwritten feedback surveys from patients. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis.
Results: DDD developed a document analysis tool to transcribe handwritten review cards.
-
Spillage, or the loss of inventory by theft, accidents, and mistakes, costs retailers billions of dollars a year in the US alone. As self-checkouts grow in popularity, there is an equally growing need to help increase accurate identification of unscanned items that leave the store.
DDD helped our client label 25,000 images over a 5-week period
DDD and our client built a schema to track customer movements, relevant objects, and flag potential misidentification events
DDD’s dataset captured 99%+ of qualifying objects with an estimated F1 score of >0.97
Results: Our client was able to create a Proof of Concept (POC) system to begin building a new product line for this and similar retail use cases. They have been able to successfully build a pipeline around this project.
-
Broadband Bill
Challenge: Broadband bills are intentionally difficult to read and parse. DDD’s client was investigating irregularities and trends in billing practices, and DDD analyzed 30,000 bills and extracted 10+ fields per bill with accuracy levels exceeding 97%.
Solution:
DDD automated Personal Identifiable Information (PII) redaction for easier file storage DDD built data structures and dictionaries for different American broadband bills DDD associates tagged the data for thousands of bills to create a training corpus DDD deployed an AWS-based machine learning model to augment data extraction.
Results: DDD’s client was able to perform a variety of analyses on the dataset to determine variability in pricing structures across the US. They are currently consolidating these insights to inform policy changes and educate consumers.
-
Natural Language Processing
Challenge: We helped a financial institution build a chatbot to help customers execute a growing list of commands and requests without the need for human assistance. This is a challenge because of the variety of ways in which people express their intents.
Solution:
Intent Creation - i.e. display my balance
Text Variant Collection and Validation
Tagged key entities to help drive further analysis and understanding.
Performed Relationship Tagging between entities and sentiments to understand more complex commands
Results: Over 50,000 unique, validated utterances were generated across 100+ intents. Complex utterances with multiple intents enabled a more useful and functional Proof of Concept (POC). We expanded into several other use cases for the client and were able to reuse the process for 4 other languages within 2 years.
-
Computer Vision
Challenge: Factory and Industrial Safety is an ever- present concern. Keeping workers safe remains a high priority, and is especially critical in high-risk work environments involving heavy machinery and equipment. Ensuring adherence to safety protocols and guidelines keeps workers safe and reduces risk for companies and their employees.
Solution:
DDD worked with our client to develop a rigorous schema to apply to CCTV images that were often complex and unclear
DDD labeled over 10,000 images within 1 month to identify workers and body parts (heads, body, etc), PPE, and objects being carried, as well as grouped these instances together
The overall dataset was delivered with an F1 score exceeding 0.96
Results: Our customer was able to build and deploy a system to augment their existing safety check and protocols.
-
Data Cleanup
Challenge: A large commercial dataset vendor acquires millions of records for new and existing products, including descriptions, SKU numbers, MSRP, and units sold. This company needed to aggregate and augment the data with a focus on quality, as this was one of their product offerings.
Solution:
We trained a team of 50 individuals to understand the proper format
We identified outliers to drive updated categorization
We performed market research to understand accurate descriptions, classifications, etc, of the millions of products the client monitors
Results: The client was able to use DDD as a one-stop data shop and reduced their costs by 40% while maintaining quality. The DDD team presented edge cases and preliminary data analysis to drive new classifications on a regular basis. Our client continued to migrate their internal solutions to DDD for cost savings and additional flexibility.
universities & research institutions
-
Challenge: Economic historians at Emory University and Gesellschaft für Kapitalmarktforschung were researching global financial growth throughout history. The main sources were New York and German newspapers from the late 19th and early 20th century, which included detailed daily stock tables from both the New York Stock Exchange as well as the Berlin Stock Exchange. Due to the quality of the scans, table format and font size, the data needed to be entered manually.
Solution:
The principle investigators partnered with DDD to:
Captured and verified over two million financial data points from English and German newspapers
Clean and normalize the data to include standardized abbreviations and names
Enter the data with high accuracy and attention to detail
Results: With this data, the co-collaborators were able to complete their analysis of the data and are looking forward to publishing their findings.
-
Digitizing and structuring over 2 million pages of cultural heritage and historical content
The National Digital Newspaper Program (NDNP), a partnership between the National Endowment for the Humanities and the Library of Congress, is a long-term effort to develop an internet-based, searchable database of U.S. newspapers with descriptive information and select digitization of historic pages.
DDD worked with the Library of Congress and over 10 state and local institutions to digitize more than 2 million pages of historic newspaper archives residing in their collections. DDD encoded the text into METS/ALTO xml files with article segmentation and descriptive metadata to create an enhanced browsing and searching experience. With these files, the Library of Congress was able to create a digital archive to enhance the study of American history.
-
Product migration and optimization on AWS Cloud
DuraSpace is a not-for-profit organization that stewards community-supported open source software developed by librarians, archivists, technologists, and researchers who share the goal of creating and preserving long-term access to the world’s digital heritage. DuraSpace’s applications are deployed on the AWS cloud platform.
DDD has a team of AWS certified SysOps engineers in Kenya working closely with the US-based DuraSpace team to optimize the deployment of their product suite, build test automations, and perform R&D support where the team looks at new and innovative ways to reduce costs, improve resource utilization and scaling on product deployments on the AWS platform. The DDD team built out a new architecture in AWS for the deployment of DSpaceDirect, including load balancing, use of an elastic file system, and managed database services.
DuraSpace’s Services Technical Director, Bill Branan, had an opportunity to meet two of the Cloud Academy engineers involved in the project. “Hearing the stories of where these students started their journey, and all the work they have done as part of the DDD program to establish technical proficiency, provides a startling glimpse at the wealth of talent that is available if we are only willing to look. DDD is doing an incredible job of finding students who are eager to learn and providing them with the training they need to succeed. I look forward to continued work with DDD, to further both the DuraSpace and DDD missions.”
-
DDD’s team placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving.
Data Collection, Baseline and Endline Surveys, and Randomized Control Trials in Africa
Georgetown University’s Initiative on Innovation, Development and Evaluation (Gui2de) was researching different interventions to improve road safety and decrease road deaths and accidents in East Africa. Beginning in Kenya, stickers were placed inside buses with motivational messages encouraging passengers to speak up directly to their driver against bad driving. Results from multiple studies proved that buses in which these stickers were placed had between 25-50% fewer insurance accident claims, translating into 140 avoided accidents and 55 lives annually. Gui2de wanted to scale this intervention to Tanzania.
Over the course of 1 year, DDD’s field research team randomly selected, inspected, and placed stickers inside over 1,200 public buses encouraging passengers to speak up against reckless driving. DDD then collected accident data from police stations in 26 regions in Tanzania, digitizing and cleaning up the data in order to analyze and compare accident rates among vehicles with/without stickers.
Through this project, Gui2de was able to collect and analyze important data to increase the effectiveness of road safety interventions in Tanzania.
-
DDD’s field research team collected data on the financial behavior of 300 households among low-income families in Kenya.
Bankable Frontier Associates (BFA) is a global consulting firm specializing in using finance to create solutions for low-income people. BFA was conducting a study about the financial behavior of low-income households in Africa through financial data collection and the compilation of “Financial Diaries.”
DDD’s field research team worked on a landmark study to collect detailed data on the financial behavior of 300 households among low-income families in Kenya across 5 locations. DDD associates visited households every two weeks for 18 months, collecting over 500,000 transaction entries. Data included information on on wage, income, spending habits, day-to-day expenses, capital expenditures for businesses, and schooling costs. With this data, BFA was able to create a collage illustrating the financial behavior of the poverty-stricken, giving researchers a vivid picture of the lives of low-income households. A similar study was conducted on a smaller scale in Tanzania, funded by CGAP.
-
DDD’s field research team distributed calendars to 36,000 households to motivate families to demand better WASH facilities.
CARE International works around the globe to save lives, defeat poverty and achieve social justice. CARE, with funding from the Bill and Melinda Gates Foundation, partnered with DDD to research and implement a School Water Sanitation Hygiene plus community action (SWASH+) project to test and evaluate alternative strategies to improve the sustainable provision of sanitation facilities in primary schools in Kenya. The implementations included a cell phone-based Education Management Information System (EMIS) to improve the collection of reliable school-level information for planning and management purposes and complementary interventions to improve delivery and management of school WASH inputs.
DDD’s field research team distributed provocative and educational calendars to 36,000 households to motivate families to demand better WASH facilities for their children. Parents were given a mobile phone application to rate and share school WASH performance, thereby exerting community-level pressure on administrators. DDD associates implemented a series of randomized interventions, as well as tracking sanitation and hygiene benchmarks in over 350+ schools to measure changes in demand for improved hygiene and sanitation through the EMIS.
With this data, CARE International was able to implement a number of complementary interventions to improve WASH provisions in schools across Kenya. A year later, DDD performed a secondary study to test and evaluate the success of the improved interventions.
CULTURAL HERITAGE
-
Data Curation and Annotation
Libraries, museums, and archives rely on their data being extracted from modern and ancient documents for research and preservation. DDD transcribed, tagged, and labeled difficult-to-read, complex handwritten data to train engines for intelligent document analysis. DDD has transcribed over 500 million characters of handwritten text annually across various document types.
Results: Historical documents are able to be digitized for broader audience reach.
-
Digital preservation, Cloud-based digital archive and collections management system for one of the world’s largest archeology and paleontology collections
The National Museums of Kenya (NMK) is the custodian of Kenya’s natural and cultural heritage. With over 10 million artifacts, fossils, and specimens, its collections represent the longest record of human evolution in the world. For years, NMK sought to preserve these rare and important collections through digital preservation to mitigate the risk of losing valuable information and records due to decay and the passage of time.
DDD is enabling NMK to achieve this objective by creating an entire digital records management, collections, and archiving system on the AWS Cloud. The DDD team is also digitizing the collections including undertaking 3D imaging, photometry, geotagging, geo-spatial analysis and training the NMK teams. Additionally, DDD is creating a virtual museum experience for the public, while providing access to the rarest of materials and artifacts to the research and academic community.
Dr. Mzalendo Kibunjia, Director General of NMK, said, “The searchable digital archive will not only be of immense value to researchers worldwide, but will also make our evolutionary history and culture more accessible to the younger generations which would foster a deeper and richer understanding of our heritage.”
-
DDD digitized and preserved the Tuol Sleng Genocide Museum archives of Cambodia, a Cambodian high-school-turned-prison.
Digital Preservation of at-risk records at the Tuol Sleng Genocide Museum in Cambodia
UNESCO partnered with DDD and The Brechin Group Inc to digitize and preserve the Tuol Sleng Genocide Museum archives of Cambodia to recognize its historical importance and increase awareness about the Cambodian genocide. The Tuol Sleng Genocide Museum, a Cambodian high-school-turned-prison by the Khmer Rouge in 1975, has a collection of over 400,000 documents. The archive contains photographs of over 6,000 prisoners, as well as elaborate handwritten “confessions” to real and imagined offenses, many extracted under torture, and other biographical records of prisoners and prison guards.
Together, DDD and Brechin undertook an extensive project to digitize the collection, assessing and improving conditions for optimal preservation. After receiving indexing and technical metadata training from Brechin, DDD associates were able to digitized the fragile collection. Approximately 400,000 documents were indexed and descriptive metadata was added. The digital files were then complied into a database in English and Khmer.
Additionally, Brechin trained the Tuol Sleng museum staff in preservation, digitization, and indexing techniques to continue preserving the collection as needed. Currently, DDD associates are building the website to host and disseminate the educational resources to thousands of online visitors.
-
DDD worked with the White House Historical Association to digitize their image collection dating back to 1962.
The White House Historical Association (WHHA) was founded in 1962 to enhance the understanding and appreciation of the Executive Mansion. From 1962 through the mid-1980s, the WHHA worked with photographers to document major events and day-to-day life in the White House. These images were preserved on 35mm slide film, remaining in storage for more than two decades.The physical slides were at risk of degradation given their age and many had never been scanned before. In the early 2000s, WHHA switched to digital photography, generating roughly 7TB of image files that were stored on external drives. The digital files were also not being utilized as the were stored on external hard drives and thus unsearchable.
WWHA partnered with DDD to digitize their image collections. DDD associates digitized the images, adding metadata and creating searchable files for their digital asset management (DAM) system. The digital files were then optimized and processed for Amazon Glacier cloud storage for long-term preservation.
With these digital images, the WWHA was able to create a Digital Library - launched on Amazon Snowball - to ensure public access to these unique records and to protect the physical assets by creating digital copies.
WWHA was recently awarded the Digital Preservation Award for their significant and innovative contributions to securing digital legacy.