FinMarketNews
  • Login
  • Home
  • News

    Political News

    Tech News

    Business and Finance

    Deals

    Health

    Environment

    Economy

    Equities

    Foreign Exchange and Fixed Income

    Commodities and Energy

    • Political News
    • Tech News
    • Business and Finance
    • Deals
    • Health
    • Environment
    • Economy
    • Equities
    • Foreign Exchange and Fixed Income
    • Commodities and Energy
  • Screener
  • Strong Buys/Insider Buys
  • Calendar
    • Earnings Calendars
    • Dividend Calendars
    • IPO Calendar
  • Special Reports
SUBSCRIBE
  • Home
  • News

    Political News

    Tech News

    Business and Finance

    Deals

    Health

    Environment

    Economy

    Equities

    Foreign Exchange and Fixed Income

    Commodities and Energy

    • Political News
    • Tech News
    • Business and Finance
    • Deals
    • Health
    • Environment
    • Economy
    • Equities
    • Foreign Exchange and Fixed Income
    • Commodities and Energy
  • Screener
  • Strong Buys/Insider Buys
  • Calendar
    • Earnings Calendars
    • Dividend Calendars
    • IPO Calendar
  • Special Reports
No Result
View All Result
FinMarketNews
No Result
View All Result
Home World By Topic Tech News

How synthetic data could save AI

chris by chris
March 21, 2021
Reading Time:7min read
0
How synthetic data could save AI

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.


AI is facing several critical challenges. Not only does it need huge amounts of data to deliver accurate results, but it also needs to be able to ensure that data isn’t biased, and it needs to comply with increasingly restrictive data privacy regulations. We have seen several solutions proposed over the last couple of years to address these challenges — including various tools designed to identify and reduce bias, tools that anonymize user data, and programs to ensure that data is only collected with user consent. But each of these solutions is facing challenges of its own.

Now we’re seeing a new industry emerge that promises to be a saving grace: synthetic data. Synthetic data is artificial computer-generated data that can stand-in for data obtained from the real world.

A synthetic dataset must have the same mathematical and statistical properties as the real-world dataset it is replacing but does not explicitly represent real individuals. Think of this as a digital mirror of real-world data that is statistically reflective of that world. This enables training AI systems in a completely virtual realm. And it can be readily customized for a variety of use cases ranging from healthcare to retail, finance, transportation, and agriculture.

There’s significant movement happening on this front. More than 50 vendors have already developed synthetic data solutions, according to research last June by StartUs Insights. I will outline some of the leading players in a moment. First, though, let’s take a closer look at the problems they’re promising to solve.

The trouble with real data

Over the last few years, there has been increasing concern about how inherent biases in datasets can unwittingly lead to AI algorithms that perpetuate systemic discrimination. In fact, Gartner predicts that through 2022, 85% of AI projects will deliver erroneous outcomes due to bias in data, algorithms, or the teams responsible for managing them.

The proliferation of AI algorithms has also led to growing concerns over data privacy. In turn, this has led to stronger consumer data privacy and protection laws in the EU with GDPR, as well as U.S. jurisdictions including California and most recently Virginia.

These laws give consumers more control over their personal data. For example, the Virginia law grants consumers the right to access, correct, delete, and obtain a copy of personal data as well as to opt out of the sale of personal data and to deny algorithmic access to personal data for the purposes of targeted advertising or profiling of the consumer.

By restricting access to this information, a certain amount of individual protection is gained but at the cost of the algorithm’s effectiveness. The more data an AI algorithm can train on, the more accurate and effective the results will be. Without access to ample data, the upsides of AI, such as assisting with medical diagnoses and drug research, could also be limited.

One alternative often used to offset privacy concerns is anonymization. Personal data, for example, can be anonymized by masking or eliminating identifying characteristics such as removing names and credit card numbers from ecommerce transactions or removing identifying content from healthcare records. But there is growing evidence that even if data has been anonymized from one source, it can be correlated with consumer datasets exposed from security breaches. In fact, by combining data from multiple sources, it is possible to form a surprisingly clear picture of our identities even if there has been a degree of anonymization. In some instances, this can even be done by correlating data from public sources, without a nefarious security hack.

Synthetic data’s solution

Synthetic data promises to deliver the advantages of AI without the downsides. Not only does it take our real personal data out of the equation, but a general goal for synthetic data is to perform better than real-world data by correcting bias that is often engrained in the real world.

Although ideal for applications that use personal data, synthetic information has other use cases, too. One example is complex computer vision modeling where many factors interact in real time. Synthetic video datasets leveraging advanced gaming engines can be created with hyper-realistic imagery to portray all the possible eventualities in an autonomous driving scenario, whereas trying to shoot photos or videos of the real world to capture all these events would be impractical, maybe impossible, and likely dangerous. These synthetic datasets can dramatically speed up and improve training of autonomous driving systems.

(Above image: Synthetic images are used to train autonomous vehicle algorithms. Source: synthetic data provider Parallel Domain.)

Perhaps ironically, one of the primary tools for building synthetic data is the same one used to create deepfake videos. Both make use of generative adversarial networks (GAN), a pair of neural networks. One network generates the synthetic data and the second tries to detect if it is real. This is operated in a loop, with the generator network improving the quality of the data until the discriminator cannot tell the difference between real and synthetic.

The emerging ecosystem

Forrester Research recently identified several critical technologies, including synthetic data, that will comprise what they deem “AI 2.0,” advances that radically expand AI possibilities. By more completely anonymizing data and correcting for inherent biases, as well as creating data that would otherwise be difficult to obtain, synthetic data could become the saving grace for many big data applications.

Synthetic data also comes with some other big benefits: You can create datasets quickly and often with the data labeled for supervised learning. And it does not need to be cleaned and maintained the way real data does. So, theoretically at least, it comes with some large time and cost savings.

Several well-established companies are among those that generate synthetic data. IBM describes this as data fabrication, creating synthetic test data to eliminate the risk of confidential information leakage and address GDPR and regulatory issues. AWS has developed in-house synthetic data tools to generate datasets for training Alexa on new languages. And Microsoft has developed a tool in collaboration with Harvard with a synthetic data capability that allows for increased collaboration between research parties. Notwithstanding these examples, it is still early days for synthetic data and the developing market is being led by the startups.

To wrap up, let’s take a look at some of the early leaders in this emerging industry. The list is constructed based on my own research and industry research organizations including G2 and StartUs Insights.

  1. AiFi — Uses synthetically generated data to simulate retail stores and shopper behavior.
  2. AI.Reverie — Generates synthetic data to train computer vision algorithms for activity recognition, object detection, and segmentation. Work has included wide-scope scenes like smart cities, rare plane identification, and agriculture, along with smart-store retail.
  3. Anyverse — Simulates scenarios to create synthetic datasets using raw sensor data, image processing functions, and custom LiDAR settings for the automotive industry.
  4. Cvedia — Creates synthetic images that simplify the sourcing of large volumes of labeled, real, and visual data. The simulation platform employs multiple sensors to synthesize photo-realistic environments resulting in empirical dataset creation.
  5. DataGen — Interior-environment use cases, like smart stores, in-home robotics, and augmented reality.
  6. Diveplane — Creates synthetic ‘twin’ datasets for the healthcare industry with the same statistical properties of the original data.
  7. Gretel — Aiming to be GitHub equivalent for data, the company produces synthetic datasets for developers that retain the same insights as the original data source.
  8. Hazy — generates datasets to boost fraud and money laundering detection to combat financial crime.
  9. Mostly AI — Focuses on insurance and finance sectors and was one of the first companies to create synthetic structured data.
  10. OneView – Develops virtual synthetic datasets for analysis of earth observation imagery by machine learning algorithms.

Gary Grossman is the Senior VP of Technology Practice at Edelman and Global Lead of the Edelman AI Center of Excellence.

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

  • up-to-date information on the subjects of interest to you
  • our newsletters
  • gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
  • networking features, and more

Become a member

RELATED POSTS

Reuters exclusively reports U.S. agency probing Facebook for ‘systemic’ racial bias in hiring, promotions

Reuters exclusively reports software vendors would have to disclose breaches to U.S. government users under new order

Reuters details how Southeast Asian tech firm SEA sends rivals scrambling

chris

chris

Related Posts

latest

Reuters exclusively reports U.S. agency probing Facebook for ‘systemic’ racial bias in hiring, promotions

April 9, 2021
latest

Reuters exclusively reports software vendors would have to disclose breaches to U.S. government users under new order

April 2, 2021
latest

Reuters details how Southeast Asian tech firm SEA sends rivals scrambling

April 2, 2021
latest

Reuters reveals Analog Devices’ $21 billion Maxim buy set for EU approval; market reacts

March 29, 2021
latest

Reuters exclusively reports Microsoft could reap more than $150 million in new U.S. cyber spending, upsetting some lawmakers

March 26, 2021
latest

Reuters exclusively reports TikTok considers introducing group chat feature

March 26, 2021
Next Post
How Rooster Teeth expanded beyond Red vs. Blue to the Roost Podcast Network

How Rooster Teeth expanded beyond Red vs. Blue to the Roost Podcast Network

Xbox Series X and S owners start testing Dolby Vision HDR for gaming

Xbox Series X and S owners start testing Dolby Vision HDR for gaming

Please login to join discussion

Stock Market Widget

Stock Market Today by TradingView

Recommended Stories

Tennessee Titans at Green Bay Packers odds, picks and prediction

Tennessee Titans at Green Bay Packers odds, picks and prediction

February 7, 2021
2020 holds the record for most named storm landfalls in a single hurricane season

2020 holds the record for most named storm landfalls in a single hurricane season

February 4, 2021
Ex-investment executive to plead guilty in admissions scam

Ex-investment executive to plead guilty in admissions scam

February 7, 2021

Recent Analyst Activity

Recent Analyst Activity:
– – – – – – – – – – – – – –
Upcoming IPO’s:
– – – – – – – – – – – – – –
Upcoming Earning Reports:
– – – – – – – – – – – – – –

Popular Stories

  • Arizona Gov: hospitals can handle virus surge

    Arizona Gov: hospitals can handle virus surge

    0 shares
    Share 0 Tweet 0
  • Tennessee Titans organization strongly backs players’ right to peacefully protest

    0 shares
    Share 0 Tweet 0
  • Reuters exclusively reports White House without Trump stays quiet on OPEC

    0 shares
    Share 0 Tweet 0
  • Series has been really good for us to understand where we stand, says Rohit

    0 shares
    Share 0 Tweet 0
  • Experts: Police in US undertrained in use of force

    0 shares
    Share 0 Tweet 0

Recent Posts

  • Reuters revealed how Trump administration left indelible mark on U.S. immigration courts
  • Reuters exclusively reports EU is set to sanction more Iranians for rights abuses
  • Reuters revealed how Trump administration left indelible mark on U.S. immigration courts

© 2021 JNews - Premium WordPress news & magazine theme by Jegtheme.

No Result
View All Result
  • Home
  • Subscription
  • Category
    • Business
    • Culture
    • Lifestyle
    • Health_
    • Travel
    • Opinion
    • Politics
    • Tech
    • World
  • Landing Page
  • Buy JNews
  • Support Forum
  • Pre-sale Question
  • Contact Us

© 2021 JNews - Premium WordPress news & magazine theme by Jegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?