How can we protect children’s online voice privacy in the AI era?

Introduction

Emerging technologies like artificial intelligence (AI) are changing the way humans interact with machines. As AI technology has made huge progress over the last decade, the processing of modalities such as text, voice, image, and video data has been replaced with data-driven large AI models. These models were primarily aimed for machines to comprehend various data and perform tasks without human intervention. Now, with the emergence of generative AI like ChatGPT, these models are capable of generating data such as text, voice, image, or video. Policymakers across the globe are struggling to draft to govern ethical use of data as well as regulate the creation of safe, secure, and trustworthy AI models.

Data privacy is a major concern with the advent of AI technology. Actions by the US Congress such as the proposed American Privacy Rights Act aim to enforce strict data privacy rights. With emerging AI applications for children, the privacy of children and the safekeeping of their personal information is also a legislative challenge.

Why is children’s voice privacy important?

Congress must act to protect children’s voice privacy before it’s too late. Companies that store children’s voice recordings and use them for profit-driven applications (or advertising) without parental consent pose serious privacy threats to children and families. The proposed revisions to the Children’s Online Privacy Protection Act (COPPA) aim to restrict companies’ capacity to profit from children’s data and transfer the responsibility of compliance from parents to companies. However, several measures in the proposed legislation need more clarity and additional guidelines.

Challenge and Opportunity

Human voice is one of the most popular modalities for AI technology. Advancements in voice AI technology such as voice AI assistants (Siri, Google, Bixby, Alexa, etc.) in smartphones have made many day-to-day activities easier; however, there are also emerging threats from voice AI and a lack of regulations governing voice data and voice AI applications. One example is AI voice impersonation scams. Using the latest voice AI technology, a high-quality personalized voice recording can be generated with as little as 15 seconds of the speaker’s recorded voice. A technology rat race among Big Tech has begun, as companies are trying to achieve this using voice recordings that are less than a few seconds. Scammers have increasingly been using this technology for their benefit. OpenAI, the creator of ChatGPT, recently developed a product called Voice Engine—but refrained from commercializing it by acknowledging that this technology poses “serious risks,” especially in an election year.

A voice recording contains very personal information about a speaker, and that gives the ability to identify a target speaker from recordings of multiple speakers. Emerging research in voice AI technology has potential implications for medical and health-related applications from voice recordings, plus identification of age, height, and much more. When using cloud-based applications, privacy concerns also arise during voice data transfer and from data storage leaks, due to noncompliance with data collection and storage. Therefore, the threats from misuse of voice data and voice AI technology are enormous.

What are the current legislative measures?

Social media services, educational technology, online games, and smart toys are just a few services for children that have started adopting voice technology (e.g., Alexa for Kids). Any service operator (or company) collecting and using children’s personal information, including their voice, is bound by the Children’s Online Privacy Protection Act (COPPA). The Federal Trade Commission (FTC) is the enforcing federal agency for COPPA. However, several companies have recently violated COPPA by collecting personal information from children without parental consent and used it for advertising and maximizing their platform profits. “Amazon’s history of misleading parents, keeping children’s recordings indefinitely, and flouting parents’ deletion requests violated COPPA and sacrificed privacy for profits,” said Samuel Levine of the FTC’s Bureau of Consumer Protection. The FTC alleges that Amazon maintained records of children’s data, disregarding parents’ deletion requests, and trained its Voice AI algorithms on that data.

Children’s spoken characteristics are different from those of adults; thus, developing voice AI technology for children is more challenging. Most commercial voice-AI-enabled services work smoothly for adults, but their accuracy in understanding children’s voices is often limited. Another challenge is the relatively sparse availability of children’s voice data to train AI models. Therefore, Big Tech is looking to leverage ways to acquire as much children’s voice data as possible to train AI voice models. This challenge is prevalent not only in industry but also in academic research on the subject due to very limited data availability and varying spoken skills. However, misuse of acquired data, especially without consent, is not a solution, and operators must be penalized for such actions.

Considering the recent violations of COPPA by operators, and with a goal to strengthen the compliance of safeguarding and avoid misuse of personal information such as voice, Congress is updating COPPA with new legislation. The COPPA updates propose to extend and update the definition of “operator,” “personal information” including voice prints, “consent,” “website/service/application” including devices connected to the internet, and guidelines for “collection, use, disclosure, and deletion of personal information.” These updates are especially critical when the personal information of users (or consumers) can serve as valuable data for operators for profit-driven applications and misuse without any federal regulation. The FTC acknowledges that the current version of COPPA is insufficient; therefore, these updates would also enable the FTC to act on operators and take strict action.

Plan of Action

The Children and Teens’ Online Privacy Protection Act (COPPA 2.0) has been proposed in both the Senate and House to update COPPA for the modern internet age, with a renewed focus on limiting misuse of children’s personal data (including voice recordings). This proposed legislation has gained momentum and bipartisan support. However, the text in this legislation could still be updated to ensure consumer privacy and support future innovation.

Recommendation 1. Clarify the exclusion clause for audio files.

An exclusion clause has been added in this legislation particularly for audio files containing a child’s voice, declaring that the collected audio file is not considered personal information if it meets certain criteria. This was added to adopt a more expansive audio file exception, particularly to allow operators to provide some features to their users (or consumers).

While just having the text “only uses the voice within the audio file solely as a replacement for written words” might be overly restrictive for voice-based applications, the text “to perform a task” might open the use of audio files for any task that could be beneficial to operators. The task should only be related to performing a request or providing a service to the user, and that needs to be clarified in the text. Potential misuse of this text could be (1) to train AI models for tasks that might help operators provide a service to the user—especially for personalization, or (2) to extract and store “audio features” (most voice AI models are trained using audio features instead of the raw audio itself). Operators might argue that extracting audio features is necessary as part of the algorithm that assists in providing a service to the user. Therefore, the phrasing “to perform a task” in this exclusion might be open-ended and should be modified as suggested:

Current text: “(iii) only uses the voice within the audio file solely as a replacement for written words, to perform a task, or engage with a website, online service, online application, or mobile application, such as to perform a search or fulfill a verbal instruction or request; and”

Suggestion text: “(iii) only uses the voice within the audio file solely as a replacement for written words, to only perform a task to engage with a website, online service, online application, or mobile application, such as to perform a search or fulfill a verbal instruction or request; and”

On a similar note, legislators should consider adding the term “audio features.” Audio features are enough to train voice AI models and develop any voice-related application, even if the original audio file is deleted. Therefore, the deletion argument in the exclusion clause should be modified as suggested:

Current text: “(iv) only maintains the audio file long enough to complete the stated purpose and then immediately deletes the audio file and does not make any other use of the audio file prior to deletion.”

Suggestion text: “(iv) only maintains the audio file long enough to complete the stated purpose and then immediately deletes the audio file and any extracted audio-based features and does not make any other use of the audio file (or extracted audio-features) prior to deletion.”

Adding more clarity to the exclusion will help avoid misuse of children’s voices for any task that companies might still find beneficial and also ensure that operators delete all forms of the audio which could be used to train AI models.

Recommendation 2. Add guidelines on the deidentification of audio files to enhance innovation.

A deidentified audio file is one that cannot be used to identify the speaker whose voice is recorded in that file. The legislative text of COPPA 2.0 does not mention or have any guidelines on how to deidentify an audio file. These guidelines would not only protect the privacy of users but also allow operators to use deidentified audio files to add features and improve their products. The guidelines could include steps to be followed by operators as well as additional commitment from operators.

The steps include:

Each audio file collected by an application should be stored with an anonymous identifier.
Each audio file collected from the same user account (a child) or a device (e.g., smartphone, tablets/iPad, laptop/computer) should be treated as an individual file and stored with anonymous identifiers. This will avoid linking multiple audio files from the same user or device.
If any audio file contains any personally identifiable information, that audio file must be deleted instantly, and the user (parent of the child) must be informed. There is no guarantee that an audio recording will not contain any personally identifiable information, and therefore this step is critical.

The commitments include:

An operator should commit not to reidentify the speaker from any given audio file (both anonymized and non-anonymous) (this task is often called “speaker identification”).
An operator should seek consent (or approval) from the parent of the user (child) to allow the use of the deidentified audio files for product development, and only then use it or else immediately delete the audio file following the completion of the task for which it was collected.

Following these guidelines might be expensive for operators; however, it is crucial to take as many precautions as possible. Current deidentification steps of audio files followed by operators are not sufficient, and there have been numerous instances in which anonymized data had been reidentified, according to a statement released by a group of State Attorneys General. These proposed guidelines could allow operators to deidentify audio files and use those files for product development. This will allow the innovation of voice AI technology for children to flourish.

Recommendation 3. Add AI-generated avatars in the definition of personal information.

With the emerging applications of generative AI and growing virtual reality use for education (in classrooms) and for leisure (in online games), “AI-based avatar generation from a child’s image, audio, or video” should be added to the legislative definition of “personal information.” Virtual reality is a growing space, and digital representations of the human user (an avatar) are increasingly used to allow the user to see and interact with virtual reality environments and other users.

Conclusion

As new applications of AI emerge, operators must ensure compliance in the collection and use of consumers’ personal information and safety in the design of their products using that data, especially when dealing with vulnerable populations like children. Since the original passage of COPPA in 1998, how consumers use online services for day-to-day activities, including educational technology and amusement for children, has changed dramatically. This ever-changing scope and reach of online services require strong legislative action to bring online privacy standards into the 21^st century. Without a doubt, COPPA 2.0 will lead this regulatory drive not only to protect children’s personal information collected by online services and operators from misuse but also to ensure that the burden of compliance rests on the operators rather than on parents. These recommendations will help strengthen the protections of COPPA 2.0 even further while leaving open avenues for innovation in voice AI technology for children.

This idea is part of our AI Legislation Policy Sprint. To see all of the policy ideas spanning innovation, education, healthcare, and trust, safety, and privacy, head to our sprint landing page.