Posts

« All Blogs

3D illustration of gavel and file folder labled AI ETHICS

AI and Data Ethics: 5 Principles to Consider

As organizations develop their own internal ethical practices and countries continue to develop legal requirements, we are at the beginning of determining standards for ethical use of data and artificial intelligence (AI).

In the past 20 years, our ability to collect, store, and process data has dramatically increased. There are exciting new tools that can help us automate processes, learn things we couldn’t see before, recognize patterns, and predict what is likely to happen. Since our capacity to do new things has developed quickly, the focus in tech has been primarily on what we can do. Today, organizations are starting to ask what’s the right thing to do.

This is partly a global legal question as countries implement new requirements for the use and protection of data, especially information directly or indirectly connected to individuals. It’s also an ethical question as we address concerns about bias and discrimination, and explore concerns about privacy and a person’s rights to understand how data about them is being used.

What is AI and Data Ethics?

Ethical use of data and algorithms means working to do the right thing in the design, functionality, and use of data in Artificial Intelligence (AI).

It’s evaluating how data is used and what it’s used for, considering who does and should have access, and anticipating how data could be misused. It means thinking through what data should and should not be connected with other data and how to securely store, move, and use it. Ethical use considerations include privacy, bias, access, personally identifiable information, encryption, legal requirements and restrictions, and what might go wrong.

Data Ethics also means asking hard questions about the possible risks and consequences to people whom the data is about and the organizations who use that data. These considerations include how to be more transparent about what data organizations have and what they do with it. It also means being able to explain how the technology works, so people can make informed choices on how data about them is used and shared.

Why is Ethics Important in HR Technology?

Technology is evolving fast. We can create algorithms that connect and compare information, see patterns and correlations, and offer predictions. Tools based on data and AI are changing organizations, the way we work, and what we work on. But we also need to be careful about arriving at incorrect conclusions from data, amplifying bias, or relying on AI opinions or predictions without thoroughly understanding what they are based on.

We want to think through what data goes into workplace decisions, how AI and technology affect those decisions, and then come up with fair principles for how we use data and AI.

What Are Data Ethics Principles?

Ethics is about acknowledging competing interests and considering what is fair. Ethics asks questions like: What matters? What is required? What is just? What could possibly go wrong? Should we do this?

In trying to answer these questions, there are some common principles for using data and AI ethically.

  1. Transparency – This includes disclosing what data is being collected, what decisions are made with the assistance of AI, and whether a user is dealing with bots or humans. It also means being able to explain how algorithms work and what their outputs are based on. That way, we can evaluate the information they give us against the problems we’re trying to solve. Transparency also includes how we let people know what data an organization has about them and how it is used. Sometimes, this includes giving people an opportunity to have information corrected or deleted.
  2. Fairness – AI doesn’t just offer information. Sometimes it offers opinions. This means we have to think through how these tools and the information they give us are used. Since data comes from and concerns humans, it’s essential to look for biases in what data is collected, what rules are applied, and what questions are asked of the data. For example, if you want to increase diversity in hiring, you don’t want to only rely on tools that tell you who has been successful in your organization in the past. This information alone would likely give you more of the same rather than more diversity. While there is no way to completely eliminate bias in tools created by and about people, we need to understand how the tools are biased so we can reduce and manage the bias and correct for it in our decision making.
  3. Accuracy – The data used in AI should be up to date and accurate. And there needs to be ways to correct it. Data should also be handled, cleaned, sorted, connected, and shared with care to retain its accuracy. Sometimes taking data out of context can make it appear misleading or untrue. So accuracy depends partly on whether the data is true, and partly on whether it makes sense and is useful based on what we are trying to do or learn.
  4. Privacy – Some cultures believe that privacy is part of fundamental human rights and dignity. An increasing number of privacy laws around the globe recognize privacy rights in our names and likeness, financial and medical records, personal relationships, homes, and property. We are still working out how to balance privacy and the need to use so much personal data. Law makers have been more comfortable allowing broader uses of anonymized data than data where you know, or can easily discover, who it’s about. But as more data is collected and connected, questions arise about how to maintain that anonymity. Other privacy issues include security of the information and what people should know about who has data about them and how its used.
  5. Accountability – This is not just compliance with global laws and regulations. Accountability is also about the accuracy and integrity of data sources, understanding and evaluating risks and potential consequences of developing and using data and AI, and implementing processes to make sure that new tools and technologies are created ethically.

As organizations develop their own internal ethical practices and countries continue to develop legal requirements, we are at the beginning of determining standards for ethical use of data and AI.

ADP is already working on its AI and data ethics, through establishing an AI and Data Ethics Board and developing ethical principles that are customized to ADP’s data, products and services. Next in our series on AI and Ethics, we will be talking to each of ADP’s AI and Data Ethics Board members about ADP’s guiding ethical principles and how ADP applies those principles to its design, processes, and products.

Read our position paper, “ADP: Ethics in Artificial Intelligence,” found in the first blade underneath the intro on the Privacy at ADP page.

« All Blogs

Close up of lights on computer devices in server room

How to select, gather, clean, and test data for machine learning systems

https://explore.adp.com/spark3/how-data-becomes-insight-the-right-data-matters-454FC-31577B.html

 

How Data Becomes Insight:

The Right Data Matters

By SPARK Team

What goes into selecting, gathering, cleaning and testing data for machine-learning systems?

It’s not enough to have a lot of data and some good ideas. The quality, quantity and nature of the data is the foundation for using it effectively.

 

We asked members of the ADP® DataCloud Team to help us understand what goes into selecting, gathering, cleaning and testing data for machine-learning systems.

Q: How do you go from lots of information to usable data in a machine-learning system?

DataCloud Team: The first thing to figure out is whether you have the information you want to answer the questions or solve the problem you’re working on. So, we look at what data we have and figure out what we can do with it. Sometimes, we know right away we need some other data to fill in gaps or provide more context. Other times, we realize that some other data would be useful as we build and test the system. One of the exciting things about machine learning is that it often gives us better questions, which sometimes need new data that we hadn’t thought about when we started.

 

Once you know what data you want to start with, then you want it “clean and normalized.” This just means that the data is all in a consistent format so it can be combined with other data and analyzed. It’s the process where we make sure we have the right data, get rid of irrelevant or corrupt data, that the data is accurate and that we can use it with all our other data when the information is coming from multiple sources.

 

A great example is job titles. Every company uses different titles. A “director” could be an entry-level position, a senior executive, or something in between. So, we could not compare jobs based on job titles. We had to figure out what each job actually was and where it fit in a standard hierarchy before we could use the data in our system.

Q: This sounds difficult.

DataCloud Team: There’s a joke that data scientists spend 80 percent of their time cleaning data and the other 20 percent complaining about it.

 

At ADP, we are fortunate that much of the data we work with is collected in an organized and usable way through our payroll and HR systems, which makes part of the process easier. Every time we change one of our products or build new ones, data compatibility is an important consideration. This allows us to work on the more complex issues, like coming up with a workable taxonomy for jobs with different titles.

 

But getting the data right is foundational to everything that happens, so it’s effort well spent.

Q: If you are working with HR and payroll data, doesn’t it have a lot of personal information about people? How do you handle privacy and confidentiality issues?

DataCloud Team: We are extremely sensitive to people’s privacy and go to great lengths to protect both the security of the data we have as well as people’s personal information.

 

With machine learning we are looking for patterns, connections or matches and correlations. So, we don’t need personally identifying data about individuals. We anonymize the information and label and organize it by categories such as job, level in hierarchy, location, industry, size of organization, and tenure. This is sometimes called “chunking.” For example, instead of keeping track of exact salaries, we combine them into salary ranges. This both makes the information easier to sort and protects people’s privacy.

 

With benchmarking analytics, if any data set is too small to make anonymous ― meaning it would be too easy to figure out who it was ― then we don’t include that data in the benchmark analysis.

Q: Once you have your initial data set, how do you know when you need or want more?

DataCloud Team: The essence of machine learning is more data.

 

We want to be able to see what is happening over time, what is changing, and be able to adjust our systems based on this fresh flow of data. As people use the programs, we are also able to validate or correct information. For example with our jobs information, users tell us how the positions in their organization fit into our categories. This makes the program useful to them, and makes the overall database more accurate.

 

As people use machine-learning systems, they create new data which the system learns from and adjusts to. It allows us to detect changes, see cycles over time, and come up with new questions and applications. Sometimes we decide we need to add a new category of information or ask the system to process the information a different way.

 

These are the things that both keep us up at night and make it exciting to show up at work every day.

 

 

Learn more by getting our guide, “Proving the Power of People Data.”