« All Blogs

ADP speakers at CES

Creating human-focused solutions in today’s product strategy

ADP Business Anthropologist Martha Bird sat down with Daniel Litwin, the Voice of B2B, at CES 2020, discussing a wide range of topics related to how her anthropological work and research impacts businesses and consumer needs.

Bird has worked for numerous companies in the field of business anthropology since the early 2000s, working to create human-focused solutions to business needs.

Bird and Litwin touch on their CES experience, a modern focus on human-centered and human-responsive products and how those concepts affect consumer product development, consumer longing for personalized experiences, and more.

« All Blogs

Close up of lights on computer devices in server room

How to select, gather, clean, and test data for machine learning systems

https://explore.adp.com/spark3/how-data-becomes-insight-the-right-data-matters-454FC-31577B.html

 

How Data Becomes Insight:

The Right Data Matters

By SPARK Team

What goes into selecting, gathering, cleaning and testing data for machine-learning systems?

It’s not enough to have a lot of data and some good ideas. The quality, quantity and nature of the data is the foundation for using it effectively.

 

We asked members of the ADP® DataCloud Team to help us understand what goes into selecting, gathering, cleaning and testing data for machine-learning systems.

Q: How do you go from lots of information to usable data in a machine-learning system?

DataCloud Team: The first thing to figure out is whether you have the information you want to answer the questions or solve the problem you’re working on. So, we look at what data we have and figure out what we can do with it. Sometimes, we know right away we need some other data to fill in gaps or provide more context. Other times, we realize that some other data would be useful as we build and test the system. One of the exciting things about machine learning is that it often gives us better questions, which sometimes need new data that we hadn’t thought about when we started.

 

Once you know what data you want to start with, then you want it “clean and normalized.” This just means that the data is all in a consistent format so it can be combined with other data and analyzed. It’s the process where we make sure we have the right data, get rid of irrelevant or corrupt data, that the data is accurate and that we can use it with all our other data when the information is coming from multiple sources.

 

A great example is job titles. Every company uses different titles. A “director” could be an entry-level position, a senior executive, or something in between. So, we could not compare jobs based on job titles. We had to figure out what each job actually was and where it fit in a standard hierarchy before we could use the data in our system.

Q: This sounds difficult.

DataCloud Team: There’s a joke that data scientists spend 80 percent of their time cleaning data and the other 20 percent complaining about it.

 

At ADP, we are fortunate that much of the data we work with is collected in an organized and usable way through our payroll and HR systems, which makes part of the process easier. Every time we change one of our products or build new ones, data compatibility is an important consideration. This allows us to work on the more complex issues, like coming up with a workable taxonomy for jobs with different titles.

 

But getting the data right is foundational to everything that happens, so it’s effort well spent.

Q: If you are working with HR and payroll data, doesn’t it have a lot of personal information about people? How do you handle privacy and confidentiality issues?

DataCloud Team: We are extremely sensitive to people’s privacy and go to great lengths to protect both the security of the data we have as well as people’s personal information.

 

With machine learning we are looking for patterns, connections or matches and correlations. So, we don’t need personally identifying data about individuals. We anonymize the information and label and organize it by categories such as job, level in hierarchy, location, industry, size of organization, and tenure. This is sometimes called “chunking.” For example, instead of keeping track of exact salaries, we combine them into salary ranges. This both makes the information easier to sort and protects people’s privacy.

 

With benchmarking analytics, if any data set is too small to make anonymous ― meaning it would be too easy to figure out who it was ― then we don’t include that data in the benchmark analysis.

Q: Once you have your initial data set, how do you know when you need or want more?

DataCloud Team: The essence of machine learning is more data.

 

We want to be able to see what is happening over time, what is changing, and be able to adjust our systems based on this fresh flow of data. As people use the programs, we are also able to validate or correct information. For example with our jobs information, users tell us how the positions in their organization fit into our categories. This makes the program useful to them, and makes the overall database more accurate.

 

As people use machine-learning systems, they create new data which the system learns from and adjusts to. It allows us to detect changes, see cycles over time, and come up with new questions and applications. Sometimes we decide we need to add a new category of information or ask the system to process the information a different way.

 

These are the things that both keep us up at night and make it exciting to show up at work every day.

 

 

Learn more by getting our guide, “Proving the Power of People Data.”

« All Blogs

Illustration of African American women

Let’s Talk About Sets

https://eng.lifion.com/lets-talk-about-sets-813dfeb2185

 

Let’s Talk About Sets

A re-introduction to JavaScript Sets and the new Set methods

Edgardo Avilés

Mar 1, 2019 · 5 min read

Let’s talk about you and me and how we used to find unique items before ES6. We really only had two ways to do it (if you had another one let me know). On the first one, we would create a new emtpy object, iterate through the items we wanted to deduplicate, we would create a new property using the item as the key and something like “true” as the value, then we would get the list of keys of that new object and we were done. In the second way, we would create a new empty array, iterate through the items, and for each item, check if the item existed in the array, if it was already there, continue, if not, add it. By the end the array would contain all the unique items.

ES6 introduced Sets, a new data structure with a very simple API to handle unique items that is not just convenient but also very fast. The intention of this article is to introduce you to some new methods coming to Sets soon that will make them even more useful, but before, let’s remember the basics.

Here in Lifion we are big users of JavaScript, about 90% of our platform services are Node.js-based. If you are interested to see some examples of how Sets are used in our codebase, check our open source projects in Lifion’s GitHub profile.

The basics of Sets

To create a new set we only need to use the constructor. We can optionally pass any iterator, such as an array or a string, and the iterated items will become elements of the new set (repeated items will be ignored).

const emptySet = new Set();
const prefilledSet = new Set(['

« All Blogs

Person on ladder reaching up into the clouds

Lifion at ADP’s cloud transformation journey

https://eng.lifion.com/lifions-cloud-transformation-journey-2333b7c0897d

 

Lifion’s Cloud Transformation Journey

On moving to managed services in a microservice architecture

Zaid Masud

Zaid Masud

Mar 26, 2019 · 5 min read

Since Lifion’s inception as ADP’s next-generation Human Capital Management (HCM) platform, we’ve made an effort to embrace relevant technology trends and advancements. From microservices and container orchestration frameworks to distributed databases, and everything in between, we’re continually exploring ways we can evolve our architecture. Our readiness to evaluate non-traditional, cutting edge technology has meant that some bets have stuck whereas others have pivoted.

One of our biggest pivots has been a shift from self-managed databases & streaming systems, running on cloud compute services (like Amazon EC2) and deployed with tools like Terraform and Ansible, towards fully cloud-managed services.

When we launched the effort to make this shift in early 2018, we began by executing a structured, planned initiative across an organization of 200+ engineers. After overcoming the initial inertia, the effort continued to gain momentum, eventually taking a life of its own, and finally becoming fully embedded in how our teams work.

Along the way, we’ve been thinking about what we can give back. For example, we’ve previously written about a node.js client for AWS Kinesis that we’re working on as an open source initiative.

AWS’s re:Invent conference is perhaps the largest global cloud community conference in the world. In late 2018, we presented our cloud transformation journey at re:Invent. As you can see in the recording, we described our journey and key learnings in adopting specific AWS managed services.

In this post, we discuss key factors that made the initiative successful, its benefits in our microservice architecture, and how managed services helped us shift our teams’ focus to our core product while improving overall reliability.

Why Services Don’t Share Databases

The notion of services sharing databases, making direct connections to the same database system and being dependent on shared schemas, is a recognized micro-service anti-pattern. With shared databases, changes in the underlying database (including schemas, scaling operations such as sharding, or even migrating to a better database) become very difficult with coordination required between multiple service teams and releases.

As Amazon.com CTO Werner Vogels writes in his blog:

Each service encapsulates its own data and presents a hardened API for others to use. Most importantly, direct database access to the data from outside its respective service is not allowed. This architectural pattern was a response to the scaling challenges that had challenged Amazon.com through its first 5 years…

And Martin Fowler on integration databases:

On the whole integration databases lead to serious problems becaue [sic] the database becomes a point of coupling between the applications that access it. This is usually a deep coupling that significantly increases the risk involved in changing those applications and making it harder to evolve them. As a result most software architects that I respect take the view that integration databases should be avoided.

The Right Tool for the Job

Applying the database per service principal means that, in practice, service teams have significant autonomy in selecting the right database technologies for their purposes. Among other factors, their data modeling, query flexibility, consistency, latency, and throughput requirements will dictate technologies that work best for them.

Up to this point, all is well — every service has isolated its data. However, when architecting a product with double digit domains, several important database infrastructure decisions need to be made:

  • Shared vs dedicated clusters: Should services share database clusters with logically isolated namespaces (like logical databases in MySQL), or should each have its own expensive cluster with dedicated resources?
  • Ownership: What level of ownership does a service team take for the deployment, monitoring, reliability, and maintenance of their infrastructure?
  • Consolidation: Is there an agreed set of technologies that teams can pick from, is there a process for introducing something new, or can a team pick anything they like?

From Self-Managed to Fully Managed Services

When we first started building out our services, we had a sprawl of supporting databases, streaming, and queuing systems. Each of these technologies was deployed on AWS EC2, and we were responsible for the full scope of managing this infrastructure: from the OS level, to topology design, configuration, upgrades and backups.

It didn’t take us long to realize how much time we were spending on managing all of this infrastructure. When we made the bet on managed services, several of the decisions we’d been struggling with started falling into place:

  • Shared vs dedicated clusters: Dedicated clusters for services, clearly preferable from a reliability and availability perspective, became easier to deploy and maintain. Offerings like SQS, DynamoDB, and Kinesis with no nodes or clusters to manage removed the concern altogether.
  • Ownership: Infrastructure simplification meant that service teams were able to develop further insight into their production usages, and take greater responsibility for their infrastructure.
  • Consolidation: We were now working with a major cloud provider’s service offerings, and found that there was enough breadth to span our use cases.

Evolutionary Architecture

On our Lifion engineering blog, we’ve previously written about our Lifion Developer Platform Credos. One of these speaks to the evolutionary nature of our work:

  • Build to evolve: We design our domains and services fully expecting that they will evolve over time.
  • Backwards compatible, versioned: Instead of big bang releases, we use versions or feature flags letting service teams deploy at any time without coordinating dependencies.
  • Managed deprecations: When deprecating APIs or features, we carefully plan the impact and ensure that consumer impact is minimal.

When we started adopting managed services, we went for drop-in replacements first (for example, Aurora MySQL is wire compatible with the previous MySQL cluster we were using). This approach helped us to get some early momentum while uncovering dimensions like authentication, monitoring, and discoverability that would help us later.

Our evolutionary architecture credo helped to ensure that the transition would be smooth for our services and our customers. Each deployment was done as a fully online operation, without customer impact. We recognize that we will undergo more evolutions, for which we intend to follow the same principles.

« All Blogs

Person gesturing toward large computer screen

Performance implications of misunderstanding Node.js promises

https://eng.lifion.com/promise-allpocalypse-cfb6741298a7

Promise.allpocalypse

The performance implications of misunderstanding Node.js promises

Ali Yousuf

Ali Yousuf

Jan 22 · 8 min read

for…of over unknown collection with await in loop
Promise.all() on an entire unknown collection

Benchmarking unbounded promise scenarios

╔═══════════════╦══════════════════════════════════╗
║     Test      ║ Average Execution Time (Seconds) ║
╠═══════════════╬══════════════════════════════════╣
║ await-for-of  ║                            6.943 ║
║ bluebird-map  ║                            4.550 ║
║ for-of        ║                            6.745 ║
║ p-limit       ║                            4.523 ║
║ promise-all   ║                            4.524 ║
║ promise-limit ║                            4.457 ║
╚═══════════════╩══════════════════════════════════╝
for…of test code
for await…of test code
Image for post

Image for post

Image for post

Image for post

Clinic.js doctor output for for await…of and for…of, respectively
Image for post

Image for post

Image for post

Image for post

Clinic.js bubbleprof output for for await…of and for…of, respectively
Promise.all() test code
Image for post

Image for post

Clinic.js doctor output for Promise.all()

Promise chain execution order example
Async chain execution order example
Bluebird.map() with concurrency limit test code
Image for post

Image for post

Clinic.js doctor output for Bluebird.map() with concurrency limit
promise-limit module test code
Image for post

Image for post

Clinic.js doctor output for the promise-limit module
p-limit module test code
Image for post

Image for post

Clinic.js Doctor output for the p-limit module

Conclusion