“Data scientists”, those who make the data speak



“In the years to come, no one will be able to escape it! », loose Illyes, in a mixture of laughter and seriousness. At 23, this young statistics graduate has just started his freelance activity, but he is already extremely confident for the future. His job ? data scientist or, literally, “data scientist.” “Today, I don’t know of any company that doesn’t produce data, and where there is data, there is always something to do. »

→ PORTRAIT. Data scientist: data explorer

Like a version 2.0 of the mission of former statisticians or actuaries, “data science” appeared in the early 2010s with the digital transition and therefore the emergence of ever larger volumes of data. That of France would have increased by 701% between 2016 and 2018, according to the 2019 report of the Global Data Protection Index, thus contributing to the record of 64.2 zettabytes (10 power 21) in the world, in 2020. Those who have the skills to immerse themselves in it and draw information or trends from it have quickly become scientists with high added value, particularly within companies.

Imagine models

Their role ? Use these millions of figures, images or statements to imagine models that automatically solve extremely diverse problems. Bachir, 29, has worked for a long time for a musical platform where it was necessary “couple data on users’ tastes with other information such as time or frequency of listening, either to improve suggestions or to better target subscription offers”. Recently hired by a financial services company, he is now responsible for “predicting fraud” by analyzing behavioral patterns and unwanted transactions among customers.

To imagine a model and then program it, you have to “as many skills in statistical mathematics as in computer science”, says Driss, a young consultant data scientist who likes to remind that he must be a ” Swiss knife “. Among the various artificial intelligence technologies to which the data scientist must regularly call figure the famous machine learning which allows increasingly powerful computers to learn for themselves by training themselves to identify patterns – recurring patterns – in databases too vast for the human mind. The goal? Use information from the past to guess future trends. Prediction is a real buzzword, and it’s not just for fine-tuning marketing strategies or avoiding security vulnerabilities.

In sport, in health

When applied in the sports world, data science can, for example, improve the performance of athletes. Illyes, who chose to specialize in computer vision, spent almost nine months analyzing rather special data: the videos of the fights of the French Boxing Federation. “For a long time, people were in charge of watching and taking notes, first in live conditions, then in detail when the video was born, he explains.But each boxing match consists of about 300 punches and thousands of other parameters, which is too laborious work. » The idea is therefore to develop an artificial intelligence capable of identifying, for example, the number of blows, their nature and the conditions in which they are carried out. It will make it possible to precisely establish the profile – cadence, strengths, weakness, etc. – athletes and their opponents in anticipation, in particular, of the 2024 Olympic Games.

In fields such as health, this is sometimes a revolutionary opportunity: as underlined by Chloé-Agathe Azencott, doctor in computer science and teacher-researcher associated with the bioinformatics center of Mines ParisTech, at the Institut Curie, and at Inserm: “The work on genome samples (genetic material, Editor’s note) which number in the billions was a problem until now; now, we can try to identify the presence or absence of a mutation and deduce risk factors. »

Human intervention, irreplaceable?

If the presence of a data scientist seems to have become a sine qua non condition for optimizing a company’s growth, guaranteeing its competitiveness or leaving room for progress, the level of maturity of the sectors remains however uneven: some are at the forefront such as social networks, video platforms or even banks and insurance; others are lagging behind, such as construction players or very small businesses.

“As culture is not yet present everywhere, it is essential to ensure that our mission is feasible and relevant”, explains Driss, who sometimes spends 80% of his time sorting and cleaning data to make it even usable. The presence of obsolete, erroneous or incomplete elements can in fact falsify the results, as for facial recognition, for which ethnic biases have recently been documented: certain algorithms had been trained too much on images of white faces.

→ MAINTENANCE. Covid-19: “Modeling is a simplification of real life”

All these tasks aim, in the long term, at the autonomy and therefore at the automation of the models. the data scientist would it therefore be doomed to disappear? No risk, according to Driss or Bachir, for whom human intervention will always be necessary “to correct the biases, themselves often introduced by humans, or to adapt to the specificities of the business side”. Recently retrained, Laurent, 36, compares the advent of the profession to that of developers in the 2000s: “At the time of the Internet bubble, we did not really understand what they were doing”, he remembers. Before adding: “Today everyone has a website, and even if the simplest of them can be created almost on their own, there is always a need for a human being to imagine, improve, maintain or even secure. »

————–

Other data professions

With his rare hybrid profile, the data scientist often overshadows other, more specialized professions that revolve around “data”.

The data architect acts upstream: after having collected the raw data, he imagines a secure database system.

The data engineer, with a more technical profile, is responsible for classifying the data and finding the means to analyze it.

The data analyst, he uses the final information to return it to professionals in the field.

.

Leave a Reply

Your email address will not be published. Required fields are marked *