Artificial Intelligence (AI) in times of General Data Protection Law (LGPD) in Brazil!
We are adapting to the General Data Protection Law (LGPD); this new law has as main objective to guarantee data privacy and reliability, but how is the data area adapted to this new reality? What are the strategies adopted? What about Artificial Intelligence (AI)?
These are some of the strategies that are happening in the data area:
- Infinite Forms?
This strategy aims to create one or more forms to manage who accesses and where the data and its sources are. The problem with this approach is that for each new data source that gets dirty, you have to create a new form or adapt the old ones with it; it takes a long time to adopt this strategy fully.
- Magic data traceability tools and/or solutions:
This strategy aims to adopt ideas and concepts of data lineage. We have an environment of high data replication, and it is necessary to know who is accessed data sources and data replication processes. The problem with this approach is control since monitoring the data does not guarantee privacy and control of access to this information.
- Data Catalog
This strategy aims to create a data catalog where instead of applications and users accessing and using data sources directly, they use interface to control and mediate access. We have control and ways to manage access to data, avoid unnecessary replication, and create a resilience mechanism for the data source and backups.
The main market tools where we can adopt this type of strategy are Dremio and Qlik Data Catalyst, among others …
What about artificial intelligence? Is there a strategy? A leap from the cat?
When we look at an Artificial Intelligence model, we notice that it is basically not just another software where we will only use the principles and means of software engineering; after all, we have data in this context.
With software engineering, we basically learn about life cycles, where the creation of new software requirements are raised, developed, tested, undergo maintenance and evolution. Already with the AI model, we have a much more complex life cycle since the generation of the model; it is training and retraining.
We use different algorithms to create and train different types of models with data samples that are often random; how to control this cycle? How to guarantee data privacy and reliability in the results of a new model?
#Can we adopt MLOps ?!
At the end of 2018, many people began to realize that they had the means to implement new facilitating models and even in an automated way like AutoML, but deploying or putting them into production until today is another story, with that, the discipline of MLOPS (Machine Learning and “Information Technology OPerationS”), which aims to simplify and automate the life cycle of Artificial Intelligence models.
“MLOps (a compound of Machine Learning and“ information technology OPerationS ”) is [a] new discipline/focus/practice for collaboration and communication between data scientists and information technology (IT) professionals while automating and productizing machine learning algorithms.” — Nisha Talagala (2018)
In 2019, MLOPS was used in the creation of automation for the implementation of new models, resulting in different automated pipeline solutions generally guided by GitOps; in most cases, the Continuous Integration (CI) process takes place, where a new model is encapsulated in a Docker image and taken to production by a Deploy Continuous (CD) process where the image would be in one or more containers managed by Kubernetes (K8S), OpenShift, among other solutions…
Currently, market solutions are no longer just automation pipelines and enabling and managing all the clicks of new models. We currently have mlflow, Kubeflow, Polyaxon, and so many other solutions that aim at the possibility of adopting MLOps.
With MLOPS, we can track and manage the entire life cycle of a model, be it the data engineer working with data from different sources and creating the datasets, the data scientist using the dataset in conjunction with different algorithms and ways to generate the trained models, automation pipelines for model deployment and even monitoring the retraining need with a new set of data.
Using MLOPS, we can manage all access and life cycle data and AI models, making this discipline possible to adapt the General Data Protection Law.
Some interesting links about MLOPS:
Polyaxon, Argo and Seldon for model training, package and deployment in Kubernetes
The ultimate combination of open-source frameworks for model management in Kubernetes?
The Rise of the Term “MLOps”
Properly Operationalized Machine Learning is the New Holy Grail
ML Ops: Machine Learning as an Engineering Discipline
As ML matures from research to applied business solutions, so do we need to improve the maturity of its operation…
MLOps: The Upcoming Shining Star
The right path to building a full stack machine learning system. MLOps is the new emerging practise to streamline…
MLOps Done Right
The company Tom works at wants to get rich, so his boss asks him to use his newly acquired knowledge (3 days workshop)…
Machine Learning e o MLOps - Hipsters #171 - Hipsters Ponto Tech
Voltamos a falar de um dos assuntos favoritos dos ouvintes do podcast: Machine Learning! E parece que esse mercado tá…
DataOps: seu próximo trabalho em Data Science? — Data Hackers Podcast 16
Saiba o que é e como a metodologia de Data Operations irá impactar os negócios nos próximos anos
This is only the first part, with your feedback we will have other articles on the subject.
#I appreciate your reading. :)