August 23-34, 2022 - Virtual
View More Details & Registration
Note: The schedule is subject to change.

The Sched app allows you to build your schedule but is not a substitute for your event registration. You must be registered for Open Source Summit Latin America 2022 to participate in the sessions. If you have not registered but would like to join us, please go to the event registration page to purchase a registration.

This schedule is automatically displayed in Eastern Daylight Time (UTC -4). To see the schedule in your preferred timezone, please select from the drop-down menu to the right, above "Filter by Date."
Back To Schedule
Wednesday, August 24 • 10:45am - 11:25am
Training AI To Code Using The Largest Code Dataset (Project CodeNet) - Christian Kadner & Tommy Li, IBM [Presented in English]

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Project CodeNet is a large dataset of 14 million code samples totaling 500 million lines of code in 55 programming languages. It enables machine learning for code, like finding code similarity, extracting semantic context, and even translating between different programming languages. Using the Machine Learning Exchange (MLX), a Linux Foundation for AI & Data Sandbox Project, we demonstrate how Project CodeNet can be leveraged to classify code and analyze code complexity in three steps. Using DataShim we turn domain specific subsets of the data into Kubernetes Custom Resources. Running Jupyter notebooks on Kubernetes we use the datasets to train deep learning models. The models are then containerized and served for inferencing on Kubernetes. For each of these steps, MLX generates Kubeflow Pipelines on Tekton so data scientists are not required to write Kubernetes specific code. Using the curated datasets, example notebooks and pre-trained models, teams of data scientists can utilize the Machine Learning Exchange to bring machine learning and AI into the world of code.

avatar for Tommy Li

Tommy Li

Senior Software Developer, IBM
Tommy Li is a senior software developer in IBM focusing on Cloud, Kubernetes, and Machine Learning. He is one of the Kubeflow committers and worked on various open-source projects related to Kubernetes, Microservice, and deep learning applications to provide advanced use cases on... Read More →
avatar for Christian Kadner

Christian Kadner

Software Engineer, IBM
Christian Kadner is a software developer at the IBM Center for Open Source Data and AI Technologies (CODAIT). Christian has a background in systems programming and data management. Most recently he has been working on Kubeflow Pipelines on Tekton, the Machine Learning eXchange (MLX... Read More →

Wednesday August 24, 2022 10:45am - 11:25am EDT
  Open AI & Data Forum