-
Data-Prep-Kit: getting your data ready for LLM application development
Authors:
David Wood,
Boris Lublinsky,
Alexy Roytman,
Shivdeep Singh,
Abdulhamid Adebayo,
Revital Eres,
Mohammad Nassar,
Hima Patel,
Yousaf Shah,
Constantin Adam,
Petros Zerfos,
Nirmit Desai,
Daiki Tsuzuku,
Takuya Goto,
Michele Dolfi,
Saptha Surendran,
Paramesvaran Selvam,
Sungeun An,
Yuan Chi Chang,
Dhiraj Joshi,
Hajar Emami-Gohari,
Xuan-Hong Dang,
Yan Koyfman,
Shahrokh Daijavad
Abstract:
Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortles…
▽ More
Data preparation is the first and a very important step towards any Large Language Model (LLM) development. This paper introduces an easy-to-use, extensible, and scale-flexible open-source data preparation toolkit called Data Prep Kit (DPK). DPK is architected and designed to enable users to scale their data preparation to their needs. With DPK they can prepare data on a local machine or effortlessly scale to run on a cluster with thousands of CPU Cores. DPK comes with a highly scalable, yet extensible set of modules that transform natural language and code data. If the user needs additional transforms, they can be easily developed using extensive DPK support for transform creation. These modules can be used independently or pipelined to perform a series of operations. In this paper, we describe DPK architecture and show its performance from a small scale to a very large number of CPUs. The modules from DPK have been used for the preparation of Granite Models [1] [2]. We believe DPK is a valuable contribution to the AI community to easily prepare data to enhance the performance of their LLM models or to fine-tune models with Retrieval-Augmented Generation (RAG).
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
Electronic and phonon contributions to the Thermoelectric properties of newly discovered half-Heusler alloys XHfPb (X= Ni, Pd, and Pt)
Authors:
Paul O. Adebambo,
Gboyega A. Adebayo,
Roberto Guerra,
Davide Ceresoli
Abstract:
In this work we calculate the thermoelectric figure of merit of XHfPb (X= Ni, Pd, and Pt) by computing the both the power factor and the lattice thermal conductivity by first principles. We make reasonable approximations: we use the Constant Relaxation Time Approximation (CRTA) to compute the electron transport contribution and the modified Debye-Callaway model to calculate the thermal lattice con…
▽ More
In this work we calculate the thermoelectric figure of merit of XHfPb (X= Ni, Pd, and Pt) by computing the both the power factor and the lattice thermal conductivity by first principles. We make reasonable approximations: we use the Constant Relaxation Time Approximation (CRTA) to compute the electron transport contribution and the modified Debye-Callaway model to calculate the thermal lattice conductivity. We also report the dielectric properties of these semiconductors and the mode Grüneisen parameters. Not surprisingly we find that the average Grüneisen coefficient correlates with the tehrmal conductivity. Next, we consider a realistic relaxation time $τ$ and carrier concentration $n$ from experimental data on ZrHfPb and obtain the figure of merit $ZT$ as a function of temperature. Our main finding is that despite the Pt is isoelectronic with Ni and Pd, the $ZT$ of PtHfPb is larger and behaves differently from the other two materials, suggesting that PtHfPb is better suited for high temperature thermoelectric generators.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Partially Trusting the Service Mesh Control Plane
Authors:
Constantin Adam,
Abdulhamid Adebayo,
Hubertus Franke,
Edward Snible,
Tobin Feldman-Fitzthum,
James Cadden,
Nerla Jean-Louis
Abstract:
Zero Trust is a novel cybersecurity model that focuses on continually evaluating trust to prevent the initiation and horizontal spreading of attacks. A cloud-native Service Mesh is an example of Zero Trust Architecture that can filter out external threats. However, the Service Mesh does not shield the Application Owner from internal threats, such as a rogue administrator of the cluster where their…
▽ More
Zero Trust is a novel cybersecurity model that focuses on continually evaluating trust to prevent the initiation and horizontal spreading of attacks. A cloud-native Service Mesh is an example of Zero Trust Architecture that can filter out external threats. However, the Service Mesh does not shield the Application Owner from internal threats, such as a rogue administrator of the cluster where their application is deployed. In this work, we are enhancing the Service Mesh to allow the definition and reinforcement of a Verifiable Configuration that is defined and signed off by the Application Owner. Backed by automated digital signing solutions and confidential computing technologies, the Verifiable Configuration allows changing the trust model of the Service Mesh, from the data plane fully trusting the control plane to partially trusting it. This lets the application benefit from all the functions provided by the Service Mesh (resource discovery, traffic management, mutual authentication, access control, observability), while ensuring that the Cluster Administrator cannot change the state of the application in a way that was not intended by the Application Owner.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
Automated Compliance Blueprint Optimization with Artificial Intelligence
Authors:
Abdulhamid Adebayo,
Daby Sow,
Muhammed Fatih Bulut
Abstract:
For highly regulated industries such as banking and healthcare, one of the major hindrances to the adoption of cloud computing is compliance with regulatory standards. This is a complex problem due to many regulatory and technical specification (techspec) documents that the companies need to comply with. The critical problem is to establish the mapping between techspecs and regulation controls so…
▽ More
For highly regulated industries such as banking and healthcare, one of the major hindrances to the adoption of cloud computing is compliance with regulatory standards. This is a complex problem due to many regulatory and technical specification (techspec) documents that the companies need to comply with. The critical problem is to establish the mapping between techspecs and regulation controls so that from day one, companies can comply with regulations with minimal effort. We demonstrate the practicality of an approach to automatically analyze regulatory standards using Artificial Intelligence (AI) techniques. We present early results to identify the mapping between techspecs and regulation controls, and discuss challenges that must be overcome for this solution to be fully practical.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Vulnerability Prioritization: An Offensive Security Approach
Authors:
Muhammed Fatih Bulut,
Abdulhamid Adebayo,
Daby Sow,
Steve Ocepek
Abstract:
Organizations struggle to handle sheer number of vulnerabilities in their cloud environments. The de facto methodology used for prioritizing vulnerabilities is to use Common Vulnerability Scoring System (CVSS). However, CVSS has inherent limitations that makes it not ideal for prioritization. In this work, we propose a new way of prioritizing vulnerabilities. Our approach is inspired by how offens…
▽ More
Organizations struggle to handle sheer number of vulnerabilities in their cloud environments. The de facto methodology used for prioritizing vulnerabilities is to use Common Vulnerability Scoring System (CVSS). However, CVSS has inherent limitations that makes it not ideal for prioritization. In this work, we propose a new way of prioritizing vulnerabilities. Our approach is inspired by how offensive security practitioners perform penetration testing. We evaluate our approach with a real world case study for a large client, and the accuracy of machine learning to automate the process end to end.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Universal $α$-elasticity of generalized Moufang loops
Authors:
A. O. Abdulkareem,
J. O. Adeniran,
A. A. A. Agboola,
G. A. Adebayo
Abstract:
In this study we introduce $α$-elasticity property for generalized Moufang loops. Necessary and sufficient conditions for $α$-elasticity of generalized Moufang loops to be universal are given. Using the universal conditions, and in some cases, with the newly introduced right and left $α$-alternative laws for generalized Moufang loops, some properties of generalized Moufang loops are studied. Condi…
▽ More
In this study we introduce $α$-elasticity property for generalized Moufang loops. Necessary and sufficient conditions for $α$-elasticity of generalized Moufang loops to be universal are given. Using the universal conditions, and in some cases, with the newly introduced right and left $α$-alternative laws for generalized Moufang loops, some properties of generalized Moufang loops are studied. Condition under which the generalized Moufang loop is an abelian group is stated.
△ Less
Submitted 16 October, 2019; v1 submitted 12 March, 2018;
originally announced March 2018.
-
Elastic constants and thermodynamics properties of pristine PEDOT revealed: A first-principles PBE/PBE PAW approach
Authors:
R. O. Agbaoye,
P. O. Adebambo,
J. O. Akinlami,
T. A. Afolabi,
S. Zh. Karazhanov,
D. Ceresoli,
G. A. Adebayo
Abstract:
In this work, we report for the first time, detailed calculations of elastic and thermodynamic properties of organic poly(3,4-ethylenedioxythiophene), PEDOT, in an undiluted state, using PBE and PBEsol-PAW pseudopotentials within the framework of Generalized Gradient Approximation Density Functional Theory. Contrary to Molecular Dynamic simulations, series of PBE and PBEsol-PAW calculations in the…
▽ More
In this work, we report for the first time, detailed calculations of elastic and thermodynamic properties of organic poly(3,4-ethylenedioxythiophene), PEDOT, in an undiluted state, using PBE and PBEsol-PAW pseudopotentials within the framework of Generalized Gradient Approximation Density Functional Theory. Contrary to Molecular Dynamic simulations, series of PBE and PBEsol-PAW calculations in the current work revealed the most stable state of monoclinic structured pristine PEDOT. We determined thirteen (13) independent elastic constants with elastic compliance which enables us to establish other elastic properties of pristine PEDOT; the Pugh's ratio and the Vicker's hardness calculations show small mismatches with PBE and PBEsol-PAW pseudopotentials. The Debye temperature TD is predicted both in the PBE and PBEsol-PAW calculations while the specific heat capacity Cv(T) follows the Dulong-Petit curve having no mismatch with Debye model at low temperature, with PBE predicting a higher Debye sound velocity than PBEsol-PAW. As accuracy tests only, we performed electronic structure calculations of PEDOT and compared with available data in the literature.
△ Less
Submitted 1 August, 2017; v1 submitted 19 October, 2016;
originally announced October 2016.
-
Projector augmented-wave and all-electron calculations across the periodic table: a comparison of structural and energetic properties
Authors:
E. Kucukbenli,
M. Monni,
B. I. Adetunji,
X. Ge,
G. A. Adebayo,
N. Marzari,
S. de Gironcoli,
A. Dal Corso
Abstract:
We construct a reference database of materials properties calculated using density-functional theory in the local or generalized-gradient approximation, and an all-electron or a projector augmented-wave (PAW) formulation, for verification and validation of first-principles simulations. All-electron calculations use the full-potential linearised augmented-plane wave method, as implemented in the \t…
▽ More
We construct a reference database of materials properties calculated using density-functional theory in the local or generalized-gradient approximation, and an all-electron or a projector augmented-wave (PAW) formulation, for verification and validation of first-principles simulations. All-electron calculations use the full-potential linearised augmented-plane wave method, as implemented in the \texttt{Elk} open-source code, while PAW calculations use the datasets developed by some of us in the open-source \texttt{PSlibrary} repository and the \texttt{Quantum ESPRESSO} distribution. We first calculate lattice parameters, bulk moduli, and energy differences for alkaline metals, alkaline earths, and $3d$ and $4d$ transition metals in three ideal, reference phases (simple cubic, fcc, and bcc), representing a standardized crystalline monoatomic solid-state test. Then, as suggested by K. Lejaeghere {\it et al.}, [Critical Reviews in Solid State and Material Sciences 39, p 1 (2014)], we compare the equations of state for all elements, except lanthanides and actinides, in their experimental phase (or occasionally a simpler, closely related one). PAW and all-electron energy differences and structural parameters agree in most cases within a few meV/atom and a fraction of a percent, respectively. This level of agreement, comparable with the previous study, includes also other PAW and all-electron data from the electronic-structure codes \texttt{VASP} and \texttt{WIEN2K}, and underscores the overall reliability of current, state-of-the-art electronic-structure calculations. At the same time, discrepancies that arise even within the same formulation for simple, fundamental structural properties point to the urgent need of establishing standards for verification and validation, reference data sets, and careful refinements of the computational approaches used.
△ Less
Submitted 11 April, 2014;
originally announced April 2014.