-
Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search
Authors:
Thomas Roland Barillot,
Alex De Castro
Abstract:
This works reports evidence for the topological signatures of ambiguity in sentence embeddings that could be leveraged for ranking and/or explanation purposes in the context of vector search and Retrieval Augmented Generation (RAG) systems. We proposed a working definition of ambiguity and designed an experiment where we have broken down a proprietary dataset into collections of chunks of varying…
▽ More
This works reports evidence for the topological signatures of ambiguity in sentence embeddings that could be leveraged for ranking and/or explanation purposes in the context of vector search and Retrieval Augmented Generation (RAG) systems. We proposed a working definition of ambiguity and designed an experiment where we have broken down a proprietary dataset into collections of chunks of varying size - 3, 5, and 10 lines and used the different collections successively as queries and answers sets. It allowed us to test the signatures of ambiguity with removal of confounding factors. Our results show that proxy ambiguous queries (size 10 queries against size 3 documents) display different distributions of homologies 0 and 1 based features than proxy clear queries (size 5 queries against size 10 documents). We then discuss those results in terms increased manifold complexity and/or approximately discontinuous embedding submanifolds. Finally we propose a strategy to leverage those findings as a new scoring strategy of semantic similarities.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Modelling the semantics of text in complex document layouts using graph transformer networks
Authors:
Thomas Roland Barillot,
Jacob Saks,
Polena Lilyanova,
Edward Torgas,
Yachen Hu,
Yuanqing Liu,
Varun Balupuri,
Paul Gaskell
Abstract:
Representing structured text from complex documents typically calls for different machine learning techniques, such as language models for paragraphs and convolutional neural networks (CNNs) for table extraction, which prohibits drawing links between text spans from different content types. In this article we propose a model that approximates the human reading pattern of a document and outputs a u…
▽ More
Representing structured text from complex documents typically calls for different machine learning techniques, such as language models for paragraphs and convolutional neural networks (CNNs) for table extraction, which prohibits drawing links between text spans from different content types. In this article we propose a model that approximates the human reading pattern of a document and outputs a unique semantic representation for every text span irrespective of the content type they are found in. We base our architecture on a graph representation of the structured text, and we demonstrate that not only can we retrieve semantically similar information across documents but also that the embedding space we generate captures useful semantic information, similar to language models that work only on text sequences.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Correlation Driven Transient Hole Dynamics Resolved in Space and Time in the Isopropanol Molecule
Authors:
T. Barillot,
O. Alexander,
B. Cooper,
T. Driver,
D. Garratt,
S. Li,
A. Al Haddad,
A. Sanchez-Gonzalez,
M. Agåker,
C. Arrell,
M. Bearpark,
N. Berrah,
C. Bostedt,
J. Bozek,
C. Brahms,
P. H. Bucksbaum,
A. Clark,
G. Doumy,
R. Feifel,
L. J. Frasinski,
S. Jarosch,
A. S. Johnson,
L. Kjellsson,
P. Kolorenč,
Y. Kumagai
, et al. (24 additional authors not shown)
Abstract:
The possibility of suddenly ionized molecules undergoing extremely fast electron hole dynamics prior to significant structural change was first recognized more than 20 years ago and termed charge migration. The accurate probing of ultrafast electron hole dynamics requires measurements that have both sufficient temporal resolution and can detect the localization of a specific hole within the molecu…
▽ More
The possibility of suddenly ionized molecules undergoing extremely fast electron hole dynamics prior to significant structural change was first recognized more than 20 years ago and termed charge migration. The accurate probing of ultrafast electron hole dynamics requires measurements that have both sufficient temporal resolution and can detect the localization of a specific hole within the molecule. We report an investigation of the dynamics of inner valence hole states in isopropanol where we use an x-ray pump/x-ray probe experiment, with site and state-specific probing of a transient hole state localized near the oxygen atom in the molecule, together with an ab initio theoretical treatment. We record the signature of transient hole dynamics and make the first observation of dynamics driven by frustrated Auger-Meitner transitions. We verify that the hole lifetime is consistent with our theoretical prediction. This state-specific measurement paves the way to widespread application for observations of transient hole dynamics localized in space and time in molecules and thus to charge transfer phenomena that are fundamental in chemical and material physics.
△ Less
Submitted 13 May, 2021;
originally announced May 2021.
-
Evidence of large polarons in photoemission band mapping of the perovskite semiconductor CsPbBr$_3$
Authors:
M. Puppin,
S. Polishchuk,
N. Colonna,
A. Crepaldi,
D. N. Dirin,
O. Nazarenko,
R. De Gennaro,
G. Gatti,
S. Roth,
T. Barillot,
L. Poletto,
R. P. Xian,
L. Rettig,
M. Wolf,
R. Ernstorfer,
M. V. Kovalenko,
N. Marzari,
M. Grioni,
M. Chergui
Abstract:
Lead-halide perovskite (LHP) semiconductors are emergent optoelectronic materials with outstanding transport properties which are not yet fully understood. We find signatures of large polaron formation in the electronic structure of the inorganic LHP CsPbBr$_3$ by means of angle-resolved photoelectron spectroscopy. The experimental valence band dispersion shows a hole effective mass…
▽ More
Lead-halide perovskite (LHP) semiconductors are emergent optoelectronic materials with outstanding transport properties which are not yet fully understood. We find signatures of large polaron formation in the electronic structure of the inorganic LHP CsPbBr$_3$ by means of angle-resolved photoelectron spectroscopy. The experimental valence band dispersion shows a hole effective mass $0.26\pm0.02\,\,m_e$, 50% heavier than the bare mass $m_0 =0.17 m_e$ predicted by density functional theory. Calculations of electron-phonon coupling indicate that phonon dressing of the carriers mainly occurs via distortions of the Pb-Br bond with a Fröhlich coupling parameter $α=1.82$. A good agreement with our experimental data is obtained within the Feynmann polaron model, validating a viable theorical method to predict the carrier effective mass of LHPs ab-initio.
△ Less
Submitted 31 August, 2019;
originally announced September 2019.
-
Machine learning applied to single-shot x-ray diagnostics in an XFEL
Authors:
A. Sanchez-Gonzalez,
P. Micaelli,
C. Olivier,
T. R. Barillot,
M. Ilchen,
A. A. Lutman,
A. Marinelli,
T. Maxwell,
A. Achner,
M. Agåker,
N. Berrah,
C. Bostedt,
J. Buck,
P. H. Bucksbaum,
S. Carron Montero,
B. Cooper,
J. P. Cryan,
M. Dong,
R. Feifel,
L. J. Frasinski,
H. Fukuzawa,
A. Galler,
G. Hartmann,
N. Hartmann,
W. Helml
, et al. (17 additional authors not shown)
Abstract:
X-ray free-electron lasers (XFELs) are the only sources currently able to produce bright few-fs pulses with tunable photon energies from 100 eV to more than 10 keV. Due to the stochastic SASE operating principles and other technical issues the output pulses are subject to large fluctuations, making it necessary to characterize the x-ray pulses on every shot for data sorting purposes. We present a…
▽ More
X-ray free-electron lasers (XFELs) are the only sources currently able to produce bright few-fs pulses with tunable photon energies from 100 eV to more than 10 keV. Due to the stochastic SASE operating principles and other technical issues the output pulses are subject to large fluctuations, making it necessary to characterize the x-ray pulses on every shot for data sorting purposes. We present a technique that applies machine learning tools to predict x-ray pulse properties using simple electron beam and x-ray parameters as input. Using this technique at the Linac Coherent Light Source (LCLS), we report mean errors below 0.3 eV for the prediction of the photon energy at 530 eV and below 1.6 fs for the prediction of the delay between two x-ray pulses. We also demonstrate spectral shape prediction with a mean agreement of 97%. This approach could potentially be used at the next generation of high-repetition-rate XFELs to provide accurate knowledge of complex x-ray pulses at the full repetition rate.
△ Less
Submitted 11 October, 2016;
originally announced October 2016.