Skip to main content

Showing 1–23 of 23 results for author: Godfrey, P B

  1. arXiv:2312.07813  [pdf, other

    cs.OS cs.LG

    On a Foundation Model for Operating Systems

    Authors: Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, Gang Wang

    Abstract: This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: Machine Learning for Systems Workshop at 37th NeurIPS Conference, 2023, New Orleans, LA, USA

  2. arXiv:2311.02800  [pdf, other

    cs.DC

    Kivi: Verification for Cluster Management

    Authors: Bingzhe Liu, Gangmuk Lim, Ryan Beckett, P. Brighten Godfrey

    Abstract: Modern cloud infrastructure is powered by cluster management systems such as Kubernetes and Docker Swarm. While these systems seek to minimize users' operational burden, the complex, dynamic, and non-deterministic nature of these systems makes them hard to reason about, potentially leading to failures ranging from performance degradation to outages. We present Kivi, the first system for verifying… ▽ More

    Submitted 5 November, 2023; originally announced November 2023.

    Comments: 18 pages

  3. arXiv:2305.03348  [pdf, other

    cs.NI

    Flock: Accurate network fault localization at scale

    Authors: Vipul Harsh, Tong Meng, Kapil Agrawal, P. Brighten Godfrey

    Abstract: Inferring the root cause of failures among thousands of components in a data center network is challenging, especially for "gray" failures that are not reported directly by switches. Faults can be localized through end-to-end measurements, but past localization schemes are either too slow for large-scale networks or sacrifice accuracy. We describe Flock, a network fault localization algorithm and… ▽ More

    Submitted 5 May, 2023; originally announced May 2023.

    Comments: To appear in ACM PACMNET, Vol 1, June 2023

  4. arXiv:1911.02128  [pdf, other

    cs.NI

    Plankton: Scalable network configuration verification through model checking

    Authors: Santhosh Prabhu, Kuan-Yen Chou, Ali Kheradmand, P. Brighten Godfrey, Matthew Caesar

    Abstract: Network configuration verification enables operators to ensure that the network will behave as intended, prior to deployment of their configurations. Although techniques ranging from graph algorithms to SMT solvers have been proposed, scalable configuration verification with sufficient protocol support continues to be a challenge. In this paper, we show that by combining equivalence partitioning w… ▽ More

    Submitted 5 November, 2019; originally announced November 2019.

    Comments: NSDI'20

  5. arXiv:1811.10737  [pdf, other

    cs.NI

    Dissecting Latency in the Internet's Fiber Infrastructure

    Authors: Ilker Nadi Bozkurt, Waqar Aqeel, Debopam Bhattacherjee, Balakrishnan Chandrasekaran, Philip Brighten Godfrey, Gregory Laughlin, Bruce M. Maggs, Ankit Singla

    Abstract: The recent publication of the `InterTubes' map of long-haul fiber-optic cables in the contiguous United States invites an exciting question: how much faster would the Internet be if routes were chosen to minimize latency? Previous measurement campaigns suggest the following rule of thumb for estimating Internet latency: multiply line-of-sight distance by 2.1, then divide by the speed of light in f… ▽ More

    Submitted 26 November, 2018; originally announced November 2018.

  6. arXiv:1811.00212  [pdf, other

    cs.NI

    Expander Datacenters: From Theory to Practice

    Authors: Vipul Harsh, Sangeetha Abdu Jyothi, Inderdeep Singh, P. Brighten Godfrey

    Abstract: Recent work has shown that expander-based data center topologies are robust and can yield superior performance over Clos topologies. However, to achieve these benefits, previous proposals use routing and transport schemes that impede quick industry adoption. In this paper, we examine if expanders can be effective for the technology and environments practical in today's data centers, including the… ▽ More

    Submitted 31 October, 2018; originally announced November 2018.

    Comments: 15 pages, 17 figures

  7. arXiv:1810.03259  [pdf, other

    cs.NI

    Internet Congestion Control via Deep Reinforcement Learning

    Authors: Nathan Jay, Noga H. Rotman, P. Brighten Godfrey, Michael Schapira, Aviv Tamar

    Abstract: We present and investigate a novel and timely application domain for deep reinforcement learning (RL): Internet congestion control. Congestion control is the core networking task of modulating traffic sources' data-transmission rates to efficiently utilize network capacity, and is the subject of extensive attention in light of the advent of Internet services such as live video, virtual reality, In… ▽ More

    Submitted 21 May, 2019; v1 submitted 7 October, 2018; originally announced October 2018.

    Comments: 10 pages, accepted to ICML 2019

  8. arXiv:1809.10897  [pdf, other

    cs.NI

    cISP: A Speed-of-Light Internet Service Provider

    Authors: Debopam Bhattacherjee, Sangeetha Abdu Jyothi, Ilker Nadi Bozkurt, Muhammad Tirmazi, Waqar Aqeel, Anthony Aguirre, Balakrishnan Chandrasekaran, P. Brighten Godfrey, Gregory P. Laughlin, Bruce M. Maggs, Ankit Singla

    Abstract: Low latency is a requirement for a variety of interactive network applications. The Internet, however, is not optimized for latency. We thus explore the design of cost-effective wide-area networks that move data over paths very close to great-circle paths, at speeds very close to the speed of light in vacuum. Our cISP design augments the Internet's fiber with free-space wireless connectivity. cISP… ▽ More

    Submitted 10 October, 2018; v1 submitted 28 September, 2018; originally announced September 2018.

  9. arXiv:1505.03449  [pdf, other

    cs.NI

    Towards a Speed of Light Internet

    Authors: Ankit Singla, Balakrishnan Chandrasekaran, P. Brighten Godfrey, Bruce Maggs

    Abstract: In principle, a network can transfer data at nearly the speed of light. Today's Internet, however, is much slower: our measurements show that latencies are typically more than one, and often more than two orders of magnitude larger than the lower bound implied by the speed of light. Closing this gap would not only add value to today's Internet applications, but might also open the door to exciting… ▽ More

    Submitted 13 May, 2015; originally announced May 2015.

  10. arXiv:1402.2531  [pdf, other

    cs.NI

    Measuring and Understanding Throughput of Network Topologies

    Authors: Sangeetha Abdu Jyothi, Ankit Singla, P. Brighten Godfrey, Alexandra Kolla

    Abstract: High throughput is of particular interest in data center and HPC networks. Although myriad network topologies have been proposed, a broad head-to-head comparison across topologies and across traffic patterns is absent, and the right way to compare worst-case throughput performance is a subtle problem. In this paper, we develop a framework to benchmark the throughput of network topologies, using… ▽ More

    Submitted 14 November, 2016; v1 submitted 11 February, 2014; originally announced February 2014.

  11. arXiv:1309.7066  [pdf, other

    cs.NI

    High Throughput Data Center Topology Design

    Authors: Ankit Singla, P. Brighten Godfrey, Alexandra Kolla

    Abstract: With high throughput networks acquiring a crucial role in supporting data-intensive applications, a variety of data center network topologies have been proposed to achieve high capacity at low cost. While this literature explores a large number of design points, even in the limited case of a network of identical switches, no proposal has been able to claim any notion of optimality. The case of het… ▽ More

    Submitted 12 February, 2014; v1 submitted 26 September, 2013; originally announced September 2013.

    Comments: 15 pages

  12. arXiv:1309.0874  [pdf

    cs.DC cs.DS cs.SI physics.soc-ph

    Shortest Paths in Microseconds

    Authors: Rachit Agarwal, Matthew Caesar, P. Brighten Godfrey, Ben Y. Zhao

    Abstract: Computing shortest paths is a fundamental primitive for several social network applications including socially-sensitive ranking, location-aware search, social auctions and social network privacy. Since these applications compute paths in response to a user query, the goal is to minimize latency while maintaining feasible memory requirements. We present ASAP, a system that achieves this goal by ex… ▽ More

    Submitted 3 September, 2013; originally announced September 2013.

    Comments: Extended version of WOSN'12 paper: new techniques (reduced memory, faster computations), distributed (MapReduce) algorithm, multiple paths between a source-destination pair

  13. arXiv:1306.3707  [pdf, other

    cs.NI

    Low latency via redundancy

    Authors: Ashish Vulimiri, P. Brighten Godfrey, Radhika Mittal, Justine Sherry, Sylvia Ratnasamy, Scott Shenker

    Abstract: Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resou… ▽ More

    Submitted 16 June, 2013; originally announced June 2013.

  14. arXiv:1306.3534  [pdf, other

    cs.NI cs.DC

    A cost-benefit analysis of low latency via added utilization

    Authors: Ashish Vulimiri, P. Brighten Godfrey, Sri Varsha Gorge, Zitian Liu, Scott Shenker

    Abstract: Several recently proposed techniques achieve latency reduction by trading it off for some amount of additional bandwidth usage. But how would one quantify whether the tradeoff is actually beneficial in a given system? We develop an economic cost vs. benefit analysis for answering this question. We use the analysis to derive a benchmark for wide-area client-server applications, and demonstrate how… ▽ More

    Submitted 4 December, 2014; v1 submitted 14 June, 2013; originally announced June 2013.

  15. arXiv:1302.6156  [pdf, other

    cs.NI

    Scalable Routing on Flat Names

    Authors: Ankit Singla, P. Brighten Godfrey, Kevin Fall, Gianluca Iannaccone, Sylvia Ratnasamy

    Abstract: We introduce a protocol which routes on flat, location-independent identifiers with guaranteed scalability and low stretch. Our design builds on theoretical advances in the area of compact routing, and is the first to realize these guarantees in a dynamic distributed setting.

    Submitted 25 February, 2013; originally announced February 2013.

    Comments: 13 pages

    Journal ref: Extends our ACM CoNEXT 2010 paper with proofs for the theoretical results

  16. arXiv:1206.2057  [pdf, other

    cs.NI

    Finishing Flows Quickly with Preemptive Scheduling

    Authors: Chi-Yao Hong, Matthew Caesar, P. Brighten Godfrey

    Abstract: Today's data centers face extreme challenges in providing low latency. However, fair sharing, a principle commonly adopted in current congestion control protocols, is far from optimal for satisfying latency requirements. We propose Preemptive Distributed Quick (PDQ) flow scheduling, a protocol designed to complete flows quickly and meet flow deadlines. PDQ enables flow preemption to approximate… ▽ More

    Submitted 12 June, 2012; v1 submitted 10 June, 2012; originally announced June 2012.

    Comments: The conference version was published in SIGCOMM 2012

  17. arXiv:1206.1134  [pdf

    cs.SI cs.DB physics.soc-ph

    Shortest Paths in Less Than a Millisecond

    Authors: Rachit Agarwal, Matthew Caesar, P. Brighten Godfrey, Ben Y. Zhao

    Abstract: We consider the problem of answering point-to-point shortest path queries on massive social networks. The goal is to answer queries within tens of milliseconds while minimizing the memory requirements. We present a technique that achieves this goal for an extremely large fraction of path queries by exploiting the structure of the social networks. Using evaluations on real-world datasets, we argu… ▽ More

    Submitted 6 June, 2012; originally announced June 2012.

    Comments: 6 pages; to appear in SIGCOMM WOSN 2012

  18. arXiv:1201.2703  [pdf

    cs.DS cs.DC cs.NI cs.SI

    Faster Approximate Distance Queries and Compact Routing in Sparse Graphs

    Authors: Rachit Agarwal, P. Brighten Godfrey, Sariel Har-Peled

    Abstract: A distance oracle is a compact representation of the shortest distance matrix of a graph. It can be queried to approximate shortest paths between any pair of vertices. Any distance oracle that returns paths of worst-case stretch (2k-1) must require space $Ω(n^{1 + 1/k})$ for graphs of n nodes. The hard cases that enforce this lower bound are, however, rather dense graphs with average degree Ω(n^{1… ▽ More

    Submitted 12 January, 2012; originally announced January 2012.

    Comments: 20 pages, an earlier version appeared in INFOCOM 2011, this version presents data structures with improved space/query-time trade-off

  19. arXiv:1201.1661  [pdf, ps, other

    cs.NI

    Slick Packets

    Authors: Giang T. K. Nguyen, Rachit Agarwal, Junda Liu, Matthew Caesar, P. Brighten Godfrey, Scott Shenker

    Abstract: Source-controlled routing has been proposed as a way to improve flexibility of future network architectures, as well as simplifying the data plane. However, if a packet specifies its path, this precludes fast local re-routing within the network. We propose SlickPackets, a novel solution that allows packets to slip around failures by specifying alternate paths in their headers, in the form of compa… ▽ More

    Submitted 8 January, 2012; originally announced January 2012.

    Comments: This is the full version of a paper with the same title that appeared in ACM SIGMETRICS 2011, with the inclusion of the appendix. 16 pages

    ACM Class: C.2.1; C.2.2; C.2.6

  20. arXiv:1110.1687  [pdf, other

    cs.NI

    Jellyfish: Networking Data Centers Randomly

    Authors: Ankit Singla, Chi-Yao Hong, Lucian Popa, P. Brighten Godfrey

    Abstract: Industry experience indicates that the ability to incrementally expand data centers is essential. However, existing high-bandwidth network designs have rigid structure that interferes with incremental expansion. We present Jellyfish, a high-capacity network interconnect, which, by adopting a random graph topology, yields itself naturally to incremental expansion. Somewhat surprisingly, Jellyfish i… ▽ More

    Submitted 20 April, 2012; v1 submitted 7 October, 2011; originally announced October 2011.

    Comments: 14 pages, 12 figures

  21. arXiv:1108.0192  [pdf, other

    cs.NI

    BGP Stability is Precarious

    Authors: P. Brighten Godfrey

    Abstract: We note a fact which is simple, but may be useful for the networking research community: essentially any change to BGP's decision process can cause divergence --- or convergence when BGP would otherwise diverge.

    Submitted 31 July, 2011; originally announced August 2011.

  22. arXiv:0803.0632  [pdf, ps, other

    cs.NI cs.IT

    Network Coding for Distributed Storage Systems

    Authors: Alexandros G. Dimakis, P. Brighten Godfrey, Yunnan Wu, Martin J. Wainwright, Kannan Ramchandran

    Abstract: Distributed storage systems provide reliable access to data through redundancy spread over individually unreliable nodes. Application scenarios include data centers, peer-to-peer storage systems, and storage in wireless networks. Storing data using an erasure code, in fragments spread across nodes, requires less redundancy than simple replication for the same level of reliability. However, since… ▽ More

    Submitted 5 March, 2008; originally announced March 2008.

    Report number: EECS 14546

  23. arXiv:cs/0702015  [pdf, ps, other

    cs.IT cs.NI

    Network Coding for Distributed Storage Systems

    Authors: Alexandros G. Dimakis, P. Brighten Godfrey, Martin J. Wainwright, Kannan Ramchandran

    Abstract: Peer-to-peer distributed storage systems provide reliable access to data through redundancy spread over nodes across the Internet. A key goal is to minimize the amount of bandwidth used to maintain that redundancy. Storing a file using an erasure code, in fragments spread across nodes, promises to require less redundancy and hence less maintenance bandwidth than simple replication to provide the… ▽ More

    Submitted 2 February, 2007; originally announced February 2007.

    Comments: To appear in INFOCOM 2007