Subscribe to keep up with Citus Con news. Sign up
Virtual sessions at-a-glance
Each livestream has a different keynote with ~6 unique sessions, so you may want to join both livestreams. And the ~25 on-demand sessions are inspiring too. All talks are in English and will have captioning. Be sure to click the “add to calendar” buttons.
Join the #cituscon channel on Discord to join the conversation.Explore on-demand talks
- Live sessions Americas Live Sessions Watch livestream replay
EMEA Live Sessions Watch livestream replay
- KEYNOTE: Big Opportunities in Small DataSimon Willison
Not all data needs to be big.
Civic data is more abundant than ever, with local and national governments around the world publishing rich data...
Not all data needs to be big.
Civic data is more abundant than ever, with local and national governments around the world publishing rich data to open data portals. Every organization has untapped data about their business, and every individual has untapped data about their personal activity.
This data is measured in megabytes, not terabytes. What's missing are the tools that help people explore and understand data at this smaller scale: too big for Excel, but not so big that it demands a Big Data warehouse.
I've spent the last five years exploring the world of small data with Datasette, my open source multi-tool for exploring, analyzing and publishing data.
Datasette is built on SQLite. Why SQLite? It's tiny, fast and ubiquitous - and supports a workflow where databases can be created, shared and even discarded with ease.
Through the lens of Datasette and SQLite, I'll explore this problem space and ask how the PostgreSQL ecosystem can evolve to best address the fascinating opportunities presented by Small Data.Simon WillisonCreator of Datasette.
Simon Willison is the creator of Datasette, an open source tool for exploring and publishing data. He recently completed a JSK journalism fellowship at Stanford, during which he focused on building open source tools for journalism based on his experience working as a data journalist at the UK's Guardian newspaper.
Prior to the fellowship, Simon was an engineering director at Eventbrite. Simon joined Eventbrite through their acquisition of Lanyrd, a Y Combinator funded company he co-founded in 2010.
He is a co-creator of the Django Web Framework, and has been blogging about web development and programming since 2002 at https://simonwillison.net/
- Postgres without SQL: Natural language queries using GPT-3 & RustJelte Fennema
Generative AI has gotten very popular in the last year. In this talk I’ll show a Postgres extension that I wrote, which allows you to use the power...
Generative AI has gotten very popular in the last year. In this talk I’ll show a Postgres extension that I wrote, which allows you to use the power of GPT-3 right from your database. This new extension makes it easy to optimize your database schema, query your data, and even distribute your Postgres tables using the open source Citus database extension. All of this by using normal human language, and without the need to know any SQL. So, now you can finally jump on the NoSQL bandwagon, while still using good old trusty Postgres .
In this talk you’ll learn:
Jelte FennemaSenior Software Engineer @ Microsoft
- How to write Postgres extensions using Rust (no Rust experience required)
- How generative AI in your database can make your own life easier
- What dangers to watch out for when letting AI loose on your Postgres database
Currently I'm working on Citus, Postgres and Pgbouncer at Microsoft. Before that I was a big time Postgres user at Stream, where I worked on low latency APIs for chat and social timelines. I'm one of the current maintainers of the PgBouncer project. I studied at the University of Amsterdam where I got my BSc in Computer Science and MSc in System and Network Engineering.
- Citus and JSON for real-time analytics at Vizor GamesIvan Vyazmitinov
At Vizor Games, we rely completely on the open-source Citus extension to Postgres, since it combines the widely adopted, feature rich and mature...
At Vizor Games, we rely completely on the open-source Citus extension to Postgres, since it combines the widely adopted, feature rich and mature PostgreSQL database with the possibility to scale indefinitely.
In this Citus and PostgreSQL user talk I will discuss:
Ivan VyazmitinovVizor Games, Internal Tools Tech Lead
- Deploying the Citus database cluster on bare metal servers with Gentoo and Btrfs
- Implementation of ETL with big amount of raw data (about 100GB per day)
- Building analytics on semi-structured JSONB data and creation of analytical layer of views over it. (Side-note: Postgres support of JSONBs is a key foundation to our analytics)
- Administering Citus on 20+ databases within one cluster
- Interaction with direct cluster users, including analytics and data scientists
- Integration with BI tools, like Tableau and Metabase
I am a Java developer with 5+ years of experience and an accidental DBA. Starting from 2018 in Vizor Games I've gradually accepted role of tech lead and gained experience with Citus during development of internal analytics system.
- Deploying PostgreSQL to Azure with BicepPamela Fox
The best kind of deploy is a repeatable deploy; one that you can redo and know that your infrastructure will be configured exactly the way you like...
The best kind of deploy is a repeatable deploy; one that you can redo and know that your infrastructure will be configured exactly the way you like it. For Azure deployments, the Bicep language enables you to programmatically describe your Azure infrastructure and deploy a whole web app stack with a single command.
In this talk, I'll explain the basics of using Bicep to deploy an app, with a focus on configuring Flexible Servers in the Azure Database for PostgreSQL managed service. I'll show firewall configuration, secret generation and Key Vault storage, and virtual network setup. Together, we'll experience the joy of repeatable PostgreSQL deploys on Azure!Pamela FoxCloud Advocate in Python, Microsoft
Pamela Fox is a human that loves to learn, teach, and create. She's currently a Cloud Advocate in Python at Microsoft, where she helps developers use Python with the many Azure offerings.
On the teaching front, Pamela has taught computer science at UC Berkeley and volunteered in bay area classrooms as part of the TEALS, GirlsWhoCode, and CoderDojo organizations. She also started the SF chapter of GirlDevelopIt, where she taught dozens of web development workshops.
Pamela's been in the tech industry for 15 years now, starting with her first role at Google as one of their first developer advocates. She went on to be an early full-stack engineer at Coursera and spent many years after at Khan Academy, both as an engineer and the creator of the computer programming content.
- Additional IO Observability in Postgres with pg_stat_ioMelanie Plageman
pg_stat_io, a new cumulative statistics view in Postgres, provides additional visibility into IO activity split out by backend type, IO context,...
pg_stat_io, a new cumulative statistics view in Postgres, provides additional visibility into IO activity split out by backend type, IO context, and IO operation.
Previously, IO statistics in Postgres, both those built-in and those available through extensions, did not divide IO activity at a sufficient level of granularity to inform tuning decisions. pg_stat_io addresses these gaps.
Using pared-down walk-throughs of the internal Postgres systems responsible for accessing and persisting your data, this talk will explain the causes of common IO bottlenecks. Then, through systematic breakdowns of potential symptoms visible in pg_stat_io, it will explore the most likely misconfigurations leading to these issues.Melanie PlagemanSenior Software Engineer at Microsoft
Melanie is a Postgres hacker working at Microsoft. She has worked on the Postgres executor, planner, storage, and statistics subsystems. Most recently she has been hacking on the proposed asynchronous and direct IO patch set. She is passionate about writing maintainable code and about building developer tools.
- On compression of everything in PostgresAndrey Borodin
For many years Postgres had only a pglz algorithm to compress TOASTs and full page images in WAL. But recently things have started to change!
For many years Postgres had only a pglz algorithm to compress TOASTs and full page images in WAL. But recently things have started to change!
In this talk, I'm going to uncover the impact of Postgres features already committed - lz4 everywhere. But what's more important are the prospects of compression application that are yet to come: protocol compression, temp files compression, WAL compression, and data segments compression.
One of the most interesting, required components is the so-called random access compressed file. In some cases, we need even random write compressed files. And there are so many approaches (and working implementations!) to do this. Having this component in the Postgres core would allow many very cool things. I think that this component could increase OLTP performance by a very significant multiplier on IO bottlenecked installations.Andrey BorodinPostgres Contributor
Hacking on Postgres since 2016. Associated professor at Yandex School for Data Analysis and Ural Federal University.
- Post-Americas Livestream Wrap UpClaire Giordano Robert Treat
Wrap-up to the Americas Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Robert Treat. Includes...
Wrap-up to the Americas Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Robert Treat. Includes impressions of the overall event from the virtual hallway track on Discord, to the swag, and links to the other livestream in EMEA—plus highlight video reels of the 25 on-demand talks, too.Claire GiordanoCitus & Postgres Open Source Champion @ MicrosoftRobert TreatI make Postgres less painful
- KEYNOTE: The Distributed PostgreSQL problem & how Citus solves itMarco Slot
Building Distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. Early on, we decided to architect...
Building Distributed PostgreSQL is perhaps one of the most challenging software engineering projects imaginable. Early on, we decided to architect Citus as a PostgreSQL extension. That way, Citus would always remain part of the PostgreSQL ecosystem even as PostgreSQL keeps developing. Moreover, architecting Citus as an extension made distribution a feature that can simply be added to PostgreSQL without losing any of its versatile feature set or its mature, efficient implementations.
The goal of Citus is to provide high PostgreSQL performance at any scale, but we learned that simply distributing data across machines is rarely sufficient to achieve that. We needed crisp distribution concepts and careful trade-offs that favor workload patterns that benefit from scaling out. Moreover, we had to tackle many complex engineering problems given the large PostgreSQL feature set, failures and concurrency in distributed systems, and mission-critical nature of databases.
In this keynote, I will discuss the main engineering challenges we faced over the past 10 years of developing the fastest, most mature, open-source Distributed PostgreSQL implementation: Citus.Marco SlotPrincipal Software Engineer on the Citus team at Microsoft
Marco Slot is a Principal Software Engineer on the Citus team at Microsoft. He has been working on PostgreSQL extensions including Citus and pg_cron since 2014 when he joined Citus Data and has continued to lead the Citus development at Microsoft since 2019. Prior to Citus Data, Marco earned a PhD in cooperative self-driving cars at Trinity College Dublin and worked on globally distributed systems at Amazon Web Services.
- Parallelism in PostgreSQL 15Thomas Munro
An introduction to the way PostgreSQL plans and executes parallel queries. In contrast to the distributed multi-server parallelism that the Citus...
An introduction to the way PostgreSQL plans and executes parallel queries. In contrast to the distributed multi-server parallelism that the Citus database extension provides, this talk is about stock PostgreSQL using multiple CPU cores on a single machine to run a single query.
The talk will illustrate the key concepts and problems by working through simple examples of workloads that can and can't benefit from CPU parallelism. There are also many cases where parallelism could help, but doesn't yet. Some of the opportunities for future development will be discussed.Thomas MunroPostgreSQL hacker working at Microsoft
I am a PostgreSQL developer and committer based in New Zealand. I began working full time on PostgreSQL and related technologies about 8 years ago, first at EnterpriseDB and now Citus/Microsoft. Before that I worked with Unix and relational databases in the web, finance and software industries for a couple of decades. Some of my PostgreSQL interests include query parallelism, taming resource management, transaction machinery, portability, and modernizing database/operating system interfaces. My other interests include hacking on the FreeBSD operating system, trying to learn other languages and trying to ride on various forms of transport with wheels or fins.
- What I learned benchmarking Citus & Postgres performance with HammerDBNaisila Puka
In this session, you will learn about measuring database performance of Azure Cosmos DB for PostgreSQL, the new home for Citus on Azure, through...
In this session, you will learn about measuring database performance of Azure Cosmos DB for PostgreSQL, the new home for Citus on Azure, through the HammerDB benchmark. When I first started, it wasn't obvious how to even get started with "running benchmarks". Therefore, you will get a tour of the whole process, including the following:
Naisila PukaSoftware Engineer at Microsoft
- Choosing the Citus database cluster we're interested in testing: its size and the tune of each node
- Configuring HammerDB based on the cluster's capacity in order to utilize the cluster's resources as much as possible
- Interpreting the benchmark result
- Tweaking parameters in steps 1 and 2 based on the CPU and Disk utilization graphs
Software Engineer working on the Citus Engine product in the Postgres team at Microsoft. Fan of organizing & decluttering (which I try to apply in my daily work in Citus as well), algebraic objects, and foreign languages.
- Postgres Storytelling: Support in the Darkest HourBoriss Mejías
This is a story experienced by Monica DeBea, a talented Postgres support engineer based in Brussels. A story some of you may have experienced...
This is a story experienced by Monica DeBea, a talented Postgres support engineer based in Brussels. A story some of you may have experienced firsthand. And if you haven’t yet, then you someday might. So grab a cup of your favorite beverage, and get ready for some Postgres storytelling.
One dark and cold night in February, Monica was the last person in the building, left alone with an almost broken application. The monitoring system she has just put in place immediately started alerting about the age of the oldest transaction. Have you ever heard about transaction ID wraparound in PostgreSQL? This is what Monica was fighting against. Why was vacuum not freezing old transaction ids? The autovacuum process was not doing it, and a manual VACUUM execution was not helping either. Monica searched and searched for the root cause. She kept searching and the clock kept ticking. The sun was gone, her teammates were gone, even the office lights were dark—but Monica was not going to allow any database downtime on her watch. This is a story of Postgres support in the darkest hour.Boriss MejíasPostgreSQL Solution Architect at EDB
I'm a holistic system software engineer, PostgreSQL solution architect at EDB, free software user, and headbanger. I got my PhD researching distributed self-managing systems and I have been working with PostgreSQL since version 9.1. In 2018 I started the PostgreSQL User Group in Belgium. I have presented in many conferences in academia, open source, and Postgres. Being a father of two fantastic daughters, I also have experience in storytelling. Now that they have grown up, I have decided to try telling stories to the Postgres community.
- How we keep Azure Database for PostgreSQL free of bloat to maximize disk spaceBob Wuisman Eleni Siampali
We were facing the issue that bloat was not removed effectively, resulting in poor database and server performance. Now we automatically update the...
We were facing the issue that bloat was not removed effectively, resulting in poor database and server performance. Now we automatically update the autovacuum settings every week on our Azure Database for PostgreSQL flexible servers.
Each week we add new client database servers to our Azure Database for PostgreSQL Flexible Server subscription. Each database is different in size and activity. Some are >300GB with daily insert, update, and delete activities; some are <10 GB with little frequent changes; and then we have everything in between. This changes per table in the databases as well.
Our automation of autovacuum has improved and stabilized Postgres query performance significantly—and saved consistently more than one TB of server disk space and growing with each new database being added.
The weekly automatic updates to the following autovacuum parameters, based on segmenting the database tables in different clusters:
- Autovacuum_vacuum_cost_limit (increase)
- Autovacuum_vacuum_cost_delay (reduce)
The 7 steps:
Bob WuismanEbiquity, Head of Production (Data and Technology)
- Design the solution
- Collect database statistics
- Segment the database tables
- Determine the autovacuuming factors
- Automatically update the factors on each table
- Return diskspace back to the server
- Analyze results and adjust where needed
Bob has successfully built business intelligence environments in various businesses. With a holistic vision and process driven mindset, Bob thrives on building teams and sustainably growing data driven operations.
Database servers: PostgreSQL server, Data warehousing, data architecture, Data Governance Professional competences include people-, process-, project Management.Eleni SiampaliEbiquity, Senior Data Engineer
Eleni is a Data Engineer who has years of experience in automating data processes and running data science projects. She thrives in solving complex projects and trying out new applications by combining and integrating different technologies. Within Ebiquity she has automated many processes and enabled automated query tasks across multiple Postgres databases. Technical competences are: PostgreSQL, Python, Databricks, Kubernetes, Docker and Azure Pipelines.
- Citus & Patroni: The Key to Scalable and Fault-Tolerant PostgreSQLAlexander Kukushkin
Citus is an open source extension to PostgreSQL that enables you to scale out your database horizontally by sharding your data across many nodes,...
Citus is an open source extension to PostgreSQL that enables you to scale out your database horizontally by sharding your data across many nodes, and Patroni is an open source tool for managing and automating PostgreSQL High Availability. Combined together three open source projects become a superhero – a scalable PostgreSQL cluster with self-healing capabilities.
In my presentation I will cover implementation details of Patroni & Citus integration, and do a live-demo of cluster deployment and showcase maintenance on Citus worker nodes without interrupting client connections.Alexander KukushkinPrincipal Software Engineer at Microsoft
Alexander is better known in PostgreSQL community as "the Patroni guy". Patroni is an open source tool for implementing PostgreSQL clustering and high availability. Besides Patroni Alexander occasionally contributes to PostgreSQL and other open source projects and tools, usually Postgres related.
- Post-EMEA Livestream Wrap UpClaire Giordano Jelte Fennema
Wrap-up to the EMEA Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Jelte Fennema. Includes...
Wrap-up to the EMEA Livestream for Citus Con: An Event for Postgres 2023, with livestream co-hosts Claire Giordano and Jelte Fennema. Includes impressions of the overall event from the virtual hallway track on Discord, to the swag, and links to the other livestream in Americas—plus highlight video reels of the 25 on-demand talks, too.Claire GiordanoCitus & Postgres Open Source Champion @ MicrosoftJelte FennemaSenior Software Engineer @ Microsoft
The Postgres and Citus team at Microsoft is proud to be the host of Citus Con: An Event for Postgres.