Data Level Conflict Resolution Strategy for writing DataFrames to Postgres
A quick and easy solution
A while ago I wrote an article on reading and writing dataframes with FastAPI. At that time, I only had conflict resolution at a table level, but I recently needed to introduce it at the data level for one of my personal projects
This article is just a quick overview of the solution.
What is a Conflict? How Do We Deal With Them?
A conflict is when any two rows in a table are identical on a subset of the columns, which I call data_conflict_properties
. For instance, if I have a table with primary key id
:
Trying to write id=1
, name=yousef
and surname=nami
is a complete conflict. However, I could define my conflict properties to be just on the id
, meaning that I couldn’t write any new entry that has id=1
. In fact, this is actually what the primary key on databases does. What I am proposing with data_conflict_properties
is a flexible way to choose which properties you want to check for conflicts, without necessarily defining those properties as primary keys or unique constraints.
Now that we understand what a conflict is… how do we deal with them? Here are four options:
- append: this means that…