

Is this already in planning? This would be highly appreciated. Yossi Shaul speaks of a better cleanup mechanism. > Seq Scan on indexed_archives_entries i (cost=80.78 rows=19742278 width=8) Hash Cond: (archive_names.name_id = i.entry_name_id) We run with Artifactory 3.6 and Postgresql 9.3.6 and this DELETE statement is still a long running one. The new query took about 1 second instead of a couple of hours. So when the 'NOT IN' query is slower on MySQL on average, why is using it better?

In the long term we will have to devise a better cleanup mechanism. So for the short term, I'll use 'NOT EXISTS' for PostgreSQL and 'NOT IN' for other databases. We tested those two queries on MySQL, and on average the query with 'NOT EXISTS' is 10-15% slower on average. Apparently, PostgreSQL suffers badly from inefficient query plan when performing the query with the 'NOT IN' syntax. The query might work on big tables and it was expected to take a while. This is indeed faster, but only in PostgreSQL. It would be great to check if this would work and if this is also faster, because the performance of the current approach is really bad. " Index Cond: (entry_name_id = archive_names.name_id)" " -> Index Scan using indexed_entries_name_idx on indexed_archives_entries (cost=0.43.2.78 rows=96 width=14)" " -> Seq Scan on indexed_archives_entries (cost=2.95 rows=2939295 width=8)"Įxplain DELETE FROM archive_names WHERE NOT EXISTS (SELECT 1 FROM indexed_archives_entries WHERE entry_name_id=name_id) I checked the query plan of the original query and also the same with a modified query using not exists:Įxplain DELETE FROM archive_names WHERE name_id NOT IN (SELECT entry_name_id FROM indexed_archives_entries) Something like a join and getting the IDs and then DELETE by id maybe do the trick.īecause the query is running for days now it would be great to fix this. So I think using NOT IN is not the right way, because it uses a sequential scan of the indexed_archive_entries table. As the archive_names table usually is smaller than the indexed_archive_entries it should be possible to optimize the query because in our case we have about 4 million rows in indexed_archive_entries and about 100.000 rows in archive_names. I think this maybe is not a bug (although it feels like one) but it should be improved.

Note that you have to restart and manually clear the related database tables (at your own risk - backup your db first in any case). This can be disabled by editing the mimetypes.xml file.
POSTGRES ANTI JOIN ARCHIVE
Please note that I wrote archive indexing. Thanks for your answer, but the Maven Indexer Support is not enable, so I guest the problem may come from an other thing. You can't cancel this query but you can cancel the archive indexing if you're not using the class search feature and then empty the related tables (while Artifactory is down) There is a solution for cancel this query ? I try to increase the work memory per process to 50Mb, I had the same problem. Maybe the NOT IN clause makes an issue performance. │ -> Seq Scan on indexed_archives_entries (cost=4.92 rows=1923992 width=8) │ I saw a long query in the database server :ĭELETE FROM artifactory.archive_names WHERE name_id NOT IN (SELECT entry_name_id FROM artifactory.indexed_archives_entries) Įxplain DELETE FROM artifactory.archive_names WHERE name_id NOT IN (SELECT entry_name_id FROM artifactory.indexed_archives_entries) Sometimes I have a postmaster process (worker process for Postgres Database) that consumes 100% CPU on my Artifactory server.
