hi my dears, I have an issue at work where we have to work with millions (150 mln~) of product data points. We are using SQL server because it was inhouse available for development. however using various tables growing beyond 10 mln the server becomes quite slow and waiting/buffer time becomes >7000ms/sec. which is tearing our complete setup of various microservices who read, write and delete from the tables continuously down. All the stackoverflow answers lead to - its complex. read a 2000 page book.
the thing is. my queries are not that complex. they simply go through the whole table to identify any duplicates which are not further processed then, because the processing takes time (which we thought would be the bottleneck). but the time savings to not process duplicates seems now probably less than that it takes to compare batches with the SQL table. the other culprit is that our server runs on a HDD which is with 150mb read and write per second probably on its edge.
the question is. is there a wizard move to bypass any of my restriction or is a change in the setup and algorithm inevitable?
edit: I know that my questions seems broad. but as I am new to database architecture I welcome any input and discussion since the topic itself is a lifetime know-how by itself. thanks for every feedbach.
What? Problems like this usually come down to some missing indexes. Can you view the query plan for your slow queries? See how long they are taking? IDK about SQL Server but usually there is a command called something like ANALYZE, that breaks down a query into the different parts of its execution plan, executes it, and measures how long each part takes. If you see something like “FULL TABLE SCAN” taking a long time, that can usually be fixed with an index.
If this doesn’t make any sense to you, ask if there are any database gurus at your company, or book a few hours with a consultant. If you go the paid consultant route, say you want someone good at SQL Server query optimization.
By the way I think some people in this thread are overestimating the complexity of this type of problem or are maybe unintentionally spreading FUD. I’m not a DB guru but I would say that by now I’m somewhat clueful, and I got that way mostly by reading the SQLlite docs including the implementation manuals over a few evenings. That’s probably a few hundred pages but not 2000 or anything like that.
First question: how many separate tables does your DB have? If less than say 20, you are probably in simple territory.
Also, look at your slowest queries. They likely say SELECT something FROM this JOIN that JOIN otherthing bla bla bla. How many different JOINs are in that query? If just one, you probably need an index; if two or three, it might take a bit of head scratching; and if 4 or more, something is possibly wrong with your schema or how the queries are written and you have to straighten that out.
Basically from having seen this type of thing many times before, there is about a 50% that it can be solved with very little effort, by adding indexes based on studying the slow query executions.