Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
345 views
in Technique[技术] by (71.8m points)

php - Best way to create nested array from tables: multiple queries/loops VS single query/loop style

Say I have 2 tables, which I can "merge" and represent in a single nested array.

I'm wandering what would be the best way to do that, considering:

  • efficiency
  • best-practice
  • DB/server-side usage trade-off
  • what you should do in real life
  • same case for 3, 4 or more tables that can be "merged" that way

The question is about ANY server-side/relational-db.

2 simple ways I was thinking about (if you have others, please suggest! notice I'm asking for a simple SERVER-SIDE and RELATIONAL-DB, so please don't waste your time explaining why I shouldn't use this kind of DB, use MVC design, etc., etc. ...):

  1. 2 loops, 5 simple "SELECT" queries
  2. 1 loop, 1 "JOIN" query

I've tried to give a simple and detailed example, in order to explain myself & understand better your answers (though how to write the code and/or finding possible mistakes is not the issue here, so try not to focus on that...)

SQL SCRIPTS FOR CREATING AND INSERTING DATA TO TABLES

CREATE TABLE persons
(
    id int NOT NULL AUTO_INCREMENT,
    fullName varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO persons (fullName) VALUES ('Alice'), ('Bob'), ('Carl'), ('Dan');

CREATE TABLE phoneNumbers
(
    id int NOT NULL AUTO_INCREMENT,
    personId int,
    phoneNumber varchar(255),
    PRIMARY KEY (id)
);

INSERT INTO phoneNumbers (personId, phoneNumber) VALUES ( 1, '123-456'), ( 1, '234-567'), (1, '345-678'), (2, '456-789'), (2, '567-890'), (3, '678-901'), (4, '789-012');  

A JSON REPRESENTATION OF THE TABLES AFTER I "MERGED" THEM:

[
  {
    "id": 1,
    "fullName": "Alice",
    "phoneNumbers": [
      "123-456",
      "234-567",
      "345-678"
    ]
  },
  {
    "id": 2,
    "fullName": "Bob",
    "phoneNumbers": [
      "456-789",
      "567-890"
    ]
  },
  {
    "id": 3,
    "fullName": "Carl",
    "phoneNumbers": [
      "678-901"
    ]
  },
  {
    "id": 4,
    "fullName": "Dan",
    "phoneNumbers": [
      "789-012"
    ]
  }
]

PSEUDO CODE FOR 2 WAYS:

1.

query: "SELECT id, fullName FROM persons"
personList = new List<Person>()
foreach row x in query result:
    current = new Person(x.fullName)
    "SELECT phoneNumber FROM phoneNumbers WHERE personId = x.id"
    foreach row y in query result:
        current.phoneNumbers.Push(y.phoneNumber)
    personList.Push(current)        
print personList         

2.

query: "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId"
personList = new List<Person>()
current = null
previouseId = null
foreach row x in query result:
    if ( x.id !=  previouseId )
        if ( current != null )
            personList.Push(current)
            current = null
        current = new Person(x.fullName)
    current.phoneNumbers.Push(x.phoneNumber)
print personList            

CODE IMPLEMENTATION IN PHP/MYSQL:

1.

/* get all persons */
$result = mysql_query("SELECT id, fullName FROM persons"); 
$personsArray = array(); //Create an array
//loop all persons
while ($row = mysql_fetch_assoc($result))
{
    //add new person
    $current = array();
    $current['id'] = $row['id'];
    $current['fullName'] = $row['fullName'];

    /* add all person phone-numbers */
    $id = $current['id'];
    $sub_result = mysql_query("SELECT phoneNumber FROM phoneNumbers WHERE personId = {$id}");
    $phoneNumbers = array();
    while ($sub_row = mysql_fetch_assoc($sub_result))
    {
        $phoneNumbers[] = $sub_row['phoneNumber']);
    }
    //add phoneNumbers array to person
    $current['phoneNumbers'] = $phoneNumbers;

    //add person to final result array
    $personsArray[] = $current;
}

echo json_encode($personsArray);

2.

/* get all persons and their phone-numbers in a single query */
$sql = "SELECT persons.id, fullName, phoneNumber FROM persons
            LEFT JOIN phoneNumbers ON persons.id = phoneNumbers.personId";
$result = mysql_query($sql); 

$personsArray = array();
/* init temp vars to save current person's data */
$current = null;
$previouseId = null;
$phoneNumbers = array();
while ($row = mysql_fetch_assoc($result))
{
    /*
       if the current id is different from the previous id:
       you've got to a new person.
       save the previous person (if such exists),
       and create a new one
    */
    if ($row['id'] != $previouseId )
    {
        // in the first iteration,
        // current (previous person) is null,
        // don't add it
        if ( !is_null($current) )
        {
            $current['phoneNumbers'] = $phoneNumbers;
            $personsArray[] = $current;
            $current = null;
            $previouseId = null;
            $phoneNumbers = array();
        }

        // create a new person
        $current = array();
        $current['id'] = $row['id'];
        $current['fullName'] = $row['fullName'];
        // set current as previous id
        $previouseId = $current['id'];
    }

    // you always add the phone-number 
    // to the current phone-number list
    $phoneNumbers[] = $row['phoneNumber'];
    }
}

// don't forget to add the last person (saved in "current")
if (!is_null($current))
    $personsArray[] = $current);

echo json_encode($personsArray);

P.S. this link is an example of a different question here, where i tried to suggest the second way: tables to single json

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Preliminary

First, thank you for putting that much effort into explaining the problem, and for the formatting. It is great to see someone who is clear about what they are doing, and what they are asking.

But it must be noted that that, in itself, forms a limitation: you are fixed on the notion that this is the correct solution, and that with some small correction or guidance, this will work. That is incorrect. So I must ask you to give that notion up, to take a big step back, and to view (a) the whole problem and (b) my answer without that notion.

The context of this answer is:

  • all the explicit considerations you have given, which are very important, which I will not repeat

  • the two most important of which is, what best practice and what I would do in real life

This answer is rooted in Standards, the higher order of, or frame of reference for, best practice. This is what the commercial Client/Server world does, or should be doing.

This issue, this whole problem space, is becoming a common problem. I will give a full consideration here, and thus answer another SO question as well. Therefore it might contain a tiny bit more detail that you require. If it does, please forgive this.

Consideration

  1. The database is a server-based resource, shared by many users. In an online system, the database is constantly changing. It contains that One Version of the Truth (as distinct from One Fact in One Place, which is a separate, Normalisation issue) of each Fact.

    • the fact that some database systems do not have a server architecture, and that therefore the notion of server in such software is false and misleading, are separate but noted points.
  2. As I understand it, JSON and JSON-like structures are required for "performance reasons", precisely because the "server" doesn't, cannot, perform as a server. The concept is to cache the data on each (every) client, such that you are not fetching it from the "server" all the time.

    • This opens up a can of worms. If you do not design and implement this properly, the worms will overrun the app.

    • Such an implementation is a gross violation of the Client/Server Architecture, which allows simple code on both sides, and appropriate deployment of software and data components, such that implementation times are small, and efficiency is high.

    • Further, such an implementation requires a substantial implementation effort, and it is complex, consisting of many parts. Each of those parts must be appropriately designed.

    • The web, and the many books written in this subject area, provide a confusing mix of methods, marketed on the basis of supposed simplicity; ease; anyone-can-do-anything; freeware-can-do-anything; etc. There is not scientific basis for any of those proposals.

Non-architecture & Sub-standard

As evidenced, you have learned that that some approaches to database design are incorrect. You have encountered one problem, one instance that that advice is false. As soon as you solve this one problem, the next problem, which is not apparent to you right now, will be exposed. The notions are a never-ending set of problems.

I will not enumerate all the false notions that are sometimes advocated. I trust that as you progress through my answer, you will notice that one after the other marketed notion is false.

The two bottom lines are:

  1. The notions violate Architecture and Design Standards, namely Client/Server Architecture; Open Architecture; Engineering Principles; and to a lesser in this particular problem, Database Design Principles.

  2. Which leads to people like you, who are trying to do an honest job, being tricked into implementing simple notions, which turn into massive implementations. Implementations that will never quite work, so they require substantial ongoing maintenance, and will eventually be replaced, wholesale.

Architecture

The central principle being violated is, never duplicate anything. The moment you have a location where data is duplicated (due to caching or replication or two separate monolithic apps, etc), you create a duplicate that will go out of synch in an online situation. So the principle is to avoid doing that.

  • Sure, for serious third-party software, such as a gruntly report tool, by design, they may well cache server-based data in the client. But note that they have put hundreds of man-years into implementing it correctly, with due consideration to the above. Yours is not such a piece of software.

Rather than providing a lecture on the principles that must be understood, or the evils and costs of each error, the rest of this answer provides the requested what would you do in real life, using the correct architectural method (a step above best practice).

Architecture 1

Do not confuse

  • the data which must be Normalised

with

  • the result set, which, by definition, is the flattened ("de-normalised" is not quite correct) view of the data.

The data, given that it is Normalised, will not contain duplicate values; repeating groups. The result set will contain duplicate values; repeating groups. That is pedestrian.

  • Note that the notion of Nested Sets (or Nested Relations), which is in my view not good advice, is based on precisely this confusion.

  • For forty-five years since the advent of the RM, they have been unable to differentiate base relations (for which Normalisation does apply) from derived relations (for which Normalisation does not apply).

  • Two of these proponents are currently questioning the definition of First Normal Form. 1NF is the foundation of the other NFs, if the new definition is accepted, all the NFs will be rendered value-less. The result would be that Normalisation itself (sparsely defined in mathematical terms, but clearly understood as a science by professionals) will be severely damaged, if not destroyed.

Architecture 2

There is a centuries-old scientific or engineering principle, that content (data) must be separated from control (program elements). This is because the analysis, design, and implementation of the two are completely different. This principle is no less important in the software sciences, where it has specific articulation.

In order to keep this brief (ha ha), instead of a discourse, I will assume that you understand:

  • That there is a scientifically demanded boundary between data and program elements. Mixing them up results in complex objects that are error-prone and hard to maintain.

    • The confusion of this principle has reached epidemic proportions in the OO/ORM world, the consequences reach far and wide.

    • Only professionals avoid this. For the rest, the great majority, they accept the new definition as "normal", and they spend their lives fixing problems that we simply do not have.

  • The architectural superiority, the great value, of data being both stored and presented in Tabular Form per Dr E F Codd's Relational Model. That there are specific rules for Normalisation of data.

  • And importantly, you can determine when the people, who write and market books, advise non-relational or anti-relational methods.

Architecture 3

If you cache data on the client:

  1. Cache the absolute minimum.

    That means cache only the data that does not change in the online environment. That means Reference and Lookup tables only, the tables that populate the higher level classifiers, the drop-downs, etc.

  2. Currency

    For every table that you do cache, you must have a method of (a) determining that the cached data has become stale, compared to the One Version of the Truth which exists on the server, and (b) refreshing it from the server, (c) on a table-by-table basis.

    Typically, this involves a background process that executes every (e) five minutes, that queries the MAX updated DateTime for each cached table on the client vs the DateTime on the server, and if changed, refreshes the table, and all its child tables, those that dependent on the changed table.

    That, of course, requires that you have an UpdatedDateTime column on every table. That is not a burden, because you need that for OLTP ACID Transactions anyway (if you have a real database, instead of a bunch of sub-standard files).

Which really means, never replicate, the coding burden is prohibitive.

Architecture 4

In the sub-commercial, non-server world, I understand that some people advise the reverse caching of "everything".

  • That is the only way the programs like PostgreSQL, can to the used in a multi-user system.

  • You always get what you pay for: you pay peanuts, you get monkeys; you pay zero, you get zero.

The corollary to Architecture 3 is, if you do cache data on the client, do not cache tables that change frequently. These are the transaction and history tables. The notion of caching such tables, or all tables, on the client is completely bankrupt.

In a genuine Client/Server deployment, due to use of applicable standards, for each data window, the app should query only the rows that are required, for that particular need, at that particular time, based on context or filter values, etc. The app should never load the entire table.

If the same user using the same window inspected its contents, 15 minutes after the first inspection, the data would be 15 mins out of date.

  • For freeware/shareware/vapourware platforms, which define themselves by the absence of a server architecture, and thus by the result, that performance is non-existent, sure, you have to cache more than the minimum tables on the client.

  • If you do that, you must take all the above into account, and implement it correctly, otherwise your app will be broken, and the ramifications will drive the users to seek your termination. If there is more than one user, they will have the same cause, and soon form an army.

Architecture 5

Now we get to how you cache those carefully chosen tables on the client.

Note that databases grow, they are extended.

  • If the system is broken, a failure, it will grow in small increments, and require a lot of effort.

  • If the system is even a small success, it will grow exponentially.

  • If the system (each of the database, and the app, separately) is designed and implemented well, the changes will be easy, the bugs will be few.

Therefore, all the components in the app must be designed properly, to comply with applicable standards, and the database must be fully Normalised. This in turn minimises the effect of changes in the database, on the app, and vice versa.

  • The app will consist of simple, not complex, objects, which are easy to maintain and cha


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

56.9k users

...