Document versioning (was: Coordinate system fix)
W dniu 6 kwietnia 2010 18:31 użytkownik bulia byak <buliabyak@...400...> napisał:
2010/4/6 Krzysztof Kosiński <tweenk.pl@...400...>:
W dniu 6 kwietnia 2010 15:33 użytkownik bulia byak <buliabyak@...400...> napisał:
This sounds inevitable, but rather hairy. Why have both inkscape:version and inkscape:document-version? Can you get what you need with inkscape:version only?
- Document version changes must be atomic, because the upgrade is
conducted only one time for each type of defect. This means inkscape:document-version (or inkscape:document-format) can change several times between two releases.
But why can't we just fix ALL the defects we find between 0.47 and 0.48 and have just two states of upgrade for <=0.47 and for >=0.48?
The upgrade procedure must run exactly once for any document. This means that if we introduce a change in the interpretation of XML between revision FOO and FOO+1, then FOO+1 must perform the upgrade on documents saved in FOO and earlier, but it must also be smart enough not to upgrade documents saved with FOO+1. That's true regardless of how many defects are fixed in each version or how many document formats there are. This means the version number must be accurate down to a single revision, which is a bad idea for reasons I've already mentioned. If we have only the bare version number (e.g. 0.47 or 0.48), then anything saved in FOO+1 or later cannot be opened until we make a stable release.
It would be possible to have the document format version number equal to the Bazaar revision in which it was first introduced. However, this would complicate working on the backwards compatibility fixes in branches. I think a simple number increasing from 1 would be the best way to go.
This is hairy, possibly tedious to implement, etc., but at the same it's the long-term solution.
Regards, Krzysztof
On Apr 6, 2010, at 11:47 AM, Krzysztof Kosiński wrote:
The upgrade procedure must run exactly once for any document. This means that if we introduce a change in the interpretation of XML between revision FOO and FOO+1, then FOO+1 must perform the upgrade on documents saved in FOO and earlier,
[SNIP]
This is hairy, possibly tedious to implement, etc., but at the same it's the long-term solution.
Well, it is *one* way to go, but it is not the *only* way to solve this problem. All considerations of the details of compatibility numbering aside, I think viewing things as an "upgrade process" is viewing the problem in a fundamentally flawed way.
As we kick off for 0.49 we can get to pin down specifics, but this is not really a case where we want to "upgrade files", but rather one where we need to "load different versions correctly". That is the big picture of what we're trying to end up with. The low-level "how" of the implementation can go a few ways, but I've seen viewing things as an "upgrade procedure" will cause us the most long-term pain. A simple shift in how we view the problem, though, can save us quite a lot of complexity and pain.
W dniu 6 kwietnia 2010 21:06 użytkownik Jon Cruz <jon@...18...> napisał:
As we kick off for 0.49 we can get to pin down specifics, but this is not really a case where we want to "upgrade files", but rather one where we need to "load different versions correctly". That is the big picture of what we're trying to end up with. The low-level "how" of the implementation can go a few ways, but I've seen viewing things as an "upgrade procedure" will cause us the most long-term pain. A simple shift in how we view the problem, though, can save us quite a lot of complexity and pain.
"Loading different versions correctly" is not a solution, it is a description of the problem! Propose something specific.
In the "upgrading files" solution, the SP layer code can remain understandable and free of backwards compatibility hacks. It also gives some degree of forward compatibility: if the document format number is higher that what we understand, because it was saved with a later version of Inkscape, we can offer to reload the document as plain SVG.
Regards, Krzysztof
On Apr 6, 2010, at 12:39 PM, Krzysztof Kosiński wrote:
"Loading different versions correctly" is not a solution, it is a description of the problem! Propose something specific.
Exactly!
If one wants to get a good solution to a problem, one must first have an accurate description of that problem. Far too often I see people jump straight into hammering on the details of the "how" without considering the big picutre of the "why".
So for those out there just starting to look at what you've pointed out, a very clear stating of the problem is of utmost importance.
In the "upgrading files" solution, the SP layer code can remain understandable and free of backwards compatibility hacks. It also gives some degree of forward compatibility: if the document format number is higher that what we understand, because it was saved with a later version of Inkscape, we can offer to reload the document as plain SVG.
Ooh, those are all very helpful details. See if you can get those broken up to some simple bullet-point lists on the wiki.
From my viewpoint it does seem like you are considering many of the pertinent factors. It is just that *how* they are being applied might not be the best *how*. For example, avoiding "backwards compatibility hacks" is a very good things, however there are many good design approaches that can be brought to use in solving that specific sub-problem.
And if we can get things listed simply and explicitly then others can help adding a point or two here and there that we might have overlooked. It's in gathering up all such points that one can see which solution will be the best solution overall. We need to be sure to be looking at different solutions to solve the problem, instead of searching for a problem to fit a solution. (that's a trap that is easy to fall into, especially for developing software).
Hi,
Just to give my point of view from my personnal experience, I totally agree with Krzysztof on this. A document versioning information is necessary when previous data has been saved in a wrong way, and on the upgrading point, this is also entirely true. The update procedure must be independant of the core code as it should be incremental and some "old" upgrades could be not always supported by the main structure.
Following is an example of what I'm applying in my software:
const SOFTWARE_VERSION = 4.3;
procedure Upgrade; var CurrentVersion begin CurrentVersion := GetCurrentVersionFromXmlFile;
if CurrentVersion > SOFTWARE_VERSION then begin //Warn user that it's software version seems too old to correctly open the file return; end;
if CurrentVersion <= 1.0 then begin //APPLY ?.? -> 1.0 Upgrades here CurrentVersion := 1.0; end;
if CurrentVersion <= 2.0 then begin //APPLY 1.0 -> 2.0 Upgrades here CurrentVersion := 2.0; end;
if CurrentVersion <= 3.0 then begin //APPLY 2.0 -> 3.0 Upgrades here CurrentVersion := 3.0; end;
// etc... if CurrentVersion <= xx.yy then begin //APPLY 3.0 -> xx.yy Upgrades here CurrentVersion := xx.yy ; end;
if CurrentVersion <= SOFTWARE_VERSION then begin //APPLY xx.yy -> SOFTWARE_VERSION Upgrades here CurrentVersion := SOFTWARE_VERSION;
//Warn user that it's file project has been updated (to a compliant application format) and changes will be kept after a save end;
end;
Note that when I'm applying the version_A -> version_B upgrade, I'm not relying on the core code, all is managed internally by the upgrade procedure in order to avoid upgrade fails that could occured due to a progressive refactoring (deletion, renaming, etc.)
W dniu 6 kwietnia 2010 23:58 użytkownik Yann Papouin <yann.papouin@...400...> napisał:
Following is an example of what I'm applying in my software:
[SNIP]
In C++, I would do something like this. It's practically the same idea, but switch + fallthrough makes it clearer.
bool sp_document_upgrade(Inkscape::XML::Document *doc) { const int document_format = 7; int current_document_format = doc->root()->parseInt("inkscape:document-version"); bool ret = true;
switch (current_document_format) { case 0: fix_3dbox_coords(doc); fix_guide_coords(doc); case 1: add_flowtext_fallbacks(doc); case 2: rename_sodipodi_attributes(doc); case 3: convert_lpe_to_vector_effects(doc); // etc. for later versions case 7: break; // nothing more to do default: ret = false; // unrecognized version - offer to load as plain SVG }
doc->root()->setInt("inkscape:document-version", document_format); return ret; }
Note that when I'm applying the version_A -> version_B upgrade, I'm not relying on the core code, all is managed internally by the upgrade procedure in order to avoid upgrade fails that could occured due to a progressive refactoring (deletion, renaming, etc.)
This is somewhat like using only generic XML processing, instead of the SP tree.
Regardless of what this function will ultimately do, we'd need to create several test documents for every document format change, and verify that the conversion works correctly after every change to older portions of the update procedure.
Regards, Krzysztof
On Apr 6, 2010, at 3:37 PM, Krzysztof Kosiński wrote:
W dniu 6 kwietnia 2010 23:58 użytkownik Yann Papouin <yann.papouin@...400...> napisał:
Following is an example of what I'm applying in my software:
[SNIP]
In C++, I would do something like this. It's practically the same idea, but switch + fallthrough makes it clearer.
That's a good initial approach. However there are several cases we've gone over in the past that make a simple case-statemt sequential update not the best choice.
A lot of this has to do with decisions on Inkscape release versions, incremental updates, etc. There are several situations you can get into where this approach will actually lead to data loss.
case 0: fix_3dbox_coords(doc); fix_guide_coords(doc); case 1: add_flowtext_fallbacks(doc); case 2:
Now that is starting to be quite helpful. Are those actual cases you've hit on, or just representative of what *might* need to be done to be compatible?
Regardless, breaking down things in that manner really helps to reveal some patterns. And one of them I see right at the start is the general one to avoid *version* checks and instead go with *feature* checks. In the abstract this is exactly the same thing that Microsoft warns developers about using various Windows APIs.
The fixup functions you have listed there definitely look like they are organized based on "features". That's very good. If we take that and expand on it a little we can get something quite robust that also will be able to minimize data loss. (let me know when you've got this started in a wiki page and I'll see what past cases we've covered that need to be included). It also is very helpful that you have each 'version' broken into separate logical sub-fixups.
Regardless of what this function will ultimately do, we'd need to create several test documents for every document format change, and verify that the conversion works correctly after every change to older portions of the update procedure.
Yes. You are right. This is a very important step. In fact, this might even be a good first step. Even if we don't actually need the functionality for older versions we can go ahead and collect up some samples of when we've done such breaking changes in the past and then see if they match our expectations. Then we can also expand the set forward with *anticipated* breaking changes such as those related to the coords flip.
2010/4/6 Krzysztof Kosiński <tweenk.pl@...400...>:
W dniu 6 kwietnia 2010 23:58 użytkownik Yann Papouin <yann.papouin@...400...> napisał:
Following is an example of what I'm applying in my software:
[SNIP]
In C++, I would do something like this. It's practically the same idea, but switch + fallthrough makes it clearer.
This makes sense, but I still cannot understand why you can't have instead
case 0.47: ... case 0.48: ...
etc. Is this that you're trying to care for the users of daily builds between 0.47 and 0.48?
On Apr 7, 2010, at 8:11 AM, bulia byak wrote:
This makes sense, but I still cannot understand why you can't have instead
case 0.47: ... case 0.48: ...
etc. Is this that you're trying to care for the users of daily builds between 0.47 and 0.48?
I believe that is what was intended. It seemed fairly straightforward from:
On Apr 6, 2010, at 7:14 AM, Krzysztof Kosiński wrote:
- Document version changes must be atomic, because the upgrade is
conducted only one time for each type of defect. This means inkscape:document-version (or inkscape:document-format) can change several times between two releases.
This does seem to run counter to the consensus from the last time we visited this issue. Do you happen to have any new insights or opinions since then?
Oh, and yes. Among the cases I'm concerned about are those where something is removed from our extras earlier in the dev cycle and then added back before release, or where something is added or changed in an interesting way and then changed back before release. Both are among those situations where a simple case statement implementation will cause data loss.
W dniu 7 kwietnia 2010 17:11 użytkownik bulia byak <buliabyak@...400...> napisał:
This makes sense, but I still cannot understand why you can't have instead
case 0.47: ... case 0.48: ...
etc. Is this that you're trying to care for the users of daily builds between 0.47 and 0.48?
Consider this: 1. I open a 0.47 document with 0.47+devel that has the coords fix 2. I save this document 3. I open it again with the same 0.47+devel
In step 3, how do I know that I've fixed the guides already? If I fix them again, they will be mirrored vertically and we fail. Do I write inkscape:version="0.48" to the SVG after fixing it, even though the actual version is 0.47+devel r9876? I don't like this, because: 1. The SVG now lies about what created it. 2. We can change the document format only once per release. 3. We can't have a revision (let's call it 0.47+foo) that fixes only some defects (e.g. fixes guides but not 3D boxes), and later add more fixes. Otherwise if somebody opens a file in 0.47+foo and saves it, the file will be permanently broken. Later revisions that do fix e.g. 3D boxes will recognize it as already fixed and do nothing to remediate remaining problems. 4. If two people decide to do something that requires XML compatibility fixes in a single release, it will be hard to coordinate between them.
If we don't change inkscape:version in the document and don't introduce a new field, then it would be impossible to use the trunk for any work between the time an XML fix is introduced, and the time the stable version is released. Merging XML fixes just before a release is an extremely bad idea, since there would be much less time to test them.
So far there are three possibilities for the document version number: 1. Inkscape major version. 2. Bazaar revision. This would mean extra code when Inkscape is built from a tarball with no VCS, and in general unacceptably strong ties to the VCS and the build system. Also problematic when working on XML fixes in local branches, which have different revision numbering than the trunk. 3. Some number not tied to Inkscape version, stored as a constant in the definition of the upgrade procedure (what I proposed).
If you don't like the fact that document version numbers would appear unrelated to the Inkscape version, I have to point out that this number doesn't have to be increasing sequentially from 0. The switch will work correctly no matter what the sequence looks like, as long as it doesn't have duplicate elements. It doesn't even have to be monotonic. We can use document format numbers like 4701, 4702, 4703, 4801, 4802, 4901, etc. and store them in XML as "0.47-01", "0.47-02" or in any other easily parsable form.
Minor nitpick: you can't switch on floating point numbers in C++.
W dniu 7 kwietnia 2010 17:29 użytkownik Jon Cruz <jon@...18...> napisał:
Oh, and yes. Among the cases I'm concerned about are those where something is removed from our extras earlier in the dev cycle and then added back before release, or where something is added or changed in an interesting way and then changed back before release. Both are among those situations where a simple case statement implementation will cause data loss.
Those cases would cause data loss only if the rules for modifying the switch are not followed. In particular, you are not allowed to remove code for any of the existing document versions or redefine what operations are performed between document versions. The switch is append-only.
If you made some XML fix that e.g. changed the way some coordinates are expressed but before a release you suddenly think it doesn't make any sense, you have to bump the document format and add code that reverts the fix, rather than removing the code that executes it. As long as the switch is append-only, and none of the upgrade operations cause data loss, it will work correctly.
You can also add a goto to skip the fix-revert sequence if the document version is earlier than the version which introduced the unwanted fix, or if the fix actually breaks something and you only do best-effort recovery when undoing it. However, I would be very careful when adding such shortcuts.
Regards, Krzysztof
On Apr 7, 2010, at 9:45 AM, Krzysztof Kosiński wrote:
If we don't change inkscape:version in the document and don't introduce a new field, then it would be impossible to use the trunk for any work between the time an XML fix is introduced, and the time the stable version is released.
This is probably the number one mistaken assumption.
It definitely is *not* impossible to make things work. A *simple* approach to a solution is to use a compatibility number, but that is most definitely not the *only* way to do it.
There are many, many other ways to approach this, and SVG even has explicit support for some such mechanisms.
Merging XML fixes just before a release is an extremely bad idea, since there would be much less time to test them.
That is really just a straw-man argument, and most definitely is not what I was proposing.
On Apr 7, 2010, at 9:45 AM, Krzysztof Kosiński wrote:
If you made some XML fix that e.g. changed the way some coordinates are expressed but before a release you suddenly think it doesn't make any sense, you have to bump the document format and add code that reverts the fix, rather than removing the code that executes it. As long as the switch is append-only, and none of the upgrade operations cause data loss, it will work correctly.
Problem is, you now have just introduced all sorts of artificial constraints on your simple solution, and they have a high chance of not working.
And, yes, if *any* of the fixups does any sort of change on the data that loses something (e.g. one that was deemed unimportant at one point in time and then later on was found to be significant) then running through all cases *will* result in data loss.
So at the very least we will need full documentation on what will and will not be allowed. And someone adding in an experimental feature that gets removed will result in extra cruft being added to files.
W dniu 7 kwietnia 2010 19:06 użytkownik Jon Cruz <jon@...18...> napisał:
On Apr 7, 2010, at 9:45 AM, Krzysztof Kosiński wrote:
If you made some XML fix that e.g. changed the way some coordinates are expressed but before a release you suddenly think it doesn't make any sense, you have to bump the document format and add code that reverts the fix, rather than removing the code that executes it. As long as the switch is append-only, and none of the upgrade operations cause data loss, it will work correctly.
Problem is, you now have just introduced all sorts of artificial constraints on your simple solution, and they have a high chance of not working.
Those constraints were part of my solution from the very beginning, I just didn't spell them out explicitly because I thought they are obvious. They have a chance not of "not working" but of "not being followed by developers", since they are primarily on the code in the switch and the way in which it is modified by developers. (There is also a constraint on the content of documents, e.g. that every document is really in the format
No matter how the backwards compatibility system is implemented, you cannot prevent the developers from modifying it in a way that breaks it. However, my system has the considerable advantage that it will always work correctly as long as some simple constraints are followed. For other systems, such correctness-guaranteeing constraints might be very complex or not exist at all.
And, yes, if *any* of the fixups does any sort of change on the data that loses something (e.g. one that was deemed unimportant at one point in time and then later on was found to be significant) then running through all cases *will* result in data loss.
Which is the rationale for the goto hack. You can skip both the data loss step and the best effort recovery, but as I said before if you do this, there's a risk that you'll violate some of the constraints.
So at the very least we will need full documentation on what will and will not be allowed. And someone adding in an experimental feature that gets removed will result in extra cruft being added to files.
Yes, if we decide to implement this, there will be detailed documentation on what you're allowed to do, what is a good idea and what's not. The cruft will only accumulate in the upgrade switch (which is actually its purpose). Any deprecated cruft in the documents themselves can be removed using the upgrade mechanism.
W dniu 7 kwietnia 2010 19:25 użytkownik bulia byak <buliabyak@...400...> napisał:
2010/4/7 Krzysztof Kosiński <tweenk.pl@...400...>:
Consider this:
- I open a 0.47 document with 0.47+devel that has the coords fix
- I save this document
But why do you think it will trigger the upgrade? The upgrade will only happen when your Inkscape is 0.48. The devel series is as good as plain 0.47. Those who want to test the conversion before 0.47 can do this by manually changing version in their tree.
The point of this mechanism is not to make the XML nicer, but to allow us to remove backwards compatibility hacks from the SP tree. Consider this situation: 1. I change the interpretation of guide coordinates in sp-guide.cpp. 2. At the same time, I introduce the XML upgrade which removes the Y axis inversion in XML. 3. Now I have to *downgrade* the document when saving - e.g. add cruft back in!
This means for every XML change, I have to implement the conversion in both directions. It also means that the XML editor does not represent what the content of the saved file will actually look like.
Point 3 might actually be good to have, but in this case we could just leave the choice of the document version up to the user. We would also need unit tests for every operation, to guarantee that each of them really is reversible.
Regards, Krzysztof
2010/4/7 Krzysztof Kosiński <tweenk.pl@...400...>:
Consider this:
- I open a 0.47 document with 0.47+devel that has the coords fix
- I save this document
But why do you think it will trigger the upgrade? The upgrade will only happen when your Inkscape is 0.48. The devel series is as good as plain 0.47. Those who want to test the conversion before 0.47 can do this by manually changing version in their tree.
- I open it again with the same 0.47+devel
In step 3, how do I know that I've fixed the guides already? If I fix them again, they will be mirrored vertically and we fail. Do I write inkscape:version="0.48" to the SVG after fixing it, even though the actual version is 0.47+devel r9876?
No, you write 0.48 to the document only if your Inkscape is 0.48
I don't like this, because:
- The SVG now lies about what created it.
- We can change the document format only once per release.
Not really, you can implement and test any number of upgrades in the 0.48 cycle, but only effect them on users when they actually upgrade to 0.48 release.
Minor nitpick: you can't switch on floating point numbers in C++.
I know :) It was only an illustration :)
On Wed, 2010-04-07 at 18:45 +0200, Krzysztof Kosiński wrote:
In step 3, how do I know that I've fixed the guides already? If I fix them again, they will be mirrored vertically and we fail. Do I write inkscape:version="0.48" to the SVG after fixing it, even though the actual version is 0.47+devel r9876? I don't like this, because:
- The SVG now lies about what created it.
- We can change the document format only once per release.
- We can't have a revision (let's call it 0.47+foo) that fixes only
some defects (e.g. fixes guides but not 3D boxes), and later add more fixes. Otherwise if somebody opens a file in 0.47+foo and saves it, the file will be permanently broken. Later revisions that do fix e.g. 3D boxes will recognize it as already fixed and do nothing to remediate remaining problems. 4. If two people decide to do something that requires XML compatibility fixes in a single release, it will be hard to coordinate between them.
Okay, but if the coords orientation is optional (as it should be), then this whole conversation makes no sense. It makes sense to me if the coords are stored on a per-document basis (but top-left oriented on all new docs), and people could then flip them on old docs to cause the upgrade to happen if they wish. Then you don't have to worry about knowing which is which. If it doesn't have the markup stating the coord system, write that it has the old one and it gets saved to the doc... if it does have it, serve up the guides and 3d boxes oriented correctly for whichever system (normal or inverted-y).
In this scenario, the bottom-left oriented people aren't forced into changing (they can set up templates with bottom-left orientation if they prefer it that much), and there's no worrying about who we upgrade things for and if it's been done.
Cheers, Josh
W dniu 7 kwietnia 2010 20:22 użytkownik Joshua A. Andler <scislac@...400...> napisał:
Okay, but if the coords orientation is optional (as it should be), then this whole conversation makes no sense.
You are introducing confusion :)
We cannot pick the orientation at the XML level. Everything stored in the XML must be in SVG coordinates. Otherwise it won't display correctly in other SVG viewers.
Changing the desktop coordinate system should be implemented at the UI level, and propagated to the XML by adding a transform attribute to the root SVG element.
This system is about fixing an error in existing Inkscape files, namely that some objects (guides and 3D boxes) are stored using a different coordinate system than all other objects. It is not related to whether we allow to change the desktop coordinate system or not.
Regards, Krzysztof
On Wed, 2010-04-07 at 20:43 +0200, Krzysztof Kosiński wrote:
W dniu 7 kwietnia 2010 20:22 użytkownik Joshua A. Andler <scislac@...400...> napisał:
Okay, but if the coords orientation is optional (as it should be), then this whole conversation makes no sense.
You are introducing confusion :)
Are you sure? It seems more simple than your proposed solution to my simple mine. :D
We cannot pick the orientation at the XML level. Everything stored in the XML must be in SVG coordinates. Otherwise it won't display correctly in other SVG viewers.
Didn't our SVGs display correctly in other SVG viewers before? I only suggested adding a preference attribute in the xml, not mucking with actual doc coords.
Cheers, Josh
W dniu 7 kwietnia 2010 21:00 użytkownik Joshua A. Andler <scislac@...400...> napisał:
Didn't our SVGs display correctly in other SVG viewers before? I only suggested adding a preference attribute in the xml, not mucking with actual doc coords.
This "XML upgrade" system is about making sure that what the document looks like does not depend on this setting. As it is now, 3D boxes and guides would only work with one specific desktop coordinate system (and to make matters worse, a non-default one) and behave incorrectly in all others.
Regards, Krzysztof
Please forgive me for entering this discussion at this stage, but I am neither a master of Inkscape development nor of SVG structure. So, I hesitated to make any comment because it would necessarily be only slightly informed by having worked with other data structures that also evolve as the programmers work on the programs that create and maintain them.
Specifically I have worked with the Initial Graphics Exchange Specification (IGES) and the Product Data Exchange Specification (PDES). Both of those standards attempt to move data from one computer-aided drawing/design system to another through the neutral data format specified in the standard. There are several small companies that work to customize SysA to Standard to SysB solutions. None of them ever attacks the entire file from SysA in one all encompassing translation or fix-up utility. The problem is too complex for such a solution. In fact, if you look at PDES, you'll find they carefully set up Application Profiles that identify workable subsets of possible engineering data and only work one of those subsets with each utility. Even then the subset may be too complex. All the programmers expect trained users to babysit the translations and report back errors in the translation. Some don't even release their translators, but hire themselves out to intelligently use the tools, because they know they are the only ones who know what the programs can really do accurately.
To sum up, my suggestion is you may need to attack the problem of evolving data structures from a more granular perspective than 0.47 to 0.48.
participants (6)
-
unknown@example.com
-
bulia byak
-
Jon Cruz
-
Joshua A. Andler
-
Krzysztof Kosiński
-
Yann Papouin