Counterproductive commits by Jon
Hello I noticed that Jon Cruz is removing forward headers and replacing them with hand-written forward declarations. This is completely backwards! Jon why are you doing this? Regards, Krzysztof
On Oct 4, 2011, at 9:27 AM, Krzysztof Kosiński wrote:
Hello I noticed that Jon Cruz is removing forward headers and replacing them with hand-written forward declarations. This is completely backwards! Jon why are you doing this?
Quite simple: I'm not undoing the work, I'm finishing it.
Forward declarations are good, but forward declaration files are not really.
One main rule we have in play here is "Don't include headers you don't need. Especially in headers." And often also "make headers independent".
There are two main benefits:
1) reduces compile time by avoiding pulling in additional headers.
2) minimizes rebuild times when one header is changed.
So we remove #including of detailed extra headers. That is good. However, if we replace that with "let's include this *different* file instead", then we lose much of the benefit. Though that one forward.h might compile a little faster, it still has to be checked and consulted. Also it introduces an artificial dependency that when touched will cascade out and cause more files to be compiled than should. (The main example we have in our source code of such cascading is verbs.h - touch that and the whole world rebuilds)
The full solution is to just forward-declare the single items *used* within a given header. No extra #includes need to be added. It also makes those .h files independent of each other.
Take a look at the changes I was making. Among other things you should see that though there might be dozens of declarations in a forward.h, any given .h file only needs perhaps one or two of those.
Then over time the forward.h files tend to accumulate cruft. If the have outdated or sometimes wrong declarations, or unused ones, they just add to the potential for slowdown or error. On the other hand if forward declarations are in a file that needed them, one can very trivially search that one .h file to see if they are used. If not, they can be removed with no problem. Much easier to keep in sync and free of cruft. And safer to do so also (especially when one considers conditional or platform specific code). And safer code is good code.
Another potential problem is the naming. A work-around is to prefix a forward.h with some other name, but we still end up with at least a few potential collisions. On the other hand, if each .h file is independent with it's own few declarations, then there does not even exist the possibility of needing a different file with the same name.
So picture a specific method being added that takes a reference to some type. One approach is for the coder to add a forward declaration for that type in the .h file that he is also adding the method to. One file touched, and only consumers of that specific file get rebuilt.
The alternative is to place the type in a forward.h. So in that case the coder has to edit the .h to add a new method, also add a new #include line to that same .h and expand it's dependencies, and finally has to also edit the pertinent forward.h file to add a declaration for that type. At least two .h files get changed, but since one is a .h file that is pulled into many other .h files *and* .cpp files (though .cpp files should never use them, they often end up accidentally having them). So now many dozens of .cpp files get rebuilt in response to what should have been an isolated case.
To be fair, that scenario is a bit less frequent, but we have a choice of following a practice that ensures it *never* gets to happen vs. one that let's it *sometimes* happen, we should always go for the code enforcing the safer of never allowing the problem.
On 4-10-2011 21:48, Jon Cruz wrote:
On Oct 4, 2011, at 9:27 AM, Krzysztof Kosiński wrote:
Hello I noticed that Jon Cruz is removing forward headers and replacing them with hand-written forward declarations. This is completely backwards! Jon why are you doing this?
Quite simple: I'm not undoing the work, I'm finishing it.
Forward declarations are good, but forward declaration files are not really.
One main rule we have in play here is "Don't include headers you don't need. Especially in headers." And often also "make headers independent".
There are two main benefits:
reduces compile time by avoiding pulling in additional headers.
minimizes rebuild times when one header is changed.
So we remove #including of detailed extra headers. That is good. However, if we replace that with "let's include this *different* file instead", then we lose much of the benefit. Though that one forward.h might compile a little faster, it still has to be checked and consulted. Also it introduces an artificial dependency that when touched will cascade out and cause more files to be compiled than should. (The main example we have in our source code of such cascading is verbs.h - touch that and the whole world rebuilds)
The full solution is to just forward-declare the single items *used* within a given header. No extra #includes need to be added. It also makes those .h files independent of each other.
Take a look at the changes I was making. Among other things you should see that though there might be dozens of declarations in a forward.h, any given .h file only needs perhaps one or two of those.
Then over time the forward.h files tend to accumulate cruft. If the have outdated or sometimes wrong declarations, or unused ones, they just add to the potential for slowdown or error. On the other hand if forward declarations are in a file that needed them, one can very trivially search that one .h file to see if they are used. If not, they can be removed with no problem. Much easier to keep in sync and free of cruft. And safer to do so also (especially when one considers conditional or platform specific code). And safer code is good code.
Another potential problem is the naming. A work-around is to prefix a forward.h with some other name, but we still end up with at least a few potential collisions. On the other hand, if each .h file is independent with it's own few declarations, then there does not even exist the possibility of needing a different file with the same name.
So picture a specific method being added that takes a reference to some type. One approach is for the coder to add a forward declaration for that type in the .h file that he is also adding the method to. One file touched, and only consumers of that specific file get rebuilt.
The alternative is to place the type in a forward.h. So in that case the coder has to edit the .h to add a new method, also add a new #include line to that same .h and expand it's dependencies, and finally has to also edit the pertinent forward.h file to add a declaration for that type. At least two .h files get changed, but since one is a .h file that is pulled into many other .h files *and* .cpp files (though .cpp files should never use them, they often end up accidentally having them). So now many dozens of .cpp files get rebuilt in response to what should have been an isolated case.
To be fair, that scenario is a bit less frequent, but we have a choice of following a practice that ensures it *never* gets to happen vs. one that let's it *sometimes* happen, we should always go for the code enforcing the safer of never allowing the problem.
I guess there will always be discussion on what is "best" with these forward declarations. Personally, I hate it when there are all these meaningless forward declaration lines at the start of a file that I have to scroll past. I find that 'cleaning up' code one way or the other is simply a waste of time that would have been better spent otherwise. (like the "@brief" removal that I do not understand; if anything, it makes doxygen stuff more clear to new-comers...)
Please do not remove the awesome "#include <2geom/forward.h>". It is very concise, and much better than handwriting the whole namespacing etc...
Ciao, Johan
On Oct 4, 2011, at 1:07 PM, Johan Engelen wrote:
I guess there will always be discussion on what is "best" with these forward declarations. Personally, I hate it when there are all these meaningless forward declaration lines at the start of a file that I have to scroll past. I find that 'cleaning up' code one way or the other is simply a waste of time that would have been better spent otherwise. (like the "@brief" removal that I do not understand; if anything, it makes doxygen stuff more clear to new-comers...)
Actually the @brief tags makes it harder for newcomers, and also different problems for the developers needing to maintain and add such comments. Aside from helping the newbies on IRC, Twitter, etc. get going, I've been doing this sort of thing at my day job for years now. Most of that includes coordinating with personnel not only in different timezones, but also spread across many continents. So even though my experience is not solely from open source, the working environment has been very similar.
The reason I'm spending time on the 'waste' is that I am trying to get our code documented better, and restructuring from the legacy C approach to the C++ that has replaced it is a big part.
In order to get a good architectural overview, Doxygen output is a key tool. And if we don't have good Doxygen output, it's hard to see what's going on. It's also hard for newcomers to use it.
So the effort is part of the overall work to have diagrams, charts, maps of where to find what, etc., that will be exactly what newcomers need. And I should also be able to get some templates, guidelines, etc. to help the rest of us gain maximum use with minimal effort.
Oh, and for a more quantitative evaluation, I ran a spot-check shortly before finishing the cleanup of src/forward.h and came up with the following:
65 total spot-checked files
22 removed forward.h and needed 1 forward declaration added 19 removed forward.h and needed 0 forward declarations added 8 removed forward.h and needed 3 forward declarations added 7 had no forward.h but needed 1 forward declaration added 6 removed forward.h and needed 2 forward declarations added 1 had no forward.h but needed 1 forward declaration added 1 removed forward.h and needed 4 forward declarations added 1 removed forward.h and needed 7 forward declarations added
So we have about two-thirds of the files where there was no real need, and almost half of that was due to forward.h being included when there was no need at all. Then only one file of the total would be considered potentially problematic and worth looking into.
(For running the check, I had each .h file individually compiled into a .cpp where including that .h was the only thing done.)
Please do not remove the awesome "#include <2geom/forward.h>". It is very concise, and much better than handwriting the whole namespacing etc...
That's a slightly different problem.
For that, you are looking at an 'I want to use some library and not worry' situation. For that purpose, I'd say that <2geom/forward.h> is a bit misleading. A better approach would be to narrow down to some specific types. Or if you really want the convenience of using the library without needing to worry, look for a full-name kind of include-the-whole-world file. Ah, there we go with <2geom/2geom.h>
*However* often actually being able to *use* a library will not be safely served via forward includes. To save space in one's header, a few sub-component includes might be more helpful. However, even 2geom's current forward.h appears to be broken.
If I look at effect.h (a true case of a somewhat 'interesting' header), there are a few causes for concern. As expected, it does have a few stale declarations that should be removed. However, another one is that a lot of use of forward declared types abounds, including in std::vector. In the general case this is not correct code, and may not work (depending on the compiler, etc.)
http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-39.14
However, in this specific it turns out that other #includes already pulled in the full files. I did a quick spot check and found that one can remove the #include of 2geom/forward.h completely and other #includes already have things covered. Of course, I don't encourage this, but it is a clear sign that we might have a lot of code pulling in more than we expect, and that some forward declarations are only working due to side-effects of other #includes.
At some point we should look into such aspects in more detail.
2011/10/5 Jon Cruz <jon@...18...>:
In order to get a good architectural overview, Doxygen output is a key tool. And if we don't have good Doxygen output, it's hard to see what's going on. It's also hard for newcomers to use it.
This is a non-argument because removing the brief tag does not change the Doxygen output.
So the effort is part of the overall work to have diagrams, charts, maps of where to find what, etc., that will be exactly what newcomers need. And I should also be able to get some templates, guidelines, etc. to help the rest of us gain maximum use with minimal effort.
Also completely unrelated to what you've been actually doing. These things would be valuable, and I'm all for you doing them, but removing forward headers and @brief is at best unnecessary and at worst counterproductive.
For that, you are looking at an 'I want to use some library and not worry' situation. For that purpose, I'd say that <2geom/forward.h> is a bit misleading. A better approach would be to narrow down to some specific types. Or if you really want the convenience of using the library without needing to worry, look for a full-name kind of include-the-whole-world file. Ah, there we go with <2geom/2geom.h> *However* often actually being able to *use* a library will not be safely served via forward includes. To save space in one's header, a few sub-component includes might be more helpful. However, even 2geom's current forward.h appears to be broken.
Writing forward declarations for template types such as std::ostream is non-trivial, and in fact the declarations are often an implementation detail. The same is the case for Geom::Rect. This is why we have <2geom/forward.h> - to allow the library to provide its forward declarations, rather than the user, so that people might use Geom::Rect const & without having to include the entire definition of this type or writing 2Geom implementation details in their headers. <2geom/forward.h> is not broken.
Regards, Krzysztof
On Oct 6, 2011, at 12:49 PM, Krzysztof Kosiński wrote:
So the effort is part of the overall work to have diagrams, charts, maps of where to find what, etc., that will be exactly what newcomers need. And I should also be able to get some templates, guidelines, etc. to help the rest of us gain maximum use with minimal effort.
Also completely unrelated to what you've been actually doing. These things would be valuable, and I'm all for you doing them, but removing forward headers and @brief is at best unnecessary and at worst counterproductive.
Oh, no. It's not unrelated at all.
For many years now I've used doxygen to generate overviews of projects to discover the structure. I need to get good up to date structure docs done on our codebase. Without clean comments it's difficult to get doxygen to give its best output. And without good doxygen output it is harder to be able to see and document the structure.
Also, to be counter productive, the @brief tags would really have to add something to the documentation. From my experience getting doxygen and such adopted at different companies, I've only ever seen @brief get in the way, and not help.
On Oct 6, 2011, at 12:49 PM, Krzysztof Kosiński wrote:
Writing forward declarations for template types such as std::ostream is non-trivial, and in fact the declarations are often an implementation detail. The same is the case for Geom::Rect. This is why we have <2geom/forward.h> - to allow the library to provide its forward declarations, rather than the user, so that people might use Geom::Rect const & without having to include the entire definition of this type or writing 2Geom implementation details in their headers. <2geom/forward.h> is not broken.
Ah, this is actually a good point, and highlights what I mean.
If one wants to use Geom::Rect, one can just include <2geom/rect.h>. For more complex declarations, especially templates, it is often the most efficient way. To have something else, such as merely a complex forward declaration, we'd usually need some measured performance gain to adding it. For simple types, then forward declarations in a independent file most often do not bring much.
Additionally, for types used by other headers in a library one can just have those library headers use their own forward declarations. Then a user automatically gets the ones they might need pulled in for free, and don't need to explicitly ask for things.
For the simple case, having a <2geom/2geom.h> that pulls in everything makes it simple to use.
Also... we often find in analyzing code that something attempting an optimization via forward declare headers actually ends up pulling in the full headers anyway. So any intended compilation benefit is lost, and only an extra include gets added.
2011/10/7 Jon Cruz <jon@...18...>:
On Oct 6, 2011, at 12:49 PM, Krzysztof Kosiński wrote:
Writing forward declarations for template types such as std::ostream is non-trivial, and in fact the declarations are often an implementation detail. The same is the case for Geom::Rect. This is why we have <2geom/forward.h> - to allow the library to provide its forward declarations, rather than the user, so that people might use Geom::Rect const & without having to include the entire definition of this type or writing 2Geom implementation details in their headers. <2geom/forward.h> is not broken.
Ah, this is actually a good point, and highlights what I mean.
If one wants to use Geom::Rect, one can just include <2geom/rect.h>. For more complex declarations, especially templates, it is often the most efficient way. To have something else, such as merely a complex forward declaration, we'd usually need some measured performance gain to adding it. For simple types, then forward declarations in a independent file most often do not bring much.
Additionally, for types used by other headers in a library one can just have those library headers use their own forward declarations. Then a user automatically gets the ones they might need pulled in for free, and don't need to explicitly ask for things.
For the simple case, having a <2geom/2geom.h> that pulls in everything makes it simple to use.
Also... we often find in analyzing code that something attempting an optimization via forward declare headers actually ends up pulling in the full headers anyway. So any intended compilation benefit is lost, and only an extra include gets added.
Can this discussion die in a fire please? Johan & Krzysztof have expressed their desire to not modify 2geom in the way that Inkscape has recently been modified. Given that 2geom is a separate project and they are probably currently the 2 most active members working on it, it seems like the discussion is done since you did what you felt was best for Inkscape and we don't modify our in-tree copy of 2geom as a policy.
Cheers, Josh
P.S. This is where Johan and I are so on-page with each other... if people put half the time into coding that they did into other things like useless discussion and whitespace cleanup of files they didn't already happen to be working on, we'd be so much further along.
On Sat, Oct 8, 2011 at 3:52 PM, Josh Andler wrote:
Can this discussion die in a fire please? Johan & Krzysztof have expressed their desire to not modify 2geom in the way that Inkscape has recently been modified. Given that 2geom is a separate project and they are probably currently the 2 most active members working on it, it seems like the discussion is done since you did what you felt was best for Inkscape and we don't modify our in-tree copy of 2geom as a policy.
/me throws a firebomb
Everybody out! :)
Alexandre Prokoudine http://libregraphicsworld.org
W dniu 4 października 2011 21:48 użytkownik Jon Cruz <jon@...1672....> napisał:
There are two main benefits:
- reduces compile time by avoiding pulling in additional headers.
Non-issue. The majority of compile time is spent on optimizations, not file I/O or parsing. If you want to improve your build times, disable optimization or use Clang.
- minimizes rebuild times when one header is changed.
On the other hand, if the forward declaration of some type changes (for example, it becomes a template or a typedef instead of an ordinary class), you have to change all files which use the type.
So we remove #including of detailed extra headers. That is good. However, if we replace that with "let's include this *different* file instead", then we lose much of the benefit. Though that one forward.h might compile a little faster, it still has to be checked and consulted. Also it introduces an artificial dependency that when touched will cascade out and cause more files to be compiled than should. (The main example we have in our source code of such cascading is verbs.h - touch that and the whole world rebuilds)
verbs.h is not a forward declaration header.
Take a look at the changes I was making. Among other things you should see that though there might be dozens of declarations in a forward.h, any given .h file only needs perhaps one or two of those.
So maybe you should start replacing <glib.h> in files which use only a few GLib functions with the prototypes of those functions?
Then over time the forward.h files tend to accumulate cruft. If the have outdated or sometimes wrong declarations, or unused ones, they just add to the potential for slowdown or error. On the other hand if forward declarations are in a file that needed them, one can very trivially search that one .h file to see if they are used. If not, they can be removed with no problem. Much easier to keep in sync and free of cruft. And safer to do so also (especially when one considers conditional or platform specific code). And safer code is good code.
This is actually an argument in favor of forward headers! You run at a much greater risk of having outdated forward declarations if you put them directly in the header using them. What's easier - changing a forward declaration in one place or changing it in 20 places?
So picture a specific method being added that takes a reference to some type. One approach is for the coder to add a forward declaration for that type in the .h file that he is also adding the method to. One file touched, and only consumers of that specific file get rebuilt.
The alternative is to place the type in a forward.h. So in that case the coder has to edit the .h to add a new method, also add a new #include line to that same .h and expand it's dependencies, and finally has to also edit the pertinent forward.h file to add a declaration for that type. At least two .h files get changed, but since one is a .h file that is pulled into many other .h files *and* .cpp files (though .cpp files should never use them, they often end up accidentally having them). So now many dozens of .cpp files get rebuilt in response to what should have been an isolated case.
Your example is bad. If you are adding a method that uses some type by reference, you almost never have to add this type to the forward header, because it's already there! Additionally, the forward declaration might be non-trivial (e.g. namespaced) or even an implementation detail (e.g. template types which are only supposed to be used as typedefed secializations, such as Geom::Rect).
This problem only applies if you are adding a new type to the library that has the forward header, which is a much less frequent occurrence.
Regards, Krzysztof
participants (5)
-
Alexandre Prokoudine
-
Johan Engelen
-
Jon Cruz
-
Josh Andler
-
Krzysztof Kosiński