Visual Analytics for Understanding the Evolution of Large Software Products - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Visual Analytics for Understanding the Evolution of Large Software Products

Description:

Visual Analytics for Understanding the Evolution of Large Software Products – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 68
Provided by: lucian
Category:

less

Transcript and Presenter's Notes

Title: Visual Analytics for Understanding the Evolution of Large Software Products


1
Visual Analytics for Understanding the Evolution
of Large Software Products
  • Alexandru Telea
  • University of Groningen, the Netherlands

2
Software life cycle
release
refactor
migrate
end lifecycle
start
?
?
?
?
Time
Corrective (repair bugs)
25
Perfective (new features)
50
Adaptive (new framework)
25
Design
Implementation
Testing
Analysis
10
15
30
45
Goal reduce development and maintenance costs,
increase quality Focus reduce testing time in
development support informed, efficient
decision-making in releasing, refactoring,
migration
3
Problems in the software industry
  • software is outsourced, gets older, more complex
    ? quality decreases
  • software size, team size, and complexity ? all
    increase
  • time-to-market decreases ? quality decreases
  • defect removal costs increase exponentially from
    introduction time
  • management decisions are based on subjective
    information

4
1. Problem statement
  • Maintenance facts
  • thousands of files, hundreds of developers, many
    years
  • knowledge lost, bugs created as software evolves
  • costs gt80 of the entire software lifecycle
  • 40 of maintenance is spent in understanding
    software
  • Goal support maintenance by analyzing evolution
    data
  • mine relevant facts from software repositories
  • analyze, correlate, filter facts
  • support questions with easy-to-use tools

5
Software Analytics
  • Visual Analytics the science of analytical
    reasoning facilitated by interactive visual
    interfaces Thomas, 2001
  • Software Analytics the application of visual
    analytics to the understanding, maintenance,
    assessing, and evolution of software Telea,
    2008


client environment
Research Development


Management
Software
increase productivity and quality
decision making support
Analysis tools
6
Software Analytics Framework
client environment
query and mining engines
central fact database
software repositories (CVS, Subversion, )
interactive visualization tools
fact extraction engines
7
Involved Techniques
graph layouts
software metrics
static analysis
pixel-filling layouts
treemaps
code flows
Let us see all these next!
8
Trend Analyzer
  • get data from repository (SVN, CVS, CM/Synergy,
    )
  • use a simple 2D layout to show version attributes
  • answer questions by sorting, coloring, and
    clustering files

Let us see a simple demo!
9
Trend Analysis Evolution at file level
file
time (version)
  • unit of analysis file (not finer-grained)
  • shows evolution of 1..3 per-file metrics
  • correlate inter-file metric changes

10
Trend Analysis Evolution at line level
lines
time (version)
  • unit of analysis individual lines (not
    coarser-grained)
  • shows insertions, deletions, constant code
    blocks
  • cannot show drift/merge sensitive to syntax
    details

11
Trend Analysis Evolution at block level
The WinDiff tool
line groups
version
detail
  • unit of analysis line blocks (as detected by
    diff)
  • shows insertions, deletions, constant blocks,
    drift
  • cannot handle more than 2 versions

12
Trend Analysis Evolution at syntax level?
Goal We would like a technique that handles
all events inserts, deletes, constants, merges,
splits, drifts
1
13
Trend Analysis Evolution at syntax level?
Goal We would like a technique that handles
all events inserts, deletes, constants, merges,
splits, drifts can handle more versions of a
file (2..20)
1
2
14
Trend Analysis Evolution at syntax level?
Goal We would like a technique that handles
all events inserts, deletes, constants, merges,
splits, drifts can handle more versions of a
file (2..20) is insensitive to small or
irrelevant program changes, e.g. comments,
identifier renaming, declaration order
1
2
3
15
Trend Analysis Evolution at syntax level?
Goal We would like a technique that handles
all events inserts, deletes, constants, merges,
splits, drifts can handle more versions of a
file (2..20) is insensitive to small or
irrelevant program changes, e.g. comments,
identifier renaming, declaration order works
between the line and file level-of-detail, as
specified by the user
1
2
3
4
Lets see next how to do this!
16
Code Matching
Idea Use code matching techniques Auber et al.,
07 Chevalier et al., 07
  • given N versions of a file f1... fN
  • extract their syntax trees T1... TN
  • construct correspondences between all pairs Ti ,
    Ti1

hash all nodes u ? Ti , v ? Tj into equivalence
classes using d(u,v) 1 dtyp(u,v)dstr(u,v)
1
(d(u)-d(v))2 (m(u)-m(v))2 (s(u)-s(v))2
structural distance between subtrees at u,v
0, if u,v have same type, else 1
type distance between u,v
find best matches between subtrees in same class
2
17
Code Matching
Example
  • two matches a are found class A and the for
    loop
  • matches between matched children are not
    considered
  • unmatched nodes represent insertions and
    deletions (E,H)

18
Visualization
  • OK, now we have the matches how to visualize
    them?
  • draw syntax trees using a cushioned icicle plot
  • compact usage of screen space
  • good for correspondence visualization (next)

classical tree drawing
cushioned icicle plot
19
Correspondence drawing
  • mirror icicle plots against previous and next
    version
  • connect matched nodes with spline tubes

20
Correspondence drawing
  • use translucent cushion-like texture along tubes
  • diminishes visual clutter
  • draw opaque 3-pixels fixed-width tube axis
  • guarantees visibility

transparence texture
luminance texture
21
Structure tracking
  • how to follow the evolution of a code fragment
    over N versions?
  • code tracking algorithm
  • connect each matched node with its children ( -
    - - - )
  • together with correspondences, we have now a
    flow graph G
  • assign a color to each n ? Ti which is not
    matched in Ti-1
  • propagate colors downstream in G
  • at merges, mix colors weighted by tree size
  • repeat process upstream from sinks

1
3
4
5
downstream propagation
upstream propagation
22
Results
original method Chevalier et al
improved method
23
Results
original method Chevalier et al
improved method
24
Visualizing events of interest
  • Insertions and deletions
  • appear as white gaps between tubes
  • Splits and merges
  • define a labeling

where
1
2
1
N/2
N
  • a split occurs from version i to i1 if

and
f gt 5 means n,m aresplit apart
kmin ? 1,2 means n,m are in same code fragment
25
Example application
  • real-world C code base (6000 lines), 45
    versions
  • zoom-in on 6 versions of interest
  • complex constructions (e.g. templates) and
    evolution changes
  • added noise random identifier renaming, spaces,
    layout changes
  • Visual enhancements
  • color matched code fragments in gray
  • mark splits and merges with icons

26
Example application
27
Example application
code shrinks with 10
28
Example application
a method f gets split
a small fragment drifts to end
29
Example application
surviving code of f
a method f gets split
30
f undergoes now many changes
surviving code of f
but stays constant from now on
a method f gets split
31
Example application
two fragments get swapped
32
Example application
and the getswapped oncemore
33
Example application
and there is a third swap (if you look carefully)
34
Code flows Summary
  • visualization and detection of code evolution
    events
  • emphasis on structure at syntax level (between
    lines and files)
  • detect and show evolution events (split, merge,
    drift, ...)
  • scale to thousands of lines, 10..20 versions
  • Future work?
  • multiscale visualization
  • add code metrics atop structure
  • show more than just correspondence relations

35
Now a large-scale application
  • Situation
  • client established embedded SW producer
  • product 8 years evolution (2002-2008)
  • 3.5 MLOC of source code (1881 ?les)
  • 1 MLOC of headers (2454 ?les)
  • 15 releases
  • 3 teams 60 people (2 x EU, 1 x India)
  • product failed to meet requests, at end
  • Questions
  • what happened right/wrong?
  • what can SW archive tell us? (post-mortem)
  • can such lessons be used in the future?

36
Methodology
  • Create a number of data visualizations
  • try to spot attribute correlations and data
    trends
  • discuss the relevant images with project team
  • For each visualization
  • Team is invited to derive one or more findings
  • what can you read from it?
  • Present our own findings
  • We discuss differences

37
a1. Team structure Code ownership (findings)
Number of developers
1
8
Red modules contributions from more than 8
developers
38
a1. Team structure Code ownership (findings)
OSPR1C1.c
PRDT1C3.c
INSCL1C2.c
SNSAD1C1.c
39
a2. Team structure Team assignment
Number of Modification Requests (MRs)
1
30
Some modules have many red(dish) files
40
a2. Team structure Team assignment
Team A
Team B
7 of the 11 red(ish) modules are assigned to the
red team
Team C
41
a2. Team structure Team assignment
Team A
Many strategic/problematic components (70) are
outsourced (to India Team A) This team is
responsible for many MRs!
Team B
Team C
42
a1. Product requirements Impacted areas
Time
329 Files
MR related check-in
R1.3 - start
Little increase in the file curve Many in
files that existed before R1.3 started
43
a1. Product requirements Impacted areas
  • Few new files added in R1.3 most
    activity/changes in old files
  • Indication of (too) long maintenance closure
    of requirements

44
a2. Product requirements MR duration
Time
Ex Number of file commits referring to MRs with
IDs in the range 4700 - 4800)
In mid 2008, activity related to MRs addressed in
2006-2008 still takes place
MR id range (4000 5000 grouped on hundreds)
45
a2. Product requirements MR duration
MR id
  • MRs have historically had a (too) large duration
  • Helps us empirically predict closure time for
    ongoing/future requirements

46
b1. SW Architecture Dependency graph
Most use dependencies go to files within the
IFACE module, basicfunctions, and platform
(system headers)
uses
Is used
uses call, type, variable, macro,
47
b1. SW Architecture Dependency graph
Without the IFACE, basicfunctions, and platform
modules
We discovered several unwanted dependencies
uses
Is used
48
b1. SW Architecture Dependency graph
Most module interaction takes place via the
interface package (IFACE module) and via the
basicfunctions and platform packages
Yet, some modules are accessed directly, outside
the interface domain (this is not desired)
49
b2. SW Architecture Call graph
Many connections between each package and most of
all other packages
50
b2. SW Architecture Call graph
Show only call relations between modules that are
mutually call-dependent
  • Many modules (in different packages) are
    mutually call-dependent
  • This is not an ideal situation

51
b2. SW Architecture Call graph (findings)
High coupling at package level
Not a strict layering in the system For ex, the
OSPR module is highly entangled with the rest of
the system
52
b3. SW Architecture metrics
53
b3. SW Architecture metrics (findings)
  • at function level, evolution stability is
    achieved in terms of fan-in / fan-out
  • also, other typical complexity-related metrics
    grow (sub)linearly in time
  • exploding size is likely not the cause of
    maintenance/evolution difficulties
  • strengthens our beliefs suboptimal team
    structure and SW architecture

54
c1. Source code testing complexity
Average complexity per method is higher than 20
Total complexity increased in R1.3 with 20
55
c1. Source code testing complexity (findings)
Module testing requires high effort to obtain a
good coverage of code
New tests have to be added / old tests updated.
Testing complexity increases
56
c2. Source code external duplication
Connections module pairs that contain blocks of
near-similar code of over 25 LOC
Few connections, so little external duplication
57
c2. Source code internal duplication
Number of duplicated blocks
1
60
Few modules have some red files little internal
duplication
58
c3. Source code metrics (LOT)
Number of lines of text
1
2500
Some modules have a high percentage of red files
59
c3. Source code metrics (LOC)
Number of lines of code
1
1500
Same modules have a high percentage of red files
60
c3. Source code metrics (McCabe)
McCabe complexity
1
500
Same modules have a high percentage of red files
61
c3. Source code metrics (findings)
  • The file size in LOT is a good indication for
    size file in LOC
  • which is a good indication for complexity in
    the case of relative assessments
  • This was noticed by other researchers too

LOT
LOC
McCabe complexity
62
c4. Source code criticality
Number of MRs
Average MR closure
1
30
1
90 days
a)
b)
Average MR propagation (teams)
Average MR propagation (files)
1
30
1
3
d)
c)
63
d1. Documentation
Time
1688 other supporting files
854 doc html
64
d1. Documentation
65
d2. Documentation
Time
Files
Activity heat map
66
d2. Documentation
67
Conclusions
  • Software Evolution Analysis
  • an extremely rich field, at its beginnings only
  • wealth of information sources
  • clear interest from both researchers and
    industry
  • Important points
  • scalable, easy-to-use tools are absolutely
    essential
  • integrating multiple information types is hard
  • visual analytics is an excellent aid!

Thank you! a.c.telea_at_rug.nl
Write a Comment
User Comments (0)
About PowerShow.com