Buy vs Build: Clean copy-paste function cost estimate
Published March 29th, 2022
Everyone loves shortcuts – and ‘copy-and-paste’ is one of the best. It’s the most popular, simple method of moving and reproducing text (or other content) from a source to a destination. But it has its downsides.
Communications Specialist at Tiny
Despite its inherent promise of ease, too often things don't work out cleanly.
Unbeknown to you, instead of the content you’re moving looking the same in its destination (a rich text editor) as it did in its source (MSWord, Google Docs or Excel), extra HTML is carried across in the background – giving weird results. Your original formatting, images, styles and other attributes are lost, and hours are wasted either replicating what you’d already spent hours creating, or you’re waiting for dev team support to fix the weirdness.
The ctrl-c and ctrl-v function may feel like a reflex action, but building a best-in-class, feature rich copy-paste feature that produces error-free content in your rich text editor (destination) is far from easy, simple, or straightforward. So what’s involved in building that functionality? And how much would it cost versus buying a third-party component and assembling it as part of your tech stack? Let’s find out.
New to the idea of resolving your copy-paste woes? Read more...
What does an advanced copy-paste function do?
An advanced copy-paste plugin helps users cleanly transfer content from its source to the rich text editor (the destination). Ideally, it should automatically parse the content for security vulnerabilities, remove unnecessary style elements as well as generally clean up and modernize the background HTML.
At its most basic level, the tool cleans up pasted content to ensure it’s correct, accessible, secure and clean. What does that mean?
- Correct = it’s well-formed HTML and CSS.
For example, the plugin must ensure there’s no tags that aren’t closed properly, or tags nested in other tags in ways they shouldn’t be, by modern HTML standards.
- Accessible = The content follows best practice guidelines regarding accessible HTML and is structured such that it can be read well by a screen reader.
- Secure = It’s been sanitized to prevent any potential security risks.
In and of themselves, both the sanitization and parsing processes are secure.
- Clean = Extraneous HTML tags and CSS statements have been removed.
The plugin must ensure it’s the minimal (best practice) HTML and CSS required to correctly represent the information, thereby making editing it easier and reducing the risk of weird behavior.
Complexities of building an advanced copy-paste feature
Building a copy-paste feature looks simple, purely because most of us only see what’s on the screen (ie WYSIWYG functionality). But in the background, numerous things need to happen for the copying and pasting to deliver an error-free replication of the source material, in its destination.
Things that need to be considered are:
For each source the feature handles, it needs to have inbuilt filtering that’s specific to that source (e.g. PowerPaste has filtering for MSWord and Excel, GDocs, general HTML, plain text and images).
Handling and updating paste sources are an ongoing challenge. This work continues throughout the feature’s life (both during maintenance and extensibility work) – contributing greatly to its overall total cost of ownership (TCO) via both dollar-cost and person-effort.
Each source has different considerations:
Plain text and images are fairly simple to handle.
General HTML is simple at a basic level. But on a deeper level, there’s browser inconsistencies and a lack of standardization of what is ‘good’ HTML.
Therefore, every website and web-based app may use slightly different HTML structures, so the copy-paste feature must set guidelines for:
- What it supports
- What it doesn’t support
- How best to handle those things that are unsupported
GDocs HTML is specific to GDocs. It’s more akin to normal HTML than MSWord, but it’s also a webapp that can change at any time. It isn’t versioned, and it doesn’t have a changelog or release notes.
Therefore the copy-paste feature not only has to figure out how to support it initially, but then it also must be monitored on an ongoing basis for unannounced changes within GDocs.
Every MSOffice app has a different HTML structure. And each version, of each Office app can have different HTML. In addition, Office Online is different to the desktop Office apps. Therefore, the list of software that has different HTML is extensive, and the copy-paste feature needs to define what version, apps and platforms are supported (and for how long), and then handle those differences between them.
The copy-paste feature’s MSOffice parsing can also be built so that it adequately (but not perfectly) handles other Office apps – such as Outlook – and other versions, such as Word Online
Word HTML has oddities (e.g. lists are paragraphs that are styled to look like lists), so it’s not a simple matter of just cleaning up the HTML by checking for non-closed tags etc. Instead, the copy-paste feature needs to entirely transform the content, to shift it from MSWord’s idea of HTML to standard HTML.
MSWord also has an extensive list of features – from basic text to lists, tables to image editing (crop, rotate, etc.), from comments to fancy styling. Each of those features needs to be considered and paste source support either provided or not, with further areas of support considered, and possibility added in later releases.
Finally, there are uncontrollable aspects such as MSWord using RTF data for images, and that the browsers all have limits on how much RTF data they can grab from the clipboard at a time. Past that point the browser refuses to paste if the user has copied a document that has too much RTF data (i.e. too many images, or the images are too big, etc.).
If you’re not careful, grabbing data off the clipboard, then sanitizing, parsing, and transforming it, and inserting that back into the DOM can be a slow process. Therefore, optimisation needs to be considered and scoped from the beginning.
- Handle clipboard data in a secure manner
- Prevent attacks via the copy-paste feature itself
- Prevent attacks via malicious content being copied in by an unknowing user.
Different users may want different output, so settings (or modes) need to be built into the feature that control certain aspects (e.g. a ‘clean’ and ‘merge’ mode that removes or keeps CSS styles respectively).
What functionality is crucial to an advanced copy-paste feature?
An advanced copy-paste function must:
- Perform security functions, as above
- Handle HTML, plain text and image paste:
– Technically, GDocs and MSWord could go via a HTML path since they do put HTML on the clipboard. It just might come out quite badly, and be hard for users to work with, since the HTML from these applications can be quite different to standard HTML
- Stretch goals for an advanced copy-paste feature are:
– Specific source handling such as MSWord and GDocs
– Specific MS Office settings (as above)
What expertise is required to build a brilliant copy-paste function?
- Deep knowledge of modern HTML and CSS standards
– To be able to set guidelines and goals
– To be able to understand the various ways different applications and websites can represent content
- HTML and CSS parsing and transformation
- HTML, CSS and JS security and sanitisation
- Deep knowledge of each supported paste source – e.g. MSWord, GDocs – how each works, and knowledge of changes/updates made to each source
- Browser clipboard functionality, the various APIs, and the limitations thereof (such as RTF data limits)
Cost Estimate for an Advanced RTE Copy-paste Feature
Building an advanced copy-paste feature, doesn’t start and end with the development and building phase. It also requires ongoing maintenance and extensibility work through the life of the feature.
All of these complexities and interactions need to be factored into the total cost of ownership (TCO) of the plugin.
COST ESTIMATE CURRENCY
All cost estimates quoted are in US$
This includes development/build work, maintenance and extensibility work
Build/Development – 1 x RTE Copy-paste Feature Cost Estimate (excl. core editor)
Using a normalized COCOMO Model, the estimated engineering requirements for building an advanced copy-paste feature, using:
- A Senior Software Engineer
- Average salary rate (US$128,749** p/yr excluding oncosts, RSUs and bonuses)
- 39836 lines of code (The total LOC includes 23085 LOC for the plugin itself, as well as 16751 LOC for the dependent libraries that are maintained ongoing, as part of the feature)
- 169.1 person-months = 18.8 months, using 11 developers
- Excluding ongoing maintenance and extensibility work
EQUALS = $1,814,399 in development cost
Advanced RTE Feature COCOMO Modeling
Software Development (Elaboration and Construction)
total equivalent size
effort adjustment factor (EAF)
Acquisition Phase Distribution
It should be noted that the above $1.8M estimate for a single advanced copy-paste feature, excludes the additional support costs required for the development of a product ready feature:
A full time Senior Product Manager***
- During the Inception/Discovery Phase
- Throughout the 169.1 person-months of the project
A full time Senior Product Designer****
- 1 week during the project
Curious about the cost of building your
own rich text editor?
Read the total cost breakdown in: Buy vs Build White Paper
Plus, get the cost breakdown of building, maintaining
and extending advanced features:
Ongoing Maintenance – 1 x RTE Copy-paste Feature Cost Estimate (excl. core editor)
Engineering time and resources are required to both maintain and evolve the feature to keep pace with market, feature and related plugin changes.
For maintenance work, expect to dedicate a minimum of:
- 2 person-months full time per year, every year of the plugin’s life, of Senior Software Engineer** resources, to keep the copy-paste feature afloat.
- Average salary rate (US$128,749** per year excluding oncosts, RSUs, and bonuses) or $10,729.08 per person-month
- $21,458** per year in maintenance cost for this single copy-paste feature, ongoing
This work would include bug fixes, other tasks and various required maintenance.
- A full time Senior Product Manager*** throughout any maintenance work.
- A full time Senior Product Designer**** sporadically during maintenance work
What maintenance work is likely to be required?
As you can see, building your own copy-paste feature isn’t quite as easy as it seems on the surface, let alone keeping up with maintenance such as:
- Fixing bugs as users report them
- Browser changes to HTML, CSS and clipboard API changes
e.g. At one time Safari changed how it represented image data on the clipboard, and code needed to be added to specifically handle images on Safari
- Changes to HTML and CSS standards
- Changes to MSWord versions, GDocs, etc.
- Relevant security changes
e.g. New clipboard-based attacks
The additional ongoing maintenance, infrastructure and QA requirements for all rich text editor advanced features, include:
- Continual testing and keeping development infrastructure up to date.
This is the biggest consumer of engineering time and cost, to ensure the engineering team avoids the accumulation of technical debt (due to dependencies not being kept up to date and having to constantly play catchup).
- Having sufficient licensing across all supported platforms (eg all the versions of MSWord, for testing)
- Balancing the prioritization of plugin maintenance over primary features (productivity vs core product revenue).
Long-term Extensibility – 1 x RTE Copy-paste Feature Cost Estimate (excl. core editor)
Extensibility work isn’t always easy to predict or plan. However, the one certainty is that users will demand feature upgrades and extended functionality, as new technologies continue to develop.
For extensibility work, expect to dedicate a minimum of:
- 6 person-months of full time work every year of the plugin’s life, of Senior Software Engineer** resources, to account for larger changes in MS Word and Google Docs.
This is required due to Microsoft, Google and Apple regularly changing their apps, which frequently breaks the paste feature, and the dev team must be able to react quickly to keep pace.
- Average salary rate (US$128,749** per year excluding oncosts, RSUs, and bonuses) or $10,729.08 per person-month
- $64,374** per year in extensibility costs for this single copy-paste feature, ongoing
This extensibility work includes feature requests and extensions and well as maintaining pace with market developments and changes in user needs.
- A full time Senior Product Manager*** throughout any extensibility work.
- A full time Senior Product Designer**** sporadically during extensibility work.
What extension work is likely to be required?
- Adding support for new MSWord and GDocs features (that weren't supported in prior version)
- Adding support for new paste sources – refer ‘Paste Sources’ section
- Adding support and fixing bugs for new use cases
- Upgrading the feature when there is a major update to any of the underlying APIs or code within any of the document processors
TOTAL COST ESTIMATE for an RTE Copy-paste Feature (excl. core editor)
- $1,814,399 build/development cost
- $21,458** per year in maintenance cost, ongoing
- $64,374** annually in extensibility cost, ongoing
$1,921,689 TOTAL COST
Note: All estimates exclude on-costs, RSUs and bonuses
* Using the Basic COCOMO Model (Accessed 1 July 2022)
**Average base salary for a Senior Software Engineer is US$128,749 per year in Silicon Valley, CA
*** Estimated base salary for a Senior Product Manager is US$157,340 per year in Silicon Valley, CA
****Estimated base salary for a Senior Product Designer is US$145,700 per year in Silicon Valley, CA
(All salary rates accessed 1 July 2022)
A person-month is equivalent to approximately 160 hours of labor, and is the amount of work performed by a single average worker in one month (ie. 12 person-month project will take 4 developers 3 months work to finish). A person-year is the total effort in person-months divided by twelve, to estimate the project length in years.
Comparing spell checking plug-ins
When comparing copy-paste features across the various rich text editors, not all of their features are deemed to be equal. Here’s a comprehensive breakdown undertaken across all the copy-paste capabilities of three popular editors, to see the one that works best for a specific use ase: Under Pressure – PowerPaste
Comparing rich text editors
Choosing a rich text editor (RTE) to use within your SaaS product, web application, CMS, LMS, email marketing or internal workspace, isn’t a simple decision. Here’s a comprehensive side-by-side comparison of key rich text editors (updated twice yearly):