Difference between revisions of "String Similarity Panel"

From ICANNWiki
Jump to navigation Jump to search
Line 1: Line 1:
The '''String Similarity Panel''', also known as '''String Similarity Examiners''', are responsible for determining if there are any similar gTLD strings that will likely and significantly  confuse Internet users. The panel will compare [[new gTTLD Program|new gTLD strings]] with any reserved name, existing TLD, requested IDN ccTLD, and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations is done during the initial evaluation phase of the new gTLD application review process.<ref>[http://www.new-gtld.ch/faq.php What are the evaluation panels?]</ref>
+
The '''String Similarity Panel''', also known as '''String Similarity Examiners''', are responsible for determining if there are any similar gTLD strings that will likely and significantly  confuse Internet users. The panel will compare [[new gTLD Program|new gTLD strings]] with any reserved name, existing TLD, requested [[IDN]] [[ccTLD]], and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations is done during the initial evaluation phase of the new gTLD application review process.<ref>[http://www.new-gtld.ch/faq.php What are the evaluation panels?]</ref>
  
 
On February 25, 2009, the [[ICANN Board]] issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.<ref>[http://archive.icann.org/en/topics/new-gtlds/eoi-string-sim-25feb09-en.pdf ICANN CALL FOR EXPRESSIONS OF INTEREST (EOIs) For New gTLD String Similarity Examiners]</ref> ICANN selected [[InterConnect Communications]] in partnership with the University College London to identify string similarity.<ref>[http://newgtlds.icann.org/en/blog/preparing-evaluators-22nov11-en Preparing Evaluators for the New gTLD Application Process]</ref>
 
On February 25, 2009, the [[ICANN Board]] issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.<ref>[http://archive.icann.org/en/topics/new-gtlds/eoi-string-sim-25feb09-en.pdf ICANN CALL FOR EXPRESSIONS OF INTEREST (EOIs) For New gTLD String Similarity Examiners]</ref> ICANN selected [[InterConnect Communications]] in partnership with the University College London to identify string similarity.<ref>[http://newgtlds.icann.org/en/blog/preparing-evaluators-22nov11-en Preparing Evaluators for the New gTLD Application Process]</ref>
  
It remains unclear whether or not many of the 3 character new gTLD applications will face high probability of being deemed too similar to existing ccTLDs. According to industry blog, [[Domain Incite]], 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing [[ccTLD]]. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's [[SWORD Algorithm]] is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.<ref>[http://domainincite.com/pro/tag/string-similarity-panel/ String Similarity Panel, DomainIncite.com/pro]</ref>
+
It remains unclear whether or not many of the 3 character new gTLD applications will face a high probability of being deemed too similar to existing ccTLDs. According to industry blog, [[Domain Incite]], 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing [[ccTLD]]. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's [[SWORD Algorithm]] is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.<ref>[http://domainincite.com/pro/tag/string-similarity-panel/ String Similarity Panel, DomainIncite.com/pro]</ref>
  
 
==Sword Algorithm==
 
==Sword Algorithm==
The '''Sword Algorithm''' is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithm developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.<ref>[https://icann.sword-group.com/algorithm/ String Similarity Assessment Tool]</ref>
+
The '''Sword Algorithm''' is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithms, developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.<ref>[https://icann.sword-group.com/algorithm/ String Similarity Assessment Tool]</ref>
  
The [[String Similarity Panel]] is responsible in validating the results of the sword algorithm and determine whether the two or more strings really have high level of visual similarity that might confuse users. The panel will also decide if the strings should be put in a contention set or direct contention.<ref>[http://www.gtldstrategy.com/technical-details-vendor-advice/sword-fights-and-string-theory-part-1 SWORD Fights and String Theory: Part 1]</ref>
+
The [[String Similarity Panel]] is responsible in validating the results of the sword algorithm and determining whether the two or more strings really have high a level of visual similarity that will confuse users. The panel ultimately decides if the strings should be put in a contention set or direct contention.<ref>[http://www.gtldstrategy.com/technical-details-vendor-advice/sword-fights-and-string-theory-part-1 SWORD Fights and String Theory: Part 1]</ref>
  
How does the sword algorithm work? It uses the string similarity assessment tool, a proprietary software that calculates mathematically the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence,
+
The algorithm uses a proprietary software that mathematically calculates the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence, number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also uses an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.<ref>[https://icann.sword-group.com/algorithm/ About This Tool]</ref>
number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also use an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.<ref>[https://icann.sword-group.com/algorithm/ About This Tool]</ref>
 
  
You may check the Sword Algorithm [https://icann.sword-group.com/algorithm/ '''here'''].
+
You may utilize the Sword Algorithm [https://icann.sword-group.com/algorithm/ '''here'''].
  
 
==References==
 
==References==

Revision as of 15:49, 3 October 2012

The String Similarity Panel, also known as String Similarity Examiners, are responsible for determining if there are any similar gTLD strings that will likely and significantly confuse Internet users. The panel will compare new gTLD strings with any reserved name, existing TLD, requested IDN ccTLD, and other new gTLD string proposals. It will also examine the IDN tables submitted by applicants. String similarity evaluations is done during the initial evaluation phase of the new gTLD application review process.[1]

On February 25, 2009, the ICANN Board issued a call for Expressions of Interest (EOI) for individuals interested in becoming string similarity examiners.[2] ICANN selected InterConnect Communications in partnership with the University College London to identify string similarity.[3]

It remains unclear whether or not many of the 3 character new gTLD applications will face a high probability of being deemed too similar to existing ccTLDs. According to industry blog, Domain Incite, 304 of 375 applications for three-letter gTLDs have only one character variance with one or more existing ccTLD. In total, if a single additional character is enough to create similarity, there are 368 potential ccTLD/gTLD conflicts in the current application round. Furthermore, the visual similarity ratio between ccTLDs and gTLDs, as measured by ICANN's SWORD Algorithm is generally only a few percentage points lower than in the case of TLDs that have already been rejected on confusing similarity grounds.[4]

Sword Algorithm

The Sword Algorithm is the string similarity assessment tool adopted by ICANN to automatically determine if a new gTLD being applied for is not confusingly similar to a reserved name or existing TLD. SWORD, an international IT company expert in verbal search algorithms, developed the tool to automate the process of examining the similarities of proposed and existing TLD strings. The tool is intended to provide an open, objective and predictable mechanism to determine the level of visual likeness between gTLDs.[5]

The String Similarity Panel is responsible in validating the results of the sword algorithm and determining whether the two or more strings really have high a level of visual similarity that will confuse users. The panel ultimately decides if the strings should be put in a contention set or direct contention.[6]

The algorithm uses a proprietary software that mathematically calculates the visual similarity of string based on the length of the strings, number of similar letters within sequences of two or more letters, number of similar letters not in sequence, number of dissimilar letters, and length of common prefixes and suffixes if greater than one. The algorithm also uses an image recognition program that supports most common characters in other languages including Arabic, Chinese, Cyrillic, Devanagari, Greek, Japanese, Korean and Latin. It is capable in comparing cross-script strings under the same group pf scripts.[7]

You may utilize the Sword Algorithm here.

References