Encoding is just one of those things that need to be done right. If done wrong, everything seems to be broken and nothing works. If done right, no one will notice. This makes dealing with encoding so annoying.
Nevertheless, we are quite lucky and most of the things are already really well-prepared. We only need to ensure that our documents are saved (and transmitted) with the right encoding. The right encoding is the one we specify. It could be anything, as long as it contains all characters we need, and as long as we stay consistent.
There are three important text encoding rules for HTML:
In this article we will have a closer look at all three rules in more detail, especially the second and the third. In the end we will also look at form encoding, which has nothing to do with text encoding directly, but does indirectly. We will see why there is some connection.
Either we know directly that our content should be delivered in some exotic encoding or we should just pick UTF-8. There are many reasons why we would want to use UTF-8. It is not a great format for storing characters in memory, but it is just wonderful as a basis for data exchange and content transmission. It is basically a no-brainer. Nevertheless, one of the more common mistakes is to save files without proper encoding. As there is no text without encoding, we should choose the encoding carefully.
Users of Sublime Text and most other text editors have probably never faced a problem with wrong encoding, since these editors save in UTF-8 by default. There are editors, mostly for the Windows platform, which a use different default format, e.g., Windows-1252.
Even in Sublime Text it is one of the more standard operations to change the encoding of the file. In the File menu we select Save with Encoding and select the one we want. That’s it!
In principle every more advanced editor should have such options. Sometimes they are contained in an advanced save menu. For instance, the editor for Microsoft’s Visual Studio triggers a special dialog after clicking Advanced Save Options… in the File menu.
We should make sure to use the right encoding. This will use the corresponding bytes for our content. UTF-8 has the major advantage of only requiring a single byte if we do not use a special character. At most 4 bytes per character are consumed. This is dynamic and makes UTF-8 an ideal format for text storage and transmission. The caveat is, however, that UTF-8 is not the best format for using strings from memory.
The HTTP protocol transmits data as plain text. Even if we decide to encode the transmitted content as GZip or if we use HTTPS, which encrypts the content, the underlying content is still just plain text. We’ve already learned that there is no such thing as just plain text. We always need to associate the content with some encoding to get a text representation.
An HTTP message is split in two parts. The upper part is called the headers. Separated by an empty line is the lower part: the body.
There are always at least two HTTP messages: a request and its associated response. Both types of messages share this structure. The body of a response is the content we want to transmit. The body of a request is only of interest for form submission, which we’ll care about later. If we want to provide some information on the encoding of the content, we have to supply some information in the header.
The following header tells the receiving side that the body contains a special text format called HTML, using the UTF-8 character set.
Content-Type: text/html; charset=utf-8
There is also the Content-Encoding
header. We can easily confuse the content encoding with the actual text encoding of the content. The former is used to specify encoding of the whole package, e.g. GZip, while the latter is used as an initial setting for, e.g., parsing the provided content.
If we care about the correctness of this step we have to make sure that our web server sends the correct header. Most web frameworks offer such an ability. In PHP we could write:
header('Content-Type:text/html; charset=utf-8');
In Node.js we may want to use the following, where res
is the variable representing the request:
res.setHeader('Content-Type', 'text/html; charset=utf-8');
The transmitted header will set the text scanner of the HTML input to the provided setting. In the case of the previous example we use UTF-8. But wait: Initial setting! There are many ways to override this. If the actual content is not UTF-8, the scanner may recognize this and change the setting. Such a change may be triggered by Byte-Order-Mark (known as BOM) detection or by finding encoding-specific patterns in the content. In contrast, the former looks for artificially prepended patterns.
Finally, the encoding may change due to our HTML code. This can only be changed once.
Once the DOM constructor hits a meta
tag, it will look for a charset
declaration. If one is found, the character set will be extracted. If we can extract it successfully and if the encoding is valid, we set the new encoding for scanning further characters. At this point the encoding will be frozen, and no further changes are possible.
There is just one caveat. To check if the previous scanning was alright, we need to compare the characters that have already been scanned with the characters that would have been scanned. Hence we need to see if changing the encoding earlier would have made some difference. If we find a difference, we need to restart the whole parsing procedure. Otherwise the whole DOM structure may be wrong up to this point.
As a consequence we’ve already learned two lessons:
<meta charset=utf-8>
(or some other encoding) tag as soon as possible.charset
attribute in HTML.Finally, a good starter for a boilerplate looks as follows. As we learned in the previous article, we can omit the head
and body
tags. The snippet does two things right: It uses the correct document type, and it selects the character set as soon as possible.
<!DOCTYPE html> <html lang="en"> <meta charset="utf-8"> <title>Title here</title> <!-- ... -->
The only remaining question is: What happens if I forget one of these three steps? Well, the first and third steps are the most important ones. The transmission is actually not that bad. If no initial encoding is given from the HTTP headers, the browser will select the initial encoding based on the user’s locale. With a German locale we get Windows-1252. This is actually the default for most countries. Some countries, like Poland or Hungary, select Latin2, also known as iso-8859-2.
In principle we do not have to worry about this initial encoding if we followed the best practices described earlier. ASCII is a subset of Unicode, and most of the listed encodings are actually just ASCII extensions to satisfy the specific needs of one or more countries. If we only use basic ASCII characters until the character set is specified, we should be fine.
Much more severe is a conflict between the stored / read or generated data, which is delivered to the client, and the statement in the meta
tag. If something went wrong we may see renderings like the following. This is not a pleasant user experience.
Coming back to determining the right encoding, there are many reasons why UTF-8 would be the best choice. Any other encoding should at least be sufficient for the characters we want to display. However, if we provide form input fields, we may be in trouble. At this point we do not control the characters that are used any more. Users are allowed to input anything here. Let’s see how we can control the encoding for form input.
A form is submitted with a certain encoding type, which is not the same as the encoding type of a server’s response, e.g. GZip. The form’s encoding type determines how the form is serialized before sending it to the server. It is particularly useful in conjunction with the HTTP verb.
Ordinary form submissions use POST
as HTTP verb, but GET
, PUT
and DELETE
are also common. Only POST
and PUT
are supposed to use the body for content transmission in the request. The browser will construct the content with respect to the choice of the enctype
attribute of the <form>
element, specifying the encoding type. The encoding type is transported by setting the Content-Type
header in the HTTP request.
There are three well-established encoding types:
The first and the second are quite similar, but they have subtle (and very important) differences. The third variant is the most powerful method. It even allows the transporting of arbitrary files as attachments.
The key difference between the first two types is that URL encoded form transmission percent-encodes all names and values, which is not done by plain text. The percent-encoding guarantees that the receiving side can distinguish between names and values. This guarantee does not exist with plain text form submission. The third variant uses a boundary string to separate the entries, which is unique by construction.
Let’s visualize the differences by submitting a simple form. The form contains the following code:
<input name="first" value="With spaces+signs"> <input name="second" value="Hällo Dü?"> <textarea name="third"> Multi lines rock</textarea>
Submitting the form without specifying any encoding type transmits the following body:
first=With+spaces%2Bsigns&second=H%E4llo+D%FC%3F&third=Multi%0D%0Alines%0D%0Arock
The URL encoding transforms the white-space characters to plus signs. Existing plus signs, like all “special” characters, are transformed by the percent-encoding rules. This especially applies to new lines, originally represented by \r\n
, which are now displayed as %0D%0A
.
Let’s see what the outcome for plain text encoding looks like.
first=With spaces+signs second=Hällo Dü? third=Multi lines rock
The pairs are split by new lines. This is especially problematic for multi-line content and may lead to incorrect representations.
In a way the multipart encoding combines the advantages of plain text submission with a defined boundary, which essentially solves the problems of the plain text encoding. The only drawback is the increased content length.
------WebKitFormBoundaryzQRASBvDO1bUB5Lp Content-Disposition: form-data; name="first" With spaces+signs ------WebKitFormBoundaryzQRASBvDO1bUB5Lp Content-Disposition: form-data; name="second" Hällo Dü? ------WebKitFormBoundaryzQRASBvDO1bUB5Lp Content-Disposition: form-data; name="third" Multi lines rock ------WebKitFormBoundaryzQRASBvDO1bUB5Lp--
The last two form encoding methods also displayed special characters exactly as we’ve entered them. Form transmission primarily uses the accept-charset
attribute of the corresponding <form>
element. If no such attribute is given, the encoding of the page is used. Again, setting the correct encoding is important.
In the future we will see a fourth encoding type, called application/json
. As the name suggests, it will pack the form content into a JSON string.
Choosing the right encoding can be as easy as just picking UTF-8. Typical problems can be avoided by using the same encoding consistently. Declaring the encoding during transport is certainly useful, although not required, especially if we follow best practices for placing a <meta>
element with the charset
attribute.
Form submission is a process that relies on the right encoding choice—not only for the text, but for the submission itself. In general we can always choose multipart/form-data
as enctype
, even though the default encoding type might be better (smaller) in most scenarios. In production we should never use text/plain
.
Create Modern Vue Apps Using Create-Vue and Vite
/Pros and Cons of Using WordPress
/How to Fix the “There Has Been a Critical Error in Your Website” Error in WordPress
/How To Fix The “There Has Been A Critical Error in Your Website” Error in WordPress
/How to Create a Privacy Policy Page in WordPress
/How Long Does It Take to Learn JavaScript?
/The Best Way to Deep Copy an Object in JavaScript
/Adding and Removing Elements From Arrays in JavaScript
/Create a JavaScript AJAX Post Request: With and Without jQuery
/5 Real-Life Uses for the JavaScript reduce() Method
/How to Enable or Disable a Button With JavaScript: jQuery vs. Vanilla
/How to Enable or Disable a Button With JavaScript: jQuery vs Vanilla
/Confirm Yes or No With JavaScript
/How to Change the URL in JavaScript: Redirecting
/15+ Best WordPress Twitter Widgets
/27 Best Tab and Accordion Widget Plugins for WordPress (Free & Premium)
/21 Best Tab and Accordion Widget Plugins for WordPress (Free & Premium)
/30 HTML Best Practices for Beginners
/31 Best WordPress Calendar Plugins and Widgets (With 5 Free Plugins)
/25 Ridiculously Impressive HTML5 Canvas Experiments
/How to Implement Email Verification for New Members
/How to Create a Simple Web-Based Chat Application
/30 Popular WordPress User Interface Elements
/Top 18 Best Practices for Writing Super Readable Code
/Best Affiliate WooCommerce Plugins Compared
/18 Best WordPress Star Rating Plugins
/10+ Best WordPress Twitter Widgets
/20+ Best WordPress Booking and Reservation Plugins
/Working With Tables in React: Part Two
/Best CSS Animations and Effects on CodeCanyon
/30 CSS Best Practices for Beginners
/How to Create a Custom WordPress Plugin From Scratch
/10 Best Responsive HTML5 Sliders for Images and Text… and 3 Free Options
/16 Best Tab and Accordion Widget Plugins for WordPress
/18 Best WordPress Membership Plugins and 5 Free Plugins
/25 Best WooCommerce Plugins for Products, Pricing, Payments and More
/10 Best WordPress Twitter Widgets
1 /12 Best Contact Form PHP Scripts for 2020
/20 Popular WordPress User Interface Elements
/10 Best WordPress Star Rating Plugins
/12 Best CSS Animations on CodeCanyon
/12 Best WordPress Booking and Reservation Plugins
/12 Elegant CSS Pricing Tables for Your Latest Web Project
/24 Best WordPress Form Plugins for 2020
/14 Best PHP Event Calendar and Booking Scripts
/Getting Started With Django: Newly Updated Course
/Create a Blog for Each Category or Department in Your WooCommerce Store
/8 Best WordPress Booking and Reservation Plugins
/Best Exit Popups for WordPress Compared
/Best Exit Popups for WordPress Compared
/11 Best Tab & Accordion WordPress Widgets & Plugins
/12 Best Tab & Accordion WordPress Widgets & Plugins
1 /New Course: Practical React Fundamentals
/Preview Our New Course on Angular Material
/Build Your Own CAPTCHA and Contact Form in PHP
/Object-Oriented PHP With Classes and Objects
/Best Practices for ARIA Implementation
/Accessible Apps: Barriers to Access and Getting Started With Accessibility
/Dramatically Speed Up Your React Front-End App Using Lazy Loading
/15 Best Modern JavaScript Admin Templates for React, Angular, and Vue.js
/15 Best Modern JavaScript Admin Templates for React, Angular and Vue.js
/19 Best JavaScript Admin Templates for React, Angular, and Vue.js
/New Course: Build an App With JavaScript and the MEAN Stack
/10 Best WordPress Facebook Widgets
13 /Hands-on With ARIA: Accessibility for eCommerce
/New eBooks Available for Subscribers
/Hands-on With ARIA: Homepage Elements and Standard Navigation
/Site Accessibility: Getting Started With ARIA
/How Secure Are Your JavaScript Open-Source Dependencies?
/New Course: Secure Your WordPress Site With SSL
/Testing Components in React Using Jest and Enzyme
/Testing Components in React Using Jest: The Basics
/15 Best PHP Event Calendar and Booking Scripts
/Create Interactive Gradient Animations Using Granim.js
/How to Build Complex, Large-Scale Vue.js Apps With Vuex
1 /Examples of Dependency Injection in PHP With Symfony Components
/Set Up Routing in PHP Applications Using the Symfony Routing Component
1 /A Beginner’s Guide to Regular Expressions in JavaScript
/Introduction to Popmotion: Custom Animation Scrubber
/Introduction to Popmotion: Pointers and Physics
/New Course: Connect to a Database With Laravel’s Eloquent ORM
/How to Create a Custom Settings Panel in WooCommerce
/Building the DOM faster: speculative parsing, async, defer and preload
1 /20 Useful PHP Scripts Available on CodeCanyon
3 /How to Find and Fix Poor Page Load Times With Raygun
/Introduction to the Stimulus Framework
/Single-Page React Applications With the React-Router and React-Transition-Group Modules
12 Best Contact Form PHP Scripts
1 /Getting Started With the Mojs Animation Library: The ShapeSwirl and Stagger Modules
/Getting Started With the Mojs Animation Library: The Shape Module
/Getting Started With the Mojs Animation Library: The HTML Module
/Project Management Considerations for Your WordPress Project
/8 Things That Make Jest the Best React Testing Framework
/Creating an Image Editor Using CamanJS: Layers, Blend Modes, and Events
/New Short Course: Code a Front-End App With GraphQL and React
/Creating an Image Editor Using CamanJS: Applying Basic Filters
/Creating an Image Editor Using CamanJS: Creating Custom Filters and Blend Modes
/Modern Web Scraping With BeautifulSoup and Selenium
/Challenge: Create a To-Do List in React
1 /Deploy PHP Web Applications Using Laravel Forge
/Getting Started With the Mojs Animation Library: The Burst Module
/10 Things Men Can Do to Support Women in Tech
/A Gentle Introduction to Higher-Order Components in React: Best Practices
/Challenge: Build a React Component
/A Gentle Introduction to HOC in React: Learn by Example
/A Gentle Introduction to Higher-Order Components in React
/Creating Pretty Popup Messages Using SweetAlert2
/Creating Stylish and Responsive Progress Bars Using ProgressBar.js
/18 Best Contact Form PHP Scripts for 2022
/How to Make a Real-Time Sports Application Using Node.js
/Creating a Blogging App Using Angular & MongoDB: Delete Post
/Set Up an OAuth2 Server Using Passport in Laravel
/Creating a Blogging App Using Angular & MongoDB: Edit Post
/Creating a Blogging App Using Angular & MongoDB: Add Post
/Introduction to Mocking in Python
/Creating a Blogging App Using Angular & MongoDB: Show Post
/Creating a Blogging App Using Angular & MongoDB: Home
/Creating a Blogging App Using Angular & MongoDB: Login
/Creating Your First Angular App: Implement Routing
/Persisted WordPress Admin Notices: Part 4
/Creating Your First Angular App: Components, Part 2
/Persisted WordPress Admin Notices: Part 3
/Creating Your First Angular App: Components, Part 1
/How Laravel Broadcasting Works
/Persisted WordPress Admin Notices: Part 2
/Create Your First Angular App: Storing and Accessing Data
/Persisted WordPress Admin Notices: Part 1
/Error and Performance Monitoring for Web & Mobile Apps Using Raygun
/Using Luxon for Date and Time in JavaScript
7 /How to Create an Audio Oscillator With the Web Audio API
/How to Cache Using Redis in Django Applications
/20 Essential WordPress Utilities to Manage Your Site
/Beginner’s Guide to Angular 4: HTTP
/Rapid Web Deployment for Laravel With GitHub, Linode, and RunCloud.io
/Beginners Guide to Angular 4: Routing
/Beginner’s Guide to Angular 4: Services
/Beginner’s Guide to Angular 4: Components
/Creating a Drop-Down Menu for Mobile Pages
/Introduction to Forms in Angular 4: Writing Custom Form Validators
/10 Best WordPress Booking & Reservation Plugins
/Getting Started With Redux: Connecting Redux With React
/Getting Started With Redux: Learn by Example
/Getting Started With Redux: Why Redux?
/Understanding Recursion With JavaScript
/How to Auto Update WordPress Salts
/How to Download Files in Python
/Eloquent Mutators and Accessors in Laravel
1 /10 Best HTML5 Sliders for Images and Text
/Creating a Task Manager App Using Ionic: Part 2
/Creating a Task Manager App Using Ionic: Part 1
/Introduction to Forms in Angular 4: Reactive Forms
/Introduction to Forms in Angular 4: Template-Driven Forms
/24 Essential WordPress Utilities to Manage Your Site
/25 Essential WordPress Utilities to Manage Your Site
/Get Rid of Bugs Quickly Using BugReplay
1 /Manipulating HTML5 Canvas Using Konva: Part 1, Getting Started
/10 Must-See Easy Digital Downloads Extensions for Your WordPress Site
/22 Best WordPress Booking and Reservation Plugins
/Understanding ExpressJS Routing
/15 Best WordPress Star Rating Plugins
/Creating Your First Angular App: Basics
/Inheritance and Extending Objects With JavaScript
/Introduction to the CSS Grid Layout With Examples
1Performant Animations Using KUTE.js: Part 5, Easing Functions and Attributes
Performant Animations Using KUTE.js: Part 4, Animating Text
/Performant Animations Using KUTE.js: Part 3, Animating SVG
/New Course: Code a Quiz App With Vue.js
/Performant Animations Using KUTE.js: Part 2, Animating CSS Properties
Performant Animations Using KUTE.js: Part 1, Getting Started
/10 Best Responsive HTML5 Sliders for Images and Text (Plus 3 Free Options)
/Single-Page Applications With ngRoute and ngAnimate in AngularJS
/Deferring Tasks in Laravel Using Queues
/Site Authentication in Node.js: User Signup and Login
/Working With Tables in React, Part Two
/Working With Tables in React, Part One
/How to Set Up a Scalable, E-Commerce-Ready WordPress Site Using ClusterCS
/New Course on WordPress Conditional Tags
/TypeScript for Beginners, Part 5: Generics
/Building With Vue.js 2 and Firebase
6 /Essential JavaScript Libraries and Frameworks You Should Know About
/Vue.js Crash Course: Create a Simple Blog Using Vue.js
/Build a React App With a Laravel RESTful Back End: Part 1, Laravel 5.5 API
/API Authentication With Node.js
/Beginner’s Guide to Angular: Routing
/Beginners Guide to Angular: Routing
/Beginner’s Guide to Angular: Services
/Beginner’s Guide to Angular: Components
/How to Create a Custom Authentication Guard in Laravel
/Learn Computer Science With JavaScript: Part 3, Loops
/Build Web Applications Using Node.js
/Learn Computer Science With JavaScript: Part 4, Functions
/Learn Computer Science With JavaScript: Part 2, Conditionals
/Create Interactive Charts Using Plotly.js, Part 5: Pie and Gauge Charts
/Create Interactive Charts Using Plotly.js, Part 4: Bubble and Dot Charts
/Create Interactive Charts Using Plotly.js, Part 3: Bar Charts
/Awesome JavaScript Libraries and Frameworks You Should Know About
/Create Interactive Charts Using Plotly.js, Part 2: Line Charts
/Bulk Import a CSV File Into MongoDB Using Mongoose With Node.js
/Build a To-Do API With Node, Express, and MongoDB
/Getting Started With End-to-End Testing in Angular Using Protractor
/TypeScript for Beginners, Part 4: Classes
/Object-Oriented Programming With JavaScript
/10 Best Affiliate WooCommerce Plugins Compared
/Stateful vs. Stateless Functional Components in React
/Make Your JavaScript Code Robust With Flow
/Build a To-Do API With Node and Restify
/Testing Components in Angular Using Jasmine: Part 2, Services
/Testing Components in Angular Using Jasmine: Part 1
/Creating a Blogging App Using React, Part 6: Tags
/React Crash Course for Beginners, Part 3
/React Crash Course for Beginners, Part 2
/React Crash Course for Beginners, Part 1
/Set Up a React Environment, Part 4
1 /Set Up a React Environment, Part 3
/New Course: Get Started With Phoenix
/Set Up a React Environment, Part 2
/Set Up a React Environment, Part 1
/Command Line Basics and Useful Tricks With the Terminal
/How to Create a Real-Time Feed Using Phoenix and React
/Build a React App With a Laravel Back End: Part 2, React
/Build a React App With a Laravel RESTful Back End: Part 1, Laravel 9 API
/Creating a Blogging App Using React, Part 5: Profile Page
/Pagination in CodeIgniter: The Complete Guide
/JavaScript-Based Animations Using Anime.js, Part 4: Callbacks, Easings, and SVG
/JavaScript-Based Animations Using Anime.js, Part 3: Values, Timeline, and Playback
/Learn to Code With JavaScript: Part 1, The Basics
/10 Elegant CSS Pricing Tables for Your Latest Web Project
/Getting Started With the Flux Architecture in React
/Getting Started With Matter.js: The Composites and Composite Modules
Getting Started With Matter.js: The Engine and World Modules
/10 More Popular HTML5 Projects for You to Use and Study
/Understand the Basics of Laravel Middleware
/Iterating Fast With Django & Heroku
/Creating a Blogging App Using React, Part 4: Update & Delete Posts
/Creating a jQuery Plugin for Long Shadow Design
/How to Register & Use Laravel Service Providers
2 /Unit Testing in React: Shallow vs. Static Testing
/Creating a Blogging App Using React, Part 3: Add & Display Post
/Creating a Blogging App Using React, Part 2: User Sign-Up
20 /Creating a Blogging App Using React, Part 1: User Sign-In
/Creating a Grocery List Manager Using Angular, Part 2: Managing Items
/9 Elegant CSS Pricing Tables for Your Latest Web Project
/Dynamic Page Templates in WordPress, Part 3
/Angular vs. React: 7 Key Features Compared
/Creating a Grocery List Manager Using Angular, Part 1: Add & Display Items
New eBooks Available for Subscribers in June 2017
/Create Interactive Charts Using Plotly.js, Part 1: Getting Started
/The 5 Best IDEs for WordPress Development (And Why)
/33 Popular WordPress User Interface Elements
/New Course: How to Hack Your Own App
/How to Install Yii on Windows or a Mac
/What Is a JavaScript Operator?
/How to Register and Use Laravel Service Providers
/
waly Good blog post. I absolutely love this…