Page Menu
Home
FreeBSD
Search
Configure Global Search
Log In
Files
F144634601
D18028.1775914810.diff
No One
Temporary
Actions
View File
Edit File
Delete File
View Transforms
Subscribe
Flag For Later
Award Token
Size
63 KB
Referenced Files
None
Subscribers
None
D18028.1775914810.diff
View Options
Index: head/share/man/man4/cpufreq.4
===================================================================
--- head/share/man/man4/cpufreq.4
+++ head/share/man/man4/cpufreq.4
@@ -24,7 +24,7 @@
.\"
.\" $FreeBSD$
.\"
-.Dd March 3, 2006
+.Dd January 22, 2020
.Dt CPUFREQ 4
.Os
.Sh NAME
@@ -85,6 +85,10 @@
.Bl -tag -width indent
.It Va dev.cpu.%d.freq
Current active CPU frequency in MHz.
+.It Va dev.cpu.%d.freq_driver
+The specific
+.Nm
+driver used by this cpu.
.It Va dev.cpu.%d.freq_levels
Currently available levels for the CPU (frequency/power usage).
Values are in units of MHz and milliwatts.
Index: head/share/man/man4/hwpstate_intel.4
===================================================================
--- head/share/man/man4/hwpstate_intel.4
+++ head/share/man/man4/hwpstate_intel.4
@@ -0,0 +1,89 @@
+.\"
+.\" Copyright (c) 2019 Intel Corporation
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice, this list of conditions and the following disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.\" $FreeBSD$
+.\"
+.Dd January 22, 2020
+.Dt HWPSTATE_INTEL 4
+.Os
+.Sh NAME
+.Nm hwpstate_intel
+.Nd Intel Speed Shift Technology driver
+.Sh SYNOPSIS
+To compile this driver into your kernel
+place the following line in your kernel
+configuration file:
+.Bd -ragged -offset indent
+.Cd "device cpufreq"
+.Ed
+.Sh DESCRIPTION
+The
+.Nm
+driver provides support for hardware-controlled performance states on Intel
+platforms, also known as Intel Speed Shift Technology.
+.Sh LOADER TUNABLES
+.Bl -tag -width indent
+.It Va hint.hwpstate_intel.0.disabled
+Can be used to disable
+.Nm ,
+allowing other compatible drivers to manage performance states, like
+.Xr est 4 .
+.Pq default 0
+.El
+.Sh SYSCTL VARIABLES
+The following
+.Xr sysctl 8
+values are available
+.Bl -tag -width indent
+.It Va dev.hwpstate_intel.%d.\%desc
+Describes the attached driver
+.It dev.hwpstate_intel.0.%desc: Intel Speed Shift
+.It Va dev.hwpstate_intel.%d.\%driver
+Driver in use, always hwpstate_intel.
+.It dev.hwpstate_intel.0.%driver: hwpstate_intel
+.It Va dev.hwpstate_intel.%d.\%parent
+.It dev.hwpstate_intel.0.%parent: cpu0
+The cpu that is exposing these frequencies.
+For example
+.Va cpu0 .
+.It Va dev.hwpstate_intel.%d.epp
+Energy/Performance Preference.
+Valid values range from 0 to 100.
+Setting this field conveys a hint to the hardware regarding a preference towards
+performance (at value 0), energy efficiency (at value 100), or somewhere in
+between.
+.It dev.hwpstate_intel.0.epp: 0
+.El
+.Sh COMPATIBILITY
+.Nm
+is only found on supported Intel CPUs.
+.Sh SEE ALSO
+.Xr cpufreq 4
+.Rs
+.%T "Intel 64 and IA-32 Architectures Software Developer Manuals"
+.%U "http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html"
+.Re
+.Sh AUTHORS
+This manual page was written by
+.An D Scott Phillips Aq Mt scottph@FreeBSD.org .
Index: head/sys/conf/files.x86
===================================================================
--- head/sys/conf/files.x86
+++ head/sys/conf/files.x86
@@ -290,7 +290,8 @@
x86/bios/smbios.c optional smbios
x86/bios/vpd.c optional vpd
x86/cpufreq/est.c optional cpufreq
-x86/cpufreq/hwpstate.c optional cpufreq
+x86/cpufreq/hwpstate_amd.c optional cpufreq
+x86/cpufreq/hwpstate_intel.c optional cpufreq
x86/cpufreq/p4tcc.c optional cpufreq
x86/cpufreq/powernow.c optional cpufreq
x86/iommu/busdma_dmar.c optional acpi acpi_dmar pci
Index: head/sys/kern/kern_cpu.c
===================================================================
--- head/sys/kern/kern_cpu.c
+++ head/sys/kern/kern_cpu.c
@@ -76,6 +76,7 @@
int all_count;
int max_mhz;
device_t dev;
+ device_t cf_drv_dev;
struct sysctl_ctx_list sysctl_ctx;
struct task startup_task;
struct cf_level *levels_buf;
@@ -142,6 +143,11 @@
SYSCTL_INT(_debug_cpufreq, OID_AUTO, verbose, CTLFLAG_RWTUN, &cf_verbose, 1,
"Print verbose debugging messages");
+/*
+ * This is called as the result of a hardware specific frequency control driver
+ * calling cpufreq_register. It provides a general interface for system wide
+ * frequency controls and operates on a per cpu basis.
+ */
static int
cpufreq_attach(device_t dev)
{
@@ -149,7 +155,6 @@
struct pcpu *pc;
device_t parent;
uint64_t rate;
- int numdevs;
CF_DEBUG("initializing %s\n", device_get_nameunit(dev));
sc = device_get_softc(dev);
@@ -164,6 +169,7 @@
sc->max_mhz = cpu_get_nominal_mhz(dev);
/* If that fails, try to measure the current rate */
if (sc->max_mhz <= 0) {
+ CF_DEBUG("Unable to obtain nominal frequency.\n");
pc = cpu_get_pcpu(dev);
if (cpu_est_clockrate(pc->pc_cpuid, &rate) == 0)
sc->max_mhz = rate / 1000000;
@@ -171,15 +177,6 @@
sc->max_mhz = CPUFREQ_VAL_UNKNOWN;
}
- /*
- * Only initialize one set of sysctls for all CPUs. In the future,
- * if multiple CPUs can have different settings, we can move these
- * sysctls to be under every CPU instead of just the first one.
- */
- numdevs = devclass_get_count(cpufreq_dc);
- if (numdevs > 1)
- return (0);
-
CF_DEBUG("initializing one-time data for %s\n",
device_get_nameunit(dev));
sc->levels_buf = malloc(CF_MAX_LEVELS * sizeof(*sc->levels_buf),
@@ -216,7 +213,6 @@
{
struct cpufreq_softc *sc;
struct cf_saved_freq *saved_freq;
- int numdevs;
CF_DEBUG("shutdown %s\n", device_get_nameunit(dev));
sc = device_get_softc(dev);
@@ -227,12 +223,7 @@
free(saved_freq, M_TEMP);
}
- /* Only clean up these resources when the last device is detaching. */
- numdevs = devclass_get_count(cpufreq_dc);
- if (numdevs == 1) {
- CF_DEBUG("final shutdown for %s\n", device_get_nameunit(dev));
- free(sc->levels_buf, M_DEVBUF);
- }
+ free(sc->levels_buf, M_DEVBUF);
return (0);
}
@@ -422,25 +413,74 @@
}
static int
+cpufreq_get_frequency(device_t dev)
+{
+ struct cf_setting set;
+
+ if (CPUFREQ_DRV_GET(dev, &set) != 0)
+ return (-1);
+
+ return (set.freq);
+}
+
+/* Returns the index into *levels with the match */
+static int
+cpufreq_get_level(device_t dev, struct cf_level *levels, int count)
+{
+ int i, freq;
+
+ if ((freq = cpufreq_get_frequency(dev)) < 0)
+ return (-1);
+ for (i = 0; i < count; i++)
+ if (freq == levels[i].total_set.freq)
+ return (i);
+
+ return (-1);
+}
+
+/*
+ * Used by the cpufreq core, this function will populate *level with the current
+ * frequency as either determined by a cached value sc->curr_level, or in the
+ * case the lower level driver has set the CPUFREQ_FLAG_UNCACHED flag, it will
+ * obtain the frequency from the driver itself.
+ */
+static int
cf_get_method(device_t dev, struct cf_level *level)
{
struct cpufreq_softc *sc;
struct cf_level *levels;
- struct cf_setting *curr_set, set;
+ struct cf_setting *curr_set;
struct pcpu *pc;
- device_t *devs;
- int bdiff, count, diff, error, i, n, numdevs;
+ int bdiff, count, diff, error, i, type;
uint64_t rate;
sc = device_get_softc(dev);
error = 0;
levels = NULL;
- /* If we already know the current frequency, we're done. */
+ /*
+ * If we already know the current frequency, and the driver didn't ask
+ * for uncached usage, we're done.
+ */
CF_MTX_LOCK(&sc->lock);
curr_set = &sc->curr_level.total_set;
- if (curr_set->freq != CPUFREQ_VAL_UNKNOWN) {
+ error = CPUFREQ_DRV_TYPE(sc->cf_drv_dev, &type);
+ if (error == 0 && (type & CPUFREQ_FLAG_UNCACHED)) {
+ struct cf_setting set;
+
+ /*
+ * If the driver wants to always report back the real frequency,
+ * first try the driver and if that fails, fall back to
+ * estimating.
+ */
+ if (CPUFREQ_DRV_GET(sc->cf_drv_dev, &set) != 0)
+ goto estimate;
+ sc->curr_level.total_set = set;
+ CF_DEBUG("get returning immediate freq %d\n", curr_set->freq);
+ goto out;
+ } else if (curr_set->freq != CPUFREQ_VAL_UNKNOWN) {
CF_DEBUG("get returning known freq %d\n", curr_set->freq);
+ error = 0;
goto out;
}
CF_MTX_UNLOCK(&sc->lock);
@@ -461,11 +501,6 @@
free(levels, M_TEMP);
return (error);
}
- error = device_get_children(device_get_parent(dev), &devs, &numdevs);
- if (error) {
- free(levels, M_TEMP);
- return (error);
- }
/*
* Reacquire the lock and search for the given level.
@@ -476,24 +511,21 @@
* The estimation code below catches this case though.
*/
CF_MTX_LOCK(&sc->lock);
- for (n = 0; n < numdevs && curr_set->freq == CPUFREQ_VAL_UNKNOWN; n++) {
- if (!device_is_attached(devs[n]))
- continue;
- if (CPUFREQ_DRV_GET(devs[n], &set) != 0)
- continue;
- for (i = 0; i < count; i++) {
- if (set.freq == levels[i].total_set.freq) {
- sc->curr_level = levels[i];
- break;
- }
- }
- }
- free(devs, M_TEMP);
+ i = cpufreq_get_level(sc->cf_drv_dev, levels, count);
+ if (i >= 0)
+ sc->curr_level = levels[i];
+ else
+ CF_DEBUG("Couldn't find supported level for %s\n",
+ device_get_nameunit(sc->cf_drv_dev));
+
if (curr_set->freq != CPUFREQ_VAL_UNKNOWN) {
CF_DEBUG("get matched freq %d from drivers\n", curr_set->freq);
goto out;
}
+estimate:
+ CF_MTX_ASSERT(&sc->lock);
+
/*
* We couldn't find an exact match, so attempt to estimate and then
* match against a level.
@@ -525,17 +557,82 @@
return (error);
}
+/*
+ * Either directly obtain settings from the cpufreq driver, or build a list of
+ * relative settings to be integrated later against an absolute max.
+ */
static int
+cpufreq_add_levels(device_t cf_dev, struct cf_setting_lst *rel_sets)
+{
+ struct cf_setting_array *set_arr;
+ struct cf_setting *sets;
+ device_t dev;
+ struct cpufreq_softc *sc;
+ int type, set_count, error;
+
+ sc = device_get_softc(cf_dev);
+ dev = sc->cf_drv_dev;
+
+ /* Skip devices that aren't ready. */
+ if (!device_is_attached(cf_dev))
+ return (0);
+
+ /*
+ * Get settings, skipping drivers that offer no settings or
+ * provide settings for informational purposes only.
+ */
+ error = CPUFREQ_DRV_TYPE(dev, &type);
+ if (error != 0 || (type & CPUFREQ_FLAG_INFO_ONLY)) {
+ if (error == 0) {
+ CF_DEBUG("skipping info-only driver %s\n",
+ device_get_nameunit(cf_dev));
+ }
+ return (error);
+ }
+
+ sets = malloc(MAX_SETTINGS * sizeof(*sets), M_TEMP, M_NOWAIT);
+ if (sets == NULL)
+ return (ENOMEM);
+
+ set_count = MAX_SETTINGS;
+ error = CPUFREQ_DRV_SETTINGS(dev, sets, &set_count);
+ if (error != 0 || set_count == 0)
+ goto out;
+
+ /* Add the settings to our absolute/relative lists. */
+ switch (type & CPUFREQ_TYPE_MASK) {
+ case CPUFREQ_TYPE_ABSOLUTE:
+ error = cpufreq_insert_abs(sc, sets, set_count);
+ break;
+ case CPUFREQ_TYPE_RELATIVE:
+ CF_DEBUG("adding %d relative settings\n", set_count);
+ set_arr = malloc(sizeof(*set_arr), M_TEMP, M_NOWAIT);
+ if (set_arr == NULL) {
+ error = ENOMEM;
+ goto out;
+ }
+ bcopy(sets, set_arr->sets, set_count * sizeof(*sets));
+ set_arr->count = set_count;
+ TAILQ_INSERT_TAIL(rel_sets, set_arr, link);
+ break;
+ default:
+ error = EINVAL;
+ }
+
+out:
+ free(sets, M_TEMP);
+ return (error);
+}
+
+static int
cf_levels_method(device_t dev, struct cf_level *levels, int *count)
{
struct cf_setting_array *set_arr;
struct cf_setting_lst rel_sets;
struct cpufreq_softc *sc;
struct cf_level *lev;
- struct cf_setting *sets;
struct pcpu *pc;
- device_t *devs;
- int error, i, numdevs, set_count, type;
+ int error, i;
uint64_t rate;
if (levels == NULL || count == NULL)
@@ -543,67 +640,21 @@
TAILQ_INIT(&rel_sets);
sc = device_get_softc(dev);
- error = device_get_children(device_get_parent(dev), &devs, &numdevs);
- if (error)
- return (error);
- sets = malloc(MAX_SETTINGS * sizeof(*sets), M_TEMP, M_NOWAIT);
- if (sets == NULL) {
- free(devs, M_TEMP);
- return (ENOMEM);
- }
- /* Get settings from all cpufreq drivers. */
CF_MTX_LOCK(&sc->lock);
- for (i = 0; i < numdevs; i++) {
- /* Skip devices that aren't ready. */
- if (!device_is_attached(devs[i]))
- continue;
+ error = cpufreq_add_levels(sc->dev, &rel_sets);
+ if (error)
+ goto out;
- /*
- * Get settings, skipping drivers that offer no settings or
- * provide settings for informational purposes only.
- */
- error = CPUFREQ_DRV_TYPE(devs[i], &type);
- if (error || (type & CPUFREQ_FLAG_INFO_ONLY)) {
- if (error == 0) {
- CF_DEBUG("skipping info-only driver %s\n",
- device_get_nameunit(devs[i]));
- }
- continue;
- }
- set_count = MAX_SETTINGS;
- error = CPUFREQ_DRV_SETTINGS(devs[i], sets, &set_count);
- if (error || set_count == 0)
- continue;
-
- /* Add the settings to our absolute/relative lists. */
- switch (type & CPUFREQ_TYPE_MASK) {
- case CPUFREQ_TYPE_ABSOLUTE:
- error = cpufreq_insert_abs(sc, sets, set_count);
- break;
- case CPUFREQ_TYPE_RELATIVE:
- CF_DEBUG("adding %d relative settings\n", set_count);
- set_arr = malloc(sizeof(*set_arr), M_TEMP, M_NOWAIT);
- if (set_arr == NULL) {
- error = ENOMEM;
- goto out;
- }
- bcopy(sets, set_arr->sets, set_count * sizeof(*sets));
- set_arr->count = set_count;
- TAILQ_INSERT_TAIL(&rel_sets, set_arr, link);
- break;
- default:
- error = EINVAL;
- }
- if (error)
- goto out;
- }
-
/*
* If there are no absolute levels, create a fake one at 100%. We
* then cache the clockrate for later use as our base frequency.
*/
if (TAILQ_EMPTY(&sc->all_levels)) {
+ struct cf_setting set;
+
+ CF_DEBUG("No absolute levels returned by driver\n");
+
if (sc->max_mhz == CPUFREQ_VAL_UNKNOWN) {
sc->max_mhz = cpu_get_nominal_mhz(dev);
/*
@@ -617,10 +668,10 @@
sc->max_mhz = rate / 1000000;
}
}
- memset(&sets[0], CPUFREQ_VAL_UNKNOWN, sizeof(*sets));
- sets[0].freq = sc->max_mhz;
- sets[0].dev = NULL;
- error = cpufreq_insert_abs(sc, sets, 1);
+ memset(&set, CPUFREQ_VAL_UNKNOWN, sizeof(set));
+ set.freq = sc->max_mhz;
+ set.dev = NULL;
+ error = cpufreq_insert_abs(sc, &set, 1);
if (error)
goto out;
}
@@ -665,8 +716,6 @@
TAILQ_REMOVE(&rel_sets, set_arr, link);
free(set_arr, M_TEMP);
}
- free(devs, M_TEMP);
- free(sets, M_TEMP);
return (error);
}
@@ -1011,11 +1060,24 @@
return (error);
}
+static void
+cpufreq_add_freq_driver_sysctl(device_t cf_dev)
+{
+ struct cpufreq_softc *sc;
+
+ sc = device_get_softc(cf_dev);
+ SYSCTL_ADD_CONST_STRING(&sc->sysctl_ctx,
+ SYSCTL_CHILDREN(device_get_sysctl_tree(cf_dev)), OID_AUTO,
+ "freq_driver", CTLFLAG_RD, device_get_nameunit(sc->cf_drv_dev),
+ "cpufreq driver used by this cpu");
+}
+
int
cpufreq_register(device_t dev)
{
struct cpufreq_softc *sc;
device_t cf_dev, cpu_dev;
+ int error;
/* Add a sysctl to get each driver's settings separately. */
SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
@@ -1031,6 +1093,7 @@
if ((cf_dev = device_find_child(cpu_dev, "cpufreq", -1))) {
sc = device_get_softc(cf_dev);
sc->max_mhz = CPUFREQ_VAL_UNKNOWN;
+ MPASS(sc->cf_drv_dev != NULL);
return (0);
}
@@ -1040,40 +1103,36 @@
return (ENOMEM);
device_quiet(cf_dev);
- return (device_probe_and_attach(cf_dev));
+ error = device_probe_and_attach(cf_dev);
+ if (error)
+ return (error);
+
+ sc = device_get_softc(cf_dev);
+ sc->cf_drv_dev = dev;
+ cpufreq_add_freq_driver_sysctl(cf_dev);
+ return (error);
}
int
cpufreq_unregister(device_t dev)
{
- device_t cf_dev, *devs;
- int cfcount, devcount, error, i, type;
+ device_t cf_dev;
+ struct cpufreq_softc *sc;
/*
* If this is the last cpufreq child device, remove the control
* device as well. We identify cpufreq children by calling a method
* they support.
*/
- error = device_get_children(device_get_parent(dev), &devs, &devcount);
- if (error)
- return (error);
cf_dev = device_find_child(device_get_parent(dev), "cpufreq", -1);
if (cf_dev == NULL) {
device_printf(dev,
"warning: cpufreq_unregister called with no cpufreq device active\n");
- free(devs, M_TEMP);
return (0);
}
- cfcount = 0;
- for (i = 0; i < devcount; i++) {
- if (!device_is_attached(devs[i]))
- continue;
- if (CPUFREQ_DRV_TYPE(devs[i], &type) == 0)
- cfcount++;
- }
- if (cfcount <= 1)
- device_delete_child(device_get_parent(cf_dev), cf_dev);
- free(devs, M_TEMP);
+ sc = device_get_softc(cf_dev);
+ MPASS(sc->cf_drv_dev == dev);
+ device_delete_child(device_get_parent(cf_dev), cf_dev);
return (0);
}
Index: head/sys/modules/cpufreq/Makefile
===================================================================
--- head/sys/modules/cpufreq/Makefile
+++ head/sys/modules/cpufreq/Makefile
@@ -11,7 +11,7 @@
.PATH: ${SRCTOP}/sys/x86/cpufreq
SRCS+= acpi_if.h opt_acpi.h
-SRCS+= est.c hwpstate.c p4tcc.c powernow.c
+SRCS+= est.c hwpstate_amd.c p4tcc.c powernow.c hwpstate_intel.c
.endif
.if ${MACHINE} == "i386"
Index: head/sys/sys/cpu.h
===================================================================
--- head/sys/sys/cpu.h
+++ head/sys/sys/cpu.h
@@ -120,11 +120,16 @@
* information about settings but rely on another machine-dependent driver
* for actually performing the frequency transition (e.g., ACPI performance
* states of type "functional fixed hardware.")
+ *
+ * The "uncached" flag tells CPUFREQ_DRV_GET to try obtaining the real
+ * instantaneous frequency from the underlying hardware regardless of cached
+ * state. It is probably a bug to not combine this with "info only"
*/
#define CPUFREQ_TYPE_MASK 0xffff
#define CPUFREQ_TYPE_RELATIVE (1<<0)
#define CPUFREQ_TYPE_ABSOLUTE (1<<1)
#define CPUFREQ_FLAG_INFO_ONLY (1<<16)
+#define CPUFREQ_FLAG_UNCACHED (1<<17)
/*
* When setting a level, the caller indicates the priority of this request.
Index: head/sys/x86/cpufreq/est.c
===================================================================
--- head/sys/x86/cpufreq/est.c
+++ head/sys/x86/cpufreq/est.c
@@ -50,6 +50,8 @@
#include <dev/acpica/acpivar.h>
#include "acpi_if.h"
+#include <x86/cpufreq/hwpstate_intel_internal.h>
+
/* Status/control registers (from the IA-32 System Programming Guide). */
#define MSR_PERF_STATUS 0x198
#define MSR_PERF_CTL 0x199
@@ -898,6 +900,7 @@
static devclass_t est_devclass;
DRIVER_MODULE(est, cpu, est_driver, est_devclass, 0, 0);
+MODULE_DEPEND(est, hwpstate_intel, 1, 1, 1);
static int
est_features(driver_t *driver, u_int *features)
@@ -915,6 +918,15 @@
est_identify(driver_t *driver, device_t parent)
{
device_t child;
+
+ /*
+ * Defer to hwpstate if it is present. This priority logic
+ * should be replaced with normal newbus probing in the
+ * future.
+ */
+ intel_hwpstate_identify(NULL, parent);
+ if (device_find_child(parent, "hwpstate_intel", -1) != NULL)
+ return;
/* Make sure we're not being doubly invoked. */
if (device_find_child(parent, "est", -1) != NULL)
Index: head/sys/x86/cpufreq/hwpstate.c
===================================================================
--- head/sys/x86/cpufreq/hwpstate.c
+++ head/sys/x86/cpufreq/hwpstate.c
@@ -1,543 +0,0 @@
-/*-
- * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
- *
- * Copyright (c) 2005 Nate Lawson
- * Copyright (c) 2004 Colin Percival
- * Copyright (c) 2004-2005 Bruno Durcot
- * Copyright (c) 2004 FUKUDA Nobuhiko
- * Copyright (c) 2009 Michael Reifenberger
- * Copyright (c) 2009 Norikatsu Shigemura
- * Copyright (c) 2008-2009 Gen Otsuji
- *
- * This code is depending on kern_cpu.c, est.c, powernow.c, p4tcc.c, smist.c
- * in various parts. The authors of these files are Nate Lawson,
- * Colin Percival, Bruno Durcot, and FUKUDA Nobuhiko.
- * This code contains patches by Michael Reifenberger and Norikatsu Shigemura.
- * Thank you.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted providing that the following conditions
- * are met:
- * 1. Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- * 2. Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in the
- * documentation and/or other materials provided with the distribution.
- *
- * THIS SOFTWARE IS PROVIDED BY THE AUTHOR``AS IS'' AND ANY EXPRESS OR
- * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
- * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
- * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
- * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
- * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
- * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
- * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
- * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- * POSSIBILITY OF SUCH DAMAGE.
- */
-
-/*
- * For more info:
- * BIOS and Kernel Developer's Guide(BKDG) for AMD Family 10h Processors
- * 31116 Rev 3.20 February 04, 2009
- * BIOS and Kernel Developer's Guide(BKDG) for AMD Family 11h Processors
- * 41256 Rev 3.00 - July 07, 2008
- */
-
-#include <sys/cdefs.h>
-__FBSDID("$FreeBSD$");
-
-#include <sys/param.h>
-#include <sys/bus.h>
-#include <sys/cpu.h>
-#include <sys/kernel.h>
-#include <sys/module.h>
-#include <sys/malloc.h>
-#include <sys/proc.h>
-#include <sys/pcpu.h>
-#include <sys/smp.h>
-#include <sys/sched.h>
-
-#include <machine/md_var.h>
-#include <machine/cputypes.h>
-#include <machine/specialreg.h>
-
-#include <contrib/dev/acpica/include/acpi.h>
-
-#include <dev/acpica/acpivar.h>
-
-#include "acpi_if.h"
-#include "cpufreq_if.h"
-
-#define MSR_AMD_10H_11H_LIMIT 0xc0010061
-#define MSR_AMD_10H_11H_CONTROL 0xc0010062
-#define MSR_AMD_10H_11H_STATUS 0xc0010063
-#define MSR_AMD_10H_11H_CONFIG 0xc0010064
-
-#define AMD_10H_11H_MAX_STATES 16
-
-/* for MSR_AMD_10H_11H_LIMIT C001_0061 */
-#define AMD_10H_11H_GET_PSTATE_MAX_VAL(msr) (((msr) >> 4) & 0x7)
-#define AMD_10H_11H_GET_PSTATE_LIMIT(msr) (((msr)) & 0x7)
-/* for MSR_AMD_10H_11H_CONFIG 10h:C001_0064:68 / 11h:C001_0064:6B */
-#define AMD_10H_11H_CUR_VID(msr) (((msr) >> 9) & 0x7F)
-#define AMD_10H_11H_CUR_DID(msr) (((msr) >> 6) & 0x07)
-#define AMD_10H_11H_CUR_FID(msr) ((msr) & 0x3F)
-
-#define AMD_17H_CUR_VID(msr) (((msr) >> 14) & 0xFF)
-#define AMD_17H_CUR_DID(msr) (((msr) >> 8) & 0x3F)
-#define AMD_17H_CUR_FID(msr) ((msr) & 0xFF)
-
-#define HWPSTATE_DEBUG(dev, msg...) \
- do { \
- if (hwpstate_verbose) \
- device_printf(dev, msg); \
- } while (0)
-
-struct hwpstate_setting {
- int freq; /* CPU clock in Mhz or 100ths of a percent. */
- int volts; /* Voltage in mV. */
- int power; /* Power consumed in mW. */
- int lat; /* Transition latency in us. */
- int pstate_id; /* P-State id */
-};
-
-struct hwpstate_softc {
- device_t dev;
- struct hwpstate_setting hwpstate_settings[AMD_10H_11H_MAX_STATES];
- int cfnum;
-};
-
-static void hwpstate_identify(driver_t *driver, device_t parent);
-static int hwpstate_probe(device_t dev);
-static int hwpstate_attach(device_t dev);
-static int hwpstate_detach(device_t dev);
-static int hwpstate_set(device_t dev, const struct cf_setting *cf);
-static int hwpstate_get(device_t dev, struct cf_setting *cf);
-static int hwpstate_settings(device_t dev, struct cf_setting *sets, int *count);
-static int hwpstate_type(device_t dev, int *type);
-static int hwpstate_shutdown(device_t dev);
-static int hwpstate_features(driver_t *driver, u_int *features);
-static int hwpstate_get_info_from_acpi_perf(device_t dev, device_t perf_dev);
-static int hwpstate_get_info_from_msr(device_t dev);
-static int hwpstate_goto_pstate(device_t dev, int pstate_id);
-
-static int hwpstate_verbose;
-SYSCTL_INT(_debug, OID_AUTO, hwpstate_verbose, CTLFLAG_RWTUN,
- &hwpstate_verbose, 0, "Debug hwpstate");
-
-static int hwpstate_verify;
-SYSCTL_INT(_debug, OID_AUTO, hwpstate_verify, CTLFLAG_RWTUN,
- &hwpstate_verify, 0, "Verify P-state after setting");
-
-static device_method_t hwpstate_methods[] = {
- /* Device interface */
- DEVMETHOD(device_identify, hwpstate_identify),
- DEVMETHOD(device_probe, hwpstate_probe),
- DEVMETHOD(device_attach, hwpstate_attach),
- DEVMETHOD(device_detach, hwpstate_detach),
- DEVMETHOD(device_shutdown, hwpstate_shutdown),
-
- /* cpufreq interface */
- DEVMETHOD(cpufreq_drv_set, hwpstate_set),
- DEVMETHOD(cpufreq_drv_get, hwpstate_get),
- DEVMETHOD(cpufreq_drv_settings, hwpstate_settings),
- DEVMETHOD(cpufreq_drv_type, hwpstate_type),
-
- /* ACPI interface */
- DEVMETHOD(acpi_get_features, hwpstate_features),
-
- {0, 0}
-};
-
-static devclass_t hwpstate_devclass;
-static driver_t hwpstate_driver = {
- "hwpstate",
- hwpstate_methods,
- sizeof(struct hwpstate_softc),
-};
-
-DRIVER_MODULE(hwpstate, cpu, hwpstate_driver, hwpstate_devclass, 0, 0);
-
-/*
- * Go to Px-state on all cpus considering the limit.
- */
-static int
-hwpstate_goto_pstate(device_t dev, int id)
-{
- sbintime_t sbt;
- uint64_t msr;
- int cpu, i, j, limit;
-
- /* get the current pstate limit */
- msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
- limit = AMD_10H_11H_GET_PSTATE_LIMIT(msr);
- if (limit > id)
- id = limit;
-
- cpu = curcpu;
- HWPSTATE_DEBUG(dev, "setting P%d-state on cpu%d\n", id, cpu);
- /* Go To Px-state */
- wrmsr(MSR_AMD_10H_11H_CONTROL, id);
-
- /*
- * We are going to the same Px-state on all cpus.
- * Probably should take _PSD into account.
- */
- CPU_FOREACH(i) {
- if (i == cpu)
- continue;
-
- /* Bind to each cpu. */
- thread_lock(curthread);
- sched_bind(curthread, i);
- thread_unlock(curthread);
- HWPSTATE_DEBUG(dev, "setting P%d-state on cpu%d\n", id, i);
- /* Go To Px-state */
- wrmsr(MSR_AMD_10H_11H_CONTROL, id);
- }
-
- /*
- * Verify whether each core is in the requested P-state.
- */
- if (hwpstate_verify) {
- CPU_FOREACH(i) {
- thread_lock(curthread);
- sched_bind(curthread, i);
- thread_unlock(curthread);
- /* wait loop (100*100 usec is enough ?) */
- for (j = 0; j < 100; j++) {
- /* get the result. not assure msr=id */
- msr = rdmsr(MSR_AMD_10H_11H_STATUS);
- if (msr == id)
- break;
- sbt = SBT_1MS / 10;
- tsleep_sbt(dev, PZERO, "pstate_goto", sbt,
- sbt >> tc_precexp, 0);
- }
- HWPSTATE_DEBUG(dev, "result: P%d-state on cpu%d\n",
- (int)msr, i);
- if (msr != id) {
- HWPSTATE_DEBUG(dev,
- "error: loop is not enough.\n");
- return (ENXIO);
- }
- }
- }
-
- return (0);
-}
-
-static int
-hwpstate_set(device_t dev, const struct cf_setting *cf)
-{
- struct hwpstate_softc *sc;
- struct hwpstate_setting *set;
- int i;
-
- if (cf == NULL)
- return (EINVAL);
- sc = device_get_softc(dev);
- set = sc->hwpstate_settings;
- for (i = 0; i < sc->cfnum; i++)
- if (CPUFREQ_CMP(cf->freq, set[i].freq))
- break;
- if (i == sc->cfnum)
- return (EINVAL);
-
- return (hwpstate_goto_pstate(dev, set[i].pstate_id));
-}
-
-static int
-hwpstate_get(device_t dev, struct cf_setting *cf)
-{
- struct hwpstate_softc *sc;
- struct hwpstate_setting set;
- uint64_t msr;
-
- sc = device_get_softc(dev);
- if (cf == NULL)
- return (EINVAL);
- msr = rdmsr(MSR_AMD_10H_11H_STATUS);
- if (msr >= sc->cfnum)
- return (EINVAL);
- set = sc->hwpstate_settings[msr];
-
- cf->freq = set.freq;
- cf->volts = set.volts;
- cf->power = set.power;
- cf->lat = set.lat;
- cf->dev = dev;
- return (0);
-}
-
-static int
-hwpstate_settings(device_t dev, struct cf_setting *sets, int *count)
-{
- struct hwpstate_softc *sc;
- struct hwpstate_setting set;
- int i;
-
- if (sets == NULL || count == NULL)
- return (EINVAL);
- sc = device_get_softc(dev);
- if (*count < sc->cfnum)
- return (E2BIG);
- for (i = 0; i < sc->cfnum; i++, sets++) {
- set = sc->hwpstate_settings[i];
- sets->freq = set.freq;
- sets->volts = set.volts;
- sets->power = set.power;
- sets->lat = set.lat;
- sets->dev = dev;
- }
- *count = sc->cfnum;
-
- return (0);
-}
-
-static int
-hwpstate_type(device_t dev, int *type)
-{
-
- if (type == NULL)
- return (EINVAL);
-
- *type = CPUFREQ_TYPE_ABSOLUTE;
- return (0);
-}
-
-static void
-hwpstate_identify(driver_t *driver, device_t parent)
-{
-
- if (device_find_child(parent, "hwpstate", -1) != NULL)
- return;
-
- if ((cpu_vendor_id != CPU_VENDOR_AMD || CPUID_TO_FAMILY(cpu_id) < 0x10) &&
- cpu_vendor_id != CPU_VENDOR_HYGON)
- return;
-
- /*
- * Check if hardware pstate enable bit is set.
- */
- if ((amd_pminfo & AMDPM_HW_PSTATE) == 0) {
- HWPSTATE_DEBUG(parent, "hwpstate enable bit is not set.\n");
- return;
- }
-
- if (resource_disabled("hwpstate", 0))
- return;
-
- if (BUS_ADD_CHILD(parent, 10, "hwpstate", -1) == NULL)
- device_printf(parent, "hwpstate: add child failed\n");
-}
-
-static int
-hwpstate_probe(device_t dev)
-{
- struct hwpstate_softc *sc;
- device_t perf_dev;
- uint64_t msr;
- int error, type;
-
- /*
- * Only hwpstate0.
- * It goes well with acpi_throttle.
- */
- if (device_get_unit(dev) != 0)
- return (ENXIO);
-
- sc = device_get_softc(dev);
- sc->dev = dev;
-
- /*
- * Check if acpi_perf has INFO only flag.
- */
- perf_dev = device_find_child(device_get_parent(dev), "acpi_perf", -1);
- error = TRUE;
- if (perf_dev && device_is_attached(perf_dev)) {
- error = CPUFREQ_DRV_TYPE(perf_dev, &type);
- if (error == 0) {
- if ((type & CPUFREQ_FLAG_INFO_ONLY) == 0) {
- /*
- * If acpi_perf doesn't have INFO_ONLY flag,
- * it will take care of pstate transitions.
- */
- HWPSTATE_DEBUG(dev, "acpi_perf will take care of pstate transitions.\n");
- return (ENXIO);
- } else {
- /*
- * If acpi_perf has INFO_ONLY flag, (_PCT has FFixedHW)
- * we can get _PSS info from acpi_perf
- * without going into ACPI.
- */
- HWPSTATE_DEBUG(dev, "going to fetch info from acpi_perf\n");
- error = hwpstate_get_info_from_acpi_perf(dev, perf_dev);
- }
- }
- }
-
- if (error == 0) {
- /*
- * Now we get _PSS info from acpi_perf without error.
- * Let's check it.
- */
- msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
- if (sc->cfnum != 1 + AMD_10H_11H_GET_PSTATE_MAX_VAL(msr)) {
- HWPSTATE_DEBUG(dev, "MSR (%jd) and ACPI _PSS (%d)"
- " count mismatch\n", (intmax_t)msr, sc->cfnum);
- error = TRUE;
- }
- }
-
- /*
- * If we cannot get info from acpi_perf,
- * Let's get info from MSRs.
- */
- if (error)
- error = hwpstate_get_info_from_msr(dev);
- if (error)
- return (error);
-
- device_set_desc(dev, "Cool`n'Quiet 2.0");
- return (0);
-}
-
-static int
-hwpstate_attach(device_t dev)
-{
-
- return (cpufreq_register(dev));
-}
-
-static int
-hwpstate_get_info_from_msr(device_t dev)
-{
- struct hwpstate_softc *sc;
- struct hwpstate_setting *hwpstate_set;
- uint64_t msr;
- int family, i, fid, did;
-
- family = CPUID_TO_FAMILY(cpu_id);
- sc = device_get_softc(dev);
- /* Get pstate count */
- msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
- sc->cfnum = 1 + AMD_10H_11H_GET_PSTATE_MAX_VAL(msr);
- hwpstate_set = sc->hwpstate_settings;
- for (i = 0; i < sc->cfnum; i++) {
- msr = rdmsr(MSR_AMD_10H_11H_CONFIG + i);
- if ((msr & ((uint64_t)1 << 63)) == 0) {
- HWPSTATE_DEBUG(dev, "msr is not valid.\n");
- return (ENXIO);
- }
- did = AMD_10H_11H_CUR_DID(msr);
- fid = AMD_10H_11H_CUR_FID(msr);
-
- /* Convert fid/did to frequency. */
- switch (family) {
- case 0x11:
- hwpstate_set[i].freq = (100 * (fid + 0x08)) >> did;
- break;
- case 0x10:
- case 0x12:
- case 0x15:
- case 0x16:
- hwpstate_set[i].freq = (100 * (fid + 0x10)) >> did;
- break;
- case 0x17:
- case 0x18:
- did = AMD_17H_CUR_DID(msr);
- if (did == 0) {
- HWPSTATE_DEBUG(dev, "unexpected did: 0\n");
- did = 1;
- }
- fid = AMD_17H_CUR_FID(msr);
- hwpstate_set[i].freq = (200 * fid) / did;
- break;
- default:
- HWPSTATE_DEBUG(dev, "get_info_from_msr: %s family"
- " 0x%02x CPUs are not supported yet\n",
- cpu_vendor_id == CPU_VENDOR_HYGON ? "Hygon" : "AMD",
- family);
- return (ENXIO);
- }
- hwpstate_set[i].pstate_id = i;
- /* There was volts calculation, but deleted it. */
- hwpstate_set[i].volts = CPUFREQ_VAL_UNKNOWN;
- hwpstate_set[i].power = CPUFREQ_VAL_UNKNOWN;
- hwpstate_set[i].lat = CPUFREQ_VAL_UNKNOWN;
- }
- return (0);
-}
-
-static int
-hwpstate_get_info_from_acpi_perf(device_t dev, device_t perf_dev)
-{
- struct hwpstate_softc *sc;
- struct cf_setting *perf_set;
- struct hwpstate_setting *hwpstate_set;
- int count, error, i;
-
- perf_set = malloc(MAX_SETTINGS * sizeof(*perf_set), M_TEMP, M_NOWAIT);
- if (perf_set == NULL) {
- HWPSTATE_DEBUG(dev, "nomem\n");
- return (ENOMEM);
- }
- /*
- * Fetch settings from acpi_perf.
- * Now it is attached, and has info only flag.
- */
- count = MAX_SETTINGS;
- error = CPUFREQ_DRV_SETTINGS(perf_dev, perf_set, &count);
- if (error) {
- HWPSTATE_DEBUG(dev, "error: CPUFREQ_DRV_SETTINGS.\n");
- goto out;
- }
- sc = device_get_softc(dev);
- sc->cfnum = count;
- hwpstate_set = sc->hwpstate_settings;
- for (i = 0; i < count; i++) {
- if (i == perf_set[i].spec[0]) {
- hwpstate_set[i].pstate_id = i;
- hwpstate_set[i].freq = perf_set[i].freq;
- hwpstate_set[i].volts = perf_set[i].volts;
- hwpstate_set[i].power = perf_set[i].power;
- hwpstate_set[i].lat = perf_set[i].lat;
- } else {
- HWPSTATE_DEBUG(dev, "ACPI _PSS object mismatch.\n");
- error = ENXIO;
- goto out;
- }
- }
-out:
- if (perf_set)
- free(perf_set, M_TEMP);
- return (error);
-}
-
-static int
-hwpstate_detach(device_t dev)
-{
-
- hwpstate_goto_pstate(dev, 0);
- return (cpufreq_unregister(dev));
-}
-
-static int
-hwpstate_shutdown(device_t dev)
-{
-
- /* hwpstate_goto_pstate(dev, 0); */
- return (0);
-}
-
-static int
-hwpstate_features(driver_t *driver, u_int *features)
-{
-
- /* Notify the ACPI CPU that we support direct access to MSRs */
- *features = ACPI_CAP_PERF_MSRS;
- return (0);
-}
Index: head/sys/x86/cpufreq/hwpstate_amd.c
===================================================================
--- head/sys/x86/cpufreq/hwpstate_amd.c
+++ head/sys/x86/cpufreq/hwpstate_amd.c
@@ -0,0 +1,543 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2005 Nate Lawson
+ * Copyright (c) 2004 Colin Percival
+ * Copyright (c) 2004-2005 Bruno Durcot
+ * Copyright (c) 2004 FUKUDA Nobuhiko
+ * Copyright (c) 2009 Michael Reifenberger
+ * Copyright (c) 2009 Norikatsu Shigemura
+ * Copyright (c) 2008-2009 Gen Otsuji
+ *
+ * This code is depending on kern_cpu.c, est.c, powernow.c, p4tcc.c, smist.c
+ * in various parts. The authors of these files are Nate Lawson,
+ * Colin Percival, Bruno Durcot, and FUKUDA Nobuhiko.
+ * This code contains patches by Michael Reifenberger and Norikatsu Shigemura.
+ * Thank you.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted providing that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+/*
+ * For more info:
+ * BIOS and Kernel Developer's Guide(BKDG) for AMD Family 10h Processors
+ * 31116 Rev 3.20 February 04, 2009
+ * BIOS and Kernel Developer's Guide(BKDG) for AMD Family 11h Processors
+ * 41256 Rev 3.00 - July 07, 2008
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/param.h>
+#include <sys/bus.h>
+#include <sys/cpu.h>
+#include <sys/kernel.h>
+#include <sys/module.h>
+#include <sys/malloc.h>
+#include <sys/proc.h>
+#include <sys/pcpu.h>
+#include <sys/smp.h>
+#include <sys/sched.h>
+
+#include <machine/md_var.h>
+#include <machine/cputypes.h>
+#include <machine/specialreg.h>
+
+#include <contrib/dev/acpica/include/acpi.h>
+
+#include <dev/acpica/acpivar.h>
+
+#include "acpi_if.h"
+#include "cpufreq_if.h"
+
+#define MSR_AMD_10H_11H_LIMIT 0xc0010061
+#define MSR_AMD_10H_11H_CONTROL 0xc0010062
+#define MSR_AMD_10H_11H_STATUS 0xc0010063
+#define MSR_AMD_10H_11H_CONFIG 0xc0010064
+
+#define AMD_10H_11H_MAX_STATES 16
+
+/* for MSR_AMD_10H_11H_LIMIT C001_0061 */
+#define AMD_10H_11H_GET_PSTATE_MAX_VAL(msr) (((msr) >> 4) & 0x7)
+#define AMD_10H_11H_GET_PSTATE_LIMIT(msr) (((msr)) & 0x7)
+/* for MSR_AMD_10H_11H_CONFIG 10h:C001_0064:68 / 11h:C001_0064:6B */
+#define AMD_10H_11H_CUR_VID(msr) (((msr) >> 9) & 0x7F)
+#define AMD_10H_11H_CUR_DID(msr) (((msr) >> 6) & 0x07)
+#define AMD_10H_11H_CUR_FID(msr) ((msr) & 0x3F)
+
+#define AMD_17H_CUR_VID(msr) (((msr) >> 14) & 0xFF)
+#define AMD_17H_CUR_DID(msr) (((msr) >> 8) & 0x3F)
+#define AMD_17H_CUR_FID(msr) ((msr) & 0xFF)
+
+#define HWPSTATE_DEBUG(dev, msg...) \
+ do { \
+ if (hwpstate_verbose) \
+ device_printf(dev, msg); \
+ } while (0)
+
+struct hwpstate_setting {
+ int freq; /* CPU clock in Mhz or 100ths of a percent. */
+ int volts; /* Voltage in mV. */
+ int power; /* Power consumed in mW. */
+ int lat; /* Transition latency in us. */
+ int pstate_id; /* P-State id */
+};
+
+struct hwpstate_softc {
+ device_t dev;
+ struct hwpstate_setting hwpstate_settings[AMD_10H_11H_MAX_STATES];
+ int cfnum;
+};
+
+static void hwpstate_identify(driver_t *driver, device_t parent);
+static int hwpstate_probe(device_t dev);
+static int hwpstate_attach(device_t dev);
+static int hwpstate_detach(device_t dev);
+static int hwpstate_set(device_t dev, const struct cf_setting *cf);
+static int hwpstate_get(device_t dev, struct cf_setting *cf);
+static int hwpstate_settings(device_t dev, struct cf_setting *sets, int *count);
+static int hwpstate_type(device_t dev, int *type);
+static int hwpstate_shutdown(device_t dev);
+static int hwpstate_features(driver_t *driver, u_int *features);
+static int hwpstate_get_info_from_acpi_perf(device_t dev, device_t perf_dev);
+static int hwpstate_get_info_from_msr(device_t dev);
+static int hwpstate_goto_pstate(device_t dev, int pstate_id);
+
+static int hwpstate_verbose;
+SYSCTL_INT(_debug, OID_AUTO, hwpstate_verbose, CTLFLAG_RWTUN,
+ &hwpstate_verbose, 0, "Debug hwpstate");
+
+static int hwpstate_verify;
+SYSCTL_INT(_debug, OID_AUTO, hwpstate_verify, CTLFLAG_RWTUN,
+ &hwpstate_verify, 0, "Verify P-state after setting");
+
+static device_method_t hwpstate_methods[] = {
+ /* Device interface */
+ DEVMETHOD(device_identify, hwpstate_identify),
+ DEVMETHOD(device_probe, hwpstate_probe),
+ DEVMETHOD(device_attach, hwpstate_attach),
+ DEVMETHOD(device_detach, hwpstate_detach),
+ DEVMETHOD(device_shutdown, hwpstate_shutdown),
+
+ /* cpufreq interface */
+ DEVMETHOD(cpufreq_drv_set, hwpstate_set),
+ DEVMETHOD(cpufreq_drv_get, hwpstate_get),
+ DEVMETHOD(cpufreq_drv_settings, hwpstate_settings),
+ DEVMETHOD(cpufreq_drv_type, hwpstate_type),
+
+ /* ACPI interface */
+ DEVMETHOD(acpi_get_features, hwpstate_features),
+
+ {0, 0}
+};
+
+static devclass_t hwpstate_devclass;
+static driver_t hwpstate_driver = {
+ "hwpstate",
+ hwpstate_methods,
+ sizeof(struct hwpstate_softc),
+};
+
+DRIVER_MODULE(hwpstate, cpu, hwpstate_driver, hwpstate_devclass, 0, 0);
+
+/*
+ * Go to Px-state on all cpus considering the limit.
+ */
+static int
+hwpstate_goto_pstate(device_t dev, int id)
+{
+ sbintime_t sbt;
+ uint64_t msr;
+ int cpu, i, j, limit;
+
+ /* get the current pstate limit */
+ msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
+ limit = AMD_10H_11H_GET_PSTATE_LIMIT(msr);
+ if (limit > id)
+ id = limit;
+
+ cpu = curcpu;
+ HWPSTATE_DEBUG(dev, "setting P%d-state on cpu%d\n", id, cpu);
+ /* Go To Px-state */
+ wrmsr(MSR_AMD_10H_11H_CONTROL, id);
+
+ /*
+ * We are going to the same Px-state on all cpus.
+ * Probably should take _PSD into account.
+ */
+ CPU_FOREACH(i) {
+ if (i == cpu)
+ continue;
+
+ /* Bind to each cpu. */
+ thread_lock(curthread);
+ sched_bind(curthread, i);
+ thread_unlock(curthread);
+ HWPSTATE_DEBUG(dev, "setting P%d-state on cpu%d\n", id, i);
+ /* Go To Px-state */
+ wrmsr(MSR_AMD_10H_11H_CONTROL, id);
+ }
+
+ /*
+ * Verify whether each core is in the requested P-state.
+ */
+ if (hwpstate_verify) {
+ CPU_FOREACH(i) {
+ thread_lock(curthread);
+ sched_bind(curthread, i);
+ thread_unlock(curthread);
+ /* wait loop (100*100 usec is enough ?) */
+ for (j = 0; j < 100; j++) {
+ /* get the result. not assure msr=id */
+ msr = rdmsr(MSR_AMD_10H_11H_STATUS);
+ if (msr == id)
+ break;
+ sbt = SBT_1MS / 10;
+ tsleep_sbt(dev, PZERO, "pstate_goto", sbt,
+ sbt >> tc_precexp, 0);
+ }
+ HWPSTATE_DEBUG(dev, "result: P%d-state on cpu%d\n",
+ (int)msr, i);
+ if (msr != id) {
+ HWPSTATE_DEBUG(dev,
+ "error: loop is not enough.\n");
+ return (ENXIO);
+ }
+ }
+ }
+
+ return (0);
+}
+
+static int
+hwpstate_set(device_t dev, const struct cf_setting *cf)
+{
+ struct hwpstate_softc *sc;
+ struct hwpstate_setting *set;
+ int i;
+
+ if (cf == NULL)
+ return (EINVAL);
+ sc = device_get_softc(dev);
+ set = sc->hwpstate_settings;
+ for (i = 0; i < sc->cfnum; i++)
+ if (CPUFREQ_CMP(cf->freq, set[i].freq))
+ break;
+ if (i == sc->cfnum)
+ return (EINVAL);
+
+ return (hwpstate_goto_pstate(dev, set[i].pstate_id));
+}
+
+static int
+hwpstate_get(device_t dev, struct cf_setting *cf)
+{
+ struct hwpstate_softc *sc;
+ struct hwpstate_setting set;
+ uint64_t msr;
+
+ sc = device_get_softc(dev);
+ if (cf == NULL)
+ return (EINVAL);
+ msr = rdmsr(MSR_AMD_10H_11H_STATUS);
+ if (msr >= sc->cfnum)
+ return (EINVAL);
+ set = sc->hwpstate_settings[msr];
+
+ cf->freq = set.freq;
+ cf->volts = set.volts;
+ cf->power = set.power;
+ cf->lat = set.lat;
+ cf->dev = dev;
+ return (0);
+}
+
+static int
+hwpstate_settings(device_t dev, struct cf_setting *sets, int *count)
+{
+ struct hwpstate_softc *sc;
+ struct hwpstate_setting set;
+ int i;
+
+ if (sets == NULL || count == NULL)
+ return (EINVAL);
+ sc = device_get_softc(dev);
+ if (*count < sc->cfnum)
+ return (E2BIG);
+ for (i = 0; i < sc->cfnum; i++, sets++) {
+ set = sc->hwpstate_settings[i];
+ sets->freq = set.freq;
+ sets->volts = set.volts;
+ sets->power = set.power;
+ sets->lat = set.lat;
+ sets->dev = dev;
+ }
+ *count = sc->cfnum;
+
+ return (0);
+}
+
+static int
+hwpstate_type(device_t dev, int *type)
+{
+
+ if (type == NULL)
+ return (EINVAL);
+
+ *type = CPUFREQ_TYPE_ABSOLUTE;
+ return (0);
+}
+
+static void
+hwpstate_identify(driver_t *driver, device_t parent)
+{
+
+ if (device_find_child(parent, "hwpstate", -1) != NULL)
+ return;
+
+ if ((cpu_vendor_id != CPU_VENDOR_AMD || CPUID_TO_FAMILY(cpu_id) < 0x10) &&
+ cpu_vendor_id != CPU_VENDOR_HYGON)
+ return;
+
+ /*
+ * Check if hardware pstate enable bit is set.
+ */
+ if ((amd_pminfo & AMDPM_HW_PSTATE) == 0) {
+ HWPSTATE_DEBUG(parent, "hwpstate enable bit is not set.\n");
+ return;
+ }
+
+ if (resource_disabled("hwpstate", 0))
+ return;
+
+ if (BUS_ADD_CHILD(parent, 10, "hwpstate", -1) == NULL)
+ device_printf(parent, "hwpstate: add child failed\n");
+}
+
+static int
+hwpstate_probe(device_t dev)
+{
+ struct hwpstate_softc *sc;
+ device_t perf_dev;
+ uint64_t msr;
+ int error, type;
+
+ /*
+ * Only hwpstate0.
+ * It goes well with acpi_throttle.
+ */
+ if (device_get_unit(dev) != 0)
+ return (ENXIO);
+
+ sc = device_get_softc(dev);
+ sc->dev = dev;
+
+ /*
+ * Check if acpi_perf has INFO only flag.
+ */
+ perf_dev = device_find_child(device_get_parent(dev), "acpi_perf", -1);
+ error = TRUE;
+ if (perf_dev && device_is_attached(perf_dev)) {
+ error = CPUFREQ_DRV_TYPE(perf_dev, &type);
+ if (error == 0) {
+ if ((type & CPUFREQ_FLAG_INFO_ONLY) == 0) {
+ /*
+ * If acpi_perf doesn't have INFO_ONLY flag,
+ * it will take care of pstate transitions.
+ */
+ HWPSTATE_DEBUG(dev, "acpi_perf will take care of pstate transitions.\n");
+ return (ENXIO);
+ } else {
+ /*
+ * If acpi_perf has INFO_ONLY flag, (_PCT has FFixedHW)
+ * we can get _PSS info from acpi_perf
+ * without going into ACPI.
+ */
+ HWPSTATE_DEBUG(dev, "going to fetch info from acpi_perf\n");
+ error = hwpstate_get_info_from_acpi_perf(dev, perf_dev);
+ }
+ }
+ }
+
+ if (error == 0) {
+ /*
+ * Now we get _PSS info from acpi_perf without error.
+ * Let's check it.
+ */
+ msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
+ if (sc->cfnum != 1 + AMD_10H_11H_GET_PSTATE_MAX_VAL(msr)) {
+ HWPSTATE_DEBUG(dev, "MSR (%jd) and ACPI _PSS (%d)"
+ " count mismatch\n", (intmax_t)msr, sc->cfnum);
+ error = TRUE;
+ }
+ }
+
+ /*
+ * If we cannot get info from acpi_perf,
+ * Let's get info from MSRs.
+ */
+ if (error)
+ error = hwpstate_get_info_from_msr(dev);
+ if (error)
+ return (error);
+
+ device_set_desc(dev, "Cool`n'Quiet 2.0");
+ return (0);
+}
+
+static int
+hwpstate_attach(device_t dev)
+{
+
+ return (cpufreq_register(dev));
+}
+
+static int
+hwpstate_get_info_from_msr(device_t dev)
+{
+ struct hwpstate_softc *sc;
+ struct hwpstate_setting *hwpstate_set;
+ uint64_t msr;
+ int family, i, fid, did;
+
+ family = CPUID_TO_FAMILY(cpu_id);
+ sc = device_get_softc(dev);
+ /* Get pstate count */
+ msr = rdmsr(MSR_AMD_10H_11H_LIMIT);
+ sc->cfnum = 1 + AMD_10H_11H_GET_PSTATE_MAX_VAL(msr);
+ hwpstate_set = sc->hwpstate_settings;
+ for (i = 0; i < sc->cfnum; i++) {
+ msr = rdmsr(MSR_AMD_10H_11H_CONFIG + i);
+ if ((msr & ((uint64_t)1 << 63)) == 0) {
+ HWPSTATE_DEBUG(dev, "msr is not valid.\n");
+ return (ENXIO);
+ }
+ did = AMD_10H_11H_CUR_DID(msr);
+ fid = AMD_10H_11H_CUR_FID(msr);
+
+ /* Convert fid/did to frequency. */
+ switch (family) {
+ case 0x11:
+ hwpstate_set[i].freq = (100 * (fid + 0x08)) >> did;
+ break;
+ case 0x10:
+ case 0x12:
+ case 0x15:
+ case 0x16:
+ hwpstate_set[i].freq = (100 * (fid + 0x10)) >> did;
+ break;
+ case 0x17:
+ case 0x18:
+ did = AMD_17H_CUR_DID(msr);
+ if (did == 0) {
+ HWPSTATE_DEBUG(dev, "unexpected did: 0\n");
+ did = 1;
+ }
+ fid = AMD_17H_CUR_FID(msr);
+ hwpstate_set[i].freq = (200 * fid) / did;
+ break;
+ default:
+ HWPSTATE_DEBUG(dev, "get_info_from_msr: %s family"
+ " 0x%02x CPUs are not supported yet\n",
+ cpu_vendor_id == CPU_VENDOR_HYGON ? "Hygon" : "AMD",
+ family);
+ return (ENXIO);
+ }
+ hwpstate_set[i].pstate_id = i;
+ /* There was volts calculation, but deleted it. */
+ hwpstate_set[i].volts = CPUFREQ_VAL_UNKNOWN;
+ hwpstate_set[i].power = CPUFREQ_VAL_UNKNOWN;
+ hwpstate_set[i].lat = CPUFREQ_VAL_UNKNOWN;
+ }
+ return (0);
+}
+
+static int
+hwpstate_get_info_from_acpi_perf(device_t dev, device_t perf_dev)
+{
+ struct hwpstate_softc *sc;
+ struct cf_setting *perf_set;
+ struct hwpstate_setting *hwpstate_set;
+ int count, error, i;
+
+ perf_set = malloc(MAX_SETTINGS * sizeof(*perf_set), M_TEMP, M_NOWAIT);
+ if (perf_set == NULL) {
+ HWPSTATE_DEBUG(dev, "nomem\n");
+ return (ENOMEM);
+ }
+ /*
+ * Fetch settings from acpi_perf.
+ * Now it is attached, and has info only flag.
+ */
+ count = MAX_SETTINGS;
+ error = CPUFREQ_DRV_SETTINGS(perf_dev, perf_set, &count);
+ if (error) {
+ HWPSTATE_DEBUG(dev, "error: CPUFREQ_DRV_SETTINGS.\n");
+ goto out;
+ }
+ sc = device_get_softc(dev);
+ sc->cfnum = count;
+ hwpstate_set = sc->hwpstate_settings;
+ for (i = 0; i < count; i++) {
+ if (i == perf_set[i].spec[0]) {
+ hwpstate_set[i].pstate_id = i;
+ hwpstate_set[i].freq = perf_set[i].freq;
+ hwpstate_set[i].volts = perf_set[i].volts;
+ hwpstate_set[i].power = perf_set[i].power;
+ hwpstate_set[i].lat = perf_set[i].lat;
+ } else {
+ HWPSTATE_DEBUG(dev, "ACPI _PSS object mismatch.\n");
+ error = ENXIO;
+ goto out;
+ }
+ }
+out:
+ if (perf_set)
+ free(perf_set, M_TEMP);
+ return (error);
+}
+
+static int
+hwpstate_detach(device_t dev)
+{
+
+ hwpstate_goto_pstate(dev, 0);
+ return (cpufreq_unregister(dev));
+}
+
+static int
+hwpstate_shutdown(device_t dev)
+{
+
+ /* hwpstate_goto_pstate(dev, 0); */
+ return (0);
+}
+
+static int
+hwpstate_features(driver_t *driver, u_int *features)
+{
+
+ /* Notify the ACPI CPU that we support direct access to MSRs */
+ *features = ACPI_CAP_PERF_MSRS;
+ return (0);
+}
Index: head/sys/x86/cpufreq/hwpstate_intel.c
===================================================================
--- head/sys/x86/cpufreq/hwpstate_intel.c
+++ head/sys/x86/cpufreq/hwpstate_intel.c
@@ -0,0 +1,516 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted providing that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/types.h>
+#include <sys/sbuf.h>
+#include <sys/module.h>
+#include <sys/systm.h>
+#include <sys/errno.h>
+#include <sys/param.h>
+#include <sys/kernel.h>
+#include <sys/bus.h>
+#include <sys/cpu.h>
+#include <sys/smp.h>
+#include <sys/proc.h>
+#include <sys/sched.h>
+
+#include <machine/cpu.h>
+#include <machine/md_var.h>
+#include <machine/cputypes.h>
+#include <machine/specialreg.h>
+
+#include <contrib/dev/acpica/include/acpi.h>
+
+#include <dev/acpica/acpivar.h>
+
+#include <x86/cpufreq/hwpstate_intel_internal.h>
+
+#include "acpi_if.h"
+#include "cpufreq_if.h"
+
+extern uint64_t tsc_freq;
+
+static int intel_hwpstate_probe(device_t dev);
+static int intel_hwpstate_attach(device_t dev);
+static int intel_hwpstate_detach(device_t dev);
+static int intel_hwpstate_suspend(device_t dev);
+static int intel_hwpstate_resume(device_t dev);
+
+static int intel_hwpstate_get(device_t dev, struct cf_setting *cf);
+static int intel_hwpstate_type(device_t dev, int *type);
+
+static device_method_t intel_hwpstate_methods[] = {
+ /* Device interface */
+ DEVMETHOD(device_identify, intel_hwpstate_identify),
+ DEVMETHOD(device_probe, intel_hwpstate_probe),
+ DEVMETHOD(device_attach, intel_hwpstate_attach),
+ DEVMETHOD(device_detach, intel_hwpstate_detach),
+ DEVMETHOD(device_suspend, intel_hwpstate_suspend),
+ DEVMETHOD(device_resume, intel_hwpstate_resume),
+
+ /* cpufreq interface */
+ DEVMETHOD(cpufreq_drv_get, intel_hwpstate_get),
+ DEVMETHOD(cpufreq_drv_type, intel_hwpstate_type),
+
+ DEVMETHOD_END
+};
+
+struct hwp_softc {
+ device_t dev;
+ bool hwp_notifications;
+ bool hwp_activity_window;
+ bool hwp_pref_ctrl;
+ bool hwp_pkg_ctrl;
+
+ uint64_t req; /* Cached copy of last request */
+
+ uint8_t high;
+ uint8_t guaranteed;
+ uint8_t efficient;
+ uint8_t low;
+};
+
+static devclass_t hwpstate_intel_devclass;
+static driver_t hwpstate_intel_driver = {
+ "hwpstate_intel",
+ intel_hwpstate_methods,
+ sizeof(struct hwp_softc),
+};
+
+DRIVER_MODULE(hwpstate_intel, cpu, hwpstate_intel_driver,
+ hwpstate_intel_devclass, NULL, NULL);
+
+static int
+intel_hwp_dump_sysctl_handler(SYSCTL_HANDLER_ARGS)
+{
+ device_t dev;
+ struct pcpu *pc;
+ struct sbuf *sb;
+ struct hwp_softc *sc;
+ uint64_t data, data2;
+ int ret;
+
+ sc = (struct hwp_softc *)arg1;
+ dev = sc->dev;
+
+ pc = cpu_get_pcpu(dev);
+ if (pc == NULL)
+ return (ENXIO);
+
+ sb = sbuf_new(NULL, NULL, 1024, SBUF_FIXEDLEN | SBUF_INCLUDENUL);
+ sbuf_putc(sb, '\n');
+ thread_lock(curthread);
+ sched_bind(curthread, pc->pc_cpuid);
+ thread_unlock(curthread);
+
+ rdmsr_safe(MSR_IA32_PM_ENABLE, &data);
+ sbuf_printf(sb, "CPU%d: HWP %sabled\n", pc->pc_cpuid,
+ ((data & 1) ? "En" : "Dis"));
+
+ if (data == 0) {
+ ret = 0;
+ goto out;
+ }
+
+ rdmsr_safe(MSR_IA32_HWP_CAPABILITIES, &data);
+ sbuf_printf(sb, "\tHighest Performance: %03lu\n", data & 0xff);
+ sbuf_printf(sb, "\tGuaranteed Performance: %03lu\n", (data >> 8) & 0xff);
+ sbuf_printf(sb, "\tEfficient Performance: %03lu\n", (data >> 16) & 0xff);
+ sbuf_printf(sb, "\tLowest Performance: %03lu\n", (data >> 24) & 0xff);
+
+ rdmsr_safe(MSR_IA32_HWP_REQUEST, &data);
+ if (sc->hwp_pkg_ctrl && (data & IA32_HWP_REQUEST_PACKAGE_CONTROL)) {
+ rdmsr_safe(MSR_IA32_HWP_REQUEST_PKG, &data2);
+ }
+
+ sbuf_putc(sb, '\n');
+
+#define pkg_print(x, name, offset) do { \
+ if (!sc->hwp_pkg_ctrl || (data & x) != 0) \
+ sbuf_printf(sb, "\t%s: %03lu\n", name, (data >> offset) & 0xff);\
+ else \
+ sbuf_printf(sb, "\t%s: %03lu\n", name, (data2 >> offset) & 0xff);\
+} while (0)
+
+ pkg_print(IA32_HWP_REQUEST_EPP_VALID,
+ "Requested Efficiency Performance Preference", 24);
+ pkg_print(IA32_HWP_REQUEST_DESIRED_VALID,
+ "Requested Desired Performance", 16);
+ pkg_print(IA32_HWP_REQUEST_MAXIMUM_VALID,
+ "Requested Maximum Performance", 8);
+ pkg_print(IA32_HWP_REQUEST_MINIMUM_VALID,
+ "Requested Minimum Performance", 0);
+#undef pkg_print
+
+ sbuf_putc(sb, '\n');
+
+out:
+ thread_lock(curthread);
+ sched_unbind(curthread);
+ thread_unlock(curthread);
+
+ ret = sbuf_finish(sb);
+ if (ret == 0)
+ ret = SYSCTL_OUT(req, sbuf_data(sb), sbuf_len(sb));
+ sbuf_delete(sb);
+
+ return (ret);
+}
+
+static inline int
+percent_to_raw(int x)
+{
+
+ MPASS(x <= 100 && x >= 0);
+ return (0xff * x / 100);
+}
+
+/*
+ * Given x * 10 in [0, 1000], round to the integer nearest x.
+ *
+ * This allows round-tripping nice human readable numbers through this
+ * interface. Otherwise, user-provided percentages such as 25, 50, 75 get
+ * rounded down to 24, 49, and 74, which is a bit ugly.
+ */
+static inline int
+round10(int xtimes10)
+{
+ return ((xtimes10 + 5) / 10);
+}
+
+static inline int
+raw_to_percent(int x)
+{
+ MPASS(x <= 0xff && x >= 0);
+ return (round10(x * 1000 / 0xff));
+}
+
+static int
+sysctl_epp_select(SYSCTL_HANDLER_ARGS)
+{
+ device_t dev;
+ struct pcpu *pc;
+ uint64_t requested;
+ uint32_t val;
+ int ret;
+
+ dev = oidp->oid_arg1;
+ pc = cpu_get_pcpu(dev);
+ if (pc == NULL)
+ return (ENXIO);
+
+ thread_lock(curthread);
+ sched_bind(curthread, pc->pc_cpuid);
+ thread_unlock(curthread);
+
+ rdmsr_safe(MSR_IA32_HWP_REQUEST, &requested);
+ val = (requested & IA32_HWP_REQUEST_ENERGY_PERFORMANCE_PREFERENCE) >> 24;
+ val = raw_to_percent(val);
+
+ MPASS(val >= 0 && val <= 100);
+
+ ret = sysctl_handle_int(oidp, &val, 0, req);
+ if (ret || req->newptr == NULL)
+ goto out;
+
+ if (val > 100) {
+ ret = EINVAL;
+ goto out;
+ }
+
+ val = percent_to_raw(val);
+
+ requested &= ~IA32_HWP_REQUEST_ENERGY_PERFORMANCE_PREFERENCE;
+ requested |= val << 24;
+
+ wrmsr_safe(MSR_IA32_HWP_REQUEST, requested);
+
+out:
+ thread_lock(curthread);
+ sched_unbind(curthread);
+ thread_unlock(curthread);
+
+ return (ret);
+}
+
+void
+intel_hwpstate_identify(driver_t *driver, device_t parent)
+{
+ uint32_t regs[4];
+
+ if (device_find_child(parent, "hwpstate_intel", -1) != NULL)
+ return;
+
+ if (cpu_vendor_id != CPU_VENDOR_INTEL)
+ return;
+
+ if (resource_disabled("hwpstate_intel", 0))
+ return;
+
+ /*
+ * Intel SDM 14.4.1 (HWP Programming Interfaces):
+ * The CPUID instruction allows software to discover the presence of
+ * HWP support in an Intel processor. Specifically, execute CPUID
+ * instruction with EAX=06H as input will return 5 bit flags covering
+ * the following aspects in bits 7 through 11 of CPUID.06H:EAX.
+ */
+
+ if (cpu_high < 6)
+ return;
+
+ /*
+ * Intel SDM 14.4.1 (HWP Programming Interfaces):
+ * Availability of HWP baseline resource and capability,
+ * CPUID.06H:EAX[bit 7]: If this bit is set, HWP provides several new
+ * architectural MSRs: IA32_PM_ENABLE, IA32_HWP_CAPABILITIES,
+ * IA32_HWP_REQUEST, IA32_HWP_STATUS.
+ */
+
+ do_cpuid(6, regs);
+ if ((regs[0] & CPUTPM1_HWP) == 0)
+ return;
+
+ if (BUS_ADD_CHILD(parent, 10, "hwpstate_intel", -1) == NULL)
+ return;
+
+ if (bootverbose)
+ device_printf(parent, "hwpstate registered\n");
+}
+
+static int
+intel_hwpstate_probe(device_t dev)
+{
+
+ device_set_desc(dev, "Intel Speed Shift");
+ return (BUS_PROBE_NOWILDCARD);
+}
+
+/* FIXME: Need to support PKG variant */
+static int
+set_autonomous_hwp(struct hwp_softc *sc)
+{
+ struct pcpu *pc;
+ device_t dev;
+ uint64_t caps;
+ int ret;
+
+ dev = sc->dev;
+
+ pc = cpu_get_pcpu(dev);
+ if (pc == NULL)
+ return (ENXIO);
+
+ thread_lock(curthread);
+ sched_bind(curthread, pc->pc_cpuid);
+ thread_unlock(curthread);
+
+ /* XXX: Many MSRs aren't readable until feature is enabled */
+ ret = wrmsr_safe(MSR_IA32_PM_ENABLE, 1);
+ if (ret) {
+ device_printf(dev, "Failed to enable HWP for cpu%d (%d)\n",
+ pc->pc_cpuid, ret);
+ goto out;
+ }
+
+ ret = rdmsr_safe(MSR_IA32_HWP_REQUEST, &sc->req);
+ if (ret)
+ return (ret);
+
+ ret = rdmsr_safe(MSR_IA32_HWP_CAPABILITIES, &caps);
+ if (ret)
+ return (ret);
+
+ sc->high = IA32_HWP_CAPABILITIES_HIGHEST_PERFORMANCE(caps);
+ sc->guaranteed = IA32_HWP_CAPABILITIES_GUARANTEED_PERFORMANCE(caps);
+ sc->efficient = IA32_HWP_CAPABILITIES_EFFICIENT_PERFORMANCE(caps);
+ sc->low = IA32_HWP_CAPABILITIES_LOWEST_PERFORMANCE(caps);
+
+ /* hardware autonomous selection determines the performance target */
+ sc->req &= ~IA32_HWP_DESIRED_PERFORMANCE;
+
+ /* enable HW dynamic selection of window size */
+ sc->req &= ~IA32_HWP_ACTIVITY_WINDOW;
+
+ /* IA32_HWP_REQUEST.Minimum_Performance = IA32_HWP_CAPABILITIES.Lowest_Performance */
+ sc->req &= ~IA32_HWP_MINIMUM_PERFORMANCE;
+ sc->req |= sc->low;
+
+ /* IA32_HWP_REQUEST.Maximum_Performance = IA32_HWP_CAPABILITIES.Highest_Performance. */
+ sc->req &= ~IA32_HWP_REQUEST_MAXIMUM_PERFORMANCE;
+ sc->req |= sc->high << 8;
+
+ ret = wrmsr_safe(MSR_IA32_HWP_REQUEST, sc->req);
+ if (ret) {
+ device_printf(dev,
+ "Failed to setup autonomous HWP for cpu%d (file a bug)\n",
+ pc->pc_cpuid);
+ }
+
+out:
+ thread_lock(curthread);
+ sched_unbind(curthread);
+ thread_unlock(curthread);
+
+ return (ret);
+}
+
+static int
+intel_hwpstate_attach(device_t dev)
+{
+ struct hwp_softc *sc;
+ uint32_t regs[4];
+ int ret;
+
+ sc = device_get_softc(dev);
+ sc->dev = dev;
+
+ do_cpuid(6, regs);
+ if (regs[0] & CPUTPM1_HWP_NOTIFICATION)
+ sc->hwp_notifications = true;
+ if (regs[0] & CPUTPM1_HWP_ACTIVITY_WINDOW)
+ sc->hwp_activity_window = true;
+ if (regs[0] & CPUTPM1_HWP_PERF_PREF)
+ sc->hwp_pref_ctrl = true;
+ if (regs[0] & CPUTPM1_HWP_PKG)
+ sc->hwp_pkg_ctrl = true;
+
+ ret = set_autonomous_hwp(sc);
+ if (ret)
+ return (ret);
+
+ SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
+ SYSCTL_STATIC_CHILDREN(_debug), OID_AUTO, device_get_nameunit(dev),
+ CTLTYPE_STRING | CTLFLAG_RD | CTLFLAG_SKIP,
+ sc, 0, intel_hwp_dump_sysctl_handler, "A", "");
+
+ SYSCTL_ADD_PROC(device_get_sysctl_ctx(dev),
+ SYSCTL_CHILDREN(device_get_sysctl_tree(dev)), OID_AUTO,
+ "epp", CTLTYPE_INT | CTLFLAG_RWTUN, dev, sizeof(dev),
+ sysctl_epp_select, "I",
+ "Efficiency/Performance Preference "
+ "(range from 0, most performant, through 100, most efficient)");
+
+ return (cpufreq_register(dev));
+}
+
+static int
+intel_hwpstate_detach(device_t dev)
+{
+
+ return (cpufreq_unregister(dev));
+}
+
+static int
+intel_hwpstate_get(device_t dev, struct cf_setting *set)
+{
+ struct pcpu *pc;
+ uint64_t rate;
+ int ret;
+
+ if (set == NULL)
+ return (EINVAL);
+
+ pc = cpu_get_pcpu(dev);
+ if (pc == NULL)
+ return (ENXIO);
+
+ memset(set, CPUFREQ_VAL_UNKNOWN, sizeof(*set));
+ set->dev = dev;
+
+ ret = cpu_est_clockrate(pc->pc_cpuid, &rate);
+ if (ret == 0)
+ set->freq = rate / 1000000;
+
+ set->volts = CPUFREQ_VAL_UNKNOWN;
+ set->power = CPUFREQ_VAL_UNKNOWN;
+ set->lat = CPUFREQ_VAL_UNKNOWN;
+
+ return (0);
+}
+
+static int
+intel_hwpstate_type(device_t dev, int *type)
+{
+ if (type == NULL)
+ return (EINVAL);
+ *type = CPUFREQ_TYPE_ABSOLUTE | CPUFREQ_FLAG_INFO_ONLY | CPUFREQ_FLAG_UNCACHED;
+
+ return (0);
+}
+
+static int
+intel_hwpstate_suspend(device_t dev)
+{
+ return (0);
+}
+
+/*
+ * Redo a subset of set_autonomous_hwp on resume; untested. Without this,
+ * testers observed that on resume MSR_IA32_HWP_REQUEST was bogus.
+ */
+static int
+intel_hwpstate_resume(device_t dev)
+{
+ struct hwp_softc *sc;
+ struct pcpu *pc;
+ int ret;
+
+ sc = device_get_softc(dev);
+
+ pc = cpu_get_pcpu(dev);
+ if (pc == NULL)
+ return (ENXIO);
+
+ thread_lock(curthread);
+ sched_bind(curthread, pc->pc_cpuid);
+ thread_unlock(curthread);
+
+ ret = wrmsr_safe(MSR_IA32_PM_ENABLE, 1);
+ if (ret) {
+ device_printf(dev,
+ "Failed to enable HWP for cpu%d after suspend (%d)\n",
+ pc->pc_cpuid, ret);
+ goto out;
+ }
+
+ ret = wrmsr_safe(MSR_IA32_HWP_REQUEST, sc->req);
+ if (ret) {
+ device_printf(dev,
+ "Failed to setup autonomous HWP for cpu%d after suspend\n",
+ pc->pc_cpuid);
+ }
+
+out:
+ thread_lock(curthread);
+ sched_unbind(curthread);
+ thread_unlock(curthread);
+
+ return (ret);
+}
Index: head/sys/x86/cpufreq/hwpstate_intel_internal.h
===================================================================
--- head/sys/x86/cpufreq/hwpstate_intel_internal.h
+++ head/sys/x86/cpufreq/hwpstate_intel_internal.h
@@ -0,0 +1,35 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2018 Intel Corporation
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted providing that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR``AS IS'' AND ANY EXPRESS OR
+ * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
+ * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
+ * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+
+#ifndef __X86_CPUFREQ_HWPSTATE_INTEL_INTERNAL_H
+#define __X86_CPUFREQ_HWPSTATE_INTEL_INTERNAL_H
+
+void intel_hwpstate_identify(driver_t *driver, device_t parent);
+
+#endif
File Metadata
Details
Attached
Mime Type
text/plain
Expires
Sat, Apr 11, 1:40 PM (7 h, 30 m)
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
28321781
Default Alt Text
D18028.1775914810.diff (63 KB)
Attached To
Mode
D18028: Add support for Intel Speed Shift
Attached
Detach File
Event Timeline
Log In to Comment